[ad_1]
I have 10GB CSV file in storage account.
I try to call the HTTP GET and get the contents in byte range for e.g. in first loop, get 0 to 500MB then 501MB-1000MB etc.
Below code works fine if I comment Union of DF part. How can I write it differently to solve this error?
It exactly fails in 5th loop, I guess after processing (500MB x 4 loops) 2GB (which is some heap space which is crossed)
for(i <- 1 to chunkNum) {
println(i)
// Hiding unnecessary code to get data in ranges
val dateFormat = new SimpleDateFormat("YYYY-MM-dd HH:mm:ss.SSS")
val currentDate = dateFormat.format(Calendar.getInstance.getTime)
println("BeforeResponse")
val response = GetHttpResponse(headers, "https://mystorage.blob.core.windows.net/test/traindata.csv")
println("AfterResponse")
dfRestAPI = dfRestAPI.union(Seq((response,currentDate)).toDF("Chunk","InsertedDate"))
}
[ad_2]