[ad_1]
I’d like to parallelise the filling of multiple columns, but I can’t figure out how to do this if the entries are sequential, i.e. each following row has to depend on what’s in the previous row.
I have a list with multiple data frames. In each data frame, there are predefined distances in each row and a similar starting point for each of the columns. All other rows in these columns have to be filled in randomly, but respecting the distance criteria – such that the next station is within the distance. To do this, I select several potential stations from the separate data frame where the distance information is stored and check which ones are within range in each successive step. If there are more than 2 results that are within reach, the next row is filled with the nearest of the two, otherwise, the station from the previous row is retained. I hope this isn’t too confusing…
A simplified version of my code is given below.
My actual dataset is huge and I need to fill in 100 columns in each data frame. I would like to try to parallelize this step, so that several columns can be filled up at the same time. I was looking into doParallel
package but couldn’t figure out how to make it work on a sequential process.
Thanks in advance!
data <- list(data.frame(distance=c(2,3,6,7,9,4,5),
station1=c("s1",NA,NA,NA,NA,NA,NA),
station2=c("s1",NA,NA,NA,NA,NA,NA),
station3=c("s1",NA,NA,NA,NA,NA,NA),
station4=c("s1",NA,NA,NA,NA,NA,NA)),
data.frame(distance=c(5,3,4,10,4,4,1),
station1=c("s3",NA,NA,NA,NA,NA,NA),
station2=c("s3",NA,NA,NA,NA,NA,NA),
station3=c("s3",NA,NA,NA,NA,NA,NA),
station4=c("s3",NA,NA,NA,NA,NA,NA)))
alldist <- data.frame(from = c("s1","s1","s1","s2","s2","s2","s3","s3","s3","s4","s4","s4"),
to = c("s2","s3","s4","s1","s3","s4","s1","s2","s4","s1","s2","s3"),
distance = c(3,5,1,3,2,7,5,2,4,1,7,4))
newdata <- list()
for (j in 1:2){
for (i in 2:nrow(data[[j]])){
for (k in 1:4) { # Need to parallelize this part #
# sample 3 rows, from the same station as previous
stations <- sample(rownames(alldist[which(alldist$from==data[[j]][[i-1,paste0("station", k)]]),]),3, replace = TRUE)
# subset stations within the reach
stations.in <- stations[which(alldist[stations,"distance"] <= data[[j]]$distance[i])]
data[[j]][i, paste0("station", k)] <- ifelse(length(stations.in) < 2,
# stays at the same station
data[[j]][[i-1,paste0("station", k)]],
# goes to the nearest out of first two in selected stations
tracks[[j]][i, paste0("station", k)] <- alldist[stations.in[which.min(alldist[stations.in[c(1,2)],"distance"])],"to"])
}}
newdata[[j]] <- data[[j]]
}
[ad_2]