[ad_1]
We start with an interval axis that is divided into bins of length 5. (0,5], (5, 10], …
There is a timestamp column that has some timestamps >= 0. By using pd.cut() the interval bin that corresponds to the timestamp is determined. (e.g. “timestamp” = 3.0 -> “time_bin” = (0,5]).
If there is a time bin that has no corresponding timestamp, it does not show up in the interval column. Thus, there can be interval gaps in the “time_bin” column, e.g., (5,10], (15,20]. (i.e., interval (10,15] is missing // note that the timestamp column is sorted)
The goal is to obtain a column “connected_interval” that indicates whether the last x intervals are connected, connected meaning no interval gaps, i.e., (0,5], (5,10], (10, 15]) and a column “conn_interv_length” that indicates for each largest possible connected interval the length of the interval. The interval (0,5], (5,10], (10, 15] would be of length 15.
The initial dataframe has columns “group_id”, “timestamp”, “time_bin”. Columns “connected_interval” & “conn_interv_len” should be computed.
Note: any solution to obtaining the length of populated connected intervals is welcome.
df = pd.DataFrame({"group_id":['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'],\
"timestamp": [0.0, 3.0, 9.0, 24.2, 30.2, 0.0, 136.51, 222.0, 237.0, 252.0],\
"time_bin": [pd.Interval(0, 5, closed='left'), pd.Interval(0, 5, closed='left'), pd.Interval(5, 10, closed='left'), pd.Interval(20, 25, closed='left'), pd.Interval(30, 35, closed='left'), pd.Interval(0, 5, closed='left'), pd.Interval(135, 140, closed='left'), pd.Interval(220, 225, closed='left'), pd.Interval(235, 240, closed='left'), pd.Interval(250, 255, closed='left')],\
"connected_interval":[0, 0, 0, 1, 2, 0, 1, 2, 3, 4],\
"conn_interv_len":[10, 10, 10, 5, 5, 5, 5, 5, 5, 5],\
})
[ad_2]