[ad_1]
I have some test data which looks like this:
test_data = {'equipment_barcode': ['0001', '0001', '0001', '0001', '0002', '0002', '0002', '0002'],
'date': ['2022-04-16', '2022-04-17', '2022-04-18', '2022-04-19', '2022-05-13', '2022-05-14', '2022-05-15', '2022-05-16'],
'downtime_minutes':[1200, 240, 1440, 600, 360, 1440, 1440, 180]}
I have converted this into a pandas dataframe:
df = pd.DataFrame(test_data)
print(df)
equipment_barcode date downtime_minutes
0 0001 2022-04-16 1200
1 0001 2022-04-17 240
2 0001 2022-04-18 1440
3 0001 2022-04-19 600
4 0002 2022-05-13 360
5 0002 2022-05-14 1440
6 0002 2022-05-15 1440
7 0002 2022-05-16 180
There are 1,440 minutes in one day. This information is required to calculate the NUMBER of faults. A fault may occur over consecutive number of days and thus will still be counted as ONE fault only.
The task is to find the number of faults for each equipment_barcode. For example equipment 0001 had 2 faults [one fault on 16th of April lasting for 1200 minutes and one fault on 17th-19th of April lasting for 2280 minutes(240 + 1440 + 600)].
For added clarity; equipment 0002 had 1 fault [fault on 13th-16th of May lasting for 3420 minutes (360 + 1440 + 1440 + 180)].
In the end I would like a resulting dataframe that shows something like this:
equipment_barcode fault_number total_downtime_minutes
0 0001 1 1200
1 0001 2 2280
2 0002 1 3420
[ad_2]