[ad_1]
I have created a dataset that has columns for 2 customers:
”’
Cust_No Transaction_date amount credit_debit running_total row_num
1 5/27/2022 800 D -200 1
1 5/26/2022 300 D 600 2
1 5/22/2022 800 C 900 3
1 5/20/2022 100 C 100 4
9 5/16/2022 500 D -300 1
9 5/14/2022 300 D 200 2
9 5/6/2022 200 C 500 3
9 5/5/2022 500 D 300 4
9 5/2/2022 300 D 800 5
9 5/2/2022 500 C 1100 6
9 5/1/2022 500 C 600 7
9 5/1/2022 100 C 100 8
”’
The result I am looking for is:
”’
Cust_No Transaction_date amount credit_debit running_total row_num
1 5/27/2022 800 D -200 1
1 5/26/2022 300 D 600 2
1 5/22/2022 800 C 900 3
9 5/16/2022 500 D -300 1
9 5/14/2022 300 D 200 2
9 5/6/2022 200 C 500 3
9 5/5/2022 500 D 300 4
9 5/2/2022 300 D 800 5
9 5/2/2022 500 C 1100 6
”’
We note the latest transaction amount and search for first occurrence of same amount that was a credit (C) and exclude the rest of the rows after it.
In the example above: Customer 9 has lastest debit transaction of 500, so we look for most recent credit transaction of 500 and exclude all the rows after that for customer 9.
Progress Made so far:
- calculated the running total using logic:
”’
sum (case when credit_debit=”C” then amount else -1*amount end) over (partition by cust_no order by transaction_date desc ) as running_total
”’
I also got the data using lead 1,2,3,4,5 but this is not efficient and I could have multiple rows before I find the first credit number with amount same as 1st row:
”’
case when lead(amount, 1) over(partition by cust_no order by transaction_date desc) = amount then amount else null end as lead1
”’
[ad_2]