I am converting code from SQL to Pandas. I have 80 custom elements in my dataset / dataframe – each requires custom logic. In SQL, I use multiple case statements within a single Select like this:
Select x, y, z, (case when am_type1 is null then 0 when am_type2 is not null then 0 else 1 end) as am_inv, next case statement, next case statement, etc...
Now in Pandas, I have constructed the same logic using a series of np.where statements like this:
[x, y, z have already been added to df] df['am_inv'] = np.where((df['am_type1'].notnull()) & df['am_type2'].isnull()), 1, 0) next np.where next np.where etc...
After writing all 80 np.where statements, my output from Pandas matches my output from SQL exactly, so no problems there. I was starting to think I had been successful.
BUT, then I saw this warning:
DataFrame is highly fragmented...poor performance...Consider using pd.concat instead...or use copy().
So my question is, how do I use pd.concat on 80 consecutive np.where statements? Or do I keep my code the way it is and just use the copy() command to create a degragmented copy of the data frame? I have tried searching for an answer in this forum but did not find anything, so I decided to post new. I appreciate your time in responding.
Leave an answer