[ad_1]
I am converting code from SQL to Pandas. I have 80 custom elements in my dataset / dataframe – each requires custom logic. In SQL, I use multiple case statements within a single Select like this:
Select x, y, z,
(case when am_type1 is null then 0
when am_type2 is not null then 0
else 1 end) as am_inv,
next case statement,
next case statement,
etc...
Now in Pandas, I have constructed the same logic using a series of np.where statements like this:
[x, y, z have already been added to df]
df['am_inv'] = np.where((df['am_type1'].notnull()) & df['am_type2'].isnull()),
1,
0)
next np.where
next np.where
etc...
After writing all 80 np.where statements, my output from Pandas matches my output from SQL exactly, so no problems there. I was starting to think I had been successful.
BUT, then I saw this warning:
DataFrame is highly fragmented...poor performance...Consider using pd.concat
instead...or use copy().
So my question is, how do I use pd.concat on 80 consecutive np.where statements? Or do I keep my code the way it is and just use the copy() command to create a degragmented copy of the data frame? I have tried searching for an answer in this forum but did not find anything, so I decided to post new. I appreciate your time in responding.
[ad_2]