Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

StackOverflow Point

StackOverflow Point Navigation

  • Web Stories
  • Badges
  • Tags
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Web Stories
  • Badges
  • Tags
Home/ Questions/Q 185360
Alex Hales
  • 0
Alex HalesTeacher
Asked: June 9, 20222022-06-09T20:47:06+00:00 2022-06-09T20:47:06+00:00

python – How to resolve Pandas performance warning “highly fragmented” after using many custom np.where statements?

  • 0

[ad_1]

I am converting code from SQL to Pandas. I have 80 custom elements in my dataset / dataframe – each requires custom logic. In SQL, I use multiple case statements within a single Select like this:

Select x, y, z,
(case when am_type1 is null then 0
when am_type2 is not null then 0
else 1 end) as am_inv,
next case statement,
next case statement,
etc...

Now in Pandas, I have constructed the same logic using a series of np.where statements like this:

[x, y, z have already been added to df]
df['am_inv'] = np.where((df['am_type1'].notnull()) & df['am_type2'].isnull()),
1,
0)
next np.where
next np.where
etc...

After writing all 80 np.where statements, my output from Pandas matches my output from SQL exactly, so no problems there. I was starting to think I had been successful.

BUT, then I saw this warning:

DataFrame is highly fragmented...poor performance...Consider using pd.concat 
instead...or use copy().

So my question is, how do I use pd.concat on 80 consecutive np.where statements? Or do I keep my code the way it is and just use the copy() command to create a degragmented copy of the data frame? I have tried searching for an answer in this forum but did not find anything, so I decided to post new. I appreciate your time in responding.

[ad_2]

  • 0 0 Answers
  • 3 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report
Leave an answer

Leave an answer
Cancel reply

Browse

Sidebar

Ask A Question

Related Questions

  • xcode - Can you build dynamic libraries for iOS and ...

    • 0 Answers
  • bash - How to check if a process id (PID) ...

    • 401 Answers
  • database - Oracle: Changing VARCHAR2 column to CLOB

    • 371 Answers
  • What's the difference between HEAD, working tree and index, in ...

    • 367 Answers
  • Amazon EC2 Free tier - how many instances can I ...

    • 0 Answers

Stats

  • Questions : 43k

Subscribe

Login

Forgot Password?

Footer

Follow

© 2022 Stackoverflow Point. All Rights Reserved.

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.