Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

StackOverflow

StackOverflow Logo StackOverflow Logo

StackOverflow Navigation

Search
Ask A Question

Mobile menu

Close
Ask A Question
  • Home
  • Add group
  • Feed
  • User Profile
  • Communities
  • Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
Home/ Questions/Q 664
Next

StackOverflow Latest Questions

Saralyn
  • 0
  • 0
SaralynBegginer
Asked: April 4, 20252025-04-04T11:16:04+00:00 2025-04-04T11:16:04+00:00In: PHP

php – Prevent backtracking with regex when web scraping this simple pattern

  • 0
  • 0
php – Prevent backtracking with regex when web scraping this simple pattern

I have a web scrapper that crawls pages with an average size of 3MB, full of

, ... elements with all sorts of attributes. From these pages, I usually have to grab just a few informations. For this, I am usually using this type of regex below (I know it will fail if the devs of the page in the future change id="xxx" to id = "xxx" but let's not make this regex more complex adding * in many places for now):

id="email"[^>]*?>(.*?)

.*?id="phone"[^>]*?>(.*?)

The problem is that mostly (.*?) makes backtrack a lot. After studying for more than a week, I came to the conclusion that the best approach to avoid the backtracking (in my cases) is using atomic group ?> (which once matched what is inside it, does not backtrack) and possessive quantifier.

I succesfully managed using atomic group and possessive quantifiers in lots of my regex and it indeed helped a ton! But in the case above, I still cant find how possessive can help. I could, for example, use possessive quantifier to change from:

id="email"[^>]*?>(.*?)
0
  • 0 0 Answers
  • 47 Views
  • 0 Followers
  • 0
Answer
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

Sidebar

Ask A Question
  • Popular
  • Answers
  • W3spoint99

    What is Physics? Definition, History, Importance, Scope (Class 11)

    • 1 Answer
  • W3spoint99

    The Living World – Introduction, Classification, Characteristics, FAQs (Class 11 ...

    • 1 Answer
  • W3spoint99

    Explain - Biological Classification (Class 11 - Biology)

    • 1 Answer
  • Saralyn
    Saralyn added an answer When Humans look at their childhood pictures, the first thing… January 17, 2025 at 3:25 pm
  • Saralyn
    Saralyn added an answer Previously, length was measured using units such as the length… January 17, 2025 at 3:25 pm
  • Saralyn
    Saralyn added an answer Measurement forms the fundamental principle to various other branches of… January 17, 2025 at 3:25 pm

Related Questions

  • Reading fancy apostrophe PHP [duplicate]

    • 0 Answers
  • Unable to send mail via PHPMailer [SMTP->Error: Password not accepted ...

    • 0 Answers
  • Concerns when migrating from PHP 5.6 to 8.4 [closed]

    • 0 Answers
  • Laravel Auth::attempt() error: "Unknown column 'password'" when using a custom ...

    • 0 Answers
  • Core PHP cURL - header origin pass null value

    • 0 Answers

Trending Tags

biology class 11 forces how physics relates to other sciences interdisciplinary science learn mathematics math sets tutorial null sets physics physics and astronomy physics and biology physics and chemistry physics applications science science connections science education sets in mathematics set theory basics types of sets types of sets explained

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help

Footer

  • About US
  • Privacy Policy
  • Questions
  • Recent Questions
  • Web Stories

© 2025 WikiQuora.Com. All Rights Reserved

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.