Use of regular expressions is one of the skills that lie at the intersection of search engine optimization and programming. Looking for an information about RegExp, you most likely will find it on a GitHub rather than on a niche media about SEO. But it doesn’t mean that you can’t use them in SEO and even content marketing. Web scraping with RegExp can make your work easier by automating a lot of different tasks. In this article, we’ll show you several examples of how you can use them in your daily routine.

1. RegExp Scraping Setup

In this paragraph, we’ll explain how to extract data with Netpeak Spider. It’s a crawler that helps you to do a fast, comprehensive technical audit of the entire website. Moreover, it can successfully perform data scraping using RegExp, CSS selectors, and XPath. Scraping includes several simple steps:

 

  1. Launch Netpeak Spider.
  2. Paste website address in the ‘Initial URL’ field.
  3. Set scraping in a corresponding menu.


    Choose scraping mode and search field – ‘RegExp’ and ‘Inner HTML content’. Then paste a regular expression used for data scraping. Click on the ‘OK’ button and get back to the main window.

  4. Not to overload the program, turn off all parameters except ‘Scraping’ in a sidebar.
  5. Launch the scraping and wait until the procedure ends.
  6. Go to the sidebar and open the ‘Reports’ → ‘Scraping’ tab: there will be a summary of all your searches. Click on ‘Found’ → ‘Show detected’ to open a table with extracted data.

  7. Save obtained data with the ‘Export’ button.

2. Examples of RegExp Data Scraping

2.1. Collecting Reviews

While building a content marketing strategy you need to have a deeper understanding of the key product strengths and weaknesses in order to show them in the right way. One of the main sources of such information are customer reviews. You can even use reviews for similar competitors’ products.

 

In most cases, you can perform reviews extracting using RegExp. It can be a big marketplace, reviews platform or any other type of website. You only need to find a proper regular expression. For example, you can extract reviews from Google Play using the following rule:

 

,[1-5],null,”[^”]*”

 

If you want to scrape reviews for some Android apps, don’t forget to add GET parameter showAllReviews=true and set language with hl in URLs. As a result, a link for data scraping must look like this:

 

https://play.google.com/store/apps/details?id=com.playrix.fishdomdd.gplay&showAllReviews=true&hl=en

 

When scraping is complete, you’ll get a table with 40 most relevant reviews.

You can also scrape a user rating using this regular expression. If you don’t need some data, you can remove it from the table with any table editor (Google Sheets, Microsoft Excel etc.).

 

By the way, you can find a G2 Crowd reviews scraping case here.

2.2. Extracting Emails

Let’s imagine the case when you have a list of potential customers or websites where you want to publish your article and place a link. Manual collecting of the contact information can take too much time. In such a case, automatic data scraping could be an absolutely logical solution.

Use the following regular expression to collect emails:

 

[a-zA-Z0-9-_.]+@[a-zA-Z0-9-.]+

or

[a-zA-Z0-9-_.]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+

 

Also, you can make a similar regular expression for the phone numbers.

2.3. Spelling Check

Even if you hired an experienced copywriter, there’s always a small risk of some spelling mistakes. Use the following regular expression to check an entire website and find out if there are any mistakes in certain words:

 

(?i)ekspresso|skool|cappuchino

 

By analogy with the example shown above, you can make your own regular expression with the most common grammar mistakes typical for your niche.

2.4. Searching for Words with Incorrect Capitalization

 

It is a widely spread practice among companies to change the writing of their names after rebranding. If the brand exists for years, its website can contain a lot of information with a brand name spelled in the old way.

 

(BrandName|brandname|Brandname)

 

This rule will also be useful for those who still don’t know how to write hHz or kB.

 

Before launching scraping don’t forget to turn off the ‘Ignore case’ option in ‘Settings’ → ‘Scraping’.

2.5. Searching for Brand Mentions

Such use of RegExp scraping might be interesting for link builders and outbound marketers who work with brand mentions on outer platforms.

 

You can use it to find out:

 

  • which pages from backlink list don’t contain your brand mentions and where you should place them,
  • if your brand name has been mentioned correctly.

 

Search can be performed as described in paragraph 2.3.

In a Nutshell

Web scraping with regular expressions is a method that helps to simplify and automate a lot of different SEO and marketing tasks including:

 

  • collecting user reviews
  • extracting emails
  • checking the spelling of certain words
  • searching for brand mentions

 

The list of tasks you can solve with RegExp data scraping doesn’t end with those we described above: its limited only by your needs, creativity, and having a teammate who understands RegExp syntax 🙂

 

Do you use a web scraping with regular expressions in your practice? If so, for what purpose? Share your experience in the comments below. We’ll be glad to add more interesting use cases to our article.

Do NOT follow this link or you will be banned from the site!