Loading

scraping

Hands-On Website-Extraction

Exemplary Extraction of a Job Board Using the Adcolabs-Scraper

Ahmet Taha Özdemir on 16 December 2024

Web content extraction has become easier than ever with modern browser automation tools. These tools allow for the automated capture and extraction of dynamic web content. In this post, we will discuss using the Adcolabs Scraper, which offers features such as screenshots, extractors, and proxies, in addition to basic browser automation.

Setup

With a simple account at Adcolabs Scraper, you get 10 scrapes including all functionalities. Those who subscribe to the newsletter even receive 20 scrapes. For additional support or more scrapes, support can be directly contacted in the app. Since it is a REST service, no integration of external SDKs is required.

The Extraction

The setup is straightforward: navigate through the user interface to the “+ Extraction” menu.

hands-on-page-extraction-example-1

On the configuration page, set the desired parameters. In our example, we are extracting a job board and would like the following information:

  1. Screenshot of the viewport
  2. Screenshot of the full page
  3. Complete page source code
  4. Extraction of paths to the job listings

Configuration

  1. Enter the page URL.
  2. Under Browser Workflow (middle column Configuration), select the Screenshot option and click Add.
  3. Also under Browser Workflow, choose the Full Screenshot option and add it.
  4. Under Selectors And Extractors (left column Create Extraction), select the Regex option, enter the regular expression href="(/is-ilani/[^\"]*)" and click Add.
  5. The Full Extraction option is activated by default and always extracts the complete page source code.
  6. Start extraction by clicking Create.

hands-on-page-extraction-example-2

Results

An extraction takes between 30 to 60 seconds depending on complexity. The result can then be accessed on the Scraper overview page.

Results can be viewed using the Inspect button.

hands-on-page-extraction-example-3

This is what the overview page of the extraction looks like.

hands-on-page-extraction-example-4

Two screenshots, the complete page source code, and all paths to the available job listings were extracted.

Extraction via API

These actions can also be conveniently performed via the API. Simple documentation on this can be found at: https://docs.scraper.adcolabs.com/

We have included an example of such an extraction in this guide. We initiate the scrape using Curl and analyze the result at the end with jq. The corresponding Curl command can be copied directly from the extraction page.

hands-on-page-extraction-example-5 hands-on-page-extraction-example-6

The corresponding command is entered into the terminal and executed.

hands-on-page-extraction-example-7

Progress can be retrieved through a GET request.

hands-on-page-extraction-example-8

Once the scrape is complete, the output can be passed to jq.

hands-on-page-extraction-example-9

Further processing is now easily possible.

Further Use Cases

In addition to the mentioned example, the following use cases are conceivable:

  • Scraping of company pages (news, jobs, documents)
  • Scraping of shops (prices, availability, reviews)
  • Scraping of (auto/real estate) markets (prices, availability, dealer information)
  • Scraping of service platforms (reviews, appointment information)

Conclusion

The example presented shows how the Adcolabs Scraper can be efficiently used. There are many more complex and challenging applications possible.

If you are interested in a demonstration or want your data delivered ready to use, please feel free to contact us. An email or a phone call is enough, and we will get back to you promptly.