Loading

scraping

Website Screenshots Made Easy via API

Simplify website screenshots: A guide using the Adcolabs Scraper

Ahmet Taha Özdemir on 30 December 2024

Screenshots of websites can be created in various ways. The fastest and easiest method often involves using the built-in tools of your operating system. Alternatively, many browsers offer corresponding tools. But what if you need to take screenshots regularly? And perhaps not only of the visible area but also of entire pages—and in larger quantities?

For such complex requirements, the Adcolabs-Scraper is an excellent choice. In this article, we will show you with a short example how easy it is to use.

Getting Started

All you need to get started is an account and the API key of the Adcolabs-Scrapers. You can find a step-by-step guide in our blog post on website extraction.

Creating Screenshots

Here is an example of a curl command to take a simple screenshot:

curl -X POST https://api.scraper.adcolabs.de/v1/extractions \
     -H "Content-Type: application/json" \
     -H "API-KEY: <API-KEY>" \
     -d '{"url":"https://adcolabs.de","agent":{"resolution":{"width":"1903","height":"927"},"options":{"headers":[]}},"connectivity":{"proxy":"europe"},"workflow":[{"action":"screenshot"}],"selector":[],"webhook":{"enabled":false,"headers":[]}}'

This command initiates what is called an extraction. The output of the command provides an ID, which we will need for the next steps:

{"id":"67683c348c067b2886aa1d43","url":"https://adcolabs.de","status":"CREATED","created":"2024-12-22T16:20:04.812794718","webhook":{"enabled":false,"headers":[],"successful":false,"retry":0},"agent":{"resolution":{"width":1903,"height":927},"options":{"headers":[]}},"workflow":[{"action":"screenshot","value":0}],"extractions":[],"connectivity":{"proxy":"europe"}}

You can use the ID to retrieve the result:

curl -X GET https://api.scraper.adcolabs.de/v1/extractions/67683c348c067b2886aa1d43 \
     -H "Content-Type: application/json" \
     -H "API-KEY: <API-KEY>"

Since processing takes a moment, the status will initially show as WAITING:

{"id":"67683c348c067b2886aa1d43","url":"https://adcolabs.de","timeStamp":"2024-12-22T16:20:04.812","agent":{"resolution":{"width":1903,"height":927},"options":{"headers":[]}},"selector":[],"status":"WAITING","workflow":[{"action":"screenshot","value":0}],"webhook":{"enabled":false,"headers":[],"successful":false,"retry":0},"connectivity":{"proxy":"europe"}}

After a few seconds, the status changes to DONE. Additionally, a list of artifact URLs containing the results becomes available:

{"id":"67683c348c067b2886aa1d43","url":"https://adcolabs.de","timeStamp":"2024-12-22T16:20:04.812","agent":{"resolution":{"width":1903,"height":927},"options":{"headers":[]}},"selector":[],"status":"DONE","workflow":[{"action":"screenshot","value":0}],"webhook":{"enabled":false,"headers":[],"successful":false,"retry":0},"artifacts":["https://artifacts.s3c.adcolabs.de/2024-12-22/67683c348c067b2886aa1d43/49d78836a161836a/67683c348c067b2886aa1d43_1734884442.png"],"connectivity":{"proxy":"europe"},"extractionsResponses":[],"outputAgents":{"statusCode":0,"headers":{"accept-ranges":"bytes","content-length":"21067","content-type":"text/html","date":"Sun, 22 Dec 2024 16:20:38 GMT","etag":"\"6767f0be-524b\"","last-modified":"Sun, 22 Dec 2024 10:58:06 GMT","strict-transport-security":"max-age=15724800; includeSubDomains"}}}

With the jq tool, you can list artifacts more clearly:

curl -X GET https://api.scraper.adcolabs.de/v1/extractions/67683c348c067b2886aa1d43 \
     -H "Content-Type: application/json" \
     -H "API-KEY: <API-KEY>" \
     | jq .artifacts

The output will then only display the artifact URLs:

[
  "https://artifacts.s3c.adcolabs.de/2024-12-22/67683c348c067b2886aa1d43/49d78836a161836a/67683c348c067b2886aa1d43_1734884442.png"
]

The created screenshot looks like this:

website-screenshots-made-easy-via-api-example-1

Full Pages and More

With the full-page-screenshot parameter, you can also capture screenshots of entire web pages.

curl -X POST https://api.scraper.adcolabs.de/v1/extractions \
     -H "Content-Type: application/json" \
     -H "API-KEY: <API-KEY>" \
     -d '{"url":"https://www.adcolabs.de/blog/hands-on-seitenextraktion/","agent":{"resolution":{"width":"1903","height":"927"},"options":{"headers":[]}},"connectivity":{"proxy":"europe"},"workflow":[{"action":"full-page-screenshot"}],"selector":[],"webhook":{"enabled":false,"headers":[]}}'

Here’s an example of the result for one of our german blog posts:

website-screenshots-made-easy-via-api-example-2

Moreover, additional parameters can simulate scrolling and clicking actions. This allows you to create specific states of the web page or target individual areas precisely. You can also add wait times to account for loading effects on the page.

Automated Screenshots with a Bash Script

For easy and flexible website screenshot creation, we’ve prepared a handy Bash script. With just a few parameters, you can capture screenshots and download them directly. The script is highly customizable and supports various screenshot types and output locations.

#!/bin/bash

BROWSER_RESOLUTION_X="1903"
BROWSER_RESOLUTION_Y="927"
MAX_RETRIES=30
RETRY_INTERVAL=3

function display_help {
    echo "Usage: $0 --url <website_url> --api-key <api_key> [--type <screenshot_type>] [--output <output_directory>]"
    echo
    echo "Options:"
    echo "  --url      The URL of the website to capture (required)."
    echo "  --api-key  API key for the screenshot service (required)."
    echo "  --type     Type of screenshot to capture. Options:"
    echo "             'screenshot' (default) or 'full-page-screenshot'."
    echo "  --output   Directory to save downloaded artifacts (default: current directory)."
    echo "  --help     Show this help message and exit."
    echo
    echo "Examples:"
    echo "  $0 --url http://example.com --api-key YOUR_API_KEY"
    echo "  $0 --url http://example.com --api-key YOUR_API_KEY --type full-page-screenshot --output /path/to/save"
}

WEBSITE_URL=""
SCREENSHOT_TYPE="screenshot"
OUTPUT_DIR="."
API_KEY=""

while [[ "$#" -gt 0 ]]; do
    case "$1" in
        --help)
            display_help
            exit 0
            ;;
        --url)
            WEBSITE_URL="$2"
            shift 2
            ;;
        --api-key)
            API_KEY="$2"
            shift 2
            ;;
        --type)
            SCREENSHOT_TYPE="$2"
            shift 2
            ;;
        --output)
            OUTPUT_DIR="$2"
            shift 2
            ;;
        *)
            echo "Error: Unknown argument '$1'"
            echo "Run '$0 --help' for usage information."
            exit 1
            ;;
    esac
done

if [[ -z "$WEBSITE_URL" ]]; then
    echo "Error: --url is required."
    echo "Run '$0 --help' for usage information."
    exit 1
fi

if [[ -z "$API_KEY" ]]; then
    echo "Error: --api-key is required."
    echo "Run '$0 --help' for usage information."
    exit 1
fi

if [[ "$SCREENSHOT_TYPE" != "screenshot" && "$SCREENSHOT_TYPE" != "full-page-screenshot" ]]; then
    echo "Error: Invalid value for --type. Use 'screenshot' or 'full-page-screenshot'."
    echo "Run '$0 --help' for usage information."
    exit 1
fi

if [[ ! -d "$OUTPUT_DIR" ]]; then
    echo "Error: Output directory '$OUTPUT_DIR' does not exist."
    exit 1
fi

echo "Starting screenshot extraction for URL: $WEBSITE_URL with type: $SCREENSHOT_TYPE"

EXTRACTION_PAYLOAD=$(cat <<EOF
{
    "url": "${WEBSITE_URL}",
    "agent": {
        "resolution": {
            "width": "${BROWSER_RESOLUTION_X}",
            "height": "${BROWSER_RESOLUTION_Y}"
        },
        "options": {
            "headers": []
        }
    },
    "connectivity": {
        "proxy": "europe"
    },
    "workflow": [
        {
            "action": "${SCREENSHOT_TYPE}"
        }
    ],
    "selector": [],
    "webhook": {
        "enabled": false,
        "headers": []
    }
}
EOF
)

EXTRACTION_ID=$(curl -s -X POST "https://api.scraper.adcolabs.de/v1/extractions" \
    -H "Content-Type: application/json" \
    -H "API-KEY: ${API_KEY}" \
    -d "$EXTRACTION_PAYLOAD" | jq -r .id)

if [[ -z "$EXTRACTION_ID" || "$EXTRACTION_ID" == "null" ]]; then
    echo "Error: Failed to initiate screenshot extraction."
    exit 1
fi

echo "Extraction initiated. ID: $EXTRACTION_ID"

RETRIES_LEFT=$MAX_RETRIES
EXTRACTION_STATUS=""

echo -n "Checking extraction status"
while [[ $RETRIES_LEFT -gt 0 ]]; do
    echo -n "."
    EXTRACTION_STATUS=$(curl -s -X GET "https://api.scraper.adcolabs.de/v1/extractions/${EXTRACTION_ID}" \
        -H "Content-Type: application/json" \
        -H "API-KEY: ${API_KEY}" | jq -r .status)

    case "$EXTRACTION_STATUS" in
        DONE)
            echo -e "\nExtraction completed successfully."
            break
            ;;
        WAITING|CREATED)
            ((RETRIES_LEFT--))
            sleep $RETRY_INTERVAL
            ;;
        *)
            echo -e "\nError: Unexpected extraction status - $EXTRACTION_STATUS"
            exit 1
            ;;
    esac
done

if [[ "$EXTRACTION_STATUS" != "DONE" ]]; then
    echo -e "\nError: Extraction did not complete within the allowed time."
    exit 1
fi

echo "Fetching extraction artifacts..."
ARTIFACTS=$(curl -s -X GET "https://api.scraper.adcolabs.de/v1/extractions/${EXTRACTION_ID}" \
    -H "Content-Type: application/json" \
    -H "API-KEY: ${API_KEY}" | jq -r .artifacts)

if [[ -z "$ARTIFACTS" || "$ARTIFACTS" == "null" ]]; then
    echo "Error: No artifacts available for the extraction."
    exit 1
fi

echo "Extraction artifacts:"
echo "$ARTIFACTS"

ARTIFACTS_ARRAY=$(echo "$ARTIFACTS" | jq -r '.[]')
for URL in $ARTIFACTS_ARRAY; do
    FILENAME=$(basename "$URL")
    OUTPUT_PATH="${OUTPUT_DIR}/${FILENAME}"
    echo "Downloading $URL to $OUTPUT_PATH"
    curl -s -o "$OUTPUT_PATH" "$URL"
    if [[ $? -ne 0 ]]; then
        echo "Error: Failed to download $URL"
    fi
done

echo "All artifacts have been downloaded to $OUTPUT_DIR."

Example Applications of the Screenshot Script

To get an overview of the usage options, run:

./screenshot-downloader.sh --help

Creating a screenshot of the visible area of a website:

./screenshot-downloader.sh --url https://www.adcolabs.de/blog/hands-on-seitenextraktion/ --api-key <API-KEY>

Creating a screenshot of the entire website:

./screenshot-downloader.sh --url https://www.adcolabs.de/blog/hands-on-seitenextraktion/ --type full-page-screenshot --api-key <API-KEY>

Saving the screenshot in a custom directory:

./screenshot-downloader.sh --url https://www.adcolabs.de/blog/hands-on-seitenextraktion/ --output <CUSTOM-DIRECTORY-PATH> --api-key <API-KEY>

Adcolabs Scraper

The Adcolabs Scraper not only allows for screenshots but also supports video recordings of websites. The script can be adjusted to utilize this feature as well. Discover more features directly in the Adcolabs app. Check it out and test the extensive possibilities!