Skip to main content
Skip table of contents

Pagescraper quick start guide

This is the abbreviated setup guide for Pagescraper, for the advanced setup instructions click here: Pagescraper App

You will need:

  1. Alli Data data source with all the URLs you want Pagescraper to scrape

  2. Access to your client in Alli Marketplace to set up Pagescraper

  3. Another data source in Alli Data to pull your Pagescraper results

1. Set up your data source in Alli Data

For this example, we’re going to set up a google drive data source that pulls a list of URLs from a Google Sheet

Google Sheet Example

DWH Datasource Example

Set up, authorize, and load data.

2. Set up Pagescraper in Alli Marketplace

Add new App

Log in to Alli Marketplace and go to your client then select Add App to create a new Pagescraper App.

Link your App to Alli Data

On the setup screen, put in the name of the datasource you just created and the name of the column with your URLs

Specify what you want to scrape

For Scrape Selections File select edit to type add the elements on the page you want to be scraped.

The below instructions are if you want to pass back information from the web page, if you only want to return the status code enter [] and jump to step 3

This will look something like:

CODE
[
  {
    "name": "store",
    "css_selector": ".mn_rebateValue"
  }
]

The css_selector tells Pagescraper where to look in the code of the page. Scroll down for instructions on getting the CSS selector.

To scrape multiple elements, add additional blocks to scrape

CODE
[
  {
    "name": "cash_back",
    "css_selector": ".cashback-amount"
  },
  {
    "name": "cash_back2",
    "css_selector": ".blk.cb a"
  }
]

For a full list of Pagescraper configuration settings and options, click here.

3. Pulling the results into Alli Data

Save and run your Alli Marketplace App to ensure it is set up and runs correctly. Your execution page should look something like this:

Find the “Use the following link to download the results. The link is valid for 7 days.” line in your output and copy/paste the URL below it (line 27 above) in your browser to download the results and confirm the output looks as you expect.

If everything looks good, you can now set up an s3 data source in Alli Data to pull your results back into Alli Data.

Finding the CSS Selector

These instructions are for Google Chrome

Go to one of the pages you want to scrape, right-click on the element (i.e. text) you want to scrape and click ‘inspect element'

This will open the web inspector with the code of your element highlighted. Right-click this and go to ‘copy’ → ‘copy selector’

Paste the selector into your Alli Marketplace App.

Before:

During:
Note that it will copy the entire CSS selector path to the item, you typically only need the last few sections - the selector path should be unique enough to select the item you want but not so unique that if anything is wrong/different it doesn’t select anything

After:

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.