Pagescraper quick start guide
This is the abbreviated setup guide for Pagescraper, for the advanced setup instructions click here: Pagescraper App
You will need:
Alli Data data source with all the URLs you want Pagescraper to scrape
Access to your client in Alli Marketplace to set up Pagescraper
Another data source in Alli Data to pull your Pagescraper results
1. Set up your data source in Alli Data
For this example, we’re going to set up a google drive data source that pulls a list of URLs from a Google Sheet
Google Sheet Example
DWH Datasource Example
Set up, authorize, and load data.
2. Set up Pagescraper in Alli Marketplace
Add new App
Log in to Alli Marketplace and go to your client then select Add App
to create a new Pagescraper App.
Link your App to Alli Data
On the setup screen, put in the name of the datasource you just created and the name of the column with your URLs
Specify what you want to scrape
For Scrape Selections File
select edit
to type add the elements on the page you want to be scraped.
The below instructions are if you want to pass back information from the web page, if you only want to return the status code enter []
and jump to step 3
This will look something like:
[
{
"name": "store",
"css_selector": ".mn_rebateValue"
}
]
The css_selector
tells Pagescraper where to look in the code of the page. Scroll down for instructions on getting the CSS selector.
To scrape multiple elements, add additional blocks to scrape
[
{
"name": "cash_back",
"css_selector": ".cashback-amount"
},
{
"name": "cash_back2",
"css_selector": ".blk.cb a"
}
]
For a full list of Pagescraper configuration settings and options, click here.
3. Pulling the results into Alli Data
Save and run your Alli Marketplace App to ensure it is set up and runs correctly. Your execution page should look something like this:
Find the “Use the following link to download the results. The link is valid for 7 days.” line in your output and copy/paste the URL below it (line 27 above) in your browser to download the results and confirm the output looks as you expect.
If everything looks good, you can now set up an s3 data source in Alli Data to pull your results back into Alli Data.
Finding the CSS Selector
These instructions are for Google Chrome
Go to one of the pages you want to scrape, right-click on the element (i.e. text) you want to scrape and click ‘inspect element'
This will open the web inspector with the code of your element highlighted. Right-click this and go to ‘copy’ → ‘copy selector’
Paste the selector into your Alli Marketplace App.
Before:
During:
Note that it will copy the entire CSS selector path to the item, you typically only need the last few sections - the selector path should be unique enough to select the item you want but not so unique that if anything is wrong/different it doesn’t select anything
After: