🤖
R for SEO
  • Using R for SEO, What to expect?
  • Getting started
  • What is R? What is SEO?
  • About this Book
  • Crawl and extract data
    • What's crawling and why is it useful?
    • Download and check XML sitemaps using R'
    • Crawling with rvest
    • Website Crawling and SEO extraction with Rcrawler
    • Perform automatic browser tests with RSelenium
  • Grabbing data from APIs
    • Grab Google Suggest Search Queries using R'
    • Grab Google Analytics Data x
    • Grab keywords search volume from DataForSeo API using R'
    • Grab Google Rankings from VALUE SERP API using R'
    • Classify SEO Keywords using GPT-3 & R'
    • Grab Google Search Console Data x
    • Grab 'ahrefs' API data x
    • Grab Google Custom search API Data x
    • Send requests to the Google Indexing API using googleAuthR
    • other APIs x
  • Export and read Data
    • Send and read SEO data to Excel/CSV
    • Send your data by email using gmail API
    • Send and read SEO data to Google Sheet x
  • data wrangling & analysis
    • Join Crawl data with Google Analytics Data
    • Count words, n-grams, shingles x
    • Hunt down keyword cannibalization
    • Duplicate content analysis x
    • Compute ‘Internal Page Rank’
    • SEO traffic Forecast x
    • URLs categorization
    • Track SEO active pages percentage over time x
  • Data Viz
    • Why Data visualisation is important? x
    • Use Esquisse to create plots quickly
  • Explore data with rPivotTable
  • Resources
    • Launch an R script using github actions
    • Types / Class & packages x
    • SEO & R People x
    • Execute R code online
    • useful SEO XPath's & CSS selectors X
Powered by GitBook
On this page
  • SearchConsoleR
  • Gather DATA

Was this helpful?

  1. Grabbing data from APIs

Grab Google Search Console Data x

⚠️ THIS IS A WORK IN PROGRESS

PreviousClassify SEO Keywords using GPT-3 & R'NextGrab 'ahrefs' API data x

Last updated 4 years ago

Was this helpful?

SearchConsoleR

First, we’ll load searchConsoleR, an awesome package by . This will allow us to send requests to Google ‘Search Console API’ very easily.

install.packages("searchConsoleR")
library(searchConsoleR)

and to help to deal with Google Account Authentication (still by Mark Edmondson). It will spare the pain of having to set up an API Key.

install.packages("googleAuthR")
library(googleAuthR)

Gather DATA

Let’s initiate authentification. This should open a new browser window, asking you to validate access to your GSC account. The script will be allowed to make requests for a limited period of time.

scr_auth()

This will create a sc.oauth file inside your working directory. It stores your temporary Access tokens. If you wish to switch between Google accounts, just delete the file, re-run the command and log in with another account.

Let’s list all websites we are allowed to send requests about:

# Load
sc_websites <- list_websites()
# and display the list
View(sc_websites)

and pick one

hostname <- "https://www.example.com/"

don’t forget to update this with your hostname

install.packages("lubridate")
require(lubridate)
tree_days_ago <- lubridate::today()-3
beforedate <- tree_days_ago
month(beforedate) <- month(beforedate) - 2
day(beforedate) <- days_in_month(beforedate)

and now the actual request (at last!)

gsc_all_queries <- search_analytics(hostname,
                    beforedate, tree_days_ago,
                    c("query", "page"), rowLimit = 80000)

There is no point in asking for a longer time period. We want to know if our web pages currently compete with one another now.

rowLimit is a bit of a big random number, this should be enough. If you have a popular website, with a lot of long-tail traffic. You might need to increase it.

API respond is store inside gbr_all_queries variable as a data frame.

bind_rows(gsc_queries_1,gsc_queries_2)

As you may know, Search Console data is not available right away. If we want, for example, to request data for the last available 2 months, we'll need the date range to be between 3 days ago and 2 months before that… we will be helped by the Lubridate package

We are requesting ‘query’ and ‘page’ dimensions. If you wish, it’s possible to restrict the request to some type of user device, like ‘desktop only’. See function

If you happen to have several domains/subdomains that compete with each other for the same keywords, this process should be repeated. The results will have to be aggregated, function will help you bind them together. This is how to use it :

Mark Edmondson
As seen before
documentation.
bind_rows