Grab Google Search Console Data x

⚠️ THIS IS A WORK IN PROGRESS

SearchConsoleR

First, we’ll load searchConsoleR, an awesome package by Mark Edmondson. This will allow us to send requests to Google ‘Search Console API’ very easily.

install.packages("searchConsoleR")
library(searchConsoleR)

and to help to deal with Google Account Authentication (still by Mark Edmondson). It will spare the pain of having to set up an API Key.

install.packages("googleAuthR")
library(googleAuthR)

Gather DATA

Let’s initiate authentification. This should open a new browser window, asking you to validate access to your GSC account. The script will be allowed to make requests for a limited period of time.

scr_auth()

This will create a sc.oauth file inside your working directory. It stores your temporary Access tokens. If you wish to switch between Google accounts, just delete the file, re-run the command and log in with another account.

Let’s list all websites we are allowed to send requests about:

# Load
sc_websites <- list_websites()
# and display the list
View(sc_websites)

and pick one

hostname <- "https://www.example.com/"

don’t forget to update this with your hostname

As you may know, Search Console data is not available right away. If we want, for example, to request data for the last available 2 months, we'll need the date range to be between 3 days ago and 2 months before that… As seen before we will be helped by the Lubridate package

install.packages("lubridate")
require(lubridate)
tree_days_ago <- lubridate::today()-3
beforedate <- tree_days_ago
month(beforedate) <- month(beforedate) - 2
day(beforedate) <- days_in_month(beforedate)

and now the actual request (at last!)

gsc_all_queries <- search_analytics(hostname,
                    beforedate, tree_days_ago,
                    c("query", "page"), rowLimit = 80000)

We are requesting ‘query’ and ‘page’ dimensions. If you wish, it’s possible to restrict the request to some type of user device, like ‘desktop only’. See function documentation.

There is no point in asking for a longer time period. We want to know if our web pages currently compete with one another now.

rowLimit is a bit of a big random number, this should be enough. If you have a popular website, with a lot of long-tail traffic. You might need to increase it.

API respond is store inside gbr_all_queries variable as a data frame.

If you happen to have several domains/subdomains that compete with each other for the same keywords, this process should be repeated. The results will have to be aggregated, bind_rows function will help you bind them together. This is how to use it :

bind_rows(gsc_queries_1,gsc_queries_2)

Last updated