🤖
R for SEO
  • Using R for SEO, What to expect?
  • Getting started
  • What is R? What is SEO?
  • About this Book
  • Crawl and extract data
    • What's crawling and why is it useful?
    • Download and check XML sitemaps using R'
    • Crawling with rvest
    • Website Crawling and SEO extraction with Rcrawler
    • Perform automatic browser tests with RSelenium
  • Grabbing data from APIs
    • Grab Google Suggest Search Queries using R'
    • Grab Google Analytics Data x
    • Grab keywords search volume from DataForSeo API using R'
    • Grab Google Rankings from VALUE SERP API using R'
    • Classify SEO Keywords using GPT-3 & R'
    • Grab Google Search Console Data x
    • Grab 'ahrefs' API data x
    • Grab Google Custom search API Data x
    • Send requests to the Google Indexing API using googleAuthR
    • other APIs x
  • Export and read Data
    • Send and read SEO data to Excel/CSV
    • Send your data by email using gmail API
    • Send and read SEO data to Google Sheet x
  • data wrangling & analysis
    • Join Crawl data with Google Analytics Data
    • Count words, n-grams, shingles x
    • Hunt down keyword cannibalization
    • Duplicate content analysis x
    • Compute ‘Internal Page Rank’
    • SEO traffic Forecast x
    • URLs categorization
    • Track SEO active pages percentage over time x
  • Data Viz
    • Why Data visualisation is important? x
    • Use Esquisse to create plots quickly
  • Explore data with rPivotTable
  • Resources
    • Launch an R script using github actions
    • Types / Class & packages x
    • SEO & R People x
    • Execute R code online
    • useful SEO XPath's & CSS selectors X
Powered by GitBook
On this page
  • 1. Crawl data
  • 2. Google analytics data
  • 3. Fuuuuu...sion!

Was this helpful?

  1. data wrangling & analysis

Join Crawl data with Google Analytics Data

PreviousSend and read SEO data to Google Sheet xNextCount words, n-grams, shingles x

Last updated 3 years ago

Was this helpful?

The SEO data to be analyzed often comes from different sources that why it's better to know how to connect or merge them. Let's imagine we have crawled your website, it might be quite nice to check which one of these pages got some SEO traffic.

To do that we'll need to merge or join the two "datasets"

1. Crawl data

Using rcrawler, we've collected our pages (see article)

library(Rcrawler)
Rcrawler(Website = "https://www.rforseo.com/")

We now have a dataset (dataframe) of urls associated to their crawl depht called INDEX

View(INDEX)

2. Google analytics data


# Between 1 january and 1 feb 2021
# we want the sessions
# we request landing and medium info too 
# and using the anti sampling option

ga <- google_analytics(ga_id, 

    date_range = c("2021-01-01", "2021-02-01"),
    metrics = "sessions",
    dimensions = c("medium","landingPagePath"),

    anti_sample = TRUE)



# We filter the data to only keep the SEO sessions

ga_seo <- ga %>% filter(medium == "organic")

3. Fuuuuu...sion!

First, you need to define what's the common ground. We have on the crawler data side the Url column and on the GA side the landingPagePath

So we need to make a conversion. We'll remove the hostname from the Url using the path function urltools package.

INDEX$landingPagePath <- paste0("/",urltools::path(INDEX$Url))

INDEX$landingPagePath[INDEX$landingPagePath == "/NA"] <- "/"

and now we can merge

crawl_ga_merged <- merge(INDEX,ga_seo)

That's it really. Lets display the data

View(crawl_ga_merged)

Using googleAnalyticsR package we grab Google Analytics SEO Landing page (see article)

How so use googleAnalyticsR
How to use rcrawler
second column is the url