The SEO data to be analyzed often comes from different sources that why it's better to know how to connect them. This is what we will see in this article Let's imagine we have crawled your website, it might be quite nice to check which one of these pages got some SEO traffic.
To do that we'll need to
join the two "datasets" :
rcrawler, we've collected our pages (see How to use rcrawler article)
library(Rcrawler)Rcrawler(Website = "https://www.rforseo.com/")
We now have a dataset (dataframe) of urls associated to their crawl depht called
googleAnalyticsR package we grab Google Analytics SEO Landing page (see How so use googleAnalyticsR article)
# Between 1 january and 1 feb 2021# we want the sessions# we request landing and medium info too# and using the anti sampling optionga <- google_analytics(ga_id,date_range = c("2021-01-01", "2021-02-01"),metrics = "sessions",dimensions = c("medium","landingPagePath"),anti_sample = TRUE)# We filter the data to only keep the SEO sessionsga_seo <- ga %>% filter(medium == "organic")
The first step is to define what's the common ground. We have on the crawler data side the
Url column and on the GA side the
So we need to make a conversion. We'll remove the hostname from the Url using the
INDEX$landingPagePath <- paste0("/",urltools::path(INDEX$Url))INDEX$landingPagePath[INDEX$landingPagePath == "/NA"] <- "/"
and now we can merge
crawl_ga_merged <- merge(INDEX,ga_seo)
That's it really. Lets display the data