🤖
R for SEO
  • Using R for SEO, What to expect?
  • Getting started
  • What is R? What is SEO?
  • About this Book
  • Crawl and extract data
    • What's crawling and why is it useful?
    • Download and check XML sitemaps using R'
    • Crawling with rvest
    • Website Crawling and SEO extraction with Rcrawler
    • Perform automatic browser tests with RSelenium
  • Grabbing data from APIs
    • Grab Google Suggest Search Queries using R'
    • Grab Google Analytics Data x
    • Grab keywords search volume from DataForSeo API using R'
    • Grab Google Rankings from VALUE SERP API using R'
    • Classify SEO Keywords using GPT-3 & R'
    • Grab Google Search Console Data x
    • Grab 'ahrefs' API data x
    • Grab Google Custom search API Data x
    • Send requests to the Google Indexing API using googleAuthR
    • other APIs x
  • Export and read Data
    • Send and read SEO data to Excel/CSV
    • Send your data by email using gmail API
    • Send and read SEO data to Google Sheet x
  • data wrangling & analysis
    • Join Crawl data with Google Analytics Data
    • Count words, n-grams, shingles x
    • Hunt down keyword cannibalization
    • Duplicate content analysis x
    • Compute ‘Internal Page Rank’
    • SEO traffic Forecast x
    • URLs categorization
    • Track SEO active pages percentage over time x
  • Data Viz
    • Why Data visualisation is important? x
    • Use Esquisse to create plots quickly
  • Explore data with rPivotTable
  • Resources
    • Launch an R script using github actions
    • Types / Class & packages x
    • SEO & R People x
    • Execute R code online
    • useful SEO XPath's & CSS selectors X
Powered by GitBook
On this page
  • What are active pages? and Why would you want to track them?
  • step 1: Counting active URLs using Search Console data

Was this helpful?

  1. data wrangling & analysis

Track SEO active pages percentage over time x

PreviousURLs categorizationNextWhy Data visualisation is important? x

Last updated 4 years ago

Was this helpful?

What are active pages? and Why would you want to track them?

An active page is a page which generates at least one SEO visit over a period. If a page has at least one visit it means that its indexed and 'Google" doesn't think it's a useless page. It is a good indicator of the SEO health of a website.

To make things even more interesting we will grab google search console data and compare them to the number of pages submitted in the XML sitemap file.

step 1: Counting active URLs using Search Console data

( see article about )

library(searchConsoleR)
library(googleAuthR)
scr_auth()

# Load
sc_websites <- list_websites()

# and display the list
View(sc_websites)

# pitck the one
hostname <- "https://www.rforseo.com/"
require(lubridate)

#  we want data between now and 2 months ago
now <- lubridate::today()-3
month(beforedate) <- month(now) - 2
day(beforedate) <- days_in_month(beforedate)

# we ask for data with dates and pages


gsc_all_queries <- search_analytics(hostname,
                                    beforedate,now,
                                    c("date", "page"), rowLimit = 80000)


library(dplyr)

# we count url with clicks
gsc_all_queries_clicks <- gsc_all_queries %>%
  filter(clicks != 0) %>%
  group_by(date) %>%
  tally()

colnames(gsc_all_queries_clicks) <- c("date","clicks")

# we count url with impressions
gsc_all_queries_impr <- gsc_all_queries %>%
  filter(impressions != 0) %>%
  group_by(date) %>%
  tally()

colnames(gsc_all_queries_impr) <- c("date","impr")

# we merge those two
gsc_all_queries_stats <- merge(gsc_all_queries_clicks, gsc_all_queries_impr)



# we scrape the url count from github csv
urls <- read.csv(url("https://raw.githubusercontent.com/pixgarden/scrape-automation/main/data/xml_url_count.csv"))

# rename columns
colnames(urls)  <- c("date","urls")

# transform string date into real dates
urls$date <- as.Date(urls$date)

# merge with google search console data
# because column names match the merge function dont need arguments
gsc_all_queries_merged <- merge(gsc_all_queries_stats, urls)


# we count url with no but with impression
gsc_all_queries_merged$impr <-gsc_all_queries_merged$impr - gsc_all_queries_merged$clicks
# we count url with no impression and no clicks
gsc_all_queries_merged$urls <-gsc_all_queries_merged$urls - gsc_all_queries_merged$impr

# rename columns
colnames(gsc_all_queries_merged) <- c("date", "url-with-clics","url-only-impr","url-no-impr")

require(tidyr)
test <- gather(gsc_all_queries_merged, urls, count, 2:4)
esquisse::esquisser(test)

ggplot(test) +
  aes(x = date, fill = urls, weight = count) +
  geom_bar() +
  scale_fill_hue() +
  theme_minimal()

library(ggplot2)

ggplot(test) +
 aes(x = date, fill = urls, weight = count) +
 geom_bar() +
 scale_fill_hue() +
 theme_minimal()
grabbing Search Console data