🤖
R for SEO
  • Using R for SEO, What to expect?
  • Getting started
  • What is R? What is SEO?
  • About this Book
  • Crawl and extract data
    • What's crawling and why is it useful?
    • Download and check XML sitemaps using R'
    • Crawling with rvest
    • Website Crawling and SEO extraction with Rcrawler
    • Perform automatic browser tests with RSelenium
  • Grabbing data from APIs
    • Grab Google Suggest Search Queries using R'
    • Grab Google Analytics Data x
    • Grab keywords search volume from DataForSeo API using R'
    • Grab Google Rankings from VALUE SERP API using R'
    • Classify SEO Keywords using GPT-3 & R'
    • Grab Google Search Console Data x
    • Grab 'ahrefs' API data x
    • Grab Google Custom search API Data x
    • Send requests to the Google Indexing API using googleAuthR
    • other APIs x
  • Export and read Data
    • Send and read SEO data to Excel/CSV
    • Send your data by email using gmail API
    • Send and read SEO data to Google Sheet x
  • data wrangling & analysis
    • Join Crawl data with Google Analytics Data
    • Count words, n-grams, shingles x
    • Hunt down keyword cannibalization
    • Duplicate content analysis x
    • Compute ‘Internal Page Rank’
    • SEO traffic Forecast x
    • URLs categorization
    • Track SEO active pages percentage over time x
  • Data Viz
    • Why Data visualisation is important? x
    • Use Esquisse to create plots quickly
  • Explore data with rPivotTable
  • Resources
    • Launch an R script using github actions
    • Types / Class & packages x
    • SEO & R People x
    • Execute R code online
    • useful SEO XPath's & CSS selectors X
Powered by GitBook
On this page
  • What is crawling?
  • How crawling is useful for SEO?
  • Crawling is also interesting to grab data.

Was this helpful?

  1. Crawl and extract data

What's crawling and why is it useful?

PreviousAbout this BookNextDownload and check XML sitemaps using R'

Last updated 3 years ago

Was this helpful?

What is crawling?

It's fetching the contents of a web page using an app or a script. This is what Google is doing when its bot explores the web and analyzes webpage content.

How crawling is useful for SEO?

As someone doing SEO you need to know what you are showing to Google. What your website looks like from a (Google) bot perspective. You need to if you are submitting one. You need to . Checking the web server logs is also a good idea, to know what Google bot is doing on your website.

You can also respectfully crawl your competitors' websites to better understand their SEO strategy.

Crawling is also interesting to grab data.

There are some great public datasets out there, even wikipedia is a great source. Let's take this that can be crawled:

library(dplyr)
library(rvest)
url <- "https://en.wikipedia.org/wiki/World_population"
population <- url %>%
  read_html() %>%
  html_nodes(xpath='//*[@id="mw-content-text"]/div[1]/table[7]') %>%
  html_table() %>%
  as.data.frame()

#removing extra row
population = population[-1,]

# convert to numeric
population$Population <- as.numeric(gsub(",","",population$Population))
population$Year <- as.numeric(population$Year)

and displayed as a plot

library(ggplot2)
ggplot(population) +
  aes(x = Year, y = Population) +
  geom_point() +
  theme_minimal() +
  scale_y_continuous(labels = scales::comma)

et voila

It's not really SEO, but it can be useful. I've also been using it to check the quality of the data on websites, like product prices, image availability, etc.

Again, or other crawlers might be a better choice, it depends on how integrated you want that to be and how custom those checks should be.

Let's move to a more practical use case, Download and

Screamingfrog
check XML sitemap quality
check the quality of your XML sitemap
check your website webpages and meta data
world population data