🤖
R for SEO
  • Using R for SEO, What to expect?
  • Getting started
  • What is R? What is SEO?
  • About this Book
  • Crawl and extract data
    • What's crawling and why is it useful?
    • Download and check XML sitemaps using R'
    • Crawling with rvest
    • Website Crawling and SEO extraction with Rcrawler
    • Perform automatic browser tests with RSelenium
  • Grabbing data from APIs
    • Grab Google Suggest Search Queries using R'
    • Grab Google Analytics Data x
    • Grab keywords search volume from DataForSeo API using R'
    • Grab Google Rankings from VALUE SERP API using R'
    • Classify SEO Keywords using GPT-3 & R'
    • Grab Google Search Console Data x
    • Grab 'ahrefs' API data x
    • Grab Google Custom search API Data x
    • Send requests to the Google Indexing API using googleAuthR
    • other APIs x
  • Export and read Data
    • Send and read SEO data to Excel/CSV
    • Send your data by email using gmail API
    • Send and read SEO data to Google Sheet x
  • data wrangling & analysis
    • Join Crawl data with Google Analytics Data
    • Count words, n-grams, shingles x
    • Hunt down keyword cannibalization
    • Duplicate content analysis x
    • Compute ‘Internal Page Rank’
    • SEO traffic Forecast x
    • URLs categorization
    • Track SEO active pages percentage over time x
  • Data Viz
    • Why Data visualisation is important? x
    • Use Esquisse to create plots quickly
  • Explore data with rPivotTable
  • Resources
    • Launch an R script using github actions
    • Types / Class & packages x
    • SEO & R People x
    • Execute R code online
    • useful SEO XPath's & CSS selectors X
Powered by GitBook
On this page
  • sitemap_scraping.R
  • main.yml

Was this helpful?

  1. Resources

Launch an R script using github actions

PreviousExplore data with rPivotTableNextTypes / Class & packages x

Last updated 4 years ago

Was this helpful?

The easiest way to do that is to duplicate this repository on GitHub

Just push the "Fork" button to create your copy.

Let me explain how it works. It's basically all about two files:

sitemap_scraping.R

#Load library
library(tidyverse)
library(rvest)

# declare XML sitemap url
url <- 'https://www.rforseo.com/sitemap.xml'

# grab html 

url_html <- read_html(url)

# Select all the <loc>'s
# and count them

nbr_url <- url_html %>% 
  html_nodes("loc")  %>%
  length()

# create a new row of data, with todayd's date and urls number
row <- data.frame(Sys.Date(), nbr_url)

# append at the end of the csv the new data
write_csv(row,paste0('data/xml_url_count.csv'),append = T)   

main.yml

This is where we are going to schedule the process.

name: sitemap_scraping

# Controls when the action will run.
on:
  schedule:
    - cron:  '0 13 * * *'


jobs: 
  autoscrape:
    # The type of runner that the job will run on
    runs-on: macos-latest

    # Load repo and install R
    steps:
    - uses: actions/checkout@master
    - uses: r-lib/actions/setup-r@master

    # Set-up R
    - name: Install packages
      run: |
        R -e 'install.packages("tidyverse")'
        R -e 'install.packages("rvest")'
    # Run R script
    - name: Scrape
      run: Rscript sitemap_scraping.R
      
 # Add new files in data folder, commit along with other modified files, push
    - name: Commit files
      run: |
        git config --local user.name actions-user
        git config --local user.email "actions@github.com"
        git add data/*
        git commit -am "GH ACTION Headlines $(date)"
        git push origin main
      env:
        REPO_KEY: ${{secrets.GITHUB_TOKEN}}
        username: github-actions

Parts you may want to modify are

  • If you are using packages, you need to ask Github to install them before running the script so be sure to include those on the list.

the resulting CSV is updated every day and can be scrape

this is the classic R script. It reaches this website and counts the number of url submitted. It relies on rvest package ( see )

the execution frequency rule. It's the weird line with cron. this one means " Runs at 13:00 UTC every day." here is the full .

RAW LINK:

XML sitemap
article about rvest
syntax documentation
https://raw.githubusercontent.com/pixgarden/scrape-automation/main/data/xml_url_count.csv
GitHub - pixgarden/scrape-automation: Scrape automation demo for https://www.rforseo.com/GitHub
scrape-automation/xml_url_count.csv at main · pixgarden/scrape-automationGitHub
Logo
Logo