# Using R for SEO, What to expect?

## The power of R'. What's different about it?

**R'** is a high-level programming language that mainly focuses on data analysis. Meaning it's "specialized". With a few lines of code, you can do a lot. Let me give you an example:

```r
internal_linking = read.csv(file.choose())
View(internal_linking)
```

These lines of code will :

* prompt a select file menu for you to select a CSV  (*`file.choose`*)
* It will import data inside R (*`read.csv`*) into `internal_linking` var
* The second line will just display it (*`View`*)

Let's do it with a website links file

![internal hyperlinks ](https://2998538899-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MSXey8kI3RllV-BSG8m%2F-MXvUfkVsbwxZKbgWzDe%2F-MXv_LDWhPsSReDr5uQj%2FtFobxabJRI.gif?alt=media\&token=b10442a1-2084-44c8-a18a-ed2f76e4ad12)

This is how you open and **browse a file with 2.6 Million rows effortlessly**. Noticed the small search icon on the top right? Yes, you can search within it quite easily too.

![search for dead links using http code](https://2998538899-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MSXey8kI3RllV-BSG8m%2F-MXvUfkVsbwxZKbgWzDe%2F-MXvahaWPvWoB8ERtDOE%2FScreenshot%202021-04-10%20at%2012.45.14%20pm.png?alt=media\&token=c5fbd0bd-4d71-445d-a5bb-372725c03406)

Want to count HTTP code? Here it is

```r
View(table(internal_linking$Status))
```

You can recognize the `View` function from before. the `table` function just count values. the **`$`** is a shortcut to access column values

It displays:

![count of http code](https://2998538899-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MSXey8kI3RllV-BSG8m%2F-MXvb72sde8C6D5xkw2L%2F-MXvcvIFm8iHXCESlzJF%2FScreenshot%202021-04-10%20at%2012.55.52%20pm.png?alt=media\&token=bb991fa5-aed2-42fa-900b-18cc36ebaf95)

This is 30 secondes job. The most time-consuming part was finding the file on the hard disk.&#x20;

Of course, this is just a silly example. There are countless ways to do this (third-party app, terminal, Excel pivot, panda/polar), but it gives a nice introduction to R's possibilities and how simple that is.

## *'There is a package for that'*

The real power of R relies on R packages. What's a package you may ask?  It's an on-demand library of functions you can load to help you in specialized tasks. Again let's take some examples.

### ⬢ `ggplot2`

It's one of the most famous packages. it can be used to build advanced charts and plots. To use it, you just have to install it once like this

```r
install.packages("ggplot2")
```

to load it

```r
library("ggplot2")
```

and after that, you can now use it

```r
ggplot(internal_linking)+
  aes(x = Status, fill = Status) +
  geom_bar() +
  scale_fill_hue() +
  theme_minimal()+ 
  coord_flip()
```

![](https://2998538899-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MSXey8kI3RllV-BSG8m%2F-M_C8pE4nj9NNfMs5b1N%2F-M_CA4339S7EtNtsjEzX%2FRplot014.png?alt=media\&token=1bea0dba-a01c-4a81-bd24-a8a9f88641e5)

Because we only want to see the problematic http codes, we are going to filter&#x20;

```r
internal_linking_filtered <- filter(internal_linking, !(Status %in% c("200 no error", "Not checked","999 LinkedIn blocking automated testing")))
ggplot(internal_linking_filtered)+
  aes(x = Status, fill = Status) +
  geom_bar() +
  scale_fill_hue() +
  theme_minimal()+ 
  coord_flip()

```

![](https://2998538899-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MSXey8kI3RllV-BSG8m%2F-M_C8pE4nj9NNfMs5b1N%2F-M_CwFZO25jyKGJhuX6Z%2FRplot01.png?alt=media\&token=db9ab318-b7e7-4f53-abd8-41485ea908c3)

Let's not go into details for now, but believe it or not, I'm not capable of writing this code, I just googled: "Bar charts chart ggplot" , "flip axis ggplot", ... shamelessly copy-paste the codes.

gggplot2 is powerful, it can make basically every chart you can think of

A few examples of plots done using `ggplot2`

![](https://2998538899-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MSXey8kI3RllV-BSG8m%2F-Md1WfNGXfvSWoRTuoeM%2F-Md1X-gfWVMDjdrQyVYN%2Froeder-feature-lawschools1.png?alt=media\&token=bdeed7c0-f473-4e05-a6a1-68fd0d2de72a) ![](https://2998538899-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MSXey8kI3RllV-BSG8m%2F-Md1WfNGXfvSWoRTuoeM%2F-Md1X6BEOw9Hwig3DS5R%2Fggplot_masterlist_42.png?alt=media\&token=c5e83904-5ea3-4b22-b5f6-ce2ce7891587)

![](https://2998538899-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MSXey8kI3RllV-BSG8m%2F-Md1WfNGXfvSWoRTuoeM%2F-Md1X3Dg1EbDjj5GVuyv%2Fggplot_masterlist_29.png?alt=media\&token=4da4920a-018f-4321-b1a0-d0e2800a88a3) ![](https://2998538899-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MSXey8kI3RllV-BSG8m%2Fuploads%2Fe5pbs59nOFNBhSpwbWV7%2Fthecode9-1.png?alt=media\&token=ec779cc7-eab5-4a55-be64-158d42c36f6a)

To see more examples:

* [The R Graph Gallery](https://www.r-graph-gallery.com/)`/` [Top 50 ggplot2 Visualizations](http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html), some nice code to copy-paste&#x20;
* [Tidy Tuesday](https://github.com/HudsonJamie/tidy_tuesday), nice to see how far ggplot2 can be pushed

Let's look at another package

### ⬢ `Lubridate`

[Lubridate](https://lubridate.tidyverse.org/) will help to deal with our timestamp values. After the now-classic installing and loading

```r
install.packages("lubridate")
library("lubridate")
```

It can be used to guess and transform this `Time.stamp`into a real date format

```r
internal_linking$real_date = dmy_hms(internal_linking$Time.stamp)
```

Values have been transformed into a true `Date` format.&#x20;

![before and after using Lubridate function](https://2998538899-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MSXey8kI3RllV-BSG8m%2F-MYHTyPp5uOObblVJhMf%2F-MYHV7FPwXmrtu9m9uOo%2FScreenshot%202021-04-14%20at%2011.27.50%20pm.png?alt=media\&token=86233f56-32c2-4160-a32d-e676ebaed2a8)

No more "at" in the middle or "am/pm". It's now easier to read and sort.  The `dmy_hms` function guessed successfully that the "at" was useless. &#x20;

Now that those are real dates and no longer character string, we can plot them using `ggplot`

```r
ggplot(internal_linking) +
   aes(x = real_date) +
   geom_histogram() +
   theme_minimal()
```

![the number of links discovered per date.](https://2998538899-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MSXey8kI3RllV-BSG8m%2F-MYHXF3Du7SOyBojmcTL%2F-MYHXLuqusJopbTjwr2u%2FRplot02.png?alt=media\&token=2c1daf3a-7d37-4029-8daa-2720c4d9829f)

the `Lubridate` package can also help with duration, time zone, intervals, ... Have a look at the [cheatsheets](https://rawgit.com/rstudio/cheatsheets/master/lubridate.pdf). It is a bit complex to get into but so much less than trying to do it yourself. I've lost literally days of my working life, trying to do this kind of stuff badly in Excel/Google Sheet.

### ⬢ `urltools`

One last example for the road. 'Want to extract links domains? You can sure use regex, or even try to split the string using "/" as a separator... OR you can use the more reliable `urltools` package which as a dedicated `domain()` function to do exactly that.

```r
# Installing and Loading Package
install.packages("urltools")
library("urltools")
# extract domain and feed it to a new data column called 'domain'
internal_linking$domain <- domain(internal_linking$URL)

```

Let's check out the values, nearly the same code as before:

```r
View(table(internal_linking$domain))
```

![top domains](https://2998538899-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MSXey8kI3RllV-BSG8m%2F-MYHXWFQT-qemJmOA9sx%2F-MYHZGehi8KYCb_LmEA7%2FScreenshot%202021-04-14%20at%2011.46.52%20pm.png?alt=media\&token=67256b13-4d07-412c-861b-37022c949773)

### Where to find packages?

Good question! All the previous packages have been downloaded from CRAN. It's a repository that contains [thousands of packages](https://cran.r-project.org/web/packages/available_packages_by_date.html). Github is also a great source. There are so many that,  the problem is often to find the right one. The way to go is usually to ask around using:

* Twitter using the #rstats hashtag
* [rstats subreddit](https://www.reddit.com/r/rstats/)
* [rstudio forum](https://community.rstudio.com/)
* There are a couple of nice slacks like the [Measurecamp's one](http://join.measure.chat/)

The community is smaller than other programming languages but people are more willing to help, it compensates.

## The confusing things about R

### The name

> *Oh you do '*&#x52; programming'*, that's cool. Is it like* Air Guitar? You do fake programming?\
> \- An anonymous member of my family

"R" is a weird name,  especially in this covid time, and it's not the most Google-friendly name either. So here are few links to help find R resources.

* <https://rseek.org/> - R search engine
* <https://www.r-bloggers.com/> - R blogs aggregator
* <https://www.bigbookofr.com/> all the R free books
* <https://github.com/search?l=R&q=seo&type=code> Search github for R source code

### the `<-`&#x20;

If you've seen some R' code before and you might have been surprised to see this "<-"  being used. it's just a legacy thing, historically R differentiates  "assignation"  and "comparison", example:

*assignation -* If you want to **set** the value of X to 3.  &#x20;

```r
x <- 3
```

*comparison -* Is X **equal** to 3?

```r
x == 3
```

If you want to keep this little *tradition* alive you can use <- but it is really up to you. Perfectly fine to use **=**

```r
x = 3
# same as
x <- 3
```

#### The `|>` or`%>%`

The (weird) **`|>`** operator allows operations to be carried out successively. Meaning: the results of the previous command are the entries for the next one. Like the **`>`** ( “pipe”) command line for the terminal if you came across it. You might also see `%>%` sometimes

Always better with an example, let's take the first line of code of this page

```r
View(read.csv(file.choose()))
```

Its 3 functions are used one after the other. The readability is decent. I wouldn't recommend adding a fourth.  the **|>** operator fixes this soon-to-be problem.

```r
# equivalent to the previous instruction
file.choose() |> read.csv() |> View()

# again equivalent
file.choose() |>
 read.csv() |>
 View()
```

As you can see, fairly easy to read. This operator is so practical that most R practitioners now use it.

### R' **relies a lot on vectors which are confusing**

Let's see some examples.

```r
#this instruction combine 3 numbers to make a vector and define the x variable.
x <- c(1,2,3) 

# this will display our vector
x

# this will concatenate the x vector twice
c(x,x) 

# Unlike tables, vectors first element need to be called with 1
x[1]
```

the good part is you don't need to make a loop every time you need to make some basic operations&#x20;

```r
#this will add one to all vector elements
y <- x+1

#Want to add up two vectors with each other? this will work
x+y

# it also works with function
x <- as.double(x)
```
