# Join Crawl data with Google Analytics Data

The SEO data to be analyzed often comes from different sources that why it's better to know how to connect or merge them. \
\
Let's imagine we have crawled your website, it might be quite nice to check which one of these pages got some SEO traffic.&#x20;

To do that we'll need to `merge` or `join` the two "datasets"&#x20;

### 1. Crawl data

Using `rcrawler`, we've collected our pages  (see [How to use rcrawler](https://www.rforseo.com/crawl/rcrawler) article)

```r
library(Rcrawler)
Rcrawler(Website = "https://www.rforseo.com/")
```

We now have a dataset (dataframe) of urls associated to their crawl depht called `INDEX`

```r
View(INDEX)
```

![second column is the url](https://2998538899-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MSXey8kI3RllV-BSG8m%2F-MYqU1rAnQ6HkKIcpo4V%2F-MYqUZoEsqh5RPnp5ZFk%2FScreenshot%202021-04-21%20at%2011.11.18%20pm.png?alt=media\&token=9204c5f5-2b10-4012-9bd3-7ed52b378401)

### 2. Google analytics data

Using `googleAnalyticsR` package we grab Google Analytics SEO Landing page (see [How so use googleAnalyticsR](https://www.rforseo.com/apis/web-analytics-google-analytics) article)

```r

# Between 1 january and 1 feb 2021
# we want the sessions
# we request landing and medium info too 
# and using the anti sampling option

ga <- google_analytics(ga_id, 
    date_range = c("2021-01-01", "2021-02-01"),
    metrics = "sessions",
    dimensions = c("medium","landingPagePath"),
    anti_sample = TRUE)


# We filter the data to only keep the SEO sessions

ga_seo <- ga %>% filter(medium == "organic")
```

### 3. Fuuuuu...sion!

First, you need to define what's the common ground. We have on the crawler data side the `Url` column and on the GA side the `landingPagePath`

So we need to make a conversion.  We'll remove the hostname from the Url using the `path` function `urltools` package.&#x20;

```r
INDEX$landingPagePath <- paste0("/",urltools::path(INDEX$Url))

INDEX$landingPagePath[INDEX$landingPagePath == "/NA"] <- "/"
```

and now we can merge

```r
crawl_ga_merged <- merge(INDEX,ga_seo)
```

That's it really. Lets display the data

```r
View(crawl_ga_merged)
```

![](https://2998538899-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MSXey8kI3RllV-BSG8m%2F-M_aJ3O5y_TJPYRQa58r%2F-M_aL8mONImRmQbaR0Ed%2FScreenshot%202021-05-13%20at%204.28.01%20pm.png?alt=media\&token=587c0ba7-1b88-4ab7-8ee6-73b1ec8de874)
