Grab Google Suggest Search Queries using R'

To make things easier, I've created two dedicated functions:

  • getGSQueries this one grabs the queries

  • suggestGSQueries this one merges each request's results

Just copy and paste those 2 functions inside your RStudio Console

getGSQueries <- function (search_query, code_lang) {
  packages <- c("XML", "httr")
  if (length(setdiff(packages, rownames(installed.packages()))) > 0) {
    install.packages(setdiff(packages, rownames(installed.packages())))
  }
  library(httr)
  library(XML)
  
  
  query <- URLencode(search_query)
  url <-
    paste0(
      "http://suggestqueries.google.com/complete/search?output=toolbar&hl=",
      code_lang,
      "&q=",
      query
    )
  
  
  # message(url)
  # use GET method
  req <- GET(url)
  # extract xml
  
  # message(req$status_code)
  
  xml <- content(req)
  # parse xml
  doc <- xmlParse(xml)
  
  # extract attributes from
  # <CompleteSuggestion><suggestion data="XXXXXX"/></CompleteSuggestion>
  list <-
    xpathSApply(doc, "//CompleteSuggestion/suggestion", xmlGetAttr, 'data')
  
  #print results
  #print(list)
  return(list)
}
suggestGSQueries <- function (search_query, code_lang, level) {
  if(length(search_query) == 1){
  all_suggestion <- getGSQueries(search_query, code_lang)
  message("level 1")
  
  if (level > 1) {
    for (l in letters) {
      message("level 2 ", l)
      Sys.sleep(runif(1, 0, 2))
      local_suggestion <-
        getGSQueries(paste0(search_query," ", l), code_lang)
      all_suggestion <- c(all_suggestion, local_suggestion)
      
    }
    
    if (level > 2) {
      for (l1 in letters) {
        for (l2 in letters) {
          Sys.sleep(1+runif(1, 0, 9))
          message("level 3 ", l1, l2)
          local_suggestion <-
            getGSQueries(paste0(search_query," ", l1, l2), code_lang)
          all_suggestion <- c(all_suggestion, local_suggestion)
          
        }
        
      }
    }
  }
  
  all_suggestion <- unique(all_suggestion)
  } else {
    message(1," ",search_query[1])
    all_suggestion <- getGSQueries(as.character(search_query[1]), code_lang)
    for (word in 2:length(search_query)){
      Sys.sleep(1+runif(1, 0, 9))
      message(word," ",search_query[word])
      all_suggestion <- c(all_suggestion,getGSQueries(as.character(search_query[word]), code_lang))
    }

    all_suggestion
  }
}

This is how you can use it:

kwd <- suggestGSQueries('covid', 'en', 2)

View(as.data.frame(unlist(kwd)))

The first parameter is the seed keyword, the second one is the language (or host language), and the last one is the level of details (1,2 or 3).

1 will just grab the first suggestion list, 2 will grab suggestions if you add another letter ('covid a', 'covid b', 'covid c', ...), 3, which I don't recommend, will add two letters (covid aa, covid ab, ..) it's also possible to pass a vector instead of a string. In this example, we ask for Google suggestions for each of the results in the previous step.

it will drastically increase the keyword list and... it might a little bit of time too :)

deeper_kwd <- suggestGSQueries(kwd, 'en', 1)

View(as.data.frame(unlist(deeper_kwd)))

Use these functions with caution because they can send a lot of queries to Google and you might get your IP banned.

Last updated