Keyword rank tracking is very common in the world of marketing. Many marketing teams use expensive tools to track their website ranks for multiple keywords on a regular basis. Since they have to do it on a daily basis this comes quite costly for new businesses, individuals, or startups.

So in this post, we will create a crawler that will keep you updated with your latest rank on any keyword you want to track.

We will create a web scraper to scrape google search results using python. I am assuming that you have already installed python on your computer. We will begin with coding the web scraper.

Let’s code

First, we need to install all the necessary libraries.

·        Requests

·        BeautifulSoup

Create a folder and then install these libraries:

mkdir googlescraper
pip install requests
pip install beautifulsoup4

Then we will import these libraries into our file. You can name the file googlescraper.py:

import requests
from bs4 import BeautifulSoup

Our target URL will change according to the keyword we want to scrape but the basic structure of the google URL will remain the same.

Google URL structure — https://www.google.com/search?q={any keyword or phrase}

For this blog post, our target keyword will be “scrape prices” and we have to find the rank of the domain christian-schou.dk at this keyword.

So, our target URL will be this.

Let us first check whether this domain is present in the first 10 results or not.As you can see page URLs are located inside class jGGQ5e and then into yuRUbf. After this, we have to find a tag inside class yuRUbf and then get the value of href tag.

headers={‘User-Agent’:’Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36',’referer’:’https://www.google.com'}
target_url=’https://www.google.com/search?q=scrape+prices'
resp = requests.get(target_url, headers=headers)
print(resp.status_code)

Here we have declared some headers like User-Agent and a Referer to act like a standard browser and not as a crawler.

Then we declared our target URL and finally made the GET request using the requests library. Once we run this code you should see 200 on your terminal.

Now, our target is to find our domain. Let’s find it using BS4.

soup=BeautifulSoup(resp.text,’html.parser’)
results = soup.find_all(“div”,{class:”jGGQ5e”})

We have used html.parser inside the BS4 library to create a tree of our HTML code. results array, you will get the HTML code of all the top 10 results.

In this list, we have to search our links one by one. For that, we are going to use for loop.

from urllib.parse import urlparse
for x in range(0,len(results)):
   domain=urlparse(results[x].find("div",{"class":"yuRUbf"}).find("a").get("href")).netloc
  
   if(domain == 'blog.christian-schou.dk'):
       found=True
       position=x+1
       break;
   else:
       found=False
if(found==True):
   print("Found at position", position)
else:
   print("not found in top", len(results))

We have used urlparse library to parse out the domain from the link. Then we are trying to match our domain with the domain we extracted.

If it matches we will get the position and if it does not match then it will print not found.

Let us run this code and let’s see what we get.

Well, the request was successful as I can see a 200 but we could find this domain in the top 10 results.

Let’s search for it in the top 20 results, but for that, we need to change the target URL and add param &num=20 to our google URL.

Google URL will become https://www.google.com/search?q=scrape+prices&num=20

Run the program again and check whether you see this domain or not.

This time I found the domain in the 18th position on Google search results.

So, the rank of this domain for “scrape prices” is 18th in my country. This position will change according to the country as google display different results in different country.

This is how you can track the rank of any domain for any keyword. If you want to track it for different countries then you can use google search result scraper.

Going forward you can also create an SEO tool just like Ahref and Semrush or you can create a lead generation tool like Snov.

Complete Code

import requests
from bs4 import BeautifulSoup
from urllib.parse import urlparse
headers={‘User-Agent’:’Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36',’referer’:’https://www.google.com'}
target_url=’https://www.google.com/search?q=scrape+prices&num=20'
resp = requests.get(target_url, headers=headers)
print(resp.status_code)
soup=BeautifulSoup(resp.text,’html.parser’)
results = soup.find_all(“div”,{class:”jGGQ5e”})
# print(results)
for x in range(0,len(results)):
domain=urlparse(results[x].find(“div”,{class:”yuRUbf”}).find(“a”).get(“href”)).netloc
 
if(domain == ‘blog.christian-schou.dk’):
  found=True
  position=x+1
  break;
else:
  found=False
if(found==True):
print(“Found at position”, position)
else:
print(not found in top”, len(results))

Running the code every 24 hours

Let’s say you want to track your position every 24 hours because you are putting lots of effort into marketing and you want to see results on daily basis.

For that you can mail yourself the current position every morning, this will keep you updated.We will use schedule library to implement this task.

Complete Code

import requests
from bs4 import BeautifulSoup
from urllib.parse import urlparse
import schedule
import time
def tracker():
headers={‘User-Agent’:’Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36',’referer’:’https://www.google.com'}
target_url=’https://www.google.com/search?q=scrape+prices&num=20'
resp = requests.get(target_url, headers=headers)
print(resp.status_code)
soup=BeautifulSoup(resp.text,’html.parser’)
results = soup.find_all(“div”,{class:”jGGQ5e”})
# print(results)
for x in range(0,len(results)):
 domain=urlparse(results[x].find(“div”,{class:”yuRUbf”}).find(“a”).get(“href”)).netloc
if(domain == ‘blog.christian-schou.dk’):
 found=True
 position=x+1
 break;
else:
 found=False
 position=x+1
if(found==True):
print(“Found at position”, position)
else:
print(not found in top “+ str(position)+ “ results”)
if __name__ == “__main__”:
schedule.every(5).seconds.do(tracker)
while True:
 schedule.run_pending()

Here we are running the schedule every 5 seconds just to test whether it will work for us or not. Once you run it you will get the results like this.

Now, to run it every day or after every 24 hours you can use:

schedule.every().day.at("12:00").do(job)

Now, let us mail ourselves these results to keep us updated with the latest position on google. For this task, we will use smtplib library.

Mail

import requests
from bs4 import BeautifulSoup
from urllib.parse import urlparse
import schedule
import time
import smtplib, ssl
def mail(position):
attackMsg = position
server = smtplib.SMTP(‘smtp.gmail.com’, 587)
server.ehlo()
server.starttls()
server.login(from@gmail.com”, “xxxx”)
SUBJECT = “Position Alert”
message = ‘From: from@gmail.com \nSubject: {}\n\n{}.format(SUBJECT, attackMsg)
server.sendmail(from@gmail.com”, ‘send_to@gmail.com’, message)
 
server.quit()
return True
def tracker():
headers={‘User-Agent’:’Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36',’referer’:’https://www.google.com'}
target_url=’https://www.google.com/search?q=scrape+prices&num=20'
resp = requests.get(target_url, headers=headers)
print(resp.status_code)
soup=BeautifulSoup(resp.text,’html.parser’)
results = soup.find_all(“div”,{class:”jGGQ5e”})
# print(results)
for x in range(0,len(results)):
 domain=urlparse(results[x].find(“div”,{class:”yuRUbf”}).find(“a”).get(“href”)).netloc
if(domain == ‘blog.christian-schou.dk’):
 found=True
 position=x+1
 break;
else:
 found=False
 position=x+1
if(found==True):
message=”Found at position “+ str(position)
mail(message)
else:
message=not found in top “+ str(position)+ “ results”
mail(message)
if __name__ == “__main__”:
schedule.every().day.at("12:00").do(job)
while True:
 schedule.run_pending()

In the mail function, we are making a login attempt to our Gmail account with the password.

Then we have declared the subject and the message that will be sent to us. And finally we used .senemail function to send the email alert. This will send an email alert every 24 hours directly to your inbox.

Now, you might wonder if we stop the script our scheduler will stop working. Yes, you are right and to tackle it we are going to use nohup.

Nohup will ignore the hangup signal and will keep running your script even if you stop it.

I leave this task to you as homework in a hope that you will learn something new and unique.

Conclusion

In this post, we learned how we can create a task that can run at any given interval of time. We used four libraries i.e. requests, BS4, schedule, and smtplib to complete this task.

Now it does not stop here, you can create any type of scheduler like news updates, stock updates, etc. I am sure python will make your job fast and simple.