List Of User Agents For Scraping

By admin

When you start scraping the web, you’ll notice that scripts are sometimes blocked seemingly without reason. Somehow, the website knows you’re not using a real browser, and sensing your intentions, it blocks your access.

There’s an easy way to tackle this—by changing your user agent header. Keep reading to find out what a user agent is, which are the most common user agents, and how you can adjust your code to mimic the user agent header of a browser.

What Is a User Agent, and Why Is It Important for Web Scraping?

When you send a request to a server through a browser or an HTTP client like the Requests library in Python, the request includes HTTP headers, which contain all kinds of information about this request.

Among other things, they also include a user agent header—a name that lets the server identify what kind of application the request comes from.

To see how it looks, you can use the following code. It uses the Requests library to send a request from your device. Then, it prints out the HTTP headers of that request in the console.

import requests

response = requests.get("https://example.com/")
print(response.request.headers)

It should print out something like this:

{'User-Agent': 'python-requests/2.31.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Length': '11', 'Content-Type': 'application/x-www-form-urlencoded'}

The relevant header in this case is the first one. As you see, the Requests library tells the server that you’re using it. What a snitch!

The user agent header of web browsers looks slightly different from the user agent header of standalone applications. Inside, it lists the version of the web browser, the operating system, and some other tidbits that can be used to determine what kind of content to serve the user.

For example, here’s the user agent string for the latest version of Chrome (at the time of writing this article) running on Windows:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36

By changing the user agent string from the default one to the one that is used by browsers, you can make the website think that the request originates from a real browser and hide the fact that you’re doing web scraping.

How to Change Your User Agent?

Most of the HTTP client applications used in web scraping let you easily change the contents of the user agent string and, in that way, mimic using a real browser.

In this part, you’ll learn how to do it with Requests, the most popular Python HTTP client library.

Let’s say that you already have some code using Requests up and running.

import requests

response = requests.get("https://example.com/")
print(response.request.headers)

To change the user agent header that Requests uses, create a new headers variable that will contain a dictionary. The dictionary needs just one entry—the user agent header.

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36'}

Now you can pass the headers variable to the requests.get() function.

import requests

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36'}

response = requests.get("https://example.com/", headers=headers)
print(response.request.headers)

It will include the user agent header in the request, overwriting the default header. The request will now have a user agent header that matches that of the Chrome browser.

{'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}

Which User Agents Are Commonly Used for Scraping Websites?

If you need to pick a user agent string for your web scraping script, the best bet is to just take the most commonly used one. It will blend with the other traffic sent to the website and not stand out.

Currently, the most common user agent is the following:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36

It’s the user agent header of the latest Chrome browser running on Windows (starting from Windows 11, browsers don’t discern between Windows versions in the browser). While it might seem odd that there are mentions of other browsers in this user agent as well, there is a solid reason behind that.

Here are other user agents you can use, in order of popularity:

  1. Chrome 115.0 on macOS
    Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36
  2. Chrome 114.0 on Windows
    Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
  3. Firefox 116.0 on Windows
    Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/116.0
  4. Firefox 115.0 on Windows
    Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/115.0
  5. Chrome 114.0 on macOS
    Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
  6. Chrome 115.0 on Linux
    Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36
  7. Chrome 116.0 on Windows
    Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36
  8. Firefox 115.0 on Linux
    Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0
  9. Firefox 116.0 on Linux
    Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/116.0
  10. Edge 115.0 on Windows
    Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36 Edg/115.0.1901.188

Keep in mind that the latest browser changes, so it’s a good idea to research commonly used user agent headers and update your headers time by time.

Alternatively, you can just get the user agent string of your browser by googling the keyphrase “What’s my user agent?”. Google should display your user agent string, which you can then set as the user agent for your script. Given that it’s copied from a real browser, it should look natural enough to bypass most scraping restrictions.

Where Can I Find a Comprehensive List of User Agents for Web Scraping Purposes?

If you require a more detailed list of user agents you can use for web scraping, check out this blog post. It contains a list of commonly used desktop-based user agents that is continually updated based on data from people visiting the blog.