Bot Traffic: How to Identify & Protect Your Website

Updated on: December 23, 2023
Learn how to detect, block, and manage bot traffic on your website to improve site performance, protect ad revenue, and maintain accurate analytics data.

As a growing publisher, you’re likely grappling with two key challenges – selling millions of ad impressions at competitive rates and tackling bot traffic. The presence of invalid bots can inflate your page views and jeopardize your relationships with demand partners. But don’t worry; we’re here to help you identify and block them.

Did you know that in 2021, fraudulent traffic through bad bot actors accounted for 27.7% of web traffic, representing a 2.1% growth from the previous year? 

Most of the time, we don’t even notice them. However, if you’re reading this blog, it’s a sign that you’ve encountered bot traffic and are ready to take action. Unchecked bot traffic can lead to various problems, such as demand partners severing ties with you, skewed analytics data, and decreased rankings due to slow site speeds.

In this blog, we’ll explore how to detect and block bot traffic. But before diving into blocking strategies, it’s crucial to determine whether your site has it and whether it should be blocked.

Understanding Bot Traffic

Bots are more prevalent on the internet than you might think, and they can significantly impact your website and business as a publisher. While some bots are harmless, others can cause problems for publishers.

The issues caused by bot traffic include:

  1. Skewed analytics: Bot traffic can distort your website’s analytics data, making it difficult to understand the actual behavior of human visitors. This makes it challenging to optimize the site for user experience and conversions.
  2. Ad fraud: Invalid bots can generate fraudulent clicks on your ads, leading to inaccurate ad performance reporting. This could also result in demand partners cutting ties with you due to the presence of fraudulent traffic.
  3. Decreased site performance: A large volume of bot traffic can slow down your website, negatively impacting the user experience and causing your search engine rankings to drop.
  4. Security risks: Malicious bots can compromise your site’s security, steal sensitive information, or even take control of your site to use it for their purposes, such as launching spam campaigns or distributing malware.

Detecting Bot Traffic Using Google Analytics

Bot traffic can cause noticeable inconsistencies in your website analytics data. By recognizing these anomalies, you can identify potential bot traffic and take action to address it. Here are some common irregularities and the reasons behind them:

  1. Increased number of page views: Bots crawling your entire website may load multiple pages simultaneously, leading to a spike in page views.
  2. Decreased session duration: Bots quickly collect data from your pages without needing to read them like humans, resulting in a reduced average session duration for your site.
  3. Increased bounce rate: Scraping bots often visit a single page on your site before moving on to the next one, causing an increased bounce rate.
  4. Increased number of pages per session: Bots collecting large amounts of data from your site may browse hundreds of pages in a single session, leading to an unnaturally high number of pages per session. Some bots are deliberately designed to visit only a few pages to avoid detection through this metric.
  5. Decreased bounce rate: When multiple bots visit more than one page on your site, you may see a sharp decline in the bounce rate, which can be another indicator of bot activity.
  6. Decreased page load speed: A significant number of bots accessing your site simultaneously can overload your server and slow down your website.

By monitoring these anomalies in your Google Analytics data, you can detect the presence of bot traffic and take appropriate measures to minimize its impact on your website and overall online presence as a publisher.

Understanding Ghost Traffic

When discussing bot traffic, it’s important to be aware of “ghost traffic.” This type of traffic appears in your Google Analytics reports but never actually visits your website. Ghost traffic usually appears as referral traffic from irrelevant sites and targets Google Analytics servers to add data to your reports.

The primary purpose of ghost traffic is to pique the curiosity of webmasters. When you notice significant referral traffic from a particular site, you might be tempted to investigate the source. Visiting the referral site can expose you to various risks, such as hacking, viruses, cookie stuffing, or even unwanted ads. Since the traffic only affects your analytics tool, filtering it out will provide you with more accurate data.

To filter out ghost traffic, compile a list of hostnames sending ghost traffic to your analytics tool. A hostname is any domain where your Google Analytics tracking code is present, so be cautious when selecting hostnames. Some genuine hostnames, like “webcache.googleusercontent.com,” should not be dismissed as bot traffic.

Most ghost traffic will come from a “(not set)” hostname. Other invalid hostnames might appear legitimate but will have spammy URLs in the Source dimension. To filter out ghost traffic:

  1. Go to Google Analytics and select a date range of 1 year or more.
  2. Navigate to Audience > Technology > Network in the reporting section.
  3. Set “Hostname” as the primary dimension.
  4. Identify genuine hostnames from the list and create a regular expression including these names.
  5. Create a filter that includes only valid hostnames to exclude ghost traffic from your reports:
  •   Go to Admin in the bottom left corner of Google Analytics.
  •   Under the View section, navigate to Filters > Add Filter and name the filter.
  •   Select Custom as the Filter Type.
  •   Choose Hostname under Filter Field.
  •    Paste the regular expression you created in the Filter Pattern.
  •    Save the filter after verifying it.

Remember to update the filter whenever you add your tracking ID to a new location; otherwise, your reports won’t include data from that location. You can also use the opposite approach by excluding bot-related hostnames, which requires continually updating the list to exclude new ones.

Appreciating the Role of Legitimate Bots

Before you start blocking bot traffic on your site, you must understand that not all bots are harmful. Some bots play crucial roles in the smooth functioning of your website, and blocking them might have unintended consequences. 

Examples of Beneficial Bots:

  1. Search Engine Crawlers: Googlebot, Bingbot, and other search engine crawlers index your site’s content and help improve your search rankings. Blocking these bots may negatively impact your site’s visibility on search engines.
  2. Social Media Bots: Bots from platforms like Facebook, Twitter, and Pinterest crawl your site to gather information for sharing on their networks. Blocking these bots might hinder the visibility of your content on social media platforms.
  3. Monitoring and Security Bots: Some bots are designed to monitor your site’s performance, uptime, and security. Blocking these bots might expose your site to various security threats, such as hacking attempts, distributed denial-of-service (DDoS) attacks, and unauthorized access to sensitive information. 

Additionally, blocking these bots could lead to performance issues, resulting in excessive server load, slow loading resources, or other technical factors that negatively impact your website’s speed, responsiveness, or overall user experience. 

Blocking Invalid Bot Traffic: Methods and Solutions

Use Bot Management Solution

Bot management solutions providers work for a single goal — to protect websites from malicious traffic. They are specialists, so they possess in-depth knowledge and expertise that goes beyond what a typical website owner might have regarding site protection. More importantly, they have data about all the good and bad bots out there on the internet.

An updated database helps combat all the latest bots on the web. You need to contact a reputable bot management solution provider, and they will assist you in setting up the necessary system tailored to your needs. Some well-known names in the field of bot management are Cloudflare Bot Manager, DataDome, Pixalate, etc.

Manually Block Invalid IP Addresses

  1. Identify the IP addresses of bot traffic.
  2. Block the addresses in your website’s cPanel, typically found under the security tab of your web hosting account.
  3. If IP addresses are unknown, create a “honeypot*” with an invisible link on your homepage visible only to bots.

*Honeypot is a fake network set up to look like a network but designed to lure attackers into revealing their tactics and techniques.

  1. Block the honeypot page using the Robots.txt file.
  2. Use Google Analytics to find the IP addresses of bots visiting the honeypot page.
  3. Block the identified IP addresses in the cPanel.
  4. Regularly update the block list as new IP addresses are discovered.
  5. For Apache Web Servers, use the .htaccess file to block bots based on IP address, HTTP referrer, or user agent.
  6. Create or update the .htaccess file with blocking instructions.
  7. Upload the .htaccess file to your directory using FTP.

Use a Web Application Firewall (WAF)

A Web Application Firewall (WAF) serves as a protective shield for websites, guarding against security threats. In simple terms, it functions as a barrier between a website (or web app) and the client, acting as a reverse proxy server. 

WAFs can be set up as host-based, network-based, or cloud-based solutions, with various free open-source and commercial options available. Advanced WAFs, such as Akamai, monitor HTTP requests to prevent malicious attacks before they reach the servers.

Use reCAPTCHA

You must have seen the small rectangular checkbox on many websites that says, “I’m not a robot”; it is called reCAPTCHA. When a user clicks on the box, it studies the movement of the cursor to differentiate a bot from a human. 

Humans always have a little randomness in their movement, whereas robots work straight. If the reCAPTCHA test cannot decide based on the mouse movement, then further difficult tasks, such as identifying images, are given to the user. The user is allowed to move further only after the test is passed.

Google provides reCAPTCHA V3 for free. You need to register your site before starting the integration. You can visit the reCAPTCHA guide for all the resources required.

Use WordPress Plugins

WordPress provides various plugins to fight bots. Many websites use WordPress, so these plugins are accessible to numerous publishers. Using plugins is an easy and convenient solution for smaller publishers with fewer resources.

For effective spambot and referral bot management on your WordPress site, follow these steps to find and set up a suitable bot-blocking plugin:

  1. Visit the WordPress Dashboard.
  2. Navigate to Plugins > Add New.
  3. Search for bot-blocking plugins, such as “Stop Spammers,” “Wordfence Security,” “WPBruiser,” or “Anti-spam by CleanTalk.”
  4. Install and activate the plugin that best fits your needs.
  5. Go to the settings page of the installed plugin.
  6. Explore and configure the various actions the plugin can perform to effectively manage spambots and referral bots.

Know about wordpress job bidding plugin.

Use IAB Bot lists

The Interactive Advertising Bureau provides two lists of identifiers to help you block the bot traffic. One of the lists is the blacklist, and the other is a whitelist. The lists are updated monthly and based on user agents, which are strings browsers or bots sent to identify themselves to a web server.

When a user agent matches an entry in the whitelist and does not match any entry in the blacklist, then it should be considered as a real user. 

If a user agent does not match any entry in both lists or if it passes the whitelist but does not pass the blacklist, it should be considered a bot.

You can download the IAB/ABC International Spiders and Bots list from the IAB website. The list is not free, and you have to pay for it through an annual membership. After getting the membership, you can access the IAB/ABC International Spiders and Bots list, which can help you effectively identify and block bot traffic.

By using these lists and regularly updating them, you can efficiently manage the bots accessing your site and ensure that only genuine users are interacting with your content. This can ultimately help improve ad performance and revenue.

Improve Your Paid Traffic 

Many publishers acquire traffic through content recommendation services. Obviously, if your content is recommended on sites full of bots, then the acquired traffic will have bots. Therefore you need to ensure that your traffic source has great quality.

To ensure high-quality traffic, adopt the following best practices for traffic acquisition:

  • Block spammy websites to avoid low-quality traffic.
  • Target websites that have proven to be effective for your campaigns in the past.
  • Use keyword targeting carefully, and avoid excessive keyword blocking.
  • Continuously refine your strategies based on new data and insights.
  • Find the optimal balance between CPC and CTR that keeps your campaign within budget while driving traffic from high-quality sites.

Conclusion: Tackling Bot Traffic Effectively

For any publisher, effectively managing bot traffic is crucial to maintaining the integrity of your website, user experience, and advertising partnerships. 

By understanding the differences between harmful and beneficial bots, detecting bot traffic using tools like Google Analytics, and implementing various techniques such as using a bot management solution, Web Application Firewall, reCAPTCHA, WordPress plugins, or IAB bot lists, you can minimize the impact of invalid bot traffic on your site.

Moreover, it’s essential to continually monitor your website’s performance, adjust your strategies to tackle emerging bot threats, and maintain a healthy online presence. By taking a proactive approach to identifying and blocking invalid bots, you can protect your site from the negative effects of bot traffic, improve your analytics data, and ensure a smooth and enjoyable user experience for your human visitors.

FAQs

1. What is Bot Traffic?

Bot traffic refers to non-human visitors to a website generated by automated programs or scripts called bots. These bots can perform various tasks, such as crawling websites for search engines or monitoring site performance.

2. How to Identify Bot Traffic?

Bot traffic can be identified through analytics tools like Google Analytics by observing unusual spikes in traffic, abnormal bounce rates, or a high percentage of visits with very short session durations. Additionally, you can check server logs for suspicious IP addresses or user agents.

3. How to Block Bot Traffic?

To block bot traffic, you can use methods like updating your website’s .htaccess file, implementing a Web Application Firewall (WAF), using bot management solutions, or employing plugins for content management systems like WordPress. It’s essential to distinguish between good and bad bots to avoid blocking helpful bots like search engine crawlers.

4. What is the Impact of Bot Traffic on Ad Revenue?

Bot traffic can negatively impact ad revenue, as it inflates the number of ad impressions without generating genuine user engagement. This can lead to lower ad performance, decreased click-through rates, and potential penalties from ad networks, ultimately reducing ad revenue.

News and Tips for Publishers

Get the inside scoop on publishing and programmatic with our 5-minute newsletter.