Published on 2025-08-07T06:18:08Z

TurnitinBot

TurnitinBot is the official web crawler for Turnitin, the well-known academic plagiarism detection service. Its purpose is to scan and collect public, text-based web content to build the vast comparison database that is used to check student submissions for plagiarism. The bot's activity supports academic integrity but does not offer a direct benefit, such as referral traffic, to the websites it crawls.

What is TurnitinBot?

TurnitinBot is the web crawler for Turnitin, a company that provides academic integrity solutions to educational institutions. The bot systematically crawls the internet to collect publicly available content for its plagiarism detection service. It identifies itself in server logs with a user-agent string like TurnitinBot/2.1 (http://www.turnitin.com/robot/crawlerinfo.html). The bot is designed to focus specifically on extracting text content and does not process dynamic elements like JavaScript. The content it collects becomes part of Turnitin's reference database.

Why is TurnitinBot crawling my site?

TurnitinBot is visiting your website to gather its text content to add to its comparison database. The bot prioritizes content with academic relevance, such as scholarly articles and educational resources, which students might reference or copy. The frequency of visits depends on your site's content and how often it is updated. The crawling is generally considered authorized as it respects standard web protocols and accesses only publicly available information.

What is the purpose of TurnitinBot?

The purpose of TurnitinBot is to support Turnitin's plagiarism detection service. When a student submits a paper, it is compared against the bot's database to identify potential matches with existing online sources. This helps educational institutions maintain academic integrity. The bot does not store complete copies of websites but creates text fingerprints for similarity matching. While website owners do not directly benefit from the crawl, the broader academic ecosystem benefits from tools that uphold standards of original scholarship.

How do I block TurnitinBot?

To prevent TurnitinBot from indexing your content for its plagiarism database, you can add a specific disallow rule to your robots.txt file. This is the standard method for managing crawler access.

To block this bot, add the following lines to your robots.txt file:

User-agent: TurnitinBot
Disallow: /

How to verify the authenticity of the user-agent operated by Turnitin?

Reverse IP lookup technique

To verify user-agent authenticity, you can use host linux command two times with the IP address of the requester.

```
> host IPAddressOfRequest
```
This command returns the reverse lookup hostname (e.g., 4.4.8.8.in-addr.arpa.).

> host ReverseDNSFromTheOutputOfFirstRequest

If the output matches the original IP address and the domain is associated with a trusted operator (e.g., Turnitin), the user-agent can be considered legitimate.

IP list lookup technique

Some operators provide a public list of IP addresses used by their crawlers. This list can be cross-referenced to verify a user-agent's authenticity. However, both operators and website owners may find it challenging to maintain an up-to-date list, so use this method with caution and in conjunction with other verification techniques.