Published on 2025-08-07T06:18:08Z
magpie-crawler
magpie-crawler is an intelligence-gathering web crawler for Brandwatch, a leading social media monitoring and digital consumer intelligence company. Its purpose is to scan public web pages, blogs, and forums to find mentions of specific keywords, brands, or products that Brandwatch's clients are tracking. It is a well-behaved crawler that respects robots.txt
directives.
What is magpie-crawler?
magpie-crawler is a web crawler operated by Brandwatch, a social media monitoring and intelligence company. It functions as a conventional web scraper that systematically indexes public content from blogs, forums, news sites, and social media platforms. The crawler identifies itself in server logs with a user-agent string like magpie-crawler/1.1
. It is designed for efficient data collection and adheres to standard web protocols, including respecting robots.txt
, making it a legitimate and transparent crawler.
Why is magpie-crawler crawling my site?
magpie-crawler is visiting your website to collect public information that may be relevant to Brandwatch's clients. It is likely that your site contains content related to brands, products, or industry keywords that are being monitored. The crawler is particularly interested in content that expresses opinions or reviews. The frequency of its visits is determined by the relevance of your content to the monitoring needs of Brandwatch's clients. This crawling is considered authorized as it only accesses publicly available content.
What is the purpose of magpie-crawler?
The purpose of magpie-crawler is to support Brandwatch's social listening and digital consumer intelligence platform. The data it collects helps organizations monitor online mentions of their brands, analyze market trends, and gain insights into consumer behavior. For website owners, there is no direct benefit from being crawled, as the service is designed to benefit Brandwatch's clients. However, the crawler is designed to be respectful of server resources and should not cause performance issues.
How do I block magpie-crawler?
To prevent magpie-crawler from accessing your website, you can add a disallow rule for it in your robots.txt
file. This is the standard method for managing access for legitimate web crawlers.
Add the following lines to your robots.txt
file to block magpie-crawler:
User-agent: magpie-crawler
Disallow: /
How to verify the authenticity of the user-agent operated by Brandwatch?
Reverse IP lookup technique
host
linux command two times with the IP address of the requester.-
This command returns the reverse lookup hostname (e.g., 4.4.8.8.in-addr.arpa.).> host IPAddressOfRequest
-
> host ReverseDNSFromTheOutputOfFirstRequest