Published on 2025-08-07T06:18:08Z
YandexNews bot
The YandexNews bot is a specialized web crawler from the Russian search engine Yandex. Its purpose is to discover and index news content for Yandex's news aggregation services. For publishers, being indexed by this bot can drive significant traffic and increase visibility among Yandex users, particularly in the Russian-speaking market.
What is the YandexNews bot?
The YandexNews bot is a dedicated crawler from Yandex designed to discover, collect, and process news content from websites for Yandex's news aggregation services. The bot identifies itself in server logs with the user-agent string Mozilla/5.0 (compatible; YandexNews/4.0; +http://yandex.com/bots)
. It systematically visits websites to identify fresh news content, headlines, and publication dates to help Yandex categorize and present news to its users.
Why is the YandexNews bot crawling my site?
The YandexNews bot is crawling your site to discover and index its news content for Yandex's news services. If the bot is visiting your site, it is because your content appears to be news-related. The frequency of visits depends on how often you publish new content. Sites that publish breaking news may be crawled more frequently. This is an authorized and standard activity for a search engine with a news aggregation service.
What is the purpose of the YandexNews bot?
The purpose of the YandexNews bot is to power Yandex's news aggregation services by collecting and indexing news content from across the web. This allows Yandex to provide its users with current news stories organized by topic. For publishers, having your content included in these services can drive significant traffic to your site and increase your visibility among Yandex users. The service creates a centralized news experience while still directing users to the original source for the full article.
How do I block the YandexNews bot?
To prevent the YandexNews bot from including your content in its news services, you can add a specific disallow rule to your robots.txt
file. This is the standard method for managing crawler access.
To block this bot, add the following lines to your robots.txt
file:
User-agent: YandexNews
Disallow: /
How to verify the authenticity of the user-agent operated by Yandex?
Reverse IP lookup technique
host
linux command two times with the IP address of the requester.-
This command returns the reverse lookup hostname (e.g., 4.4.8.8.in-addr.arpa.).> host IPAddressOfRequest
-
> host ReverseDNSFromTheOutputOfFirstRequest