Published on 2025-08-07T06:18:08Z

YandexNews bot

The YandexNews bot is a specialized web crawler from the Russian search engine Yandex. Its purpose is to discover and index news content for Yandex's news aggregation services. For publishers, being indexed by this bot can drive significant traffic and increase visibility among Yandex users, particularly in the Russian-speaking market.

What is the YandexNews bot?

The YandexNews bot is a dedicated crawler from Yandex designed to discover, collect, and process news content from websites for Yandex's news aggregation services. The bot identifies itself in server logs with the user-agent string Mozilla/5.0 (compatible; YandexNews/4.0; +http://yandex.com/bots). It systematically visits websites to identify fresh news content, headlines, and publication dates to help Yandex categorize and present news to its users.

Why is the YandexNews bot crawling my site?

The YandexNews bot is crawling your site to discover and index its news content for Yandex's news services. If the bot is visiting your site, it is because your content appears to be news-related. The frequency of visits depends on how often you publish new content. Sites that publish breaking news may be crawled more frequently. This is an authorized and standard activity for a search engine with a news aggregation service.

What is the purpose of the YandexNews bot?

The purpose of the YandexNews bot is to power Yandex's news aggregation services by collecting and indexing news content from across the web. This allows Yandex to provide its users with current news stories organized by topic. For publishers, having your content included in these services can drive significant traffic to your site and increase your visibility among Yandex users. The service creates a centralized news experience while still directing users to the original source for the full article.

How do I block the YandexNews bot?

To prevent the YandexNews bot from including your content in its news services, you can add a specific disallow rule to your robots.txt file. This is the standard method for managing crawler access.

To block this bot, add the following lines to your robots.txt file:

User-agent: YandexNews
Disallow: /

How to verify the authenticity of the user-agent operated by Yandex?

Reverse IP lookup technique

To verify user-agent authenticity, you can use host linux command two times with the IP address of the requester.

```
> host IPAddressOfRequest
```
This command returns the reverse lookup hostname (e.g., 4.4.8.8.in-addr.arpa.).

> host ReverseDNSFromTheOutputOfFirstRequest

If the output matches the original IP address and the domain is associated with a trusted operator (e.g., Yandex), the user-agent can be considered legitimate.

IP list lookup technique

Some operators provide a public list of IP addresses used by their crawlers. This list can be cross-referenced to verify a user-agent's authenticity. However, both operators and website owners may find it challenging to maintain an up-to-date list, so use this method with caution and in conjunction with other verification techniques.