Published on 2025-08-07T06:18:08Z

YandexOntoDBAPI

YandexOntoDBAPI is a specialized web crawler from the Russian search engine Yandex that focuses on extracting structured data from websites, similar to YandexOntoDB but often with a higher request frequency. It targets pages with rich structured data and API endpoints to help build Yandex's knowledge graph, which powers semantic search capabilities and rich results in search.

What is YandexOntoDBAPI?

YandexOntoDBAPI is a web crawler from Yandex that is focused on structured data extraction and ontology processing to build the search engine's knowledge graph. It identifies itself in server logs with the user-agent string Mozilla/5.0 (compatible; YandexOntoDBAPI/1.0; +http://yandex.com/bots). The bot is distinguished by its high request frequency and its focus on pages with structured data markup and API endpoints. It is particularly interested in semantic web technologies.

Why is YandexOntoDBAPI crawling my site?

YandexOntoDBAPI is visiting your website to extract structured data that can enhance Yandex's knowledge graph. If your site contains rich structured data like JSON-LD or has semantic web elements, you are more likely to see this crawler. The frequency of visits depends on your site's authority and the presence of valuable structured data. Its crawling is generally considered an authorized part of normal search engine operations, though its high request rate may be a concern for some operators.

What is the purpose of YandexOntoDBAPI?

The purpose of YandexOntoDBAPI is to build and enhance Yandex's knowledge graph by extracting and processing structured data from websites. This supports Yandex's search engine by gathering information that can be used to improve search results and power rich snippets. The 'API' in its name suggests it is designed for automated data retrieval. For website owners, this crawler can increase your content's visibility in Yandex search, especially for queries that require specific factual information.

How do I block YandexOntoDBAPI?

To prevent YandexOntoDBAPI from accessing your website, you can add a specific disallow rule to your robots.txt file. This is the standard method for managing crawler access.

To block this bot, add the following lines to your robots.txt file:

User-agent: YandexOntoDBAPI
Disallow: /

How to verify the authenticity of the user-agent operated by Yandex?

Reverse IP lookup technique

To verify user-agent authenticity, you can use host linux command two times with the IP address of the requester.

```
> host IPAddressOfRequest
```
This command returns the reverse lookup hostname (e.g., 4.4.8.8.in-addr.arpa.).

> host ReverseDNSFromTheOutputOfFirstRequest

If the output matches the original IP address and the domain is associated with a trusted operator (e.g., Yandex), the user-agent can be considered legitimate.

IP list lookup technique

Some operators provide a public list of IP addresses used by their crawlers. This list can be cross-referenced to verify a user-agent's authenticity. However, both operators and website owners may find it challenging to maintain an up-to-date list, so use this method with caution and in conjunction with other verification techniques.