Published on 2025-08-07T06:18:08Z

Applebot-Extended

Applebot-Extended is a specialized web agent from Apple, introduced in June 2024 to support Apple Intelligence. Unlike the primary Applebot that indexes for search, Applebot-Extended's specific purpose is to evaluate already-crawled web content for its suitability in training Apple's generative AI and large language models (LLMs). Website owners can opt out of this AI training usage without affecting their site's visibility in standard Apple search results.

What is Applebot-Extended?

Applebot-Extended is a special-purpose web crawler operated by Apple. It was announced in June 2024 alongside Apple Intelligence. Its function is distinct from the main Applebot; instead of crawling the web for search indexing, it assesses content that has already been indexed by Applebot to see if it is suitable for training Apple's generative artificial intelligence (AI) models. It acts as a secondary filter for AI training data, identifying itself with the user-agent string Applebot-Extended. This creates a clear separation between content used for search and content used for AI model training.

Why is Applebot-Extended on my site?

Applebot-Extended is not crawling your site independently but is evaluating content previously fetched by the primary Applebot. Its presence indicates that Apple's systems are considering your publicly available web content, particularly high-quality, text-rich pages, for use in training its foundation AI models. The frequency of its activity is tied to the main Applebot's crawl schedule. This is an authorized activity from Apple that respects standard web protocols, and website owners are provided with a mechanism to opt out.

What is the purpose of Applebot-Extended?

The sole purpose of Applebot-Extended is to identify and collect high-quality web content to train and improve Apple's generative AI models. These models are the foundation for features within the "Apple Intelligence" ecosystem. The data gathered by this bot supplements other licensed and proprietary datasets, helping Apple enhance its AI capabilities while minimizing the use of private or user-generated data. For website owners, having your content used in this way offers no direct benefit, but it does contribute to the improvement of AI systems used by millions of Apple customers.

How do I block Applebot-Extended?

If you do not want your website's content to be used for training Apple's AI models, you can block Applebot-Extended by adding a specific rule to your robots.txt file. Blocking this bot will not affect your site's visibility in Apple's standard search services like Siri and Spotlight.

To opt out, add the following lines to your robots.txt file:

User-agent: Applebot-Extended
Disallow: /

How to verify the authenticity of the user-agent operated by Apple?

Reverse IP lookup technique

To verify user-agent authenticity, you can use host linux command two times with the IP address of the requester.
  1. > host IPAddressOfRequest
    This command returns the reverse lookup hostname (e.g., 4.4.8.8.in-addr.arpa.).
  2. > host ReverseDNSFromTheOutputOfFirstRequest
If the output matches the original IP address and the domain is associated with a trusted operator (e.g., Apple), the user-agent can be considered legitimate.

IP list lookup technique

Some operators provide a public list of IP addresses used by their crawlers. This list can be cross-referenced to verify a user-agent's authenticity. However, both operators and website owners may find it challenging to maintain an up-to-date list, so use this method with caution and in conjunction with other verification techniques.