Published on 2025-08-07T06:18:08Z
Kangaroo Bot
Kangaroo Bot is a specialized web crawler from Kangaroo LLM, an Australian AI consortium. Its unique mission is to collect textual content exclusively from Australian websites to build a dataset for training Australia's first open-source large language model (LLM). The project emphasizes data sovereignty and aims to create an AI that understands the specific linguistic and cultural nuances of Australian English.
What is Kangaroo Bot?
Kangaroo Bot is a data scraping web crawler operated by the Australian AI consortium Kangaroo LLM. Its purpose is to systematically collect text content from Australian websites to build a dataset that captures the unique characteristics of Australian English. The bot identifies itself in server logs with the user-agent string Kangaroo Bot
and primarily targets Australian domains or servers located in Australia. Unlike global AI scrapers, it uses geographic filtering to ensure the relevance of its data. It is designed to be a well-behaved bot, respecting robots.txt
protocols and maintaining a reasonable request rate.
Why is Kangaroo Bot crawling my site?
Kangaroo Bot is visiting your website because it has been identified as a source of Australian content that could be valuable for training Australia's first open-source large language model. The bot prioritizes websites based on factors like lexical density, update frequency, and the presence of user-generated content, with a focus on sites like forums and news outlets. If your site is on an Australian domain (.au) or hosted on Australian servers, it is a prime target for this crawler. The frequency of visits depends on your site's content volume and update patterns.
What is the purpose of Kangaroo Bot?
The main purpose of Kangaroo Bot is to collect data for the 'VegeMighty' dataset, which will be used to train an Australian large language model. This initiative aims to create an AI that understands Australian language and culture, promoting data sovereignty by processing and storing all information within Australia. The project follows an 'opt-out-plus' model, allowing website owners to control how their content is used. The ultimate goal is to boost Australian AI innovation and ensure the country's digital future is shaped by a language model that accurately reflects its linguistic diversity.
How do I block Kangaroo Bot?
To prevent Kangaroo Bot from collecting your website's content for its AI training dataset, you can add a specific disallow rule to your robots.txt
file. This is the standard method for managing access for well-behaved web crawlers.
Add the following lines to your robots.txt
file to block Kangaroo Bot:
User-agent: Kangaroo Bot
Disallow: /
How to verify the authenticity of the user-agent operated by Kangaroo LLM?
Reverse IP lookup technique
host
linux command two times with the IP address of the requester.-
This command returns the reverse lookup hostname (e.g., 4.4.8.8.in-addr.arpa.).> host IPAddressOfRequest
-
> host ReverseDNSFromTheOutputOfFirstRequest