For those interested in seeing how their websites components are indexed (or not), Googles decision to open source their robots.txt parser is an amazing bit of news. Webmasters have been struggling with understanding the robots.txt files for many years. The challenge was not so much how to write and declare the directives in the files; more so to fully comprehend what actions would be taken by each search engine. While there is a single, de-facto standard, the Robots Exclusion Protocol (REP), the manner in which the corner cases were handled was ambiguous, like when their text editor included BOM characters in their robots.txt files.
On July 1, 2019, Google announced that they are spearheading the effort to make the REP an internet standard. Thank you, Google. We applaud your moves thunderously!
Webstation has set up a public Github repository where you can download a snapshot of the full C++ library. It is an Apache license. Full instructions on how to build and run the library are included.
Read the full article at https://webmasters.googleblog.com/2019/07/repp-oss.html