Documentation Proposal: Dealing With Robots
TWikiAdminCookBook has a section on how to deal with robots
I don't know if this is the kind of documenatation you guys can use or not, but if it is, please feel free to take it and do whatever you want. Or email me and let me know: pauljohn AT ku DOT edu.
--
PaulJohnson - 11 Nov 2004
what about creating a
robots.txt based on the topics names in the wiki? (for which pages are allowed)
--
WillNorris - 11 Nov 2004
Isn't also possible to identify robots from the way the identify themselves to the webserver and present different skins to them? I am not sure if TWiki has the capabilities, but it should be possible to use a different skin for Internet Explorer, Mozilla, Opera and one for bots. The bots could be presented with only the plain topic text and no side bars.
--
ChristopherOezbek - 27 Jan 2005
PreventGoogleToIndexRevisions has an example
robots.txt for blocking access to actions such as edit/attach/diff etc.
SearchEngineIndexOnlyPlainView should solve most of the other problems mentioned above when it is implemented.
--
SamHasler - 01 Feb 2005
Even though the skins may have a
<meta name="robot" ... statement in them, it is edited out in
View.pm for all except older revisions.
This makes no sense to me. What is the point of making the site unconditionally indexable?
--
AntonAylward - 17 Jul 2005
Not so. In
CairoRelease it is never edited out. In
DevelopBranch it is edited out only if you have enabled {AntiSpam}{RobotsAreWelcome} in
configure.
--
CrawfordCurrie - 18 Jul 2005
My audience asked me to allow our corporate intranet search engine to be explicitly allowed on my TWiki. In addition to a
robots.txt file, I seem to have found an efficient approximation. I have replaced the meta element which usually excludes robots by the following conditional:
%IF{ "$ QUERYSTRING" then="<meta name='robots' content='noindex, nofollow' />"}%
Unlike
robots.txt, this does not prevent the spider from
visiting the pages. However, it works fine against
indexing either of:
- old revisions
- non-default skin or cover (like "printable")
-
sortcol manifolds in pages containing sortable/editable tables
- ...or any combination of the above.
--
HaraldJoerg - 20 Mar 2006
This should not be necessary in TWiki 4.0 since the robots noindex metatag is already present if there is a query string. Technically, the skin's view templates have a robots noindex metatag that gets removed by
twiki/lib/UI/View.pm if there is a query string:
if( $indexableView &&
$TWiki::cfg{AntiSpam}{RobotsAreWelcome} &&
!$query->param() ) {
# it's an indexable view type, there are no parameters
# on the url, and robots are welcome. Remove the NOINDEX meta tag
$tmpl =~ s/<meta name="robots"[^>]*>//goi;
}
--
PeterThoeny - 22 Mar 2006
My fault. I had "slightly" changed the
meta element in my custom skin, so that the regex never removed the element, even with
RobotsAreWelcome set to
$TRUE. Eventually I removed the
meta element and, as a consequence, had to compensate with what I quoted. Now, with the correct
meta element in place, everything works fine.
All that cool stuff how to control robots (with
/bin/configure,
robots.txt,
httpd.conf) would make a nice HowTo recipe. Let's see when (or whether) I get round to it...
--
HaraldJoerg - 22 Mar 2006