Tags:
create new tag
view all tags
Here's my concern. I would like to use TWiki for a project that wants to reach a large public audience. For this reason, I would like the Wiki material to be picked up by the major search engines. This requirement is paramount to me.

My question: How well do wikiwebs perform in this regard?

Thanks.

-- TWikiGuest - 04 May 2000


Dear guest, a TWiki site can get spidered by search engines, just use the regular ways of submitting new sites, like for example http://all4one.com/all4submit/ or http://www.submit-it.com/ .

-- PeterThoeny - 04 May 2000


Thanks for your reply, Peter.

However, my question was more along the lines of what experience people have with it, if any.

Specifically, I believe that at least some search engines will look at such things as the page title, metatags, the first 100 words, the ratio of graphics and text, the number of links pointing to that page (Google, for instance, does that), the URL, the last modification time etc. pp.

So for instance, does wiki accurately report modification times? Is it easy to change the title? Insert metatags? etc. I am wondering how many search engines will be "willing" to rank wiki URLs among their top five hits.

Do these questions/concerns make any sense?

-- TWikiGuest - 04 May 2000


For instance, try typing "Salginatobel bridge" in google. You won't find this twikiweb.

-- TWikiGuest - 04 May 2000


For example, "Salginatobel bridge" on the old server ( http://starship.python.net/crew/scharf/TWiki/bin/ ) is indexed by All the Web ( http://www.alltheweb.com/ )

Meta tags: TWiki is based on templates. You can easily modify the templates to be spider friendly. If needed we also could introduce a new variable to customize the meta tag per topic.

Modification time: No, dynamically generated pages to not show a Last Modified time. Anybody knows if there is a trick to so that?

Title: Is fixed. Can be modified per web in the templates.

-- PeterThoeny - 05 May 2000


Is there a Last Modified time HTTP header? If so, you should be able to put in a HTTP-META tag with a %LASTREVISION% variable that is formatted correctly for HTTP. This can be plaed in the view.tmpl template file.

-- JamalWills - 09 May 2000

Regaring Modification time: Follow up in LastModifiedFieldOfHttpHeader.

-- PeterThoeny - 09 May 2000

Last-Modified headers are now working (the HTTP_EQUIV stuff only affects the HTML doc, which is not used by many types of software including proxy caches - only browsers typically look at the HTML document) - see LastModifiedFieldOfHttpHeader for the patch.

-- RichardDonkin - 15 Jan 2002

There's a good article about optimising sites to get high rankings on search engines over on webmonkey - well worth a read. TWiki already does some things right, e.g. the WebIndex is a 'crawler page' and the URL format avoids '?'.

-- RichardDonkin - 20 Feb 2002


moved here from RobotsBlackList -- MattWilkie - 03 May 2004

[...]

Note (to all, or to myself ;-): I might be crying wolf (in other words, the problem I discuss below might now be solved), but on occasion back in say December 2001, if I did a Google search I found many duplicate hits. You see more if you ask Google to repeat the search to show the duplicate hits that they hid on the first search (can't recall the exact words they use). I hesitate to say this, but, IIRC, in one case, I found the same page listed something like 18 times.

I hope Richard's robots.txt file is the answer to this, and I suppose we will have to wait 4 to 8 weeks or longer to see (to let Google go through its next index cycle). In the meantime, if I come across another search that shows an outrageous number of duplicate hits, I'll post it here so we can test again later.

Note also that pages are (were?) indexed both under twiki.org and twiki.sourceforge.net. I'm not sure whether that's a good or a bad thing. In general, I think it's bad as it just clutters up Google's index and everybody's search results. On the other hand, at the time I last did some testing, the indexing of the two domains was "out-of-sync" resulting in TWiki being indexed twice as often as it otherwise would be.

I have a few very rough (and now out-of-date) pages over on Wikilearn discussing my experiences with Google and TWiki, including:

To some extent, the second was an attempt to summarize the problems that I found that should be potentially be fixed, but there seem to be fewer problems at this time:

I just tried a search on [Wikilearn WideTextTestPage site:twiki.org]. It found seven hits (which is not 18, so that's good):

  • Two of them are "appropriate" because there are two valid pages (WideTextTestPage itself, and GoogleSearchTestResults which includes the page title "WideTextTestPage").
  • Two are inappropriate because they are the same page with the print skin. (for example, URL =
twiki.org/cgi-bin/view/Wikilearn/WideTextTestPage?skin=print)
  • Three more are inappropriate (IMHO) because they are the Wikilearn.WebTopicList. These are three occurences of the same page, and I don't think even one occurrence is appropriate -- one duplicate occurrence is because of the print skin, I can't see why the second occurs, the only difference I see on the list is that the page is listed as 49K in one case and 50K in the other. I guess I'll follow the link (and cached links) and try to see why it's listed twice -- I'll be back.

I'm back, and still confused, and in fact more confused, because I clicked on "Similar Pages" on one of these hits (which showed two hits, including WebTopicList and WebChanges), and then clicked on similar pages on the WebTopicList, which added WebIndex and some other pages.

I guess the situation is much better than I seem to recall from the previous tests I had done.

[...]

-- RandyKramer


Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r10 - 2004-05-03 - MattWilkie
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.