Tags:
create new tag
view all tags

Question

I am OK at Perl and CGI, but certainly not able to help with optimization of TWiki core elements. But now I have a pressing need to reduce CPU strain caused by the upgrade of TWiki to 20040901. I opened this topic to try to discuss some ideas I've already used and get new advice about little things that I might try.

Pressing Questions

  1. I found this support topic, TWikiIsTooSlow. What is the status of CacheAddOn and TWikiCacheAddOn at the current time. Do they work? Do they help? How does the CPU required to cache and check and display cached items compared to the dynamic display costs. These addons have not had recent re-releases and it appears as though people have proposed patches but the authors appear not respond. What is best?
  2. Can somebody who knows Perl give us page writers some idea about how much CPU is required for various tasks? For example, if I recode/redesign my pages, what impact will it have.
    • Eliminate pattern.tmpl use of separate bar files. If I don't grab the WebTopBar, WebBottomBar, WebLeftBar out of separate TWiki topics, and move that same code back into the tmpl file (as was done in old TWiki), do I save a lot of time, or a little?
    • Eliminate usage of macro variables like %WIKIWEB% or such. I notice that many files are written to ease re-naming of webs. But I don't expect I will rename TWiki as a web, and don't need the generality. Or, another example, I have many template files with . On my systems, there is no suffix, so if I eliminate all , does it make any difference? I'm not adverse to fine tuning template files to get rid of things like %TMPL{"SEP"}% or whatnot. It seems logical to me that every time I get rid of a %% thing, Perl would do less work. However, I've seen the view in these pages (can't find where now, though) that Perl would not run faster because it is going to process all of those strings no matter what, and so getting rid of the %% usages would not help. (A substitution command runs across a sting, whether or not the target element is in the string. Right?)
    • Eliminate plugins that came with the distribution. I am using SessionPlugin and SpreadSheet plugin. I can't do without them. But I notice there are several other plugins and I don't know what they do. Is the mere loading of them making me slow?
  3. Are there other handy advices?

Things I've done that are helping.

I have the problem that when robots prowl my site, they impose a huge CPU cost by repeatedly doing useless stuff. I've talked to the sysadmin about adding some things in robots.txt. A couple of days ago I happened across the BetterPerformance topic and I splatted out some ideas that I was considering to deal with the problem of robots that repeatedly call expensive things like search and rdiff (what is rdiff for, anyway? It makes CPU usage go high!) I have tried some things that seem to help.

  1. Change the topicactions in the template so that TWikiGuest does not see the "Edit" "Attach" and other links at the bottom. This seems to help. When I had those still exposed, web robots would find them, and then they would get oops messages, which would send them off on other links. I asked in Support a couple of days ago about how this can be done and was quickly answered, and the SpreadSheet plugin does let me show links only to users who are not TWikiGuest. That's significant. More importantly, it does not rely on the honesty of robots to respect robots.txt or meta tags.
  2. Get rid of as many references to the WebTopicList, and other topics that use search in order to offer content. I don't want robots to trigger those costly searches, I only want users to. So I've fiddled navigation bars to get rid of all links I can, and I've re-written human-readable topic pages that let people see how they can search by cutting and pasting a URL, but a robot can't understand that.
  3. I've fiddled the TWiki Perl code to try to make sure the meta tag for robots says noindex,nofollow for every single page. I would rather have no robot attention than the surplus I was having. However, i am thinking of changing from "noindex,nofollow" to just "nofollow". Here's why. The default in the pattern.tmpl is to have "noindex" because the links to the previous versions were exposed at the bottom of every page (and we don't want those indexed). Now I've redone topicactions so that those links don't show anymore, there's no need to worry that robots will try to index them. I'm considering changing that to nofollow only, so if the robot happens to find a page, it will index it. But I don't want it to try to follow the search-related links that might exist on a page.

Environment

TWiki version: TWikiRelease01Sep2004
TWiki plugins: DefaultPlugin, EmptyPlugin, InterwikiPlugin
Server OS: Compaq/DEC OSF
Web server: Apache
Perl version: 5.8
Client OS: Linux
Web Browser: Mozilla

-- PaulJohnson - 20 Oct 2004

Answer

  • AFAIK none of the cache add-ons has been tested against Cairo. There is mirror-site code in the TWiki core, but you're on your own while working out how to use it.
    • Did not get around to upgrade yet; the Unix/sh version of the CacheAddOn is so trivial, that it should work regardless of the version. But caching will not reduce the CPU load, unless the crawler re-visits faster than the cache expires. -- PeterKlausner - 20 Oct 2004
  • The main improvement you can get from recoding your pages is to minimise the use of searches.
  • Topic and template loading is not a big CPU muncher. Don't waste time with it.
  • Variable expansion is expensive in Cairo. If you can, minimise the number of variables you define and use.
  • Keep the number of webs small. Iterating over multiple webs is expensive.
  • Keep the number of plugins small. Each plugin adds an additional 3% CPU burden, on the average. You probably don't need DefaultPlugin, unless you have very old data.
  • Use SpeedyCGI to avoid recompiling the perl code on every invocation (around 50% of the runtime of a view!)
  • I assume you have your robots.txt configured appropriately, as described in http://www.robotstxt.org/wc/norobots.html?

Are you getting hit by robots that don't understand robots.txt? You problem may be the result of stupid robots, and not a TWiki problem at all.

-- CrawfordCurrie - 20 Oct 2004

Thanks for your advice. But I have to take issue with the last comment. It is a given that there are stupid robots, and I am trying to rewrite my site so I don't get pounded so horribly.

I find it is really making a big difference to hide links to "edit" "attach" and so forth on all the pages. Think of the number of links the robot finds--counting the left bar, it is finding 10 or 20 per page, and all of them are completely wasted effort. Not all are fast views of pages, either. Some send search strings that send the TWiki off on errands. Also, I try to make sure there are no exposed links pointing to WebPreferences, AdminTools, and whatever else. It is a big big problem because TWiki default pages are written to maximize the number of interconnections and ease of user access to information, and I don't want that at all.

Last time I checked, the sysop is no longer calling me the King of CPU. I may be the Duke, though. In a couple of weeks, I should be able to review the TWiki logs and see if the total number of hits is reduced as much as I think.

I think I may try to experiment with some Caching on the most frequently used pages, but will wait until the Cache plugin authors sound the signal that it is time for regular people to test them.

-- PaulJohnson - 23 Oct 2004

Fair point wink

Paul, do you think you could summarise your experiences in a style suitable for inclusion in the documentation set? It would be good to have a FAQ on anti-robot counter-measures!

-- CrawfordCurrie - 23 Oct 2004

I would definitely try the caching addon by PeterKlausner - I had a go with this and it worked fine. It's very simple so there's not much to go wrong.

You do have a robots.txt file, I hope? This will be more effective than the noindex attributes on links.

-- RichardDonkin - 24 Oct 2004

I'm trying to work with the site administrator on getting robots.txt in place. I want to block all uses of all TWiki functions except view.

Lately I have noticed a new robot problem. Although I have gone through my sites and deleted as many topics as possible and hidden links to them, some robots still come along and ask for that stuff. So, apparently, they have been here before and are looking for updates.

chopped the rest to DealingWithRobots as a DocRequest

CrawfordCurrie asked if I would write down the steps I took. I will try to retrace my steps, but here is a start.

Edit | Attach | Watch | Print version | History: r12 < r11 < r10 < r9 < r8 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r12 - 2005-02-20 - CrawfordCurrie
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.