Tags:
create new tag
view all tags

CacheAddOnDev Discussion: Page for developer collaboration, enhancement requests, patches and improved versions on CacheAddOn contributed by the TWikiCommunity.
• Please let us know what you think of this extension.
• For support, check the existing questions, or ask a new support question in the Support web!
• Please report bugs below

See also:

Feedback and discussion on the server side CacheAddOn.

Discussion

Speed vs outdated content

Incredible speed improvement at the price of outdated content.

-- PeterThoeny - 10 Nov 2002

Ad performance: Note, that the benchmark figures are actually optimistic for TWiki, as the TWiki documentation web is around 130 topics, i.e. small to medium size. For a web of 400+ pages, I can't get the index faster than 11 seconds. A formatted search returning 40 of 400 pages: no faster than 5 seconds.
And note, that an occasional post on twiki.org does not require the same responsiveness as more dedicated work -- which I try to do.

Ad outdated pages: You must consider different types of pages, what can be out-of-date, how often, impact versus potential gain.

Type of page What is out of date Change rate Impact Gain
Regular, "static" text page wiki links every few weeks small nice
Index with formatted search, e.g. Plugins.WebHome page index entries every few days medium huge
Status overview with formatted search, e.g. Support.WebHome status changes of pages missing every few hours high huge
Change overview, e.g. WebChanges newest entries missing less than hourly high big

The first two cases should be tolerable for many environments. The last case can be easily fixed by changing the templates to refer to ../fresh/WebChanges.

The case for generated status overview pages is more difficult. You need them to be accurate. You want to use them frequently, as they are an important entry point for navigation. I propose to work around that problem by adding a prominent refresh link/button to such pages.

-- PeterKlausner 13 Nov 2002

Detect dirty entries

By coding it in perl, a strategy could be to do some "push": Keep track of pages which a topic depends on (by hooks on all the functions loading files during rendering): preferences, included topics.., then using this info in reverse: when you save a page, remove (or just mark "dirty") all cached pages which depends on the saved page, recursively.

Note that a dumb solution which would invalidate all caches on each save anywhere would probably work, too, as saves are rarer than views.

-- ColasNahaboo - 13 Nov 2002

Calculating pages that require cache invalidation (or refresh) is the really hard part, and of course you could end up invalidating a huge set of pages after a single change... I agree the plugin should probably be in Perl, for portability reasons and access to TWiki.cfg for location of data and cache directories - however, would be interesting to see how the benchmarks change.

-- RichardDonkin - 13 Nov 2002

Ad "push": I was thinking about reusing the link list built by the TouchGraphAddOn. This easily shows all potentially effected reverse links. Clearing their entries would fix outdated EditThis links. But the most critical issue is not addressed: pages with inline search result. That would be rather hard to fix, I guess. Clearing all the cache on save works in principle. But it depends on the usage scenarios. On our TWiki documentation, save/upload accounts to less then 1% of accesses. Caching could kick in pretty well. On our "real work" web, this ratio is 10%. Less than 10 views before flush is probably not worth anything.

-- PeterKlausner 13 Nov 2002

Note, that version 2.0 offers the -maxage=0 option to the fresh (or cache.pl) script. It will refresh the page, if the data directory changed. This should catch like 99% of all changes, the 1% being changes from subsequent saves, which will be caught when the lock expires -- or any other activity starts.

-- PeterKlausner - 04 Mar 2003

Why not Perl?

I agree the plugin should probably be in Perl, for portability reasons and access to TWiki.cfg for location of data and cache directories - however, would be interesting to see how the benchmarks change.

-- RichardDonkin - 13 Nov 2002

Using Perl adds the 0.65 secs shown in the benchmark. A workaround could be to just require TWiki.cfg. This should give us all variables without much compile overhead. . . yes, looks good:

Test case Min[s] Avg[s] Max[s] URL path
Start perl +req vars 0.10 0.13 0.39 /twiki/benchmark?-s
Hello-world +load twiki 0.63 0.65 0.74 /twiki/benchmark?-h

-- PeterKlausner 13 Nov 2002

Ok, here it is...

Release 2.0 offers a Perl version, specifically tested on Windows (95, yep...)

-- PeterKlausner - 04 Mar 2003

Why ksh not sh?

  • I notice the shell script uses ksh. It would be better to change this to #!/bin/sh. I haven't checked if there's any korn-isms in the script. Some sites don't have ksh (like sourceforge.net). -- JonathanCline - 05 Apr 2003
    • Regular Bourne shell does not support the -nt (newer than) test operator -- PeterKlausner 6 Apr 2003
      • I just double-checked my project at sourceforge (where I'm running this addon successfully): they don't have ksh, but they do have /bin/bash, which is symlinked as /bin/sh -- that explains why it worked when I used #!/bin/sh. -- JonathanCline - 06 Apr 2003
  • Note, that plain old Bourne Shell is not available on Linux.

There you "only" have bash - kind of Korn Shell on steroids. -- PK

Plugin vs Add-On

Considering the nature of this Plugin (reinforced by examining its current implementation), wouldn't it make more sense to consider recategorizing this as an Addon instead of a Plugin?

-- TomKagan - 26 Nov 2002

Good point. Renamed from CachePlugin to CacheAddOn. PeterKlausner: I recommend to repackage the zip file.

-- PeterThoeny - 27 Nov 2002

The current implementation does not interfere with the core code or rendering, so it is and add-on. Still, I named it plug-in because I figured, that we will need per-topic cache control. Can we do that without parsing sth like %CACHE{expire="1hr"}% ?

-- PeterKlausner - 28 Nov 2002

Where is fresh.pl?

BTW: the Perl version doesn't need an extra fresh script, just pass maxage=-1 to view.pl; I clarified the install instructions.

-- PeterKlausner - 21 May 2003

On the page CacheAddOn the Perl version and Usage sections still refer to "fresh". BTW great speed gain on my old PentiumPro server smile

-- ArnoldAtMallos - 22 Dec 2007

Feature requests

Cache clean up frequency

Possibly set a daily cron job to remove the cache files?

-- PeterThoeny - 10 Nov 2002

Ad periodic flushing: Currently, I'm using 14 days [sic] for all pages. I'm thinking of a special variable like %EXPIRES, which could set a shorter cache retention period. Then manually add that to pages like WebChanges or pages with similar, known dynamics. Without such a variable, different cronjobs with given lists of topics to flush/refresh would do. E.g. solve the CPU load problem for WebRss. A different idea: flush/pre-cache pages with inline %SEARCH faster than regular ones.

-- PeterKlausner - 13 Nov 2002

Pre-caching

Or better, a cron job to refresh the cache files so that the user does not experience cache refresh on rarely updated topics.

-- PeterThoeny - 10 Nov 2002

Yes, that should be fairly easy. Currently, it's a cheap-o-cheap 80:20 solution.

-- PeterKlausner 13 Nov 2002

A fairly simplistic approach could be this cronjob:

wget

Port to windows

I got this working on Windows using CygWin, but had to make some changes to use bash (also relevant to TWikiOnLinux) - also, I can't work out how the original script would work since it seemed to always suffix ? to every 'entry' value. Here is a patch to work with bash, tested on bash 2.05b:

*** cache.old   Wed Nov 13 08:49:32 2002
--- cache       Wed Nov 13 09:05:18 2002
***************
*** 1,4 ****
! #!/bin/ksh
  #
  # @(#)$Id: CacheAddOnDev.txt,v 1.20 2003/04/28 19:13:00 RichardDonkin Exp nobody $ (c) Peter Klausner
  #
--- 1,4 ----
! #!/bin/bash
  #
  # @(#)$Id: CacheAddOnDev.txt,v 1.20 2003/04/28 19:13:00 RichardDonkin Exp nobody $ (c) Peter Klausner
  #***************
*** 25,33 ****
  #exec 2> /tmp/qik.log
  #set -x

! entry="$cache$PATH_INFO?$QUERY_STRING"

! if [ "$entry" -nt "$data/$PATH_INFO.txt" \) ]
  then
        exec cat "$entry"
  else
--- 25,37 ----
  #exec 2> /tmp/qik.log
  #set -x

! entry="$cache$PATH_INFO"
! if [ "$QUERY_STRING" != '' ]
! then

!     entry="$cache$PATH_INFO?$QUERY_STRING"
! fi

! if [ "$entry" -nt "$data/$PATH_INFO.txt" ]
  then
        exec cat "$entry"
  else

(BTW the RCS ID keywords above are bogus - the TWiki page's RCS string is shown, rather than the one in the patch - I logged this bug at ExpandsRcsKeywordsInText a while back, can be fixed quite easily using RCS options in TWiki.cfg.) The speedup was great, but shortly after running the benchmark program I ran into Apache on Cygwin problems that weren't cured by a reboot (probably these are Cygwin socket issues) - not sure if these are related, but should test this script thoroughly before deploying on Cygwin at any rate!

There are some CPAN modules that do similar things, which we should investigate, e.g. CGI::Cache - this would require some small code changes I think, though it's possible an external script could also work. CGI::Cache is compatible with ModPerl and SpeedyCGI.

-- RichardDonkin - 13 Nov 2002

Having problems with this addon, and I'll struggle (or keep asking) till I get it fixed - waiting 10 seconds for a page to load is not going to help my company adopt the TWiki I've setup here. Anyway, have TWiki installed and running fine (apart from file attachment) on Win2K, with Apache and Cygwin. Am getting these errors:

[Wed Jan 15 14:26:06 2003] [error] [client 10.17.0.247] Premature end of script headers: c:/twiki/bin/cache
[Wed Jan 15 14:26:06 2003] [error] [client 10.17.0.247] c:\twiki\bin\cache: line 40: tee: command not found

Any suggestions would be greatly appreciated? From within Cygwin, tee is a valid command, and seems to work fine. My twiki is installed in c:\twiki and Cygwin at c:\cygwin if that's any help (all correctly mounted in Cygwin, with writable permissions). After all the fixes mentioned here, the cache perl file looks liks this:

#!c:/cygwin/bin/bash
#
#
# NAME:
# cache - quick'n dirty page caching for TWiki
#
# SYNOPSIS:
# Identical to TWiki's view
#
# DESCRIPTION:
# Rename original view to render
# Link this to 'view'
# See CachePlugin page for more.
#
# SEE ALSO:
# view  fresh
#
 
# customize...
data=/twiki/data
cache=/twiki/cache
 
# debug...
#exec 2> /tmp/qik.log
#set -x
 
entry="$cache$PATH_INFO"
if [ "$QUERY_STRING" != '' ]
then
    entry="$cache$PATH_INFO.$QUERY_STRING"
fi
 
if [[ -f "$entry" && "$entry" -nt "$data/$PATH_INFO.txt" ]]
then
        exec cat "$entry"
elif [ -d "$entry" ]
then
        exec ./render "$@"
else
        exec ./render "$@" | tee "$entry"
fi

-- ClaudeSchneider - 15 Jan 2003

Ad cygwin: I don't really use it; always have problems with missing stuff and bad interoperability... Might as well be the problem with your missing tee. Try to insert the absolute path c:/cygyin/bin/tee (or such), test it interactively and then try it via web server.

HTH - PeterKlausner - 16 Jan 2003

Awful performance on Windows

Thanks for those tips - writing the full path to tee (and cat) worked a treat - the page loads now, and does create an HTML file in the cache folder. The benchmark script fails completely (probably too many Unix dependent commands to work with Cygwin), but a quick estimate shows that loading Twiki WelcomeGuest used to take 8 seconds without the cache, and now takes 4 seconds. This still isn't as quick as should be (I guess, from all the 0.5 seconds benchmarks mentioned here), and loading the cached HTML page directly (http://twiki/cache/Twiki/WelcomeGuest.html) loads and Shift-F5s instantly.

-- ClaudeSchneider - 17 Jan 2003

Use cached pages from Go box

P.S. Something else I've just noticed - if you navigate to a page using the Go input box at the top, the resulting page is something like ?topic=Test.TestTopic5, which gets cached as .topic=Test.TestTopic5 in whichever Web I was in. This results in LOTS of duplicates of the same page being cached everywhere. I'll try to figure out a way of removing this reduncancy, but my perl/bash is non existent, so any help would be appreciated.

-- ClaudeSchneider - 17 Jan 2003

The GoBox implementation from GoIsSearch avoids this problem.

-- PeterKlausner - xx Feb 2003

Avoid misleading edit links

When you follow the edit link from a cached page and create the missing topic, the cached page won't reflect this. When you follow this link a second time, you will be dumped into the edit window. Annoying! Patch the edit script to invalidate the parent page like so:

diff -c -r1.1 edit
*** edit        2002/06/19 13:53:37     1.1
--- edit        2003/02/13 08:06:30
***************
*** 150,155 ****
--- 150,160 ----
          $meta->put( "TOPICPARENT", ( "name" => $theParent ) );
      }
      $tmpl =~ s/%TOPICPARENT%/$theParent/;
+
+     # touch parent file to update links in cache; breaks in different web!
+     my $now = time();
+     my $parent = "$TWiki::dataDir/$webName/$theParent.txt";
+     $parent =~ /^([A-Za-z0-9_-]+)/    and utime $now, $now, $1;

      # Processing of formtemplate - comes directly from query parameter formtemplate ,
      # or indirectly from webtopictemplate parameter.

-- PeterKlausner - xx Feb 2003

Use with mod_perl, etc.

See TwikiFreeBsdPerformance for a case where this addon, probably the Perl version, may be useful alongside ModPerl on a slower machine.

Also, see RenderOnceReadMostly for some discussion in this area.

-- RichardDonkin - 28 Apr 2003

Detect corrupt entries

A problem not yet solved: if an error creates a corrupt page, this page will be served from the cache until it expires just like a regular page. Unfortunately, you do not even have a refresh button. The only way to fix it is to know & enter the refresh URL manually. Rendering errors should be detected and never go into the cache.

-- PeterKlausner - 15 May 2003

I got this problem somehow, cached files with null size. Problably caused by some interupted redering process. This I solved by adding a find -size 0 command into the cache script removing those files.

+   `/usr/bin/find $cache$path$sep -size 0 -type f -exec rm {} \\;`;
   # handle max age parm...
   if ( $maxage == 0 ) {           # re-render on _any_ change in web, i.e.

Functional for me but not cross platform, depending of the normal find command. A better way is problably to use the library File::Find in perl.

-- MartenMartensson - 26 Oct 2003

Compress cache contents

  • I've also just run a test for storing the cached pages using gzip -1 (my personal closet-web-server machine is an old PC with limited disk space -- i386 at 233Mhz and 1 gig HD). The speed improvement is still dramatic and the compressed version is 30% of the rendered. A great improvement. -- JonathanCline - 05 Apr 2003
    • The small patch to cache.sh:
if [ "$entry.gz" -nt "$data/$PATH_INFO.txt" ]
then
        exec gzcat -d "$entry"
else
        exec ./render "$@" | tee "$entry" 2>/dev/null
        gzip -f -1 "$entry" &
fi

Second Level Cache for Topics with Access Restriction

The installation instructions for the CacheAddOn say

  1. If you are using bin/viewauth link it to bin/render

Only after inspecting the cached page I understood why this could work:

Status: 302 Moved
location: http://host.domain/twiki/bin/viewauth/Restricted/WebHome

That is, the view (render) script responds by sending a redirect to the client. So the access control is not invalidated by the cache, as I feared it would. However, access restricted pages are not really cached; only the redirection is. To get the caching right we would need a secondary cache script, which implements access control as the ordinary view script does, but produces the cached page if it exists instead of rerendering it. As far as I can judge, this entails incorporating the access control checks from view in cache.pl. I'm not proficient in perl and the cache.pl script doesn't work on my Linux box anyway. So before I might try this:

  • Has anyone considered this already?
  • Would the overhead of the access control make caching still worthwhile?

Since a consirable portion of the view script seems to be needed, it becomes worth considering building caching into the view script itself.

-- EelcoVisser - 01 Jul 2003

Yes, viewauth is not cached; will clarify the doc.
No, I see no practical way to implement authentication outside of view.
No, putting it into the core/view will not be accepted easily, as TWikiMission seems to be more of an application server, which requires real-time display.

2¢ by PeterKlausner - 02 Jul 2003

Add a SHORTDESCRIPTION to the Add-On Info Section

I added a SHORTDESCRIPTION to the Add-On Info section so that this add-on is represented properly in the AddOnPackage topic and query topics. Please take this into the release.

-- PeterThoeny - 06 Oct 2006

Bugs

Syntax error with \) in bash

I can't work out how the original -nt test worked with the \) included, as I got a syntax error from bash - is this valid ksh syntax? The script could of course work on any Bourne shell derivative by rewriting the -nt test using ls -t - however, this should be a config option as it would probably be somewhat slower.

-- RichardDonkin - 13 Nov 2002

egg on my face frown It's not valid syntax, but tolerated by ksh. (Leftover from an overly complicated if construct to avoid serving deleted pages, which doesn't buy enough to be worth it.)

-- PeterKlausner 13 Nov 2002

? separator in path name

I got this working on Windows using CygWin, but had to make some changes to use bash (also relevant to TWikiOnLinux) - also, I can't work out how the original script would work since it seemed to always suffix ? to every 'entry' value. Here is a patch to work with bash, tested on bash 2.05b:

-- RichardDonkin - 13 Nov 2002

? separator is not a bug, but a feature which seems not to work on Windows. The idea is to cache skinned pages as well. For symmetry with the URL syntax, I choose '?' to separate PATH_INFO from the optional QUERY_STRING. '.' should work as well, be actually more convenient on Unix and not collide with TWiki word namespace. I will change this.

-- PeterKlausner 13 Nov 2002

A separator for PATH_INFO and QUERY_STRING part of the cache filename must fullfill these criteria:

  1. Not valid in a wiki word
  2. Not valid within PATH_INFO nor QUERY_STRING
  3. Valid as filename in Unix and Windows
  4. Not a directory name in Unix nor Windows

Actually, I don't know of one character meeting this; probably best to go for

`?' on Unix
`__' on Windows; conflicts with 'TopicNamesLikeThis__'

HTH - PeterKlausner - 16 Jan 2003

Caching per web

This is an outstanding enhancement to the standard TWiki install. Poor performance was probably our most frequent complaint with the standard TWiki install.

One question: I want to cache some, but not all webs. (We have some webs where a lot of pages are built dynamically from content of other pages, so these should not be cached.) How could I handle this? For now I am doing this by creating directories for only the webs I want to cache under the "cache" directory, but that results in lots of warnings in the apache error log when it tries to do the tee command to write to a directory that does not exist. I don't know ksh so do not know how to extract the cache/web path and test that it exists before attempting a write.

-- MartinWatt - 22 Nov 2002

Sorry for the late response. Yes. This is a bug. Fix by redirecting stderr to /dev/null like so:

#!/bin/ksh

# customize...
data=/var/twiki/data
cache=/var/twiki/cache

entry="$cache$PATH_INFO.$QUERY_STRING"

if [ "$entry" -nt "$data/$PATH_INFO.txt" ]
then
        exec cat "$entry"
else
        exec ./render "$@" | tee "$entry" 2>/dev/null
fi
Warning: I don't know yet, whether this affects Perl's error handling.

Parsing the directory etc. gets expensive very soon because *sh forks a lot. Then we rather bite the bullet and go the bare-bones Perl route.

-- PeterKlausner - 28 Nov 2002

Occasionally corrupted page

One thing I see very occasionally, like once every 1000 page saves, is a half-written page - it just cuts off partway through. Looks like the write operation got interrupted. I have not figured out exactly what circumstances cause this to happen - maybe hitting save again when a save is already in progress?

-- MartinWatt - 09 Jan 2003

Here is an update on this problem. We get this fairly regularly, I'd say once every week or two, which is roughly once per 1000 page saves. It causes considerable alarm for users whose pages suddenly disappear or truncate. I am almost certain that it is caused by impatient users hitting a browser button when the addon is in the middle of saving the cached page.

As for a solution, well the obvious one is to switch the company to decaf and have our users mellow out a little smile Alternatively, is there a way to make a script being executed by the browser non-interruptible so the browser cannot just kill it partway through? Probably not, I suppose. My final suggestion is to have the addon attach an identifier to the very end of the cache file and have the caching addon only return the cached page if the identifier exists (as it then knows the write completed successfully), otherwise delete the cache file and regenerate it.

-- MartinWatt - 28 Apr 2003

Sorry, but I cannot reproduce this behaviour. My combinations of apache & fs settings seem to prevent this from happening. Maybe this patch of cache.sh works for you:

<       exec ./render "$@" | tee "$entry" 2>/dev/null
---
>       tmp="$entry.tmp$$"      # use same filesystem!
>       exec ./render "$@" | tee "$tmp"  2>/dev/null    &&
>               mv "$tmp" "$entry"
This should update the cache only after the rendering completed successfully. If you see abondoned .tmp files lying around, check if these have corrupt content as you suspect. Whence we figured out what is wrong, insert an exit handler before the exec:
        trap  "/bin/rm -f $tmp 2>/dev/null"  0
-- PeterKlausner - 04 May 2003

Failure on missing topic name, i.e. WebHome

In the bash version of this add on, I've modified it to properly handle the syntax of a user typing in only the web name in a URL, ommitting the topic name ( http://server/twiki/Web1 or http://server/twiki/Web1/ ) would generate internal server errors (at least for our TWiki).

See newer version below...

The bottom if block is now:

if [[ -f "$entry" && "$entry" -nt "$data/$PATH_INFO.txt" ]]
then
        exec cat "$entry"
elif [ -d "$entry" ]
then
        exec ./render "$@"
else
        exec ./render "$@" | tee "$entry"
fi
Note that I added the elif clause to not pipe the Directory name to tee, I believe this is a better option than sending the output of tee to /dev/null, as it's (remotely) possible that some fixable error is getting swallowed.

Note that this is only for bash, but I assume the same issue exists for ksh.

-- MikeMaurer - 14 Jan '03

Just recently, I ran across this as well. If I understand correctly, the fix works only if you use '.' as separator, because this makes $entry refer to the same directory '.../Web/.' -- kewl. If you want to re-use the cached WebHome, put this in front of the first if:

test -d "$entry"        && entry="$entry/WebHome."

HTH - PeterKlausner - 16 Jan 2003

Directory bug one more time: Duuuh... I missed the point: '.' is not a kewl feature, it is the bug. With the orginal '?' it worked right away. (Small annoyance: any directory touch invalidates the odd .../Web? and .../Web/? cache files but not .../Web/WebHome? )

HTH - PeterKlausner - 16 Jan 2003

I've tweaked my hack a little more to make it cache entries that don't include the specific topic (they probably want WebHome). This should go below the debug block in bash versions of this script. This is not tested on KSH and almost certainly won't work on it.

fullpath=$PATH_INFO$QUERY_STRING
fullpath=${fullpath%/}

entry="$cache$fullpath"
if [[ -f "$entry" && "$entry" -nt "$data/$PATH_INFO.txt" ]]
then
        exec cat "$entry"
elif [ -d "$entry" ]
then
        if [[ -f "$entry/WebHome" && "$entry/WebHome" -nt "$data/WebHome" ]]
        then
                exec cat "$entry/WebHome"
        else
                exec ./render "$entry/WebHome" | tee "$entry/WebHome"
        fi
else
        exec ./render "$@" | tee "$entry"
fi

-- MikeMaurer - 2 Feb '03

Refresh doesn't work on Windows

I've got the cache script saved in bin/view, and the original view script saved in bin/render, as well as having bin/fresh and bin/benchmark as they came from the zip file. I have a feeling that the cache script is doing something (it's writing the cached HTML file), but once the topic has been cached, shouldn't it load the HTML instantly? Also, the refresh link I've added (which loads the page using the fresh script), does load in a second, but doesn't force the page to be recached? It doesn't seem to actually remove the cached page and re-execute view...

Any assistance or insight would be greatly appreciated.

-- ClaudeSchneider - 17 Jan 2003

See Perl version of release 2.0

-- PeterKlausner - 04 Mar 2003

Installation/configuration troubles with mod_perl

I'm having a heck of time getting CacheAddOn to work. The page keeps sending back nothing. I've created the .../cache/myweb dir, I've changed the paths in the cache script to point to the proper Perl binary and location of the render script, as well as the proper directories for cache and data, and nothing gets returned (literally, nothing. I used a Java program to just look at the exact text sent over the socket from the server, and it was receiving nothing) when I try to replace "view" with "cache" in the url. This is for the Perl version of the script. Any idea of things I can look at? Looking at my httpd-error.log, I see only this:

get s:/usr/local/www/twiki/data/Javatips/WebHome.txt c:1052788701 s:1052768185 m:336

In the .../cache/myweb directory, I only have one file created after an attempted view, and it's WebHome__. It's empty.

-- SeanLeBlanc - 14 May 2003

Does it work, after you rename the cache directory tree?
If yes, then the 0-length file was created before the config was ok, but pollutes the cache. Delete the empty WebHome__. Noted feature request above: don't cache trash!
If not, then your call to render.pl is not yet correct.

-- PeterKlausner - 15 May 2003

Thanks. I was able to get a different error when I renamed the cache dir. The error is:

Software error: Can't locate object method "request" via package "Apache" at /usr/libdata/perl/5.00503/CGI.pm line 234.

For help, please send mail to the webmaster (you@yourPLEASENOSPAM.address), giving this error message and the time and date of the error.

I'm set up to use mod_perl. Is there something I need to change to make this work with mod_perl? Or a perl module that is missing?

-- SeanLeBlanc

Ignore this bit, leaving it in as general info only:
You may be using a version of CPAN:CGI (CGI.pm) that doesn't support mod_perl 2.0. See the first hit on Google:Can%27t+locate+%22method+request%22+via+package+%22Apache%22++cgi.pm, which is actually a TWiki.org page. It would be best to get the latest CGI.pm version in any case, but do run the latest testenv from CVSget:bin/testenv to see the actual version. See also IssuesWithPerl5dot8 where CGI.pm versions caused problems on Perl 5.8.0.
End of ignore

The most likely possibility is 2nd hit on this search, here - if the Perl script needs to run outside mod_perl, but you are running it as a process from underneath a mod_perl Apache server, you'll get this error, as mod_perl sets the %ENV hash to indicate it should be used. This can be fixed by a tweak to the script to never use mod_perl, or perhaps using SelectiveModPerl. However, it is clearly bettter for this script to work under mod_perl if possible to maximise performance by not forking a Perl interpreter process.

Having now looked at the cache.pl script, it forks the render script without resetting the environment as mentioned in this message from Lincoln Stein, author of CGI.pm - hence the TWiki render (really view) script loads CGI.pm, which tries to use mod_perl since the %ENV (environment) says it can do. In cache.pl, try changing the following line as highlighted in bold (this assumes you are using CygWin for bash and perl):

$render = "GATEWAY_INTERFACE=CGI/1.1 perl c:/opt/twiki/bin/render.pl";   # you might need full path!

This forces the forked Perl process to think that mod_perl is not available, which is actually the case. It would be more efficient to do $ENV{GATEWAY_INTERFACE} = 'CGI/1.1' perhaps, to avoid forking a shell, but one of these should work (not tested.)

Please attach the testenv output to TwikiFreeBsdPerformance, where this originated, as I don't have mod_perl access. This will help in checking your environment and in updating testenv (see ImproveTestenv) to recommend a CGI.pm upgrade to 2.87 or higher when using mod_perl 2.0 (aka 1.99.x for some reason). Even if you are not on mod_perl 2.0, this output would be useful.

I think this problem will also apply to those using the shell version of the script, if running under a mod_perl enabled web server, since the environment is also inherited in the same way.

The most efficient option for mod_perl (future project) would be to somehow run the render script without forking a Perl interpreter, sending its output into a string rather than to STDOUT. This would get the most performance out of mod_perl with this add-on, and is almost certainly possible with a small change to the Perl code in view / render, but I'm not sure how at the moment.

-- RichardDonkin - 17 May 2003

As discussed in the rationale section, this add on was intended as alternative to mod_perl. The shell version is totally incompatible with it, as far as I understand it. To get sth working for Windows, I implemented the same crude thing in Perl; I never expected the exec logic to work with mod_perl. To squeeze out more performance for saves and reloads, I guess you need a complete rewrite.

-- PeterKlausner - 18 May 2003

I think that cache.pl can just be tweaked to work better with ModPerl - it may seem a bit odd to use the two together, but many Linux boxes these days come with mod_perl enabled so it would be good if it doesn't break. SeanLeBlanc is using this add-on as a workaround to the SiteMapIsSlow issue, which isn't helped by mod_perl, so until that issue is fixed there's some rationale for using this with mod_perl.

-- RichardDonkin - 19 May 2003

Putting in the mentioned line (with a semicolon) like so:

$render = "GATEWAY_INTERFACE=CGI/1.1; perl c:/opt/twiki/bin/render.pl"; # you might need full path!

did work for me. However, my ksh didn't seem to work out so well with the "fresh" script, and I'm trying to rewrite that in perl, so if anyone has already done that, please let me know. Also, and this is a weird one, sometimes when I edit topics, I cannot save them, or even preview them. I have to go back and hit my half-working refresh link, and then go at the editing again. The symptoms are that it either a) acts like I'm not logged in, even though I was already permitted in to do editing or b) errors when doing the preview, complaining that it cannot find oops.tmpl. Both messages are clearly bogus. Here's what httpd-error.log has in it for the times it fails.

get s:/usr/local/www/twiki/data/Generalsoftware/MsSqlServer.txt c:1053381064 s:1053381063 m:24 [Mon May 19 16:35:06 2003] [warn] Apache::Registry: T switch ignored, enable with 'PerlTaintCheck On'

[Mon May 19 16:35:10 2003] [warn] Apache::Registry: T switch ignored, enable with 'PerlTaintCheck On'

[Mon May 19 16:35:23 2003] [warn] Apache::Registry: T switch ignored, enable with 'PerlTaintCheck On'

[Mon May 19 16:35:38 2003] [warn] Apache::Registry: T switch ignored, enable with 'PerlTaintCheck On'

put s:/usr/local/www/twiki/data/Generalsoftware/MsSqlServer.txt c:1053381064 s:1053383738 m:24 [Mon May 19 16:35:50 2003] [warn] Apache::Registry: T switch ignored, enable with 'PerlTaintCheck On'

-- SeanLeBlanc 19 May 2003

One thing to do is to use PerlTaintCheck On in httpd.conf. Another option is to use SelectiveModPerl to ensure that the new fresh.pl script, or original ksh fresh script, are run outside mod_perl (it could be that Perl is trying to run the ksh script because it's in the same bin directory that is assigned to mod_perl). You may also need to unset the GATEWAY_INTERFACE variable as above.

-- RichardDonkin - 20 May 2003

I made the classic mistake of combining two issues into one entry. I think I have the fresh script hacked up to do what I want. The more pressing matter in any case is that I can't seem to do more than one edit per "session"...I get an error upon preview saying that I'm not logged in. Will PerlTaintCheck On help with this, and if so, what do I have to do to avoid the insecure path entry error? Also, in what script would I unset the GATEWAY_INTERFACE variable?

-- SeanLeBlanc - 20 May 2003

I'm working on several performance optimizations right now with the main goal to keep response times below one second for most pages. Regarding cache.pl, if changed the following:

  • make cache.pl run under mod_perl in an optimizied way, this is without forking an external perl interpreter on cache miss or refresh
  • additional cleanup to get rid of warnings or errors from tainted paths
  • check for cache directories (also to get rid of error messages)
  • implement stderr redirection (doesn't work for me at all under mod_perl)
  • cache per user (session plugin) to not mix up user-specific settings /access rights and things like user names in skins
I'll clean up the script (lots of old/commented code in there right now) and attach it here. It still needs further polishing and some better security checking, but to give you some figures:

time mod_perl cache refresh
446ms yes no n/a
1385ms no no n/a
120ms yes yes no
168ms no yes no
721ms yes yes yes
1574ms no yes yes

The first two lines are from calling render.pl and can be used as a reference. The performance gain from running cache.pl under mod_perl with cached content is 48ms or 28% (3rd and 4th line), which in absolute figures is too small to notice. The biggest gain comes when the page is refreshed, here we gain 853ms or 54% from mod_perl. And what is most important, we stay under the one-second-barrier.

Finally, I commented out the info messages which gave another, incredible boost from 120ms to 35ms. TWiki on steroids!!!

I've attached my working version; it needs further modification on render.pl (the old view.pl), so I'll write some instructions on how to install later...

-- MichaelRausch - 04 Jun 2003

I've just seen Michael's impressive script - interesting that there's a benefit to both mod_perl and caching. In fact, various templating environments such as TemplateToolkit prefer mod_perl and implement caching, so there is a precedent for combining the two.

Which 'info messages' were you talking about? Are these where the cache script writes to a log file? If so, it's probably worth leaving these in by default, a high-volume site could take them out if needed.

-- RichardDonkin - 03 Jul 2003

In one of the first comments on this page, PeterKlausner talks about WebChanges, and how you can simply revise all your URLs to be /fresh/WebChanges. That seemed dirty to me, and a lot of work. Instead, I modified the cache.pl script to add another conditional to the checking for cache, namely to ignore the cache for WebChanges, WebIndex, and WebTopicList:

if ( ( $t_cache > $t_change )                   # cached copy is newer
and  ( $t_cache + $maxage * 3600 > time() )     # and expires in the future
and  ( $path !~ /Web(Changes|Index|Topic)/ ) )  # and is not an index lookup.

I'm getting a few Perl warnings though:

[Fri Nov 28 15:06:18 2003] view: Use of uninitialized value in numeric gt (>) at view line 40.
[Fri Nov 28 15:08:07 2003] edit: Use of uninitialized value in substitution (s///) at edit line 273.

Not sure where the second one is coming from (I've yet to reliably reproduce it), but the first can be solved relatively easily. The warning is caused when a cache file does not yet exist for the particular topic. Since that file doesn't exist, the time check will fail:

$entry = "$cache$path$sep$query";
my $t_cache = (stat "$entry")[$mtime];
my $t_change = (stat "$source")[$mtime];

and thus, the conditional will be trying an undefined against a date:

if ( ( $t_cache > $t_change )                   # cached copy is newer

Simple solution:

my $t_cache = (stat "$entry")[$mtime] || 0;
my $t_change = (stat "$source")[$mtime] || 0;

-- MorbusIff - 28 Nov 2003

Here is a patch for the fresh script. In my own setup, the cache place is not the standard one,thus the bug I noticed when trying to refresh a topic in the cache.

I decided to write the name of the variables in capitaland protect them with "{" and "}".

Tested it with pdksh, and it worked.

24,25c24,25
< cache=/var/twiki/cache
< data=/var/twiki/data
---
> TWIKI_CACHE=/var/twiki/cache
> TWIKI_DATA=/var/twiki/data
29c29
<     if [ "$cache$PATH_INFO?$QUERY_STRING" -nt `dirname "$data$PATH_INFO"` ]
---
>     if [ "${TWIKI_CACHE}${PATH_INFO}?${QUERY_STRING}" -nt `dirname "${TWIKI_DATA}${PATH_INFO}"` ]
34,35c34,35
< /bin/rm -f "/var/twiki/cache$PATH_INFO?"* \
<          "/var/twiki/cache$PATH_INFO?$QUERY_STRING"   2>/dev/null
---
> /bin/rm -f "${TWIKI_CACHE}${PATH_INFO}?"* \
>          "${TWIKI_CACHE}${PATH_INFO}?${QUERY_STRING}" 2>/dev/null

-- LaurentGautrot - 13 Jan 2005

Unknown features

Workaround to force that certain pages such as WebChanges are always refreshed

  1. cd .../Cache/Main
  2. touch -m -t 200108150000 WebChanges
  3. chmod ugo-w WebChanges

This forces the Cache/Main/WebChanges page to look older than the data/Main/WebChanges.txt and makes it non-writable, thus ensuring that the page will always be refreshed.

-- WolfgangSlany - 01 Oct 2005

Does not work with login in users

Just tried CacheAddOn and it is great, but I quickly noticed that all users become me! Whenever a user edited a page it was my wiki user that has done the changes. And in the top left corner it says Welcome and my username instead of their. Not possible to logout either.

I guess the wiki session got cached in someway - how do I resolve this so I can start using CacheAddOn again?

-- FredrikLarsson - 02 Oct 2007

The whole page is cached, so everybody looking at a cached page will see the side bars of the person that was logged in at the time. I use twiki with registered users only, I changed my apache2 configuration so that it requires authentication when the view script is called. On my Debian installation, I added 'view' to the FileMatch line, so it reads

<FilesMatch "(attach|edit|manage|rename|save|upload|mail|logon|view|.*auth).*">
Now the script cache.pl has knowledge about the user. I changed it so it will create and keep cache per user.
twiki/bin$ diff -au cache.pl view
--- cache.pl    2007-12-20 22:19:52.000000000 +0100
+++ view        2007-12-22 23:20:15.916829117 +0100
@@ -23,6 +23,16 @@
 #$render =~ /(.*)/;
 #$render = $1;
 
+# if a user is logged in, use the cache directory for that user
+my $user = $ENV{'REMOTE_USER'};
+if (defined $user) {
+       $cache .= "/$user";
+       # create the directory if it does not exist
+       if (! -e $cache) {
+               mkdir $cache;
+       }
+} # end of setting up user specific $cache 
+
 my $webhome = "WebHome";
 my $maxage = 24 * 14;  # default expiration after 14 hours (days?)
This seems to work fine.

-- ArnoldAtMallos - 22 Dec 2007

You may want to have a go at my different implementation of the same idea, at PublicCacheAddOn (5 years after my comment on this topic... time flies...). The difference:

  • it has a C front end for even faster operation, with only two disk access - it could even be optimized to one.
  • it gets the page as TWikiguest, so you see "neutral" pages, not the ones of last user. But it will not cache read-protected page.
  • it detects errors in getting pages, retry the original view script for them, and remembers not to cache them
  • it locks so that even if you issue 100 request for the same, yet-uncached page, [1] only one process will build the page, the others will wait, [2] there is no possible corruptions due to race conditions. This is really important as it means you will still be able to edit your public site, even in heavy use.
  • it has full automated install uninstall
  • and, most importantly, it solves the problem of freshness in an - to my knowledge - original way: instead of trying to determine freshness of a cache by comparing with the source, it compares with the ... reader! It works this way: when you edit a page, you are noted as a "changer" by your IP adress, and the cache will let all requests from this IP fetch uncached contents, while the rest of the world still see the cache, so both views are consistent. After you are done editing (it waits 15mn after your last save), it actually "publish" your changes by clearing the cache. This concept really solves a lot of problems that were very hard to solve when trying to determine cache freshness by comparing it to the TML source.
The only drawback is that it is unix-only (uses bash, sed, grep, wget, crontab jobs...). It could be made easily work on windows with cygwin I guess, but I am not sure of the performance, but perhaps the C frontend would help. rewriting it in perl may be possible however.

-- ColasNahaboo - 03 Feb 2008

Topic attachments
I Attachment History Action Size Date Who Comment
Texttxt TIESTDERR.pm.txt r1 manage 0.9 K 2003-06-04 - 13:15 UnknownUser Stripped version of Tie::STDERR for cache.pl
Texttxt cache.pl.txt r1 manage 4.8 K 2003-06-04 - 13:12 UnknownUser mod_perl version; needs further cleanup!
Edit | Attach | Watch | Print version | History: r48 < r47 < r46 < r45 < r44 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r48 - 2008-02-03 - ColasNahaboo
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.