PageCaching < Codev

It might be worth considering a similar to system to FAQ-O-MATIC (http://www.dartmouth.edu/~jonh/ff-serve/cache/1.html

This would certain to go reduce system load at peak times.

-- NicholasLee - 04 May 2000

This is somewaht related to the discussions we had in OfflineWiki, e.g. to create static html pages for offline reading.

At work we actually have this setup:

TWiki.Know is our internal knowledgebase.
It has a topic classification of FAQ, KB, Internal use...
There is a script that automatically creates our public FAQ and KB pages. Those pages are static HTML pages. One sample KB page that shows a file attachment is at http://www.takefive.com/faq/kbCreateAndModifyPythonVcsAdaptors.html .

This approach is nice for us, no performance issues behind the firewall (less then 100 people), and also no performance issues on our web site.

A page caching for the actual Wiki pages is also possible, but is kind of complex if the pages should be kept in synch. Regenerating all static pages on each topic save would be easy but is expensive. Finding out what pages need to be updated is difficult:

Create a new topic that is already referenced (question mark). This can happen across webs.
Variables, specially %SEARCH% and %INCLUDE%.
Topic delete / rename (currently manual)

To me it sounds more feasible to have only dynamic topics for everyday use and to generate static pages for static views.

-- PeterThoeny - 04 May 2000

For the WebTeach project I expect to serve several hudred hits per day, so I started considering page caching. I would limit to the "view" action, which is the most used one.

One idea could be to write a makefile by extracting the references from the page and then use "make" to judge if the page should be rebuilt or not. Pages with "searches" would not be cached.

Caching would be gratly simplified by using the threaded discussion method described in WebTeach, since most of comments to a given topic would be stored in a separate page, instead of changing the topic contents.

This is only brainstorming, I'll start working on it in a couple of weeks. Suggestions and comments are welcome.

-- FrancoBagnoli - 30 Sep 2000

Have a look also at the way APW (by SergioFanchiotti) does it:

when a topic is stored APW adds to a database (the back-link database) a record for each twiki name present in the page.
moreover, the cached files of all pages that reference this page are deleted (it gets them from the same database)
next time a view is done on this page the rendered text is saved as a cache file

In TWiki we must consider also the presence of included files, TOC, searches, indices ...

the first two can be handled by marking the corresponding backlinks as "transitive"
when we compute the set of cached pages to be deleted:
- if the backlink is "normal" we stop at first level
- if the backlink is "transitive" we follow it and continue deleting cached files
- if a "transitive" follows a deleted page we follow the link and delete de following also

E.g. if we have (suppose --> is a normal backlink and ==> a transitive one):

   a --> b --> c ==> d --> e
   a ==> c --> f ==> h

for a new version of "a" we should delete cache files for topics b,c,d.
for a new version of "c" we should delete f,d,h

(I hope there are no mistakes ... I'm just jotting the idea down)

For links that show a more complicated behavior (e.g. searches) we just don't cache the page ... smile

There is only one (small) problem ... the database must be locked to write in it ...

-- AndreaSterbini - 30 Sep 2000

I'll consider this approach, but it seems to me that most of it is simply remaking make!

After a quick revision to the source code of twiki, I think that one possibility could be:

rename view makehtml (see below)
whenever a page (say ANewPage) is made, add an entry to the directory's Makefile like

ANewPage.html : FORCE
   makehtml $<
...
FORCE : #this is a dummy entry

the new view should simply call make and redirect to ANewPage.html

makehtml is mostly the same of view, except that it also builds a list of dependendences. Those are (mostly?) built by wiki::initialize and wiki::handleCommonTags by filling the vector @dependences with

preference files
templates
the additional return values of the &handle... functions

if a &handle.. function decides that the page is not cachable (for instance the search or time functions) simply undefs @dependences

(or use another signal).

Finally, if @dependences is valid, the entry in the makefile is updated. It again needs to be locked, but the operation is quite fast.

Alternatively, one could make a subweb for each topic (i.e. a subdirectory, in which to store per-page permissions, attachment, etc.) and then one does not need any locking to the makefile.

Moreover, if one uses my threaded discussions approach (see WebTeach) and frames to separate the topic from the list of replies, the rebuilding of the topic page is not needed when a comment is added

-- FrancoBagnoli - 30 Sep 2000

There is a difference with make, make is called once for each view call, while the above is done once for each save call. Thus, view is extremely efficient (it could'nt be more).

A second observation: with the APW scheme, if all the cache pages backlinked to the saved topic are recomputed instead than deleted then we can replace all view CGI calls with direct links to the cached pages ... with the side-effect that we have a site ready for offline browsing (for CDROM mirrors or zipped files).

-- AndreaSterbini - 01 Oct 2000

Clearly, it would perfect if we could access static web pages, but I am afraid of the mass of computation needed when changing WebPreferences!

I think that the makefile approach should be considered as the "most conservative" first improvement.

In any case it would be preferable to add (optionally) a DatabaseConnection, in which to store the modification date of a topic and also its dependences (forwards and backwards). with this help, it should be quite easy to invoke the rebuild after editing, eventually disabling the "make" invocation.

With the "make" option one can also call a rebuild of all pages by a cron job, so to have amn up-to-date version every day

-- FrancoBagnoli - 02 Oct 2000

I discovered that netscape caches gnuplot (or other dynamically-generated gif image) regardless of their url (i.e., if I have http://url/bin/gnuplot?plot%20sin(x) and http://url/bin/gnuplot?plot%20cos(x) on the same page, they result in the same image!)

So I decided to generate gnuplot images statically, and to assign them a unique name. After having started working on it, I discovered that I was implementing large part of page caching, so I decided to move in.

After having studied the problem, I decided to cache only the body of the topic. This choice is based on the following considerations

the full page depends on the choice of template, and I want to offer users the possibility of changing the template (see NewTemplateScheme)
the references to one's username (Main.guest) or to date 2025-12-23 prevents caching. In this way they are allowed only in the templates (which is the usual case)
adding a reply (see WebTeach) would prevent caching, but again %REPLY% generally stay in the template

Practically, what I did is to copy portion of the attachment menagement. I created a directory cache (in /home/httpd/twiki/) and recreated there the web structure and inside each web a directory corresponding to the topic (the same of the pub directory, they may eventually merge). then

in wiki.cfg add the variables $cacheUrlPath and $cacheDir
in wiki.pm, add $cacheDir, $cacheUrlPath and %dependences to the list of global vars.
in wiki.pm add the functions getCacheDir and getCacheUrlPath
in wiki.pm, at the end of the subroutine readFile add

the name of the read file to the hash %dependences

in wiki.pm, add the subroutine makemake that will create the cache directory if not present, and create there the makefile. If the %dependences hash contains a key "FORCE", the rule for makefile is to force rebuilt, otherwise it will list all files that are keys in %dependences (I use an hash instead of an array not to worry about multiple inclusions -- BTW, it can be used in readFile to check for multiple inclusions and to solve the Warning: Can't find topic "".""

runaway problem)

in &handleSearchWeb $dependences{"FORCE"} is defined to prevent caching
in $handleCommonTags two new variables: %CACHEURL% and %CACHEURLPATH% corresponding to $cacheDir and $cacheUrlPath (not very useful, but for completeness)
in view, when dealing with normal text (not rev), check if makefile exists, otherwise make it by calling wiki::makemake (with FORCE option) and call make. After that, the text is read from the cache and it is not further processed by &handleCommonTags and &getRenderedVersion.
&handleCommonTags and &getRenderedVersion are called by an external script, makecache, which is invocked by the makefile.
add an alias to cache to httpd.con

I attach a tar+gz file containing: wikicfg.pm, wiki.pm, view, makecache. They are modified heavily from the present version of twiki (beta), but all changes are marked by #FB tag and #/FB. The cache add-on is marked by #FB cache.

I have not yet performed benchmarks, but the response time to large topic is quite good.

-- FrancoBagnoli - 05 Oct 2000

You can find the latest update for the cache plug-in in WebTeach

-- FrancoBagnoli - 09 Oct 2000

I would've thought the easiest way of reducing CGI load on the server would be to simply have the default viewed version being a published version of the page... ie people view the page as a published version of the page, which includes an edit button that takes you to the proper edit URL. They then edit the page to their hearts content, and when they click save, the published version is autogenerated, and place in the published version location on the webserver.

OK, you will get results cached by web caches and also browsers, but their behaviour can very easily be tweaked. The advantages of having an "always" publish option are pretty good IMO:

Offline Wiki's become a doddle.
Caching is relegated to locations where it's dealt with best - browsers & caches - both of whihc invest heavily in doing things "properly".
Having a read-only version of the site becomes trivial.
Although I don't have access to the logs, I suspect that on some Wiki's, the majority of access is read, and write occasionally. If that's the case, then eliminating virtually all the cgi accesses for 'view' URL's would be an extremely good thing, since it has the potential to massively reduce the load. (If it's different then knowing the profile would be especially useful in optimsing this.)

And so on...

-- TWikiGuest - 11 Oct 2000

This is an option, and I originally started generating static html pages (and redirecting to them after generation). However, it has some drawbacks:

page reload is not automatic (you have to "reload" a page to check for changes -- and this does not work with embedded images, at least with netscape, you have to shift-reload each image)
the page is the same regardless of users (i.e., no multilangual templates, no variables that depend by who is looking, etc).

After considering that, I decided to cache only the body part, which is better than nothing, but I could easily cache the whole page, I just have to experiment with "meta" istructions to avoid the first drawback. suggestions?

-- FrancoBagnoli - 11 Oct 2000

Why caching? Just partially compile the page by finding all patterns such as % variable-name % , WikiName , tables, lists, fonts etc., to have a program which just concatenates some constants and function calls. It should speed TWiki up.

If it will be too slow anyway, consider the use of OpenLink Virtuoso ( http://www.openlinksw.com ) for your web publishing. If you need small number of RDBMS connections, it's free. I use it e.g. to publish whole ODP database from http://www.dmoz.org in our intranet.

-- IvAn - 16 Oct 2000

WebForm
TopicClassification	FeatureBrainstorming

Attachments

Topic attachments
I	Attachment	History	Action	Size	Date	Who	Comment
gz	cache.tar.gz		manage	18.5 K	2000-10-05 - 15:42	UnknownUser	caching pages modifications

Topic revision: r15 - 2003-09-06 - MichaelSparks

Account
- Log In
- Register User

Edit
Attach

Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.