Feature Proposal: Control of proxy and browser caching through HTTP headers
Motivation
In search engines, it is very useful to be able to search for topics modified in a given period, and for heavy traffic site to make browser re-use the page rather than re-asking for it each view
Description
The patch make TWiki emits a Last-Modified-Date (LMD for short) being the metadata date of the topic, and (optionally) an expiration date set N seconds (default 60) in the future, via a new variable in TWiki.cfg, $expirationLastDateOffset
Dynamic TWiki constructs (like %SEARCH) can force the Last-Modified date to be NOW
--
ColasNahaboo - 19 Aug 2005
Impact and Available Solutions
Note: Patch is attached as
https://twiki.org/p/pub/Codev/CacheControlHeaders/LastModifiedDate-HttpHeader.diff. The patch is against the TWiki Cairo release.
Documentation
(patches may have some offset lines)
- Apply patches in order:
- in your bin/ directory, do a:
sed -e "s/force = ''/force = 1/" <view >viewf;chmod a+x viewf
ln -s viewf viewauthf
- give in
bin/.htaccess the same protections for viewauthf than viewauth, i.e. write:
<Files "viewauthf">
= require valid-user=
</Files>
- if you have special protections for you
view script in bin/.htaccess (normal installations do not), give viewf the same rights/protections as view
- (optional) try to make
editf as non-cacheable as possible, in bin/.htaccess
<Files "editf">
= ExpiresDefault "access"
= Header set Cache-control max-age=0,no-cache,no-store,must-revalidate=
</Files>
(first line needs apache module expires_module, second headers_module)
Be sure to clear any value of HTTP_EQUIV_ON_VIEW in your global TWiki.TWikiPreferences,
otherwise the patch will have no effect on expiration dates
* Set HTTP_EQUIV_ON_VIEW =
You can set now the
$expirationLastDateOffset in
lib/TWiki.cfg to the value N you want. On heavyly-edited sites by multiple authors with no performance problems, set it to 0. On internet sites
with performance problems, with pages edited by people aware that they may have to hit reload to see
the new versions, you can set it to 3600 (one hour). 60 (default) to 600 should be good medium value. What this means is that users going to a page they visited N seconds ago will see the old version without the servers being asked for generating a new one, except for dynamic pages with %SEARCH in them that will always be re-computed.
Also pages will be now correctly dated, so that google-like search engines will be able to provide
a more accurate search.
Examples
Implementation
The TWiki::writeHeader & TWiki::writeHeaderFull gain a new optional parameter, $lastModifiedDate
for giving the date in unix time. The View script uses the one found in the parsed meta value to pass it to the header generation code
Dynamic constructs should be modified to include the lines:
# set last-modified date in http headers to now
$TWiki::UI::View::lastModifiedDate = '';
The v1.1 of this patch have done it for %SEARCH, but it should be done for other constructs too.
TODO: put the above lines in all the %-constructs generating dynamic contents.
Note that you can also put a date (number of seconds since 01 Jan 1970) in it if you can precisely
know the "freshness" date of the generated contents
%INCLUDE has been modified in the 1.2 version of the patch to set $TWiki::UI::View::lastModifiedDate
to the date of the most recent of the included files and the including one
The problem now is that when one edit a topic, on save the topic will appear not to have changed:
this is because the broser thinks that the topic has not changed based on its previous LMD, and
do not refetch it. The solution I have found is to make
all TWiki-generated redirects to view of topics
redirect to a new script
viewf (f for force) that will emit the same topic as view, but
with a LMD of now, and an expiration date the same. This is set by a new request-global variable
$viewForce that the
view script sets to '', and the
viewf script sets to 1.
Also, for view-protected pages, we need to take into account the view/viewauth antics. The simplest way I have found was to:
- create a
viewauthf script which is to viewf what viewauth is to view
- exclude the view=>viewf conversion in TWiki redirects if we see /viewauth in the url
But now that means that the trick of adding a server view time to the edit url do not work anymore, as the view page could be reused a lot of time, making user edit a previously edited old version fetched from the browser cache. we need to mae the dit url in skins or the one generated by the engine call editf, a non-cacheable page, that will
in turn redirect to the edit page with a time parameter computed at the time of the click on edit, not the view of the page. This redirected edit will on the opposite be
very cacheable to avoid loosing edits under IE when going back/forward in the browser.
Discussion:
Tags to be modified to change the LMD: (non-exhaustive)
-
%TOPICLIST%
-
%WEBLIST%
-
%DATE%
-
%GMTIME%
-
%SERVERTIME%
-
%DISPLAYTIME%
--
ColasNahaboo - 22 Aug 2005
Note: if you took the patch on Aug 22, please apply the 3rd one, and create the
viewautf script, and add its entry in your
bin/.htaccess
--
ColasNahaboo - 23 Aug 2005
If you want to force in all cases the expiration date to be immediate, I recommend also putting in the VIEW template for your skin the html meta tags in the head:
<meta http-equiv="expires" content="-1">
<meta http-equiv="pragma" content="no-cache">
<meta http-equiv="cache-control" content="no-cache">
Firefox especially seems to not understand the expiration date whe a last-modified-date is present.
--
ColasNahaboo - 31 Aug 2005
I was forced to abandon the
bin/viewf solution as it could only work if browsers were always obeying the expire date, which is not the case

Instead I resorted to redirecting to an URL with an added ?t=number added to it (or &t= if there is already parameters). The code does this only for
bin/view* urls (thus also for viewauth), and do no re-add it if it is already there. Implementation:
- You can remove everything about viewf which is not used anymore
- apply the patch after all the other ones above: ForceViewReloadFromRedirects-4.diff: addendum to the above: no more /viewf, but ?t=xxx
--
ColasNahaboo - 05 Sep 2005
Variant: if you want to limit the number of digits of the argument "t" to 4 (gives 2 days span), replace the 2 lines in TWiki.pm:
$url .= sprintf("&t=%x" ,time());
by:
$url .= sprintf("&t=%x" ,time() % 0xffff);
--
ColasNahaboo - 06 Sep 2005
We use &t=%GMTIME{"$epoch"}% on
DevelopBranch.
--
CrawfordCurrie - 06 Sep 2005
patch to not have ?t=xxx added after save when $expirationLastDateOffset is 0 (it is not needed in this case)
--
ColasNahaboo - 30 Sep 2005
Colas, the html meta tags to stop caching you gave are not the accepted way of handling this. The correct way is to add HTTP tags besides the date tag discussed here.
The HTTP three headers to add to stop caching are (CASE is significant):
Cache-Control: no-cache
Expires: Wed, 28 Dec 2005 18:53:15 GMT
Pragma: no-cache
Obviously, the Expires date should be set properly to before
right now. Also, busting cache has implications for the "back" button in the browser.
Managing these HTTP headers are only universal way to control browser and proxy caching. Most proxy servers will ignore the html meta tags.
BTW, if Twiki sent an HTTP "Last-Modified" header in its response, subsequent browser requests will include an HTTP "If-Modified-Since" request header which Twiki could use to increase performance by sending a "304 Not Modified" where appropriate without a response body.
--
TomKagan - 28 Dec 2005
Just saw this page, which I had been ignoring due to the uninformative title (HTTP dates are a tiny detail of
CacheControlHeaders, which is how I'm renaming this page...)
I researched this a lot a few years back, and implemented two key cache-related bug fixes relating to page editing. Some of the existing TWiki cache coding is required to work around issues with
InternetExplorer and
OperaBrowser, and the one thing I know is that caching is very hard to get right, and very dependent on bugs in proxy caches and (particularly) browsers. The following pages have some useful information and discussion:
- BrowserAndProxyCacheControl - overview and links to research
- BackFromPreviewLosesText - major issue with IE 5 and 6, in which the Back key causes you to lose text. May be less of an issue now that Preview is not mandatory, but I believe that in a very few cases BackFromPreviewStillLosesText when doing Preview and Back. [I think Colas' code addresses this]
- RefreshEditPage - fix for Opera's aggressive caching behaviour, which caused 2nd edit of specific page in a session to fail to retrieve latest page state. [Colas' suggestion of moving the edit URL suffix generation to the client is a good one, where JavaScript is enabled, but you can't rely on that always being the case.]
- ViewAfterSaveCachesOldPage - Colas' code may or may not fix this.
-
- This page has a good discussion between me, Colas and AndrewMoise that outlined an approach to configurable cache control to suit different types of TWiki deployments and user bases.
I agree that
HTML meta tags are not useful to stop caching, and should be avoided.
Getting caching to work better is a hard problem, and a suitably configurable solution is needed to address different scenarios such as:
- Small workgroup in single office that edits TWiki pages many times per hour - virtually no caching is acceptable here.
- Large corporation with many users across timezones and slow wide-area-network (WAN) links - proxies are deployed widely and important to get good performance, so a few hours' cache expiry is OK on many pages, but may need to vary across webs or TWiki sites. Not controlling proxy caching may result in overly stale pages being served.
Proper control of caching is greatly complicated by features such as embedded searches (
FormattedSearch) and per-user skins. The cache plugin work may have some good discussion here as well.
Given that this work started in the summer, I'd be astonished if it has correctly addressed all the issues (e.g. re Tom's comments on breaking the Back button). Even if it were bug-free with respect to common browsers and proxy cache software, I don't think that it is sufficiently configurable.
We
should not put this into DakarRelease unless we want to delay Dakar for long enough to get this feature fully baked and tested in a lot of different environments.
--
RichardDonkin - 30 Dec 2005
Interesting approach to improve performance. However, we should not take this into
DakarRelease since it is in code freeze.
--
PeterThoeny - 30 Dec 2005
Richard, good point about the embedded searches and per-user skins. What can help in this case is the HTTP "Etag:" header, and possibly a "Vary:" header marking the Etag.
--
TomKagan - 30 Dec 2005
I suggest we change the proposed release for this feature to
EdinburghRelease, to avoid delaying Dakar.
It's also worth noting that Firefox 1.5 has new back button behaviour compared with Firefox 1.0.x and presumably most other Mozilla/Gecko based browsers: it is now much more aggressive, like Opera, though it remains to be seen if it has the same
RefreshEditPage behaviour.
There have been several people working on server-side caching for TWiki, and much investigation of caching issues that is relevant to caching dynamic TWiki pages:
- CacheAddOn - used quite a lot
- TWikiCacheAddOn - re-implementation, can do per-user caching to handle per-user skins, and was apparently used to great effect at TWiki.org while running on slow hardware.
- CacheChooserAddOn - more controllable by users, may be suitable for technical user base
Also, the TWiki built-in plugin,
VarCachePlugin, caches the results of evaluating variables (e.g. searches), but not the actual web page resulting from
view. This plugin has been used to
CacheWebRssFeedForSpeed and for
TWikiOrgTopicCaching (of
WebIndex pages, etc).
VarCachePlugin caches at an intermediate stage that's not directly relevant to cache control headers but could be very helpful in figuring out how to generate correct ETags that are used in such headers.
The add-on authors have some useful experience of caching dynamic TWiki pages, and one comment identifies the huge range of potential dependencies (plugins, searches, embedded TWiki variables, etc.) and changes (e.g. renames) that can invalidate a cache entry (or set of entries). There's also a huge list of cache-related pages at
CacheAddOnDev.
The dependency tracking issue (i.e. when to invalidate a cache entry, or to not cache in the first place) may well be too hard to solve completely in the short term, and is more of a server-side problem perhaps, but it could be important to use the ETag as mentioned to distinguish between pages that are truly dynamic or per-user, and those that are cacheable across users or even for a single user. The trouble is that TWiki is so dynamic in its use of
TWikiVariables that it might be necessary to only create a 'static page' ETag when the TWiki code is fairly sure the page is static.
--
RichardDonkin - 02 Jan 2006
I agree that this is too early for dakar. We are running with the above modification at ILOG for some time, but we found that we could not have a satisfying behavior with caching, as it will always trap users into some bad cases (editing a previous version, saving and not seeing changes).
So we enabled the above code
only when queried by our search engine (
http://www.aspseek.org/
), a nice free google clone.
I now think that we should not use any HTTP caching, which is too hard to get it right across browsers & proxies, but aim to server-side caching (generate pre-computed pages). But this definitely needs more experience.
--
ColasNahaboo - 04 Jan 2006
It's hard to get HTTP caching right, but in some companies proxy caches are mandatory for intranet performance as well as the Web, and of course virtually every Web user has caching enabled in their browser.
So I think it's worth the effort to come up with a caching approach that uses HTTP headers, if only to enable the View page expiry time to be configured (could be left as 'expire now' by default).
However, it probably makes sense to push ahead first with server-side caching as this is better at handling dependencies such as dynamic variables. It only addresses the CPU overhead of TWiki, but in most cases that will be the primary cause of a slowdown.
One simple but important point for server-side caching is to change skins to include (A) time of cached copy and (B) a Refresh button. Point (A) is also important for proxy/browser caching - a suitable
TWikiVariable should be enough.
--
RichardDonkin - 04 Jan 2006