Cache WebRss Feeds
A look at
WebStatistics shows one big CPU hog:
Getting 1
WebRss involves 2 formatted searches,
so it is a very expensive operation.
It is pulled more often, than the next 10 most popular topics together.
So I dare to propose:
replace
WebRss with a static entry declaring it is (temporarily) out of service.
- BTW
- does anybody really read the feed? Any reports on TWikiSyndication actually working properly on a reader?
--
PeterKlausner 25 Sep 03
12416 hits in 25 days is 496 hits per day, an average of 20 hits per hour. I wonder if this would cause a hog. I do read the feed besides .changes, although I don't rely on it (sometimes alreay read topic links that are updated are not placed at the top of the list. Must be a bug with my NetNewsWire).
Correction: I read the topic
titles in the newsfeed reader, not the summaries, because these never change (only take the first x characters) so this is of no use when following a topic discussion. It would be more useful to use the diff output somehow.
--
ArthurClemens - 25 Sep 2003
Granted, it's not
WebRss
Arthur's math is right;
WebRss cannot possibly cause the slowdown.
Doesn't buy anything to turn it off.
So once it works for me, I would really like to use
TWikiSyndication.
--
PeterKlausner - 25 Sep 2003
Nevertheless, it would pay of to cache
WebRss. This could be implemented relatively easily with a
WebRssCachePlugin. Spec:
- WebRss contains just a %WEBRSSFEED% tag
- The Plugin gets active with this tag:
- Create a cache of the RSS feed; refresh criteria is the latest entry in
data/Web/.changes
- Store the cache in the attachments directory of WebRss as
pub/Web/WebRss/_cache.txt
- Return cache data
This would solve two issues: Server load and speed of RSS feed.
--
PeterThoeny - 26 Sep 2003
releated to this - I wans wondering if we shouldn't consider caching SEARCHes too. If two of the same SEARCH query happen and there have been no topic changes, is there any way at all that the answer could be different?
--
SvenDowideit - 16 Jan 2004
Wouldn't that mean that when one topic changes, all cached searches are invalidated? I think that in almost all cases a topic will be changed before the same search query is entered.
--
ArthurClemens - 16 Jan 2004
However a date-time could be kept for each cached search and then when one is re-requested a check is performed to see if the search pattern matches any topics that have changed since that time. If it doesn't then the date-time for the cached search is chaged to the current time and it is re-used; if there are topics that have changed which match the search pattern then it must be re-calculated.
--
SamHasler - 16 Jan 2004
Yes, any edit will invalidate the cache. But i suspect that there are a large numbe rof repeated SEARCHes in between each edit.
WebRss is just a single case of this. I wouldn't bother doing extra work to test for the validity of the cache as your reducing the speed of the cache.
but if the work is done, it will be good to test the idea
--
SvenDowideit - 17 Jan 2004
The Codev web's
WebRss feed has in average 3 accesses per minute. Caching the feed helps reduce the load on the server.
I just created a cached
WebRssTest feed based on the
VarCachePlugin. It does not need any parameters, the SKIN = rss setting and the cache settings are hidden in
HTML comments. This
should work,
XML allows comments. Could you test it out and report any feedback here? If successful I will enable it on all TWiki.org webs to reduce the load on the server.
What is a reasonable cache time? For now I set it to 0.1 hours (6 minutes)
--
PeterThoeny - 19 Sep 2005
Well, set it to
30 minutes. That's what slashdot does. There's only low edit trafic on twiki.org to fear not being up-top-date and there's no time-critical mission. Btw, if you pull slashdot's rss beyond that boundary for too often, you get blacklisted for 72 hours (
no hint). They too
fight hight trafic generated by rss requests.
--
MichaelDaum - 20 Sep 2005
It works brilliantly here, with Mozilla Firefox. 30 minutes sounds a bit to the "high side" to me, at times you're involved in a discussion-like topic evolvement, and 30 minutes becomes dreadful. Personally I like the 6 minutes better - especially as an alternative to producing a "personal" RSS-feed in the sandbox or refreshing topics to look for updates.
If the 6-minute cache time is enough to help the server, I vote we leave the setting there.
--
SteffenPoulsen - 20 Sep 2005
OK, I enabled caching of the RSS feeds for the Codev, Main, Plugins, Sandbox, Support and TWiki web. Caching is done for 15 minutes max. To use caching, the
WebRss topic
must be called without any URL parameter (the
VarCachePlugin does not cache topics if there are parameters).
Appeal to all folks using RSS feeds on TWiki.org: Please help reduce the load on the TWiki.org server by removing the
?skin=rss parameter or
?skin=rss&contenttype=text/xml parameter from TWiki.org's RSS URLs. That is, in your news reader specify
http://twiki.org/cgi-bin/view/Codev/WebRss
instead of
http://twiki.org/cgi-bin/view/Codev/WebRss?skin=rss
--
PeterThoeny - 23 Sep 2005
is
mod_rewrite
installed on the server? this could
all be handled server-side.
--
WillNorris - 23 Sep 2005
Good point. Lets ask Sven.
--
PeterThoeny - 23 Sep 2005
a handy mod_rewrite
cheat sheet
--
WillNorris - 23 Sep 2005
Thanks a bunch! Some RSS requests are now accessed without a parameter. Until yesterday, the top command showed 0% CPU idle most of the time during daytime in the USA. Now it fluctuates between a few percent and 90%, guestimating an average of 30%. We can improve that further if more folks remove the parameter from the RSS feeds.
--
PeterThoeny - 23 Sep 2005
I was frolicking too early, we are now solid at 0% idle again. Current high traffic is mainly due to spiders, large part of it from one IP address (66.249.66.98) of Google. This IP address accessed 1562 topics in the last 60 minutes (vs. 698
WebRss requests). This looks like a misconfigured spider. I
filed a request
to reduce the hit rate.
--
PeterThoeny - 23 Sep 2005
And we have currently
many new registrations due to a Freshman Academy Orientation of Western Oregon University. Lately we have in average around 25 new registrations a day, today there are already over 70.
--
PeterThoeny - 23 Sep 2005
This table indicates total percentage of
WebRss requests with parameters removed:
| Date |
Percent |
| 2005-09-24 |
10% |
| 2005-09-25 |
17% |
| 2005-09-28 |
23% |
| 2005-10-01 |
29% |
| 2005-10-05 |
32% |
| 2005-10-13 |
41% |
| 2005-10-24 |
51% |
| 2005-10-31 |
54% |
| 2005-12-19 |
74% |
--
PeterThoeny - 25 Sep 2005
to force all old-style RSS requests to just
WebRss couldn't
.htacess be leveraged? e.g.
Alias /blahblah/WebRss?skin=rss /blahblah/WebRss ?
--
MattWilkie - 26 Sep 2005
Not generically, since RSS requests with a
search parameter should be retained. See
TWiki.WebRssBase
--
PeterThoeny - 28 Sep 2005
How many people actually don't read the output of their aggregator?
--
MartinCleaver - 06 Oct 2005
I do not know. A lot do not seem to read them. The percentage in above table climbs steadily though. It won't reach near 100% since some people use the search parameter to narrow down a feed.
--
PeterThoeny - 07 Oct 2005
One third of the topic views on TWiki.org is caused by RSS feeds (151K of total 452K views in the last week). The majority of that is already cached with the
VarCachePlugin. Nevertheless, the CPU has to work a lot for those feeds since it is still a topic view. I added a new caching mechanism for the Codev, Main, Plugins, Sandbox, Support and TWiki web: RSS feeds are now cached and served as static
HTML pages. This is done transparently with an Apache rewrite rule, e.g. if you access:
https://twiki.org/cgi-bin/view/Codev/WebRss
you will be served with:
http://twiki.org/feeds/CodevWebRss.xml
The
HTML files are updated once every 15 minutes. If you prefer to access the static
HTMLs without rewrite, here are the URLs:
This caching should make TWiki.org more responsive. It will affect the
TWikiOrgStatistics since RSS feed requests without parameters are no longer in the TWiki logs.
I summarized the
HowToCacheRssFeedsWithRewriteRule.
--
PeterThoeny - 09 Mar 2006