Tags:
caching1Add my vote for this tag stale_content1Add my vote for this tag syndication1Add my vote for this tag create new tag
, view all tags

Cache WebRss Feeds

A look at WebStatistics shows one big CPU hog:

hits topic
12416 WebRss
2484 WebHome
1395 WebChanges
1059 CairoRelease

Getting 1 WebRss involves 2 formatted searches, so it is a very expensive operation. It is pulled more often, than the next 10 most popular topics together.

So I dare to propose: replace WebRss with a static entry declaring it is (temporarily) out of service.

BTW
does anybody really read the feed? Any reports on TWikiSyndication actually working properly on a reader?

-- PeterKlausner 25 Sep 03

12416 hits in 25 days is 496 hits per day, an average of 20 hits per hour. I wonder if this would cause a hog. I do read the feed besides .changes, although I don't rely on it (sometimes alreay read topic links that are updated are not placed at the top of the list. Must be a bug with my NetNewsWire).

Correction: I read the topic titles in the newsfeed reader, not the summaries, because these never change (only take the first x characters) so this is of no use when following a topic discussion. It would be more useful to use the diff output somehow.

-- ArthurClemens - 25 Sep 2003

Granted, it's not WebRss

Arthur's math is right; WebRss cannot possibly cause the slowdown. Doesn't buy anything to turn it off. So once it works for me, I would really like to use TWikiSyndication.

-- PeterKlausner - 25 Sep 2003

Nevertheless, it would pay of to cache WebRss. This could be implemented relatively easily with a WebRssCachePlugin. Spec:

  • WebRss contains just a %WEBRSSFEED% tag
  • The Plugin gets active with this tag:
    • Create a cache of the RSS feed; refresh criteria is the latest entry in data/Web/.changes
    • Store the cache in the attachments directory of WebRss as pub/Web/WebRss/_cache.txt
    • Return cache data

This would solve two issues: Server load and speed of RSS feed.

-- PeterThoeny - 26 Sep 2003

releated to this - I wans wondering if we shouldn't consider caching SEARCHes too. If two of the same SEARCH query happen and there have been no topic changes, is there any way at all that the answer could be different?

-- SvenDowideit - 16 Jan 2004

Wouldn't that mean that when one topic changes, all cached searches are invalidated? I think that in almost all cases a topic will be changed before the same search query is entered.

-- ArthurClemens - 16 Jan 2004

However a date-time could be kept for each cached search and then when one is re-requested a check is performed to see if the search pattern matches any topics that have changed since that time. If it doesn't then the date-time for the cached search is chaged to the current time and it is re-used; if there are topics that have changed which match the search pattern then it must be re-calculated.

-- SamHasler - 16 Jan 2004

Yes, any edit will invalidate the cache. But i suspect that there are a large numbe rof repeated SEARCHes in between each edit. WebRss is just a single case of this. I wouldn't bother doing extra work to test for the validity of the cache as your reducing the speed of the cache.

but if the work is done, it will be good to test the idea wink

-- SvenDowideit - 17 Jan 2004

The Codev web's WebRss feed has in average 3 accesses per minute. Caching the feed helps reduce the load on the server.

I just created a cached WebRssTest feed based on the VarCachePlugin. It does not need any parameters, the SKIN = rss setting and the cache settings are hidden in HTML comments. This should work, XML allows comments. Could you test it out and report any feedback here? If successful I will enable it on all TWiki.org webs to reduce the load on the server.

What is a reasonable cache time? For now I set it to 0.1 hours (6 minutes)

-- PeterThoeny - 19 Sep 2005

Well, set it to 30 minutes. That's what slashdot does. There's only low edit trafic on twiki.org to fear not being up-top-date and there's no time-critical mission. Btw, if you pull slashdot's rss beyond that boundary for too often, you get blacklisted for 72 hours (no hint). They too fight hight trafic generated by rss requests.

-- MichaelDaum - 20 Sep 2005

It works brilliantly here, with Mozilla Firefox. 30 minutes sounds a bit to the "high side" to me, at times you're involved in a discussion-like topic evolvement, and 30 minutes becomes dreadful. Personally I like the 6 minutes better - especially as an alternative to producing a "personal" RSS-feed in the sandbox or refreshing topics to look for updates.

If the 6-minute cache time is enough to help the server, I vote we leave the setting there.

-- SteffenPoulsen - 20 Sep 2005

OK, I enabled caching of the RSS feeds for the Codev, Main, Plugins, Sandbox, Support and TWiki web. Caching is done for 15 minutes max. To use caching, the WebRss topic must be called without any URL parameter (the VarCachePlugin does not cache topics if there are parameters).

ALERT! Appeal to all folks using RSS feeds on TWiki.org: Please help reduce the load on the TWiki.org server by removing the ?skin=rss parameter or ?skin=rss&contenttype=text/xml parameter from TWiki.org's RSS URLs. That is, in your news reader specify http://twiki.org/cgi-bin/view/Codev/WebRss instead of http://twiki.org/cgi-bin/view/Codev/WebRss?skin=rss

-- PeterThoeny - 23 Sep 2005

is mod_rewrite installed on the server? this could all be handled server-side.

-- WillNorris - 23 Sep 2005

Good point. Lets ask Sven.

-- PeterThoeny - 23 Sep 2005

a handy mod_rewrite cheat sheet

-- WillNorris - 23 Sep 2005

Thanks a bunch! Some RSS requests are now accessed without a parameter. Until yesterday, the top command showed 0% CPU idle most of the time during daytime in the USA. Now it fluctuates between a few percent and 90%, guestimating an average of 30%. We can improve that further if more folks remove the parameter from the RSS feeds.

-- PeterThoeny - 23 Sep 2005

I was frolicking too early, we are now solid at 0% idle again. Current high traffic is mainly due to spiders, large part of it from one IP address (66.249.66.98) of Google. This IP address accessed 1562 topics in the last 60 minutes (vs. 698 WebRss requests). This looks like a misconfigured spider. I filed a request to reduce the hit rate.

-- PeterThoeny - 23 Sep 2005

And we have currently many new registrations due to a Freshman Academy Orientation of Western Oregon University. Lately we have in average around 25 new registrations a day, today there are already over 70.

-- PeterThoeny - 23 Sep 2005

This table indicates total percentage of WebRss requests with parameters removed:

Date Percent
2005-09-24 10%
2005-09-25 17%
2005-09-28 23%
2005-10-01 29%
2005-10-05 32%
2005-10-13 41%
2005-10-24 51%
2005-10-31 54%
2005-12-19 74%

-- PeterThoeny - 25 Sep 2005

to force all old-style RSS requests to just WebRss couldn't .htacess be leveraged? e.g. Alias /blahblah/WebRss?skin=rss /blahblah/WebRss ?

-- MattWilkie - 26 Sep 2005

Not generically, since RSS requests with a search parameter should be retained. See TWiki.WebRssBase

-- PeterThoeny - 28 Sep 2005

How many people actually don't read the output of their aggregator?

-- MartinCleaver - 06 Oct 2005

I do not know. A lot do not seem to read them. The percentage in above table climbs steadily though. It won't reach near 100% since some people use the search parameter to narrow down a feed.

-- PeterThoeny - 07 Oct 2005

One third of the topic views on TWiki.org is caused by RSS feeds (151K of total 452K views in the last week). The majority of that is already cached with the VarCachePlugin. Nevertheless, the CPU has to work a lot for those feeds since it is still a topic view. I added a new caching mechanism for the Codev, Main, Plugins, Sandbox, Support and TWiki web: RSS feeds are now cached and served as static HTML pages. This is done transparently with an Apache rewrite rule, e.g. if you access:

http://www.twiki.org/cgi-bin/view/Codev/WebRss

you will be served with:

http://twiki.org/feeds/CodevWebRss.xml

The HTML files are updated once every 15 minutes. If you prefer to access the static HTMLs without rewrite, here are the URLs:

Codev web: http://twiki.org/feeds/CodevWebRss.xml
Main web: http://twiki.org/feeds/MainWebRss.xml
Plugins web: http://twiki.org/feeds/PluginsWebRss.xml
Sandbox web: http://twiki.org/feeds/SandboxWebRss.xml
Support web: http://twiki.org/feeds/SupportWebRss.xml
TWiki web: http://twiki.org/feeds/TWikiWebRss.xml

This caching should make TWiki.org more responsive. It will affect the TWikiOrgStatistics since RSS feed requests without parameters are no longer in the TWiki logs.

I summarized the HowToCacheRssFeedsWithRewriteRule.

-- PeterThoeny - 09 Mar 2006

Edit | Attach | Watch | Print version | History: r30 < r29 < r28 < r27 < r26 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r30 - 2006-03-09 - PeterThoeny
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.