InvestigatingTWikiPerformance < Codev

Tags: view all tags

Over the years there have been many attempts to characterise TWiki performance, some more successful than others. Performance evaluation falls into two main areas:

End user performance. This is the performance perceived by a TWiki user sitting at their browser, working with TWiki. It's best measured using ab (the apache benchmark tool), and is the performance that most people want to hear about when we discuss "performance" in public.
More focused evaluation of code runtime. This is primarily of interest to developers, as it's the way we identify hotspots in the code so they can be optimised, and is the subject of this topic.

In the past we have made use of performance analysis tools such and CPAN:Develop::Dprof and CPAN:Develop::SmallProf to try and analyse TWiki performance. These profilers suffer from a major problems. They rely on time-based sampling. This works well in an environment where a few functions absorb the bulk of the runtime, but isn't that helpful when a lot of smaller functions are run sequentially, as the sample rate can end up sampling a pseudo-random selection of functions. This results in different performance information for each run, even over the same data. The sampling is highly sensitive to perl's memory management, which in turn depends on the what the operating system is doing. The conclusion of my own experience is that these tools are useless for profiling TWiki performance, except in the rare case where something in the code is so bad that it stands out like a sore thumb. TWiki is simply too fast to be usefully profiled this way.

A more accurate measure is obtainable by instrumenting, where the code is instrumented with special statements that record runtime at different points in the code. This is the basis of my methodology.

In brief, the approach is use a class called Monitor, which is used like ASSERT - unless the TWIKI_MONITOR environment variable is defined, the functions of this package are nops and are optimised out of existance by the perl compiler. The Monitor class uses CPAN:Benchmark to capture runtime snapshots at various points, and outputs a report at the end of the runtime.

I'm going to make a contentious statement. As perl programs go, TWiki does not perform badly.

Let's take a look at the detail of the runtime for a particular topic; in this case, I'm interested in the compilation time, which has often been pilloried. To generate the trace below, I added a BEGIN block that is executed as each module just completed compilation, so any line that says "XXX begin" is a reasonable reflection of the compilation time for that module. Out of the box TWiki, typical configuration (Template login) and I'm rendering Main:

Event	Delta	Absolute
TWiki::Sandbox begun	0.006006 wall 0.01 CPU	0.006006 wall 0.01 CPU
Static configuration loaded	0.00957608 wall 0.01 CPU	0.0155821 wall 0.02 CPU
TWiki::Access begun	0.00133896 wall 0.00 CPU	0.016921 wall 0.02 CPU
TWiki::Meta begun	0.00286412 wall 0.00 CPU	0.0197852 wall 0.02 CPU
TWiki::Attrs begun	0.016588 wall 0.01 CPU	0.0363731 wall 0.03 CPU
TWiki::Store begun	0.018333 wall 0.02 CPU	0.0547061 wall 0.05 CPU
TWiki::Prefs::Parser begun	0.00115395 wall 0.00 CPU	0.05586 wall 0.05 CPU
TWiki::Prefs::PrefsCache begun	0.00152206 wall 0.00 CPU	0.0573821 wall 0.05 CPU
TWiki::Prefs begun	0.00163388 wall 0.00 CPU	0.059016 wall 0.05 CPU
TWiki::Attach begun	0.00482416 wall 0.01 CPU	0.0638402 wall 0.06 CPU
TWiki::Form begun	0.00489783 wall 0.00 CPU	0.068738 wall 0.06 CPU
TWiki::Access begun	0.00360608 wall 0.00 CPU	0.0723441 wall 0.06 CPU
TWiki::Plugin begun	0.00256109 wall 0.01 CPU	0.0749052 wall 0.07 CPU
TWiki::Func begun	0.00801897 wall 0.01 CPU	0.0829241 wall 0.08 CPU
TWiki::Plugins begun	0.00342083 wall 0.00 CPU	0.086345 wall 0.08 CPU
TWiki::Render begun	0.0166001 wall 0.02 CPU	0.102945 wall 0.10 CPU
TWiki::Search begun	0.00997305 wall 0.01 CPU	0.112918 wall 0.11 CPU
TWiki::Templates begun	0.00250697 wall 0.00 CPU	0.115425 wall 0.11 CPU
TWiki::LoginManager begun	0.00718594 wall 0.01 CPU	0.122611 wall 0.12 CPU
TWiki::Users begun	0.00268412 wall 0.00 CPU	0.125295 wall 0.12 CPU
TWiki::I18N begun	0.00232196 wall 0.00 CPU	0.127617 wall 0.12 CPU
TWiki::UI begun	0.026973 wall 0.03 CPU	0.15459 wall 0.15 CPU
TWiki::UI:View begun	0.00332785 wall 0.00 CPU	0.157918 wall 0.15 CPU

As we can see, each module takes 1 or 2 centiseconds to compile. Some modules are physically bigger than others, and they take longer to compile. This "compile time" accounts for ~0.15 seconds which we can take as a constant - it's time that has to be spent irrespective of the page being viewed.

The next job is creating the session, and that's a big job - isn't it?

About to create session	0.0080719 wall 0.01 CPU	0.166477 wall 0.16 CPU
TWiki::Users::TWikiUserMapping beginning	0.0219061 wall 0.02 CPU	0.188383 wall 0.18 CPU
TWiki::Users::TWikiUserMapping begun	0.00474095 wall 0.01 CPU	0.193124 wall 0.19 CPU
Htpasswd begun	0.001158 wall 0.00 CPU	0.194282 wall 0.19 CPU
TWiki session created	0.044009 wall 0.04 CPU	0.238291 wall 0.23 CPU

Big? I guess so, it took 0.07s. But that included reading all the TWiki preferences, so altogether I think that's not too shabby. So, as this is a view, access permissions need to be checked:

Access checked	0.000117064 wall 0.00 CPU	0.238408 wall 0.23 CPU

Access checked

0.000117064 wall 0.00 CPU

0.238408 wall 0.23 CPU

OK, now we can start the view proper. We have 0.23s of "constant time" under our belts, that we can't do much about, haven't we? Well, yes, we do, but only because in the course of this exercise I changed a use File::Temp to require File::Temp, saving 0.1s. And I commented out use Carp (:fatalsToBrowser) which saved another 0.05s. (So there are still proportionally significant gains to be found through simple analysis of the compilation process).

Now, this is Pattern skin, so there's a web left bar. When I run this same page with Classic skin, it renders in 0.27s i.e. the whole page rendering process takes 0.04s. But with Pattern skin, it takes:

END	0.110471 wall 0.11 CPU	0.530478 wall 0.52 CPU

END

0.110471 wall 0.11 CPU

0.530478 wall 0.52 CPU

Nearly twice as long. Why? Here's why:

Getting weblist	0.0866899 wall 0.09 CPU	0.325098 wall 0.32 CPU
Got weblist	0.094909 wall 0.09 CPU	0.420007 wall 0.41 CPU

When I switch off hierarchical webs, the total runtime drops back to 0.4s - so listing the hierarchical webs is taking ~0.12s. Note that I am using a leftbar with no twiki tips. When I include the twiki tips, the runtime increases by another 0.15s.

So, what does that mean for the end user? Well, we've established that the perl runtime for rendering this page is of the order of 0.5s. What does ab say?

Time per request:       630.933 [ms] (mean)

Same server, same load conditions, little or no network latency. Classic skin:

END	0.069102 wall 0.07 CPU	0.317981 wall 0.31 CPU

END

0.069102 wall 0.07 CPU

0.317981 wall 0.31 CPU

Time per request:       408.015 [ms] (mean)

OK, what lessons can we take from this?

The above analysis provides a way to baseline the code - eliminate the compilation phase, so we can focus in on the post-compilation execution that is the most time-consuming. It has already indicated simple changes that together achieve a (relatively) major improvement in static initialisation time.
Static initialisation time can be improved, but only by reducing the amount of code loaded; by whatever method. Peter has always argued that "extra modules penalise runtime", and he is dead right. We should do this, be we cannot expect major performance improvement from it.
The design of the skin can has huge impact. Much larger than optimising the code can ever deliver. What appear to be small productivity features in the skin can have major performance impacts.
If you are not using a feature, turn it off

My feeling after this exercise is that optimising the code will provide ever diminishing returns, and that far more would be gained by putting effort into (1) cacheing and (2) optimising the skins. That's not to say there aren't still some slow chunks of code after initialisation - I'm well aware of the performance issues with queries, and search generally is a bear - but I think there are many more accessible gains to be made by looking again at the skins. And just about anyone can optimise a skin - no perl experience required!

-- Contributors: CrawfordCurrie - 04 Jun 2007

Discussion

So the dynamic parts are eating seconds, no surprise. I've been using VarCachePlugin for our homepage. I wonder if that could be used selectively for left bar and top bar.

Pattern skin has a lot of icon decoration. Each icon creates a server up-to-date request, and that adds. It would be better if we would use 1 image file and use CSS to position it ("sliding doors technique"). Any ideas how we could make this work with no code change for the users? (global setting anyone...)

Crawford, did you test with locale? You didn't mention it, so I assume you did not. With a language selector and lots of MAKETEXT performance will be worse. We do need to try to use mo files instead of po files.

-- ArthurClemens - 04 Jun 2007

No, I didn't. Maybe some time, if I have time.

-- CrawfordCurrie - 04 Jun 2007

Crawford. Excellent work.

Some details. The ab is a very good tool for the perl part. But measuring how long it takes from the page is loaded and the skin takes to check that all the many files are up to date and run javascripts and calculate the page rendering adds up to 0.5 seconds. So skin developers can do a lot to optimize. BUT skin CPU time is not depending that much on server load because it is mainly happening in the client CPU. The perl code combined with many users doing many searches kills the server - and searching is naturally the worst bandit.

It seems the weblist could be a quick little win. How about having the list in a cache file like it was implemented with the language list used in the top bar? That seems like a simple thing that could give a little but good win. Updating the cache could be on the web manage page.

-- KennethLavrsen - 04 Jun 2007

about the web cache, I see a "small" problem: when should the cache be updated? If webs where created only through TWiki tools, it's trivial to update the cache on each web manipulation. But things get complicated when you can create webs manually (cp _default MyNewWeb -R). Perhaps this can be solved by providing a perl script to update the cache from the cli.

-- RafaelAlvarez - 04 Jun 2007

Sven tells me he once proposed having the list of webs in a configure variable. I wouldn't go quite that far, but there is definitely a case for having an "index file" in the root that has a list of webs. There is another reason for wanting this. At the moment webs have nowhere to hang meta-data - except by creating side files or topics in the webs themselves - which has been a PITA for a long time. A properly designed index file could provide that.

An index file would also provide a way (potentially) to support softlinks on Windows platforms, as it could record absolute paths to the web directories.

-- CrawfordCurrie - 05 Jun 2007

Another important factor of code analysis is how it scales: how is performance related to an increasing factor x with x = number of topics, number of users, number of requests, whatever.

Typically, the gain of caching a web list/a topic list/backlinks/user mapping info/templates will increase with the number of webs/topics/users/whatever up to a certain point where cache maintenance kicks in and the ratio stagnates or even starts dropping again.

We need to explore these kind of experiments to get to know our code.

-- MichaelDaum - 05 Jun 2007

I added a Monitor::This(module, method) which allows us to see how long a particular method takes, or even, all the methods in a module (if you leave method undefined). Its already helped me find one single performance hog in the usermapping code, and hopefull will allow us to do similar in other places.

-- SvenDowideit - 13 Jun 2007

Michael - I think your statement, about cache maintenance overheads that cause cache efficiency to drop as numbers scale up, is only true if the algorithm for cache replacement (deciding which cache entry to drop when adding new entry to a fixed size cache) is inefficient, i.e. O(N) not O(1). There are many efficient O(1) cache replacement algorithms - see this search for some papers, but note that the popular LRU (Least Recently Used) is one of them, implementable as a linked list ordered by last access time.

-- RichardDonkin - 16 Jun 2007

BasicForm
TopicClassification	BrainstormingIdea
TopicSummary
InterestedParties
RelatedTopics	TWikiCache

Topic revision: r12 - 2007-06-16 - RichardDonkin

Account
- Log In
- Register User

Edit
Attach

Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.