Feature Proposal: Thin Prefs Mechanism
Motivation
TWiki deals with preferences like a stack: topic preferences overrides web preferences, that override plugins preferences, and so on. Also, there are some requirements:
- Code to
get a preference value must be fast, since it's called a lot of times
- Preferences state must be able to be saved and restored (used by
INCLUDE)
- There must be a way to take preference values from a web/topic without considering the stack (used by access control mechanism)
The current implementation is exactly a stack: each level corresponds to a topic/web/plugin/etc. I guess it is done this way because of (b) above. However suppose I have a deep stack and I want a value that is defined on the first level. I'd have to search for the definition on every level! To improve performance, the current code
copies all values from one level into another: the second level has the values defined on the second level
plus those defined on the first. The third level copies all values from second level, including those copied from the first... considering that
each plugin defines a level we have about 14 levels by the end of
TWiki::new. We have
all preferences defined on
TWiki.TWikiPreferences 14 times on memory! (
this growth seems to be exponential...)
IMHO this is too much memory. Memory is cheap, but is not free.
The tradeoff in the prefs code (once the prefs are loaded) is of course between the storage used to represent the Prefs, and the cost of looking them up. Since prefs cost very little to store, but are looked up many, many times, I chose to use the flat-copy approach you describe above. There are a small number of levels, each of which contains the P(L-1) preferences from the level above plus the P(L) preferences from the current level. The worst-case is where all N prefs are defined in P(0) (i.e. in TWikiPreferences) and copied L times. There are rarely more than 200 prefs in the system, and you mention the number 14 for L, so memory used in the worst case is 200 X 14 = a 2800 entry hash (i.e. peanuts, and it grows linearly, not exponentially). -- CrawfordCurrie - 15 Oct 2008
Yeah, I just checked it mathematically and the growth is linear for most cases, where each level adds a few preferences. If all levels defined the same number k preferences, then the growth is quadratic (that is the worst case for the growth. The worst case for storage is to define all preferences on the first level, as you showed). Sorry for the first unchecked statement. -- GilmarSantosJr - 15 Oct 2008
Description and Documentation
I developed a new
TWiki::Prefs based on:
- It should have at least similar performance to the current one
- As little redundant copies as possible
- It should be 100% compatible with the current (so it could be a drop-in replacement)
- It should be back-end agnostic, so it doesn't matter how the preferences are stored/loaded (flat text files that are parsed for each request, Storable on-disk cache, etc)
Here how it works:
-
TWiki::Prefs is the front-end and TWiki::Prefs::* are the back-ends
- The front-end deals with preferences logic.
- The back-ends deal with the way preferences are stored/loaded.
- The front-end:
- has a bitmap associated with each preference. This bitmap indicates the levels where the preference is defined
- There is a level list: each element corresponds to a level and it's a simple back-end object
- Also, there is a hash that maps
"$web.$topic" to back-end objects (the same as above)
- This way I have preferences values of each topic at most once (if the back-end loads it into memory)
- There is a level index, so it's possible to know where we are
- There is a hash that maps preferences to the level where they were finalized.
- There is a list to hold runtime-defined values (with
setPreferencesValue)
- The back-end:
- Initializes itself
- It's able to return a list of preferences defined (only the keys, not the values)
- It's able to return a local-defined preference value (
Local, instead of Set)
- It's able to return a preference value
- Finalizes itself
But how can it be fast to get a value?
To simplify the example, lets suppose we are dealing with preferences stacks smaller than 8 levels. Suppose a preference map is
01011101. For this example lets assume that the rightmost bit is bit
0 (and corresponds to level
0). This preference is defined on levels
0,
2,
3,
4 and
6. We want the value from the 6th level. This bitstring, converted to a decimal number, is
93. Lets take the integer part of the logarithm of this number on the base 2:
int(log2(93)) = int(ln(93)/ln(2)) = int(6.54) = 6. Hey, we got the level where the preference is defined! This rule always holds ( The mathematical proof is left as an exercise

). So, I can figure out from which level I should take the value with a simple O(1) operation. Then I ask the backend object at level
6 to give me the value I want.
Note: the implementation doesn't use strings, but
bitstrings
With this design it's easy to save state: it's enough to return the level index. The recover process is a bit slow and complicated (compared to the current implementation), more specifically the map: the idea is to set to
0 all bits above the level we want to restore. To achieve this I build a special bitmap that has only
1 from level 0 to the one I want to restore, then I perform a bitwise
AND. Taking from our previous example: if we want to restore to level
3, then
(2 ** (3+1)) - 1 = 15 = 00001111b.
00001111 & 01011101 = 00001101, that correspond to the first 4 levels.
Since I have only one copy of preference values of each topic and I must satisfy
(c), I need a hash of values associated with each level, used to store custom settings (those set with
setPreferencesValue).
This design has another interesting part: the backends. I wrote two:
TWiki::Prefs::TopicRAM and
TWiki::Prefs::HASH. The first is used to get this mechanism working without adding dependencies. The second is used to store custom settings. I could develop a
TWiki::Prefs::TopicDBM, for example, and store all preferences values from all topics in a DBM. Probably, there would be a performance gain, since preferences would not be extracted from topics for each view. The database would be updated at
save time. I would have almost nothing in RAM and DBM itself uses shared memory among concurrent processes making requests. I don't know if the performance loss to get a preference would be noticeable (DBM is fast, but plain hashes are faster).
The bitstring approach is fine, and bears little additional runtime cost (though of course it will have to be rethought if we ever need >32 levels). But I do wonder if your (memory, CPU) benchmarks indicate any measurable performance advantage from having done this? I doubt it. But I also doubt this is the real reason for your refactoring.
The real problem with preferences is the time required to load the preferences, not the storage or post-load runtime. Each time a preference level is loaded it requires a text parse of a topic, which is slow - many, many, many times slower, and more memory hungry, than any prefs storage model. Because prefs are so intimately bound up with the TWiki architecture, it would be really hard to do anything to improve performance that doesn't involve cacheing the values read from these topics, for example in a DB. So readers, don't get misdirected by the above; I believe the real justification for this refactor is to improve the prefs architecture to make use of such a cache DB easier. This is a sufficient justification in itself, but please let's make sure we document the correct reasons for doing this. -- CrawfordCurrie - 15 Oct 2008
The bitstring approach works with any number of levels. I use perl's vec: vec($a,127,1) = 1 will set to 1 the 7th bit of the 16th byte. In the code I always use the last byte of the string to do the calculation. Refer to the code ;). I simplified the above example to jump this part, that is not relevant to get the underlying idea.
Believe me: my real motivation for this was memory usage, but I got very disappointed when my tests didn't pointed any improvement. My assumption was based on a dump of twiki object by the end of request: most of it were preferences. The second (but not less important) motivation was to avoid preferences parsing for each request. I wanted a way to put them in a DBM at save time (I started to think about this last year, so the DatabaseStore way was not so strong as it is now) and improve startup performance.
After this experiment, I think that most memory footprint is due to read lots of whole files in memory. Please, correct me if I'm wrong again. -- GilmarSantosJr - 15 Oct 2008
Notes
- Sorry about code documentation. I coded almost everything today and I wanted results. I hope the code to be clear and if not, the above description should help
- I changed
TWiki::Prefs::Parser to be a singleton, like described at Perl Design Patterns site
. It made possible to call TWiki::Prefs::Parser->new() whenever I needed without knowing about a global variable
- I didn't implement
stringify method (I'm very sleepy to this)
-
$type argument is supported, but almost ignored. The only useful $type is Local. All other are used only on stringify method... (how about purging them? What they are for?)
Impact
Implementation
See attached code. Unzip it over your checkout root (
core dir). All unit tests passed.
--
Contributors: GilmarSantosJr - 28 Sep 2008
Discussion
I know about "no new code on stable release", but considering this is a refactor (not a new feature), how about considering it also for
FreetownRelease? (given enough test, of course).
--
GilmarSantosJr - 28 Sep 2008
By tossing new stuff into 4.2.x, we're making a mockery of the point of the numbering system - to communicate the level of change people can expect.
I'm quite open to the ideas of a 4.3 stream though - as 5.0 has pretty much died for now (my part is due to
many other commitments, and a lack of motivation brought about via the governance issues).
instead of unzip over - how about refactoring the current code into a
OldStylePrefsContrib, and yours as the
NewHotnessPrefsContrib ? then a build could change what we ship
just by changing the MANIFEST include lines..
--
SvenDowideit - 28 Sep 2008
I agree that there is no way to put something like this into 4.2. This is not a code re-factor. It is a totally different implementation.
4.2.X is bug fixes only. That is the whole idea of patch level releases.
I am not sure I fully understand why we will want to persue this.
On the 5.0 roadmap we are looking for an optimized storage where preferences and access rights are in indexed fast lookup DB storage. I see this as a diversion away from the straight path to this. If we start working on a 4.3 we kill any dream of a 5.0. We do not have the development bandwidth to work on both. I fear we channel the energy away from working on the technology that we desperately need to make TWiki survive against competition.
Why not work on a DB storage for the preferences instead?
The technology suggested in this feature proposal could be used in this storage to make the speed extremely fast.
Making the 5.0 DB storage in small chunks is in my opinion the best way to do it. On the
DatabaseStoreTopics topic I tried to create a starting point where the DB storage is split into a set of definition topics for the different aspects of a DB storage trying to chop the whole thing into seperate manageable chunks. This one would go into either the meta
DatabaseStorageMetaData or perhaps a separate
DatabaseStoragePreferences. There is also an old
DatabaseForPreferences which we can take inspiration from.
--
KennethLavrsen - 28 Sep 2008
Ok, I was already expecting a "no" to 4.2. That makes sense
I liked your idea Sven!
Kenneth, this proposal is not about the way preferences are stored, but the way they are managed at runtime. Even if we put preferences on an optimized storage the current code would still make tons of unnecessary copies. Notice that the current code is composed of:
-
TWiki::Prefs, whose interface is used by core and Plugins
-
TWiki::Prefs::PrefsCache, that is used only by the one above
-
TWiki::Prefs::Parser, that extract preferences from topics and is used only by TWiki::Prefs::PrefsCache
If we change storage mechanism the only part that would change on the current implementation would be the parser. The current code is already
very fast, except for the part of parsing preferences for each request, that is a consequence of using flat text files as primary storage backend.
Notice that this proposal make it very easy to take data from other sources. After defining the interface of
DatabaseStore in respect to preferences, it's easy to write a
TWiki::Prefs::TopicDB
My main motivation for this is memory usage, not directly speed. (Although at least unit tests always run a bit faster with this new implementation than with the current)
--
GilmarSantosJr - 28 Sep 2008
Nice work! Good to see active work on TWiki's performance.
--
ArthurClemens - 28 Sep 2008
If the implementation is inline of a later DB storage then my concern is over.
And Arthur is right. It also warms my heart to see these kinds of proposals coming up.
--
KennethLavrsen - 28 Sep 2008
Having preferences in a database store might not be desirable. What databases can do best is taking care of indexes and process queries to extract results from large datasets. Reading preference info bits is not a typical database operation. The win of having them in a database is levelled by the costs to access it, e.g. prepairing and
SQL query.
However, I'd prefer to see the preferences, once computed, to be serialized into a cache on disk, e.g. using perl's Storable module. You probably won't have to recompute them on every click. The current
PrefsCache unfortunately does not do this. It barely is a cache at all.
--
MichaelDaum - 03 Oct 2008
I know I'm probably the only mod_perl advocate around, but from what I can see, you've created a (some?) object(s) that will hold these preferences.
We could simply re-use this object, and only change the top layer (the user one) for each call, which would save something like up to 13 level, if I understood your document properly.
So no need to implement some caching when you can use existing ones.
"Reinventing the toothbrush is OK, but you shouldn't re-invent the wheel".
And yes, I know that not everybody uses mod_perl, and I'm sure
GilmarSantosJr will disagree with me

but most probably using mod_perl programming features should be re-usable on other technologies (if not automagically granted).
--
OlivierRaginel - 03 Oct 2008
My first idea about preferences mechanism, when I studied the current implementation, was to reuse a generic object among many requests. But that would be useful only with mod_perl and would not help that much other mechanisms. Then I had this idea.
The objects doesn't necessarily hold the preferences. If we have them stored at fast indexed database, for example, the preferences values would come to memory only when they were needed. With this design, the preferences object is so small and fast to build that doesn't worth it to share among requests (but for sure, if any persistent engine is in use, we can leave it pre-built for the first levels, that are
always the same).
--
GilmarSantosJr - 03 Oct 2008
Sure, something like: if mod_perl, reuse, else, load from Storable.
Just need to take care of refreshing the cache upon save of lower levels, but this shouldn't be much of an issue.
Sounds good to me, Micha, what do you think?
--
OlivierRaginel - 03 Oct 2008
"if ( mod_perl ) { ... } else { ... }" is something I try to avoid, because it easily would become "if ( mod_perl ) { ... } elsif ( fast_cgi ) { ... } elsif ( http ) { ... } elsif...", that doesn't sound good to me
If twiki core doesn't know about execution mechanism it is easier to maintain, to test and it keeps portable (in respect to web servers, operating systems and even versions of the same, like mod_perl 1.x and 2.x).
--
GilmarSantosJr - 03 Oct 2008
Caching the preferences in ANY form of "database" (rdbms, tied hash, Storable object,etc) that allows nearly O(1) access without the need to read at least 5 topics (
TWiki.TWikiPrefernces, TWikiPreferences, WebPreferences, Main.USERTOPIC and the topic being viewed) is a gain.
--
RafaelAlvarez - 03 Oct 2008
Gilmar, I totally agree. But with mod_perl, one could simply do:
my $prefs = new TWiki::Prefs;
And this would return the cached object if available, otherwise load it from, as Rafael points out, ANY kind of database. Also Rafael, you forgot all the plugins preferences, which makes a few more pages too.
--
OlivierRaginel - 03 Oct 2008
There are some plugins that combined both: keeping stuff in memory as long as possible (using global hashes or whatewver), and save/reload them from disk as needed. Surely, this is only viable for objects of a certain kind. But I think a prefs
memory cache, rebuild on demand, is one such case. That's actually memory I'd like to trade for speed.
--
MichaelDaum - 03 Oct 2008
I just extended a little the architecture documentation above. But
there is no substitute to reading the code.
Olivier, that could be true not only to mod_perl, but to all persistent engines. It's not too difficult to add this behavior and it doesn't depend on any mod_perl specific feature.
--
GilmarSantosJr - 03 Oct 2008
I see good discussion related to implementation but noone against or concerned so accepted by 14-day rule / consensus.
--
KennethLavrsen - 13 Oct 2008
Gilmar you rock.
--
MichaelDaum - 14 Oct 2008
Thanks, Michael!
I'll implement
stringify() method before merging into core, and consequently add
$type support. And also write code documentation.
--
GilmarSantosJr - 14 Oct 2008
CrawfordCurrie added some comments
in red to the intro.
I added some comments in
blue above. I came up to this design as an easy way to put preferences in a DBM without breaking compatibility to the current implementation, I mean: it can be a drop-in replacement without extra dependencies. My first idea (last year) was to release this modification to core and a DBM back end as a contrib. I think DBMs are fast enough to drop the cache need, but I don't have much experience with them (actually I never used DBM nor Cache::* before), so it was also my initial plan to compare both approaches.
--
GilmarSantosJr - 15 Oct 2008
I refactored following discussion on cacheing to
ImprovingPerformanceUsingCaching. Please continue cacheing discussion there.
- --
CrawfordCurrie - 16 Oct 2008
I am setting this to parked and no committed developer. Please feel free to flip that and own & implement.
--
PeterThoeny - 2010-08-01