Feature Proposal: Refactor the Store to allow multiple plugable backends
Motivation
To provide the infrastructure for
DatabaseStore.
TWiki topics will continue to be distributed in Rcs store form - to get them into the Database, the 2 store's will need to be accessible at the same time.
Description and Documentation
In the current test implementation of
DatabaseStore, I have 2 stores loaded simultaneously, and am able to use
ManagingWebs to create a database store using an rcs store - though the UI will obviously need work to make that a useful thing.
MultiStoreRefactor means that when the web is created, the admin must choose what *Store the web will reside in. (Initially pub files will still be store in the current rcs form)
An interesting side effect will be that instead of softlinking several TWiki's webs into a data directory, you may be able to specify where the data is for each web.
Examples
Impact
Implementation
At the moment, I'm largely working on fixing unit tests and places in the core that were hard coded to think they can look at the file system to find out about topics.
--
Contributors: SvenDowideit - 11 Oct 2007
Discussion
This proposal seems to have jumped straight to Under implementation. Please try to keep the process so we can track and decide on the committed proposals.
For a proposal to go through the approval cycle it starts at
UnderInvestigation. And it requires a
CommittedDeveloper AND a
DateOfCommitment to show up on the
TWikiFeature04x02.
This proposal lacks a
DateOfCommitment. It also lacks a spec to decide on. It is straight in line with the roadmap decided in Rome so the motivation is right
I could add todays date but I think it is better to wait until Sven adds a spec to this topic. This is part of what will probably be the most important work in
GeorgeTownRelease
--
KennethLavrsen - 17 Dec 2007
Sven, great idea. Not sure it needs any core changes. You could simply plug in a kind of
TwinStore which is a thin layer to delegate any operation to two independent stores. Being fault-tolerant is then the tricky part. One might even think of plugging in a first full-featured store into the TwinStore, i.e., the RcsWrap store, that is used as a rock-solid foundation, and start implementing a second database store prototype, keeping it running along the way without too much of a problem, and implementing each store feature step by step. The TwinStore layer could then be made in a way to tolerate different capabilities. Just my 2cent.
--
MichaelDaum - 17 Dec 2007
yeah, I started with a twin store, but ended up finding all it did was create a façade, like the one in the Plugins handling code (that Crawfords working to remove).
The code changes to make TWiki::Store do multiple stores directly is considerably less code, and in essence removes that extra layer of performance hit and developer confusion.
(Unlike the user code, where the different mappings occupy the same namespace, in the store, it's a single hash lookup, based on Web).
--
SvenDowideit - 17 Dec 2007
I made some long-run tests on
TWikiStandAlone, and one of my conclusions is that filesystem access is one big bottleneck nowadays: it's responsible for very large standard deviation on response times. This proposal and
DatabaseStore are very important to improve overall TWiki performance.
I only disagree with the simultaneous use of two store mechanisms, unless database would be used most and
RCS only for data-security.
--
GilmarSantosJr - 17 Dec 2007
Gilmar, right. The TwinStore layer, a pure facade, would be taken out when the
DatabaseStore is mature enough. An interim TwinStore layer would only be of use during development. It will impose its own performance burden that we don't want to pay in the long run. Any multi store setup will be as slow as its slowest backend. So there's definitely no performance argument in here. Only one to help developing additional storages. Ones mature you'd switch back to single store again.
The biggest advantage of a
DatabaseStore will be:
- scalability,
- its implicit caching and indexing facilities and
- advanced querying in TWikiApplications by either leveraging SQL to TWiki, or even XQuery+XUpdate when using a native XML database store.
--
MichaelDaum - 17 Dec 2007
The reson for an actual
MultiStore is more long term than that. Because of the way I use TWiki to integrate into many backend systems, I want to be able to connect other backends as though they were twiki webs. That way there could be a
BugzillaStore, an
SvnStore, a
LegacyManagemenSystemStore, and a
TWikiStore (DB or whatever).
'Any multi store setup will be as slow as its slowest backend.' is something I am grappling with - I'm not quite sure why it's necessary for TWiki operations to access all the Webs on all transactions - but I have seen that it does.
It is likely that you will be surprised how little code is changed to make
MultiStore a reality (basically a hash of Web->storeClassName).
The hard work is replacing the code that assumes that it can just look directly at the file system, and fixing the Unit tests that do the same.
--
SvenDowideit - 17 Dec 2007
It is likely that you will be surprised how little code is changed to make MultiStore? a reality (basically a hash of Web->storeClassName).
How did you handle backend errors that might lead to stores not being in sync anymore?
--
MichaelDaum - 18 Dec 2007
Aaaaahhh, no, This is not for having the same topics in more than one store at a time. This is to allow some webs to be in rcs form, some in the database, some in
XML etc.
--
SvenDowideit - 18 Dec 2007
If I got your point, Sven, I think this will be a little complex. The hardest problem is to map TWiki semantics (view, edit, save, forms, etc) into these other stores (like Bugzilla or
SVN). But it would be
great to have this possibility!!
--
GilmarSantosJr - 19 Dec 2007
Okay got it. Sorry, Sven, my fault that I misunderstood your proposal.
--
MichaelDaum - 19 Dec 2007
FYI
There are currently
three layers of store abstraction, two of which are firm and one of which is widely abused. These are:
-
TWiki::Meta objects (what I call the "TOM layer"). Meta objects should support all of the methods necessary to manipulate the contents and history of a topic in an abstract way. The current TWiki::Meta implementation depends on TWiki::Store. This is the interface that is widely abused, as calls that should go to the meta object go instead to:
-
TWiki::Store, which is the "traditional" facade for the TWiki store engine. It is used widely in other core modules. I have been working quietly but steadily to hide TWiki::Store behind TWiki::Meta for some time now, but because of the legacy APIs it's a long, slow process.
-
TWiki::Store::RcsFile is a relatively simple RCS-style API that is used by TWiki::Store. It is meant to be hidden entirely within TWiki::Store, but different implementations of it are possible e.g. TWiki::Store::RcsWrap, =TWiki::Store::RcsLite and TWiki::Store::Subversive (an inactive subversion layer).
With Sven's help, over time I have been:
- pushing
TWiki::Store into the role of a simple facade, by pushing implementation dependencies (such as searching) down into TWiki::Store::RcsFile, and pulling abstractions into TWiki::Meta.
- promoting
TWiki::Meta to the role of a TOM, by deprecating the use of $topic, $web etc in function parameters.
Some time ago I added the ability for different webs to have different store implementations. This was a five minute hack that usefully demonstrated that we are already at the point of a viable
TwinStore as described above. The hack couldn't work long term because there are still severe weaknesses in the abstraction (to do with moving, renaming and deleting topics and webs).
--
CrawfordCurrie - 21 Dec 2007
There is still no commitment date on this so I am not sure how to process it.
From a customer advocate point of view this is more a code refactoring than something affecting the end user - until someone actually implements something that uses this new refactoring. And I see no possible harm in this proposal.
So it seems to me that the key core developers simply need to agree on this one and announce when they have reached a consensus.
The best I can do now is to add Michael and Crawford to the concern field. Then simply remove yourself if you have no concern.
--
KennethLavrsen - 25 Dec 2007
god, if you're just missing a date
--
SvenDowideit - 26 Dec 2007
I thought it could be a signal that you were not done with the proposal description. If it is clear that people forgot I usually just add the date.
--
KennethLavrsen - 26 Dec 2007
Good move here. I suggest to KISS on the audit trail, that is, keep ACLs and other meta data in topics but cache meta data in a database for fast retrieval.
--
PeterThoeny - 02 Jan 2008
Peter, that's not the point here. Sven wants to configure
which store to use per web. And only one at a time.
TwinStore is something different, meaning to piggyback one store ontop of another, actually mirroring it, like
DBCacheContrib already does, but built into the core.
I have no concerns about MultiStore on that level.
--
MichaelDaum - 02 Jan 2008
One of the roadmap subjects was the scalability of TWiki and working on a spec that enables us to place information such as ACLs, form data, etc. in a standard indexed storage format.
The reason for this is obviously performance. As a TWiki grows we have to parse through more and more flat text files.
On the other hand the flat text file format including the meta data gives the audit trail which would be much harder to get in a database unless it is designed for it.
So we discussed in Rome that instead of arguing religiously about flatfiles versus database storage, the answer may be to do both. Having form fields and settings incl ACL in meta stored with topic for audit trail and having the data for the current version in an additional indexed storage. And also allowing the current version of the topic itself to be in a database if required.
So how does this relate to Sven's proposal? In principle
not at all. But it does
indirectly. Both touches the design of how data is stored.
I think it will be important that a change for multiple storage backend it specified and thought through in the context of the road map goal because no matter what - one influences the other.
I think we can all agree that we do not want to implement something now that makes it harder later to make the changes for the generic storage design.
I would really prefer if the core developers would have completed and agreed on the initial work on the principle design for the road map part of the storage concept before this proposal gets implemented.
I think that once we have agreed on the TWiki 5.0 storage concepts it will be easy to define the spec for this proposal, and maybe even get some synergy. I have added my name in the concern field in this context. I am not at all against Svens's proposal from a feature point of view.
--
KennethLavrsen - 02 Jan 2008
Thanks Kenneth for highlighting the indirect relationship of my comment. I am with you, I'd like to see a clearly defined spec of the storage backend in Codev topics before the actual implementation.
--
PeterThoeny - 03 Jan 2008
Yes, Kenneth, I agree. We know TWiki won't scale without a real database backend. But that really is not the scope of this proposal. Maybe this is just YAPWAMT (yet another proposal with a misleading topic name).
Refactoring the store
is a pending issue (...been following the development of
dbxml
for quite some time now).
As far as I understand Sven's proposal, his changes are quite trivial. Maybe best would be to see some code first before accepting or rejecting this proposal.
--
MichaelDaum - 03 Jan 2008
This proposal is indeed relatively trivial in scope - it is a refactoring to remove the remaining file assumptions from the non-store code, and to finish up the changes for having more than one store backend active at one time (important for moving webs from the distributed rcs files into a database)
It has little to do with the
DatabaseStore feature, except that it is a pre-requisite for that work.
--
SvenDowideit - 03 Jan 2008
It is this pre-requisite factor I am concerned about.
If this is a pre-requisite, then it is a pre-requisite to something we have not defined.
So why don't we start defining it at a very high level?
--
KennethLavrsen - 04 Jan 2008
We have. Thats what the
DatabaseStore topic exists for. I do not, however want to distract myself from the work needed to release 4.2 at this point.
--
SvenDowideit - 06 Jan 2008
Note that I didn't raise any concern over this, just provided a point of information, so I'm removing my name from the
ConcernRaisedBy field. --
CrawfordCurrie - 26 May 2008
I would still like to see this proposal specified a little further for ONE reason only.
The proposal itself is quite OK. But since this is the heart of the core code and since we all want to see more contributors, it is essential that major rewrites of code is planned just a little bit to
- Give others a chance to participate
- Give others a chance to understand what is happening before, during and after it is implemented.
Code refactorings that are not changing any specs or create any compatibility issues do not need a community decision. If this is a pure refactoring then go ahead. But please consider my input to add 10-20 lines of doccu at the top of this topic before anyone starts coding.
I have changed proposal to Accepted.
--
KennethLavrsen - 27 May 2008
What is the status of this?
--
RafaelAlvarez - 04 Aug 2008
I have to port the work to trunk, and last time i looked, trunk has unit test failures in the SEARCH code - but in the next week or so I'll be able to plan out some work in this area.
--
SvenDowideit - 05 Aug 2008
Bumping this feature proposal topic. This feature goes hand in hand with the
DatabaseStore.
--
PeterThoeny - 2010-08-01
I am parking this proposal, no action for a long time.
Anyone with interest in this topic, please re-propose feature.
--
PeterThoeny - 2012-10-18