There have been many many topics and threads going like:
"TWiki uses RCS as backend, so it's easy to switch to CVS, Subversion, this, that..."
But it never materialised.
Why?
The Hypothesis
It's the way how TWiki abuses
RCS locking!
All source code control systems (I'm aware of) have the concept
of a
checked out copy.
I.e. you
- have a repository
- checkout into your workspace
- start working: now your workspace copy is out of sync
- commit back to the repository: your workspace is in sync again
Now look at TWiki, where you
- have a single workspace, the data dir
- plus a repository, which always mirrors the latest workspace copy, i.e
- start working, then each and every save goes directly into the repostitory!
Side note on how to force the revision
Now that I look into it, I realise:
it doesn't (cannot?) use ci -f[rev]
(force overwriting [rev]), but instead runs:
-
rlog
-
rcs -u
-
rcs -o[rev]
-
rcs -l
-
ci -l
5 external calls needed instead of 1 (or 2?) per save - no wonder it's so slow
|
Although I admit, that the sequence of
ci -l and
rcs -u,
rcs -o1.x commands
is obscure to me, the upshot is simple:
During the lock window,
TWiki tricks
RCS into overwriting existing revisions
with each fresh save copy from the workspace.
The advantage of the current scheme is clear:
the code always can assume, there is revision 1.1, 1.2 .. 1.n in the repository.
No out-of-sync copy ever in the workspace,
no specical case to program for.
But the price is high...
Even
RCS normally isn't used this way;
it just can be forced into doing this.
CVS et al can't w/o heavy work-arounds.
This defeats any attempt to use SW configuration tools as backend for
DistributedTWikis,
i.e. delegate synchronisation and conflict delegation to those.
Likewise,
external channels like the
MailToTWikiAddOn are very difficult to get right.
Discuss Solution(s)
Change TWiki to have the notion of an
in work version.
If the lock timer expires or is broken,
commit the working file and increase the revision number.
This shouldn't be rocket science;
"just" a lot of work,
because the assumption of version 1..n being in the repo
is spread all over the place...
But without this, it is very very hard to use any loosely coupled backend,
in other words, any
interesting backend.
Rough Sketch
-
view just visits the check-out copy, which it does already now, for performance reasons
-
edit starts right away in the workspace, if there is no old lock
-
edit first checks in, if the lock is too old or from someone else
-
1 but checks what?
- checks-in the last saved text from the workspace to the repository -- PK
-
notify checks in, if a lock expired
-
rdiff has to show the "in work" revision; hmmm - maybe edit should increment the revision in the meta data in advance?
--
PeterKlausner - 14 Aug 2003
Reconcile Alice and Bob working parallel

1: the problem is that the edit copy is in the memory of the browser, the server has no way to access it
A practical thing could be:
- document is at version 1.4
- Alice edits, lock is set, and goes to lunch.
- Bob edits, and since lock is older than 1 hour, it is overwritten silently
- Bob saves, version becomes 1.5
- Alice saves. now, what sould be done and is not done yet, is to, on saving, check the metadata of Alice buffer, which says:
%META:TOPICINFO{... version="1.4"}%
We see that the metadata version is out of synch, so we:
- save Alice version as 1.6
- do a rcsmerge to incorporate Alice & Bob versions, making it 1.7, a merge of 1.5 and 1.6 from 1.4
- if all went well (rcsmerge gave no warnings of overlapping conflicts), just log this fact.
- if there were conflicts,
- Prepend to the topic a specific string warning of conflicts, with details: Bob saves 1.5 from 1.4 on xxx, Alice saved 1.6 from 1.4 on yyy... (so that we can %SEARCH for them and have a topic listing them)
- Mail Alice and Bob (and other people involved if Alice waited, say 1 day, before save and override 3 people work) the situation and that the conflicts should be edited by hand. The mail should be easy for non-technicians, hinting them to ask assistance to the local Wikimaster (which should be in CC, too), with direct links to the various revisions.
So, I think this should be easy to implement, (not a lot of places to modify TWiki code), use existing standard
RCS tools and practice, and cover all cases...
Variant: When saving Alice version, in case of conflicts, save the merged version as 1.6, and alice as 1.7.
This could be useful as thus the viewable version will always be coherent (the conclict marks <<<<< >>>>>>
could break havoc on layout of
HTML or complex wiki pages.
--
ColasNahaboo - 15 Aug 2003
Decouple saves to disk and repository
Ok, I thought of the browser edit buffer to be clearly off-limit,
and wanted to leave this problem to the lock warnings.
But the Alice-Bob scenario
does happen
and it's very annoying, when it happens.
Interestingly, the issues are similar with parallel TWiki servers.
And there are descriptions lurking in some CVS topics,
which mirror Colas' approach.
Just never realised, that non-distributed TWikis profit as well
I think all is really about de-coupling the
save-to-disk from the save-to-repository.
The save-to-repository (aka commit) should happen,
whenever someone starts doing something
and discovers an "old" disk copy.
--
PeterKlausner - 15 Aug 2003
And attachements?
Yes. Note that the same problem exist for
attachements (people overwriting collegue's changes). But we cannot solve it the "Alice&Bob way" because we do not have metadata in the attachement. It is a problem, since people using TWiki to handle cooperative working on attached files are often not developers (developers will just use a proper source management system already in place like CVS), and do not really understand the issues.
Maybe in this case, have a way to manually "lock" attachements from TWiki?
BTW, I do not think that TWiki abuses
RCS so much, it basically does what it can since it does everything from the same UID, the web server UID...
On parallel TWiki servers, why just leverage the use of standard files by TWiki and share the files via
a shared file system, be it NFS, AFS, DFS... ?
--
ColasNahaboo - 15 Aug 2003
Couple comments:
On using NFS/etc. for parallel servers: This only works if your only reason for splitting the services is concern over CPU usage. While this may cover most of the scalability cases for TWiki installations, other concerns are also possible. Two examples:
- Bandwidth usage: NFS can't help here, because generally if you have a lot of extra bandwidth between two servers, they're close enough network-topography-wise that they end up sharing the (bottlenecked) bandwidth to the rest of the world. And you HAVE to have spare bandwidth to run the NFS over, or you're just compounding the problem.
- Mirroring: it might be useful to keep two TWiki installations mirrored, but most networked FS's I've seen are strictly client-server: the "real" data is on the server, and is sent to and from the client as needed. This means that for every request to the mirror, the primary server will need to retrieve the information; the whole point of mirroring data is generally to avoid this.
On TWiki's approach to versioning: I think that what needs to happen is that, to use other VCS's as TWiki backends, we need to detach TWiki's notion of a file's revision from that of the VCS. Then TWiki can do all of its tricks with maintaining revision numbers during an edit, etc. without needing to have support for such overrides in the VCS. This would mean that you'd need to keep some kind of correspondence that would let TWiki retrieve a specific version.
Here's an example: Say you decide you want
TWikiWithSubversion. Subversion, however, doesn't have any concept of an individual file's revision
at all. However, it does allow you to tag a file at any point in its history with arbitrary key-value pairs. The TWiki backend could use this to store an analogue to
RCS revisions, and retrieve different versions (for diffs, etc) by referring to the value of that property.
RCS, of course, would be fine as it is with such a scheme, and so would the existing TWiki::Store API...which was my idea in the first place.
This should also be compatible with the proposals above for adding some kind of conflict handling to TWiki. (something that probably does need to happen eventually if it hasn't already)
--
WalterMundt - 15 Aug 2003
- On file systems: I was thinking of real distributed FS like AFS. The W3C has 3 places (US, France, Japan) and use AFS for they work directories and it works well: basically, each site as a full local copy, so reads are fast, and only writes are propagated, much more efficiently than the dumb NFS protocol.
- On subversion: what is the gain? why on earth would you fix something that is not broken? RCS is a perfectly good system for TWiki needs (single user, single file). CVS was even just a collection of shell scripts above RCS. Having TWiki support multiple backends will only introdude maintenance hell...
--
ColasNahaboo - 16 Aug 2003
1 write plus n read-only webs
Having setup and run a
DistributedTWiki in the past quite successfully, and actually satisfied people using it I think it's worth sharing again the following:
- Each TWiki Server is logically a shared text editor & file clerk who takes submissions from clients.
- These clients currently get told (due to locking) that things are in use, if they try to edit on the same server.
- Each TWiki Server can be assumed to be on the end of a wet piece of string that can break at any time (eg the US's recent power blackout, fibre cuts on transatlantic links, fires etc)
- Each TWiki Web will generally have an owner, and often will be most commonly used by one group or another.
To take into account these ideas we did the following:
- Editting was only allowed on the owner location. Templates were changed to enforce this, with the owner location defined in WebPreferences, making it possible to shift ownership should it be necessary.
- Other sites used rsync to pull the data, templates, pub directories (and any associated cron jobs & cgi's) periodically.
This worked remarkably well, and remarkably simply - people stopped thinking of TWiki as a server and more of a
TWikiNet and using it as a very simple to use CDN. (After all you could guarantee that within an hour people would have a local quick to access copy of whatever you were sending)
Our next step (which never got implemented for various reasons) would have been to do this:
- Allow local editting
- Stop using rsync for direct synchronisation
- Have a central CVS repository on a central server.
- Allow the TWiki's to rsync their webs to a "Personal" workspace on the central server, and perform a standard CVS update/checkin.
- This would result in clashes - almost by definition
- The TWiki server would be notified of this, which would then notify the original editor of the problem and ask them to resolve the problem.
ie Make the local TWiki's automated checkin/checkout/patch/change-control sergants who bug people to resolve clashes. Rsyncing to "personal" workspaces on the central server was designed to speed up the checkin/out process, and hence allow frequent checkin/out of changed content.
I did some experimental manual tests of this, and the results looked promising to say the least. Logically it works pretty well, and the
TWikiUnixInstaller is designed to make TWiki's installed capable of acting in this way...
--
MichaelSparks - 16 Aug 2003
n read/write webs
Shouldn't the attachment's %META information in the text
at least identify conflicts?
In which case you can inform the uploader of the problem.
("You just overloaded revision 1.123 of user ...").
Then:
while AFS (or DCE DFS or CODA) are technically superior,
they are about to die.
And at any rate, could they really allow conflict resolution?
So my hope was to offload the remote synchronisation stuff to a VCS.
Michal's approach serves distributed sites very well,
where you can define clear responsibilities
for local
servers.
But what I'm really after are
ReadWriteOfflineWikis.
The delayed save-to-repository would do the replication.
The VCS handles intelligent merges,
which are necessary after concurrent edits,
line outages, or intentional disconnects,
like with laptops.
Sort of a poor man's Groove...
--
PeterKlausner - 18 Aug 2003