How useful would a
NativeVersionFormat be, rather than
RCS. The advantage being the complete removal of a
RCS dependancy.
This will probably be a good thing from the point of view of
TWikiOnWindows.
--
NicholasLee - 20 Mar 2001
Good to have something in Perl. But I would think it best to go with an existing format. Given
RCS format already in use why not stick with it - will make a future upgrade a lot easier.
--
JohnTalintyre - 21 Mar 2001
I guess the issue is: I will need to replace the rcs binaries with some native perl read/write parser, in order to do TWiki::Store::DBI and for complete removal of system pipes.
We don't use a lot of the
RCS format's capabilities (*) but I'd still have to deal with these in the parser. It makes sense to consider a basic replacement.
(*) branchs, tags, etc.
--
NicholasLee - 21 Mar 2001
Consider that you may want to increase the functionality rather than decrease it. For instance tagging using CVS would provide the ability to go back to a previous version of the whole site.
Revision control systems such as
RCS and CVS do work on most platforms - I'd say that programming resources would be best spent on other tasks.
--
MartinCleaver - 21 Mar 2001
Certainly thats true, although getting tags to work with TWiki might be tricky. Anyway once I've got my framework in place, getting my old CVS modifications to work again with the new experimental TWiki shouldn't be more than a weekends work. Of course getting to that point is still far from completion. 8)
--
NicholasLee - 21 Mar 2001
On the original topic-> No, no, no!!!. I have to agree with
MartinCleaver, there is no point in re-inventing the wheel yet again. With pretty good tools like CVS and
RCS around, what is the point of using precious programmer time in re-creating this functionality?. Specially when all the source code for those tools is freely available!!.
If there is some additional functionality that we might want to see in TWiki, it's probably there already, if not maybe we should contribute or suggest it to the teams that are already working in those tools.
If we are too impatient to see that happen, we can modify the source of those tools to fit our purpose, and distribute the source patches with the TWiki, and that would be much faster than a complete re-write.
--
EdgarBrown - 21 Mar 2001
I basically agree that we should not reinvent the wheel. Nevertheless, we can improve the performance where it makes sense. TWiki's most time critical
RCS access is on topic view and search, there it makes sense to get rid of the
RCS system calls. Other
RCS calls like saving a topic are not time critical.
The function that is most time critical is
TWiki::Store::getRevisionInfo(). I suggest that TWiki gets it's own little parser to parse the
RCS header for top revision number with author / date, or author / date of a specific version. The
RCS header has a very simple format to parse. We can introduce a new switch, i.e.
$doFastRevisionAccess to enable the TWiki internal parsing, e.g. bypassing the external
RCS call.
--
PeterThoeny - 21 Mar 2001
Hmm. See
UsingPerlVCSModules
--
MartinCleaver - 22 Mar 2001
I don't see why getting the revision info would be so time critical, as compared to a full search for example, but then I haven't timed the execution time, however:
- (System calls are expensive. The grep search is fast, it's only one system call. Building the search result list is slowed down considerably because there is one system call to RCS for each entry. Parsing the RCS header in TWiki would eliminate those system calls. -- PeterThoeny - 22 Mar 2001)
- [ EdgarBrown - 23 Mar 2001]
Now I see your point, but you can see how the following caching procedure would work to aleviate this:
- Initially run either a utility (or periodically a cron job) to make all pages conform to this new format.
- On topic save, store the RCS header info, in the page header itself under an html comment. Alternatively store all the information in an index file somewhere else.
- On topic view get the information from the header, if it's not there run the RCS call and store the information there (just in case something goes wrong somewhere).
- On topic search just get the information from the header, if it is not there trigger either a log entry saying so, or background a process that will re-index the pages (I know, I'm paranoid on error checking), and default to the current way of doing it.
In that case I would suggest a completely different route for this, a lot of the information that is being dinamically generated in the twiki is relatively static, revision information is an example of this. It only changes on the comparatively infrequent topic saves but is required by the far more frequent topic views.
So why not put this information in hashed cache files for the web, or for the pages themselves? (it could even be stored in the same page under an
HTML comment). It can be updated on topic saves, and depending on the implementation, re-generated via a cron job, so that any corruption would be averted.
I am toying with this idea for searches, as the current implementation shows a noticeable delay even for 50 pages in a web. Besides, indexing would make it easy to add a lot of functionality to the search engine without incurring on a performance hit [but that is another topic:
SearchSuggestion].
--
EdgarBrown - 22 Mar 2001
Have a look at
PageCaching. Its on my todo list after I've sorted the Storage mechanism out. I recall we did some testing with a native header parsing, check out
LoadTesting. I figure I can get even better than that if I can sort out parsing the deltatree as well.
The problem with indexs at the moment is that the code rebuildings the Topic list each time is either and `ls` or via readdir. Creating some index will probably help this. Of course there is the issue that you'd probably need to use something like DBM to make it worthwhile.
--
NicholasLee - 22 Mar 2001
Yes, it would have to be some database engine of sorts I was actually looking into some [non-perl] alternatives to this, the main issues being the increase in the required code-base, and the ability to generate delta-indexes when only one page changes. For this particular case, it seems that the delta indexes are easy to generate.
--
EdgarBrown - 23 Mar 2001
If you want to put a proposal together I'd be happy to work with on you it.

I'd suggest though that anything consider is pluggable. Peter prefers to keep the core requirements simple. Any change like this would have to be pluggable.
Having a look at the *DBM_File might be a idea, as these are pretty standard on most *nix.
Furthermore I suggest any index be additional, rather than a large change to the default rcs format. Have a look at FOM, which has quite a nice page caching system.
--
NicholasLee - 23 Mar 2001
First I must confess that I had no Perl (or DBM) experience before running into Twiki, with that caveat out of the way:
The proposal
for this particular speed issue I outlined above (is out of sequence, as it was a comment on an out of sequence comment), but I would surely vote for the use of DBM under perl for managing these indexes.
I do agree that the indexing should be completely orthogonal to
RCS, and serve just as a complement to it, not a substitution of the
RCS files. I read a bit on DBM under Perl, and it really looks as the right tool without placing a big load on the code-base, the alternatives that I was looking at, would require additional C-code and the like.
On being a plug-in, I think that that is not a very good idea, as it would actually complicate the code, as for every file view/search/edit there would have to be conditionals that would decide which functions to use, hoever, since all file acceses seem to be channeled through a single file, I do see the possibility of having both styles as an option at instalation time (mantaining both in parallel, is a bit of an issue though).
As a starting point for dbm file access, I would suggest the following:
- a dbm index database per web, indexed by twiki topic, containing the revision info (hey!!! that's too simple ;^)
- of course each entry could be an array containing:
- revision info (user, date, etc)
- references from other twiki-topics (to make the refered-by function faster
- search summaries, so that the search function won't have to access anything else!!! (I still think that this function needs to be re-tooled, using a more complete database to add functionality, and now that I got introduced to DBM...)
Oh, yeah, I forgot, this was supposed to make the life of window$ implementations easier, not harder....
--
EdgarBrown - 23 Mar 2001
If you make a switch to DBM, please consider how an external search engine can index / search the content.
....
Moved most of RandyKramer's comment to SearchSuggestion [EB]
--
RandyKramer - 24 Mar 2001
The idea here is not to change the current text files, just to add indexes for the seldom modified, used a lot, kind of information in a faster format, more suitable for common twiki activities.
Of course, I understand that
NicholasLee is toying with this same idea for
PageCaching, but even in that case the current text file format of the twiki is preserved, just some more files added for speed efficiency (and to allow of-line browsing).
I do see that whatever is implemented in this case, should be integrated with the twiki's
search facilities (or at least planned alongside them).
--
EdgarBrown - 24 Mar 2001
Some other related topics:
PackageTWikiStore,
CanonicalTopicStoredForm,
TWikiXML