Furthering my comments in
NonWebAccessToWiki, this space is for development of basic
XML format for storage.
Initial requirements would be
- Header
- Body
- attachments
- revision information for attachments
Furthermore, with a good spec, we might be able to create generic links to internal sections of a Web from another.
--
NicholasLee - 24 Apr 2000
I like that idea and was considering it myself. It would open the door to easily interface with other tools.
TWiki already uses a mix of
XML/HTML for
FileAttachments. This is for easy parsing when uploading files. To see the tags, use your browser to look at the source of a TWiki page that has a file attachment.
Question is also how to migrate existing data to a new
XML file format. One of:
- One time conversion: Provide script that converts all ".txt" and ".txt,v" files.
- Backward compatibility: The new TWiki engine understands both formats when reading topics (including older revisions) but saves only in the new XML format.
--
PeterThoeny - 24 Apr 2000
Seems worth while to convert everything over to the
XML format.
Once we get the format sorted out and actually implemented, it should be that hard to convert everything over.
There are a few issues to consider before we even go there though. Like have to actually do differencing with the
XML. Especially if we are including revision information in the
XML format.
This and the whole modulisation issue requires some thought as to the best way to abstracting the components of the system. Since the versioning information (in the
header) is required at the IO module level, and then file attachment, etc. information is required at the presentation (in the
body) layer.
--
NicholasLee - 24 Apr 2000
Starting
HowShouldTWikiBeModularized to get design underway. --
kk 02 May 00
Revision information does not necessarily need to be included in the
XML format.
RCS (or any other VCS) could be used to check-in
XML files and read revisions. For example if you compare r1.2 and r1.3 it would read the two
XML file revisions, render them and then diff them. External programs could interface like "get me revision r1.x of Main.WebHome in
XML" , or "replace last revision of Main.WebHome with this
XML text".
Other points to consider:
- What to show in the edit box? Keep for example
*bold* _italic_ , but store as XML tag? Probably not a good idea to expose XML to the user...
- What to do with non conforming XML? TWiki allows you to enter any type of HTML tags, even incorrect ones. Is some error checking necessary at preview time?
--
PeterThoeny - 03 May 2000
Unfortunately, I'm discovering that CVS might not be so good for us. Firstly, it's not able to do the version replacing stuff in the current code. Seconadly and most importantly it takes the author information from the login ID, rather than the command line as
RCS can do. This is of course because of the enviroment that cvs is design for. This could be gotten around by storing some revision information in the data file.
It seems worth storing everying in
XML, makes it easier to design other tools to interact with the data. eg. email, news tools.
--
NicholasLee - 04 May 2000
I believe that there is a versioning tool for
XML that knows about
XML structure. I've forgotten where I saw it though.
In general
XML with XSL (when it gets implemented in more browsers)
should provide the most flexibility. Indeed, XUL in mozilla might
provide the capability for some really neat downloadable tools.
--
JeffPutnam - 12 May 2000
I know I'm a fine one to talk, with all the silly tables an stuff I edit into
topics, but... before we go
too bonkers with the latest greatest browser
XML/SL/UL hoo hah I'd like to mention that I'm editing this topic in Emacs,
via Lynx, and if we come to rely on browser specific tagging (like
JavaScript)
I won't be able to do that anymore, and I'll be all sad & stuff...
Maybe we should get all of the hairy processing strictly on the server side,
and spit out plain old
HTML 4 to the browsers.
--
KevinKinnell - 12 May 2000
Actually I don't see
XML being that useful on the client side for a while. Basically for the reasons Kevin stated.
On the server side, having read the rcs spec I think I might convert it over to
XML. Saves me have to deal with writing a parse (which I don't know much about).
Once the storage files are in a nice standard
XML format with native perl access methods. It becomes cheap and easy to introduce nice features, like: email, news, and gopher access. Maybe even WAP!
Also makes it easy to store further information internal to the data files. ie. authorisation information, etc.
As for
XML that might be of use. The
XML diff program that Jeff mentioned is probably the IBM alphaworks one. Although it's in Java. I haven't had the time to look at that. The other thing of use might be one of the
XML directed database tools. Hopefully those would be good to save having to deal with things like caching, etc.
As I go along I'm investigating these options.
--
NicholasLee - 12 May 2000
I really like the XML idea if what we mean is "we come up a system of TWikiTags to mark up edits." We could then safely add some dynamic content control (we have some already with inline tags, but I'm thinking of control structures--a TWiki macro language. Not right away, of course.) With our own tagging system, we can allow complex and dynamic markup without (necessarily) giving away security. I don't particularly care if what we do is called "XML" or whatever, though I see the point of following the buzz-flood in order to take advantage of third-party tools. One thing to note: going this route means that we hide all standard HTML under TWikiTags--no more HTML <FONT> tags. Not that we can't define some TWikiTags to be completely identical to their HTML counterparts, of course, it would just mean we parse them as TWikiTags first.
I like writing parsers (but then, I'm a masochist.)
--Main.KevinKinnell - 13 May 2000
Want to create a grammer for
RCS, based on the information in
man 5 rcsfile ? Particular something that will work with one of the Perl Parse::* modules. (eg.
http://search.cpan.org/search?dist=Parse-RecDescent
.) Could at least use the information there to remove some of the rcs system calls in the code as well. ie. produce author and version regular expressions.
With regards to the comments about actually using
XML style markup information, I'm not considering that at the moment. The
XML simply serves as a mechanism for me to ignore parser requires. Someone else can work on making quick
XML::* parsers for us to use.
The benefit of this, is that it is trival to extend the DTD we come up with initially to include any extra information we might want. ie. style markup, and stylesheets. For now is standard well-defined native storage mechanism is what I'm hoping for. Of course I have to find the time to write it.
Some tasks that need to be complete before this happens:
- Produce a DTD for a versioning storage format. (convert man 5 rcsfile)
- Create Perl::Patch to go with Algorithm::Diff (www.plover.com/perl/diff)
- rcs -> new format conversion tools.
--
NicholasLee - 14 May 2000
Now you went and got me all interested in doing Pure Perl RCS, drat you! Hmmmmmm... It can't be that difficult (can it?) -- kk
Most of the grammer is in the man page, just don't have the experience to convert it to a usable form. Seriously parsing it for reads with regular expersions is trival.(*) The tricky bit is getting a diff/patch module going so we can commit to it as well. Of course if we had a rcs grammer for one of the Perl parses, things would be even easier. Probably could figure it out myself, but since you seem to be the parser (put his hands up) expert.....
(*) I was hoping to get a chance to have a look at it this week, but some more crap has fallen on my desk.
--
NicholasLee - 14 May 2000
I don't think that the
XML format should contain any revision data. Depending on what you actually want to use TWiki for, the different version management systems each have their advantages and disadvantages. Remember that there are a lot of these systems:
RCS, SCCS, CVS,
ClearCase, Perforce,
TeamWare, etc. The one thing all these systems have in common is that they can all process text files, as they were designed mainly to handle source code.
For this reason, it makes sense for TWiki to perform all its actions on a plain (XML) text file, and to
"out-source" all revision handling to an external system.
Personally I'm not a big fan of creating a TWiki-XML language for marking up the texts either. Using simple things like *'s to mark bold text, etc. is one of the main advantages of TWiki in my oppinion, as it's very easy to learn both for technical and non-technical people.
As all the more powerful version control system have their own merging and merging management systems, all handling for this should be done in a specific module for each revision control system, which gets interfaced through an abstraction layer.
--
AlainPenders - 22 May 2000
I would have to say I agree with the
"outsourcing" sentiment, but I'm not sure it's practical yet. I favor a system independent abstraction layer for storage, but that requires an almost complete rewrite/modularization of TWiki. It's in the works, but don't look for a release date yet --
PeterThoeny has the throttle and he seems to be maintaining enough speed to get there without going so fast it crashes & burns.
Then too,
NicholasLee looked into
TWikiWithCVS and determined it wasn't feasible at the present time. The
LoadTesting problems don't go away unless TWiki doesn't do system calls, so there may need to be a lot of duplication of function if that saves system time and resource overhead.
Something to remember is that this iteration of TWiki has been driven by a user base that installs on intranets behind firewalls, and that means it needs to be optimized for smaller/slower servers. My guess would be that it gets installed as a ``test'' platform...
Boss : What's that?
Tech : Oh, that's just something we're testing.
There
are bigger installations (this is one, f'rinstance.)
As far as
XMLizing the Twiki markup system, that can be done transparently--most users would never need to know any other capabilities than
_, *, and
__.
--
KevinKinnell - 22 May 2000
I think you've missed the comments I made. I'm intending to create a native perl access to the file storage mechanism. Constantly system pipes to external version control mechanism is not scalable at all. Any given read currently involves at least 3-4 system pipes.
Of course if there is a free version control system that has API access to it functions I can just write a perl wrapper around that. Otherwise I'm just going to convert the rcs spec into
XML and use that.
Why I use
XML is:
- I've can't be bothered writing and maintaining a parser
- Makes the application logic layer easy to port.
The big thing driving this is the constant demand for CVS style locking. Using CVS is never going to work in a high load enviroment, plus I'm not use I can meld TWiki's current features onto CVS's offering.
Consider the direction that the use of
XML at the moment going is purely for META-information and nothing to do with markup.
--
NicholasLee - 22 May 2000
Nope, I caught all that. I just didn't reiterate it very well. ;-) But you know my take on the "locking problem": it's perceptual, not functional...
Groucho: They tried to use the sanity clause in my contract to fire me.
Zeppo: They did?
Groucho: Yeah, but they can't fool ME, 'cause there AIN'T no Sanity Clause!
-- kk
Still things like
WikiClusters are going to need some fancy backend locking file storage schemes.
--
NicholasLee - 23 May 2000