I'm also going through existing comments to see if anyone else has tried to come up with a good description of the current logical breakdown of a twiki page in this kind of way. Any pointers from others would be appreciated - MS
Structure of a TWiki
This page aims to document how TWiki pages are
currently used from a structural perspective, tries to look at the positives of how this structure is used and stored, and look at the key "pain points" it currently causes. The aim is not to redesign from scratch and throw away, but to move the current advantages forward. I would expect the step after that to be TWiki Webs as well, but one thing at a time.
TWiki Pages
By default a TWiki page can be viewed as structureless - a featureless void, where only text exists. Then came formatting, then came automatic linking [1], followed by whole rafts of structure in the form of multiple webs, category tables which evolved into
TWikiForms, tables of contents were added, even the ability to include content not just from local content, but from remote sites.
- 1 NOTE
- I'm making history up as I go here ...
Then people started doing more wacky things and discovered the need for more control over includes, skins, hierarchical (logical, physical, nested, non-nested, green, even purple Wikis), finer control over existing access control, finer grained addressing, etc.
Moving to a slightly more serious mode, it's struck me that the way I think about TWiki pages might not be the same as you. Also the same probably goes for others. I think that in order to keep moving forward discussing this aspect, looking at key pain points (IMO speed & flexibility) is worth doing. I want to produce extras to address the speed issue first, without sacrificing the current flexibility.
The following is how I view a TWiki structurally, both currently and where I would like my install to go. In case it's of any use I'm sharing this here. Opinions will differ, I
hope 
.
Some TWiki Document Abstraction Levels
TWiki Pages have various levels of abstraction. I illustrate a few types here for discussion, rather than a strict grammar - Twiki doesn't follow a strict grammar. This isn't exhaustive obviously.
(I would prefer
BNF here, but for discussion purposes I'll use illustrations)
Page Abstraction 1
- Page
- freeform text with markup & autolinking
Page Abstraction 2
- Page
- freeform text with markup & autolinking
- freeform text
- Access Control
- TWikiForms
Page Abstraction 3
- Page
- freeform text with markup & autolinking
- text
- Access Control
- TOC
- Sections
- H1 (named start) sections
- H2 (named start) sections
- H3 (named start) sections ...
- Anchors
- TWikiForms
- Collections of keys & values.
- Each stored as META fields
Page Abstraction 4
- Page
- Text
- freeform text
- TOC
- Sections
- H1 (named start) sections
- H2 (named start) sections
- H3 (named start) sections ...
- Anchors
- Auto anchored fine grained addressing
- Auto section detection.
- Meta Data Storage
- Page Meta Data
- TWikiForms
- Collections of keys & values.
- Each stored as META fields
Current Uses of a TWiki Page
Does this cover what people have been saying they want, and use? Nope. What
do we have?
- Data - which contains other chunks of data. Some of these chunks overlap. (Named sections and auto header sections) Some of the data is in fact metadata. TWikiVariables & access control being the most obvious examples.
- TWikiVariables describe:
- Pieces of information able to included on any page. (Fine grained addressing of some pieces of information)
- Meta data about a web.
- Layout and presentation of the page - PreferencesAsStyleSheets
- Templates should be editable at the browser level : WebBasedTemplateMaintenance . If this happens, topics are "suddenly" templates as well
- INCLUDEs from other sites. At the most basic level you're essentially using WebServices, on a high level you could have a site that reviews other sites, and pulls the front page of the site in locally, in a spirit similar to the comments in AnnoteaProject
- Plugins essentially allow the en-masse replacement of TWiki's backend storage using databases, or generated systems, or even alternate languages. This means a large amount of data is actually now stored for some people in VirtualTopics
- INCLUDES/Plugins to include at a low level for things like SharedAdminTopics for web preferences. (Again this is a form of VirtualTopics)
- META data - The meta here is a misnomer... TWiki META * data is:
- TWikiForms - which often contain structured data
- In some TWiki Web's considered part of the information on the page
- In some TWiki Web's considered to be data about the information on the page.
- Structural information...
- Contains pointers to attachments. Attachments are more data directly considered "part" of the page by measure of the idea of being "attached". However this structual information contains meta data regarding it:
- Original source filename on user's original system (security issues...)
- Version number relating to the file
- Date of upload
- File size
- Uploading user
- A comment. (can be considered data)
- Contains a pointer to a parent topic.
- A "TopicInfo" tag:
- Contains last editor
- Contains time of last edit
- A "format" version indicator.
- A version indicator (the version number of the current document)
- Additional information that stores information about the topic but is not rendered directly
I use the term META rather deliberately to indicate TWiki's %META tag. meta data - how these forms varies - and can be used as part of the data of the page. Other metadata is truly metadata.
Advantages and Disadvantages of the Current Page Model
None of these are absolutes of course.
What's good about this being all in one file?
- Everything is kept under version control meaning audit trails and reversion of all aspects is "simple" to do. (Simpler than if things weren't kept together)
- Copy the topic & you've copied all it's meta data. (Except it's history of course...)
- Copy the ,v file and you have the topic, and it's metadata throughout it's entire existance.
- If you want access to earlier versions of a page, this is trivial to do.
- System level tools to find information work. (
grep information |cut -d: -f1|uniq )
What's bad?
- People access data at a finer granularity than just the whole topic - TWikiVariables and Meta Data spring to mind.
- Access control and page rendering specifically require access to information at this fine grained level in order to check permissions, replacements, and even page layout
- In order to render a page, large numbers (>4) of files can be caused to be read simply to render a page anyone can read, using the default skin, with no custom TWiki Variables. (A common case I suspect)
- A Business may need to roll back the topic but not roll back the access control.
- Integration of TopicText and metadata can cause problems here - suppose a user changes name, and they're the only user allowed to edit/view the data for an existing page. If the content is rolled back, the authorisation may refer to the old name, locking people out. (Except for the admin group of course)
- A Business may need to be able to roll-back the topic body seperately from the attachements.
Why are the bad things bad?
- You can access the data and use it in this way - what's wrong with that?
- It is directly relating to the page - keeps everything together and user editable in a "simple" manner (arguable I believe that to be true). What's wrong with that ?
- Allows access control to be added, and controlled by users, and allows creation of random groups of authorisation without a central admin. What's wrong with that ?
- In the general case since things are only defined once (the point of normalisation in databases) what's wrong with that ?
- The problem is speed. Pure and simple. There's no indexes into this "normalised" information. There's no caching of information.
- Some things that are stored as metadata aren't metadata, they're structured data. TWikiForms springs to mind here. The reason for this is metadata is TWiki's way of storing structured data.
- Some things that are meta-data are stored as data.
- Structured data (meta data or data) is stored mixed in with the data precluding interesting techniques for working with. TWiki's search is very useful, database joins of twiki data & pages would be very nice.
- What bad about the things that are listed as bad is that the body, metadata, preferences and access control are logically orthogonal - they should be able to be accessed independantly from each other.
- This argument is equivalent to stating that every table in every database must be in Boyce-Codd normal form (or even 4th/5th), which would be naive. The common case when reading a topic - the most common action - is that all this data relating to a topic is required. Keeping access to all data in a file requiring 1 access is optimal. The problem comes when reading multiple files is required.
- This point has no information regarding how data is currently used, and WHAT data is currently used over and above that which is detailed above.
Next Steps
These are various ideas as to where to go next, some will be more concrete than others. These comments don't relate to HOW TWiki is currently being used. This will be factored out to a separate topic as this page grows. (They are forces on a potential later implementation)
Meta Data - produce cache/index into topics
How to resolve? (Initial requirements for resolution, not all required)
- We're talking about optimisation. First rule of optimisation is to only optimise where it must be done, where it will have the biggest impact, cause the least disruption, and NOT to break anything. This is hard .
- Keep the current disk format as is, the advantages of history, auditting, simplicity of backup, and copying of data are too good to lose.
- Move all structured data into TWikiForms. This includes web preferences. This would require the ability to add extra metadata fields for the user.
- Keep the current interface. TWiki already performs conversions before storage to disk, this would be no different.
- When systems are writing to disk, a metadata cache or better index would be updated with key, value semantics. How to deal with repeated fields needs addressing. Normally with databases you'd do something nice like treat the objects as a weak entity, but in this case the repeating fields will be determined by unruly users.
Possible techniques for improvement:
- Flatfile/ondisk hash - kept in sync with topic, but not required for functionality - purpose is to act as an index/cache.
- Impacts on usage, hierarchy of data? Nested data? Order? Version history of metadata? Current, or not ?
Provide profiles for Operating functionality
Example approaches:
- CPAN:CGI::KWiki
mentioned elsewhere PeterMaiser by uses a config file to determine which modules get loaded so as to give a (radically) different operating profile.
- Again this is an optimisation - if more of the core was moved out as plugins, activating & deactivating modules would be much simpler.
- This is similar to TWiki using WebPreferences activating and deactivating Plugins, except clearly more efficient because it is done via object mechanisms, but at the expense of user level editing of the configuration in topics.
- Will this need a per-site config file beyond what we have now?
- Depends on what is meant by "per site" - Per "web"? or Per domain? Per machine? The TWiki Installer provides for localisation since it was designed to be used in a DistributedTWiki setup. (geographic, and hence machine)
Example uses:
- Minimalist wiki sites like the Portland Pattern Repository
. These sites view themselves as having no user need for attachments, forms, access control. They are really just a hyperlinked scratchpad.
Organisational Impacts on Current Usage
- There may be a need for storage of webs, or webs attachments on read only removable media, either in "published" or non-published form. Uses cases:
- Archival & backup
- Data on the move - such as laptops, PDAs, credit card CDs. (Photo albums to give to friends/family, presentations) Notes:
- Solutions for published versions exist already using addons
- Issues: Anything performing storage (Caching, indexing, editting)
- Requirement for storage of webs, or webs attachments on remote media or remote servers, determined by fully qualified URI: URN or URL. Use cases:
- DistributedTWiki style (synchronisation)
- Radio Userland style (remote procedure call) Notes:
- Interwiki implements URN based page addressing.
- Issues regarding access to META data, and non-page objects. (TWiki's raw=debug mode, along with viewfile & changes listing nearly resolves a number of issues, along with viewfile)
Notes
Comments from below factored into the above from have been removed. Any ideas left aren't necessarily bad, just means I've not thought of a place for them. For original comments see
diffs
. Anything dropped out, please put back if you think things have been missed.
A request... Discussion that moves away from the initial aim of documenting the
current use cases of Twiki
pages or structural aspects of Twiki pages not webs might be best linked
from this page rather than
on this page. Perhaps including a summary of the page?
Can this page please only include
information. Without links or documentation to what's being tried or has been done, the following is just "noise" (no
content):
- AntonAylward is experimenting with an approach for a radically different code-base where (almost) everything is done by using objects that override base classes.
- ... This is the basis for the version AntonAylward is currently experimenting with. "Everything is is a plugin".
- "Business Needs" - are user requirements. This page should document existing in-use-cases first.
- But like "The Panda's Thumb" it is a design that is limiting.
They're worthwhile activities, but they don't add to
this discussion of how
pages in TWiki are being used, and therefore viewed. Discussion of the codebase is premature for
this page. This page is about
WHAT is wanted, and
WHY. Not
how . (That's actually relatively simple in comparison) Sources of where the data causes problems match this statement. As noted above, references to topics
concretely discussing either design or implementation are great, but this is intended as a requirements page.
--
MichaelSparks - 19 Jun 2003
Related Topics on Usage
Discussion
Thoughts?
Factored out
AntonAylward 's comment re optimsation being hard:
Inappropriate optimisation IMO. Problems:
- Changes the logic regarding how preferences are used.
- Appears to change the logic of final prefs.
- Storage mechanism appears to fail to deal with the fact that the order of variables is significant . (As noted in docs, WebPreferences and the code)
- The hash storage mechanism is inherently flat, in practice prefs are at minimum 4 layers deep in many installations. (Site, Web, Personal, Topic, Final)
Source of problem is it doesn't deal with how prefs are
currently used . Nice start however. Statistics about improvements in speed would be interesting.
--
MichaelSparks - 20 Jun 2003
Contributors:
--
MichaelSparks - 18 Jun 2003
--
AntonAylward - 19 Jun 2003
--
ThomasWeigert - 20 Jun 2003