The Problem
Discussions in
HierarchicallyNestedTwikiWebsNaming
and
RenameMainWebToHome
have touched on the issue
Never break existing URLs!
I generalize this issue:
if there is at time T0 a valid reference R to an object O,
and time TF, TF>T0, that reference should always be
"good for something".
Where "good for something" includes
- pointing to the latest version of the object
- telling you that the object was deleted, and no longer exists
- pointed you to the renamed object O', and/or taking you duirectly to the renamed object O' but warning you that the old reference is deprecated
- telling you that the old object O has been "split" into different objects O1 and O2, and pointing you to them and/or taking you to them
- telling you that the reference is valid, but that you no longer have permission to access it.
Unfortunately SecurityMayWantNonpermittedLinksToAppearInvalid.
- telling you that some other object Odifferent has assumed the name O; you probably want Ooriginal
- etc.
But not giving you a dangling reference that is indistinguishable from
the error for an object that never existed. And ideally taking you to
the object, or its most reasonable descendant.
As you may note above, renaming is not the only example of the problem.
This issue applies to many aspects of computer systems, including
- wikis
- version control systems (like CVS)
- file systems
etc.
Internal Solution
Systems like TWiki provide a modicum of support for object renaming,
by renaming all references inside the database.
If done correctly and completely, this is good.
(Unfortunately, TWiki has minor bugs renaming some free links.
Also, TWiki's renaming of the
RCS files means that,
if you go back in history to an earlier version of the database,
the links may be program. (I.e. the history isn't renamed
- and, in general, you do not want the history to be renamed.
In some cases you want history to be renamed; others not.))
But it doesn't help external references inside the TWiki database.
Unique Object IDs over all Time
The classical solution, used by tools based on databases,
is typically to provide an object "name" which is
something like a unique number that never gets reused,
that is always incremented.
Such an object ID is purely a handle,
and is not at all related to the object's position in
the logical structure.
I.e. it is a name that has no semantic content.
Trouble is, people just plain don't like typing such object IDs.
You can't tell anything about the object by looking at the name.
Also, a fairly common activity is to
- Have logical object L with object id #1
- Delete the logical object L => object id #1 gets marked deleted
- Create a new object LL that is supposed to do everything that the original object L did. But, because it is new, it gets a new object id #2.
I.e. accidental deletions happen, and do not always get fixed by
reviving the old object.
There has to be some mechanism mapping #2 to #1.
The Solution
Here is the supposedly well known solution.
Let the object reference - the URL, if we are talking about wikis
- consist of two parts:
- The logical object name
- The object number
There may also be an object version number, but we'll forget about that.
The logical object name may well correspond to a physical filesystem pathname.
Or, it may correspond to an entry in a table.
Given a full object reference of (name,number), the system will
- if the same logical name and number still exist, provide the latest version
- if the name no longer exists, but the number does under another name, point you to the renamed object (optionally warning)
- the system may be keeping a history of all names, so it may well expect to have a matching (name,number) entry in its history. But not all systems bother to keep a complete history.
- if the name exists, but has a different object number, and the object number still exists, warn you that a renaming and replacement has been done
- there should never be a case where the number does not exist. The rule is that the number will never be reused.
Given a partial object reference of (name,number=unknown)
- if the logical name still exists, point you to the latest object
- if the logical name does not now exist, but you have a history of logical names, use that
- if the old object of that name was renamed, point to that
- similarly for deleted
- if the system doesn't create a history of logical names, you may lose
Given a partial object reference of the form (name=unknown,number)
- if the object still exists, whatever its logical name, point to it
- if the object existed, but no longer does, indicate that
Amd so on. The object mapping database can track renamings, deletions,
mergings, splits, and other evolutions of the object.
The logical name helps span deletuons and restorings.
What this would look like for a wiki
A TWiki URL looks like this:
With an object number added, it might look like
But, the following partial references would still be valid
The object can be logically renamed in the wiki or the filesystem
but still found.
What this looks like for CVS
This renaming technique works okay for CVS, too.
(It has been built.)
Problem: CVS is embedded in the UNIX filesystem.
Users can copy directory subtrees around at any time.
If you do so, you really
should copy (and version) the
database that maps object numbers to logical names, etc.
And that is the basic problem.
If the database is maintained in the root of the repository
or CVS module, ordinary UNIX manipulations will not keep it consistent.
If the database is maintained in each directory, it is more likely
to be consistent.
But the overall problem is keeping it consistent.
CVS's big strength is that it uses ordinary files and ordinary
directories. But that is also the problem.
NOTE: filesystems that provide hooks or callbacks on common filesystem
operations solve these problems.
E.g. a database that can be mounted as a UNIX filesystem.
--
AndyGlew - 17 Apr 2003