Tags:
create new tag
, view all tags

Feature Proposal: Repository for site metadata, web metadata, and more

Motivation

  • If you have hundreds or thousands of webs on a TWiki site, web metadata in a data repository is useful.
    • It can make things otherwise impossible possible.
    • It can make things more efficient.
  • If you run a federation of TWiki sites (detail later), site metadata is necessary.
  • A mechanism to store and retrieve metadata of sites and webs in a uniform manner is handy for a large and federated TWiki installation.
  • There are cases where other kinds of metadata needs to be handled.

Description and Documentation

Basics of the metadata repository

Based on the motivation above, a metadata repository of the following nature is proposed.
  • The repository houses data tables such as the site metadata table and the web metadata table.
  • A table consists of records which have unique names. A web metadata record is named by the web name.
  • A record consists of fields, each of which consists of a field name and value. A field name is unique in a record and a field value is a string.

Why metadata repository rather than TWiki topics?

A large TWiki site may have hundreds or even thousands of webs. In that case, a) housing metadata of all webs in one topic or b) housing the metadata of one web in one topic is inefficient. You should use a properly indexed data repository such as GDBM.

Site metadata being small, there is no need to care about efficiency. However, handling it in the same manner as web metadata makes things simpler and consistent.

It's optional

The repository being for a large site having hundreds or thousands of webs, its use is optional. There would be changes at various places in the TWiki core making use of the metadata repository. Those changes would be utilized only if the site owner explicitly turns on the metadata repository.

Examples

Here's how a metadata repository would be used by a federated and large TWiki installation.

Federation of sites

Let's assume the following federation of TWiki sites.
  • It consists of three TWiki sites - in Americas, Europe, and Asia.
  • All sites in the federation have the same set of webs.
  • Each web in the federation has one master site where update happens. This means that a web is read-only on a non master site.
    • Let's say WebOne's master is Americas site, WebTwo's Europe, and WebThree's Asia.
    • Each site mirrors sites whose master is not local periodically.
      • Americas site mirrors WebTwo and WebThree.
      • Europe site mirrors WebOne and WebThree.
      • Asia site mirrors WebOne and WebTwo.
site-mirroring.png

Web admins

If a TWiki site has hundreds or thousands of webs, defining admins of each web is crucial for efficient management of the site.
  • Notifying and asking questions about a web are straightforward - it's only a matter of contacting the admins. When admins become unreachable, you can flag a web as abandoned and start a removal process.
  • By properly enhancing TWiki, you can make web admins permitted to do everything on the web regardless of access control setting. This is like TWiki admin but only for the web. Very little TWiki admin intervention is required for individual webs.

Site metadata and web metadata fields

To achieve above, the following fields are needed in a site metadata record.
  • server
  • data directory path on the server
  • pub directory path on the server
Server and directory information are needed for mirroring. For example:

Name Server DataDir PubDir ViewURL ScriptURL ScriptSuffix
am strawman /disk0/data /disk0/pub http://strawman http://strawman/cgi-bin  
eu woodenman /d/twiki/data /d/twiki/pub http://woodenman/cgi-bin/view http://woodenman/cgi-bin  
as tinman /twiki/data /twiki/pub http://tinman/twiki http://tinman/twiki/cgi-bin pl

And the following fields in the web metadata.

  • admin group
  • master site
For example:

Name Admin Master
WebOne GodelGroup am
WebTwo EscherGroup eu
WebThree BachGroup as
Sandbox TWikiAdminGroup  
Trash TWikiAdminGroup  

For practicality, only top level webs (not subwebs) have metadata in the repository. A subweb inherits its parent's metadata.

Each site in a federation needs to have the Sandbox and Trash webs locally and it's natural for those webs not to be mirrored. To represent that, those webs have master undefined.

Among the TWiki sites consisting of a federation, both the site metadata and web metadata are shared.

Impact

WhatDoesItAffect: Security, UI, Vars

Basic Design

There will be the mdrepo attribute in the TWiki object. Needless to say the name comes from metadata repository.

The mdrepo attribute is set when the TWiki object is constructed. It's not for all TWiki installation. Everything would keep working fine without the attribute.

If a site employs the metadata repository, its LocalSite.cfg would have:

$TWiki::cfg{Mdrepo}{Store} = 'DB_File';

TWiki.pm would have:

    if( $TWiki::cfg{Mdrepo}{Store} && $TWiki::cfg{Mdrepo}{Dir} &&
        $TWiki::cfg{Mdrepo}{Tables}
    ) {
   require TWiki::Mdrepo;
   $this->{mdrepo} = new TWiki::Mdrepo( $this );
    }

The mdrepo object would have the following methods:

$mdrepo->getRec("TABLE_NAME", "RECORD_NAME")
returns a hash ref.
$mdrepo->getList("TABLE_NAME")
return the list of all record names.
$mdrepo->putRec("TABLE_NAME", "RECORD_NAME", HASH_REF)
update the table entry for the key with the hash ref.

  • There would be the mdrepo script to update the data repository, which is accessible only by the TWiki admins.
  • There would be the %MDREPO{...}% variable to retrieve data from the data repository.

The table name for site metadata would be "sites" while the table name for web metadata would be "webs".

$mdrepo->getRec("webs", "WebOne") would yield:

{
    admin  => "GodelGroup",
    master => "am",
}

Implementation

-- Contributors: HideyoImazu - 2011-12-16

Web interface to view and update web metadata

With the following combination of HTML tags and TWiki markup, you can make a web interface to view and update web metadata.
<form>
Initial Letters:<input class="twikiInputField" name="websfilter" value="%URLPARAM{"websfilter"}%" size="12" />
<input class="twikiSubmit" type="submit" value="List sites" /><br/>
</form>
<form action="%SCRIPTURL%/mdrepo" method="post">
<input class="twikiSubmit" type="submit" name="_add" value="add"/>
<input class="twikiSubmit" type="submit" name="_updt" value="update"/>
<input class="twikiSubmit" type="submit" name="_del" value="delete"/>
<input class="twikiSubmit" type="reset" value="clear" />

| *Name* | *Admin* | *Master* |
| <input class="twikiInputField" size="20" name="_recname"/> | <input class="twikiInputField" name="__admin" size="12"/> | <select name="__master"><option>am</option><option>eu</option><option>as</option></select> |
%MDREPO{"webs"
filter="%IF{"$'URLPARAM{websfilter}' = ''" then="^ " else="^%URLPARAM{"websfilter"}%"}%"
format="| $_ | $master | $admin |"}%

It would look as follows:

Initial Letters:
Name Admin Master Load Row
 
WebOne GodelGroup am do
WebThree BachGroup as do
WebTwo EscherGroup eu do
  • You would put initial one or more letters of web names in the "Initial Letters:" box and "submit" to see webs of the specified name.
  • You would add, update, and delete webs using the form above.

For metadata update by the web form shown above to work well, the mdrepo script redirects to the view script of the same page.

Command line interface

In addition to work as a CGI script, the mdrepo script is made to be used as command line interface to view and update metadata repository. The general syntax is as follows.

mdrepo COMMAND  ARGUMENT ...
COMMAND is either show, list, add, updt, del, load. Usage of each command is as follows.

mdrepo show TABLE_NAME  RECORD_NAME

This shows a record of a table.

Example:

$ ./mdrepo show webs WebOne
WebOne
    admin=GodelGroup
    master=am
In this and following examples, $ at the beginning of a line is command line prompt. And the current directory is the bin directory. The reason why the mdrepo command is invoked as ./mdrepo is that the current directory (.) must not be in PATH.

mdrepo list TABLE_NAME

This shows an entire table.

Example:

$ ./mdrepo show webs
WebOne
    admin=GodelGroup
    master=am

WebThree
    admin=BachGroup
    master=as

WebTwo
    admin=EscherGroup
    master=eu

mdrepo add TABLE_NAME  RECORD_NAME  FIELD_NAME=VALUE ...

This is to add a new record. If the record of the specified name already exists, it fails. It returns nothing.

Example:

$ ./mdrepo add webs WebFour admin=HofstadterGroup master=am 

mdrepo updt TABLE_NAME  RECORD_NAME  FIELD_NAME=VALUE ...

This is to update an existing record. If the record of the specified named doesn't exist, it fails. It returns nothing.

Example:

$ ./mdrepo updt webs WebOne admin=GaussGroup master=as

mdrepo del TABLE_NAME  RECORD_NAME

This deletes a record of a table. It returns nothing.

Example:

$ ./mdrepo del webs WebFour

mdrepo load TABLE_NAME

This loads a table in the list output format from standard input. It returns nothing.

Example:

$ ./mdrepo load webs < webs.txt

Audit trail

All add/update/delete are logged to the logYYYYMM.txt log file in the same manner as other scripts.
  • With update, both the previous and new record are put in the log.
  • With delete, the current record is put in the log.

Web admins

The web admin feature would be implemented by a user mapping handler for the most part. More specifically, it's a matter of the isAdmin() method to return true for the user or not.

Integrating to the core features

If you have thousands of webs on your site, using metadata repository as the integral part of the site becomes beneficial. You can have better control of webs that way. But maybe initially, you are fine metadata repository being advisory.

If $TWiki::cfg{Mdrepo}{WebRecordRequired} is true (false by default), the following behaviors would be turned on.

More efficient %WEBLIST{...}%

Usually, the list of webs is generated by traversing the hierarchy of the directory specified by $TWiki::cfg{DataDir}. This takes time if you have thousands of webs.

If $TWiki::cfg{Mdrepo}{WebRecordRequired} is true, TWiki::WEBLIST() refers to web metadata instead of traversing the data directory. To cope with the fact that web metadata exists only for top level webs, TWiki::WEBLIST() would put subwebs of the current web by directory traversal.

Lacking subwebs of other top level webs, the result of %WEBLIST{...}% is different from usual, but it's substantially faster.

Consistency between web metadata and web existence

Usually, existence of a web is equal to existence of the corresponding directory in $TWiki::cfg{DataDir}.

If $TWiki::cfg{Mdrepo}{WebRecordRequired} is true, web existence checking and web creation refers to web metadata.

As such, before creating a new web, you need to register the web's metadata using the mdrepo script from the command line or browser. After a web is removed (moved to Trash), the web's metadata needs to be removed.

Discussion

I can see the need to scale TWiki with web-level TWiki admins.

Is there another reason besides not locking oneself out? If no other reason, may be there is a solution that does not raise the complexity of setup and use? For example, TWiki could be enhanced to check on topic save of any topic if the user locks him/herself out, and refuses to save the topic with proper error message if this is the case. This would be useful also for non-admins.

If there are other reasons for web-level TWiki admins, e.g. we proceed with this proposal, I recommend to add a clear spec: What configure settings enables this feature, UI to define the web-level TWiki admins, where the setting is stored, etc.

-- PeterThoeny - 2011-12-21

Good proposal, setting to accepted state.

Items to consider as discussed at JerusalemReleaseMeeting2012x05x11:

  • To get a complete audit trail (for traceability) it would be good to have settings version controlled, or alternatively, log changes to settings in TWiki log file
  • Think of a more descriptive term than "super ", which could be confused with super class. Can be anything that is descriptive, such as extended attributes, extra preferences, extended settings, extended meta or the like.

-- PeterThoeny - 2012-05-11

  • logging for audit trail is now mentioned.
  • I switched from "super" to "mdrepo".

-- HideyoImazu - 2012-05-11

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng site-mirroring.png r1 manage 16.2 K 2012-05-01 - 09:24 HideyoImazu  
Edit | Attach | Watch | Print version | History: r34 < r33 < r32 < r31 < r30 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r34 - 2013-02-18 - HideyoImazu
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.