create new tag
, view all tags

Proposed: Proper Handling of Meta Data in Topic Text

Issue: Inserting TWikiMetaData in a topic text is an unsupported case. Meta data inserted at the beginning of the line gets removed on topic save and inserted as real meta data; clearly not what the user expects. For example, if you enter %META:TOPICPARENT{name="CoffeeBreak"}% at the beginning of the line, it gets removed on topic save. (Actually, I think it replaces the original parent meta data -- ThomasWeigert - 08 Feb 2004) Users are expected to escape meta data like other TWiki variables, e.g. by writing %<nop>META:TOPICPARENT{name="CoffeeBreak"}%, but this should be guarded against if not done.

Proposed spec 1: (PeterThoeny) Consistent handling of meta data even if entered manually in a topic text, which is an unsupported case. Handling of meta data:

  • In normal text: TWiki escapes meta data on topic save if needed.
  • In verbatim blocks: TWiki leaves meta data alone so that it survives edit/save cycles.

Proposed spec 2: (ThomasWeigert) Consistent handling of meta data even if entered manually in a topic text, which is an unsupported case. Handling of meta data:

  • Text entered that parses as meta data will not be entered as meta data.
  • It is undefined what happens otherwise with the entered string of characters that parses as meta data (in other words, there are no guarantees, the case remains unsupported).
  • Characters following the text parsing as meta data up to the line end are preserved and correctly rendered/stored.

I have no strong feelings either way on the middle bullet but thought it was easier to implement by leaving this undefined. The user is really not meant to do this, so we just need to prevent bad things from happening; we don't need to turn it into a feature.

Note that the last bullet leaves open the possibility that the characters following were again text parsing as meta data. In this case, either this definition should be applied recursively and those characters handled, or these would be left in text and would be dealt with upon next time the text is edited. (Either way they would not result in meta data.)

Proposed Spec 3: (SvenDowideit) replace MetaData with tables with (name, value) pairs.

  • all topic text is visible in edit mode
    • CN: well, editing tables in plain text twiki mode sucks. But having a "raw MD edit view" could be nice for all the proposals
    • SD: if editing as a table sucks, it needs fixing
    • SD: i should not have said table in hindsight. any name:value pair is sufficient for our needs - and they can then be inline in the topic
  • when you want to protect metadata you put it in a seperate web
  • you can create workflow forms that may not have a general edit box (webform items only)
  • extremely wiki
  • proven idea (i implemented this before the current metadata system existed)

(The question of how meta data should be stored which emerged from this topic was refactored as a separate question, as it is somewhat unrelated to the requirements discussion of what should be done when a user enters meta data in text -- ThomasWeigert - 09 Feb 2004)

-- PeterThoeny - 08 Feb 2004
-- ThomasWeigert - 08 Feb 2004
-- ColasNahaboo - 08 Feb 2004
-- SvenDowideit - 08 Feb 2004


In MetaDataHandlerCantProcessCrLfLineEndings it was pointed out that meta data is treated inconsistently if entered into a topic text (an unsupported case), which manifestates itself in that particular bug fix. Above spec is a first cut, please feel free to modify.

-- PeterThoeny - 08 Feb 2004

(Discussion below moved here from StructureOfOndiskTopicFormat -- ThomasWeigert - 08 Feb 2004)

I have conducted some experiments in my own system (Beijing release) and observed the following behavior:

  • If a string of characters S that could be parsed as a meta data statement is inserted in the text during edit
    • if S begins a line, the statement is interpreted as meta data and everything after it up to the end of the line is thrown away
    • if S does not begin a line, the whole line is treated as text.
  • If a ^M is inserted with a text editor after a line with meta data, this line is still interpreted as meta data (just a consequence of above)
  • It is possible to insert the meta data for a form in a text are during edit and have it converted to meta data upon save (again just a consequence of above)
    • Note that during preview one sees the inserted text, but after save it is gone into meta data

These are my observations. My question now is, what are the requirements you are trying to meet. For example,

  1. When a string of characters that could be parsed as a meta data statement is inserted into the text area, should it always be interpreted as text? (In other words, still be there as text after the save?)
  2. When a string of characters that could be parsed as a meta data statement is inserted into the text area, should it always be thrown away as illegal text? (In other words, be gone after the save, but not be added to the meta data?)
  3. When a string of characters that could be parsed as a meta data statement is inserted into the text area, should it always be interpreted as meta data? (In other words, be gone after the save from the text, but added to the meta data?)
  4. One should also clarify whether or not characters following a string of text parsed as meta data in behaviors 2 and 3 should be ignored or should remain in the text.

Again, this is just a list of questions to be answered, without any opinion stated as to what the right way of doing things is. (I should note that I was surprised by the text following the string of characters that could be parsed as meta data being thrown away.)

-- ThomasWeigert - 07 Feb 2004

Meta data you do not want to have expanded should be escaped properly like any other TWiki variable. At TWiki.org we write frequently about meta data, therefore you find non-escaped meta data in the Codev web. Codev topics containing non-escaped meta data should be fixed. Michael, thanks for pointing them, is fixed now.

I think the likelyhood of a user inserting %META:SOMETHING{...}% at the beginning of a line is very low for a typical TWiki installation (unless the subject is about TWiki internals).

Nevertheless, we should decide what to do if a user enters unsupported meta-data at the beginning of a line. My inclination is to escape them on topic save if needed. As John pointed out, meta data in verbatim text should survive edit cycles verbatim. Please follow up in HandlingOfMetaDataInTopicText

Thomas, I apologize for the confusion this topic causes. In reality it is quiet simple if you read the docs.

-- PeterThoeny - 07 Feb 2004

Thanks, Peter. As there was so much discussion on this topic I am trying to understand the exact requirements that underlie this aspect of the system. Based on the experiment I did described above, for example, I was surprised to find that text following on the same line as unescaped metadata starting a line was thrown away. I had not seen that in the description of metadata. I agree with you that this is not a big issue (just an issue of documentation); I was only trying to understand.

Here is my understanding of the grammar for the text entered in the edit box:

   edited-text ::= ( topic-text-line | meta-text-line }+
   topic-text-line ::= printable-chars-not-starting-with-meta-identifier end-of-line
   meta-text-line ::= meta-identifier upper-alpha-char* "{" meta-parameters "}%" char* end-of-line
   meta-identifier ::= "%META:"
   end-of-line ::= "\n"

using the definitions from StructureOfOndiskTopicFormat.

Then I observe that upon save the complete meta-text-line is removed from edited-text and the part of it that is meta data is moved to the meta data, if it is the last occurrence of that particular meta data. The char* following the meta data are deleted.

If I were to redesign the parser for the text, I would not delete the char* but leave it in the text.

-- ThomasWeigert - 08 Feb 2004

Disappering text: This is an esoteric case when discussing TWiki internals. Yes, the parser should be changed so that user entered text does not disappear. Follow up in HandlingOfMetaDataInTopicText.

-- PeterThoeny - 08 Feb 2004

Actually, thinking about it some more, I do think that there is a problem. If a user enters some text that parses as meta data at the beginning of a line, then this is interpreted, upon topic save, as meta data. This allows the user to enter meta data that she might not be allowed otherwise.

Here is an example: I have an application where I have a form field "Validated on" and another one "Validated by". These fields cannot be edited but can only be set by other means. (In detail, they are defined as "label". In the action bar at the bottom, when an entitled user is viewing the page, there is also a "validate" link, which when pressed inserts the validation data. This is done in by an edit in template, view script, and a validate script.) In my application it is important that only certain users can validate. (This is financial information.) Using the trick with the meta data, if a topic is not yet validated, a user can put the correct meta data for the "Validated on" and "Validated by" scripts into the text, and has it stored as meta data upon save. As a consequence, the form shows the topic as being validated, where in fact the user cheated.

The conclusion I draw from this example, I now can see many others, is that text that parses as meta data entered in the text area should not be treated as meta data. I don't care if it is thrown out or not, but it better not be treated as meta data, lest one allows the situation described in the paragraph before.

-- ThomasWeigert - 08 Feb 2004

Thomas, this observation is correct, the reason for proper HandlingOfMetaDataInTopicText.

-- PeterThoeny - 08 Feb 2004

I proposed requirements for HandlingOfMetaDataInTopicText above.

-- ThomasWeigert - 08 Feb 2004

Thinking about the implementation somewhat, either

  • one has to parse the text upon saving differently from the on-disk format to recognize the text masquarading as meta data, or
  • one needs a vehicle of recognizing meta data from text in the on-disk format (including text masquarading as meta data).

Note that it is not enough to say that text starts when the first non-meta data line is found. This would still allow the user to enter meta data in the text as long as it is at the beginning of the text. (The same argument goes for the end of the text.)

Note that in the current (Beijing) format it is possible to differentiate meta data from text. Text is apparently always terminated by ^M before the new line charater. Meta data does not end in a ^M. Thus text masquarading as meta data is easily recognized by the ^M at the end. I understand that this might change in the Cairo release (see discussion in StructureOfOndiskTopicFormat).

-- ThomasWeigert - 08 Feb 2004

Update to my empirical test above: If the meta data added in this sneaky manner is a field, and this field is not present in the attached form or if there is no form, it will be present upon the first save. However, upon the next save it will be removed from the raw topic.

This does not happen for other meta data, even for meta data not known to TWiki. By the way, this would be really bad, as we rely heavily on additional meta data added by our applications.

-- ThomasWeigert - 08 Feb 2004

By the way, I have decided to put the "sneaky data enty" to good use. I like the InterestedParties form field, but there are forms on TWiki.org which do not have that field, and there are pages which do not have any form. To be able to keep track of those pages, I add the field

  %META:INFO{name="InterestedParties" title="InterestedParties" value="Main.ThomasWeigert"}%
which is then stored in meta data and can be queried for.

We have explicit support for this on our twiki, see TWikiInALargeCorporateSetting.

-- ThomasWeigert - 08 Feb 2004

Added proposal #3
-- ColasNahaboo - 08 Feb 2004

Whatever version is taken note that this does not change the fact that META tags can be, and have been, entered into topic text on multiple wikis around the world. They expect their content not to be broken (as some of the proposals above suggest), and expect that if they enter anything into the topic text edit box that TWiki will not corrupt it. Furthermore they expect that TWiki will not break and corrupt their existing content.

Disk filing systems do not prevent you entering arbitrary sequences of letters and numbers. Neither does any modern day application.

Most of the discussion on this page has been caused by an inability (by TWiki's alpha release) to cope with the concept that topic storage separation and functionality are independent issues, and this problem has been caused by a completely different issue - the fact that not all important characters are escaped before storage.

ie The root cause is marshalling and unmarshalling of meta data attribute values not happening correctly, not the issue described on this page. If that was occuring then the change on 17 Jan 2004 that happened to the codebase would not have resulted in the knockon bug described in MetaDataHandlerCantProcessCrLfLineEndings .

Unless the former of these two issues is resolved, then further "patch-patch-patch" fixes like this recent sequence will be needed. (Much like the issue that happened when root issues regarding HTTP/1.0 vs HTTP/1.1 were ignored* caused problems)

  • * or left because people didn't have time, the expertise to understand RFCs or whatever caused the need for repeated explanations,

Alternative solutions to both issues will be posted shortly, making this discussion more suitable to be resolved at a slower more considered pace. ( Corrupting people's content surely precludes any further releases ?)

-- MS - 08 Feb 2004

M, I think you touch a point which is important: what are the "intangible laws" TWiki should obey? let's see:

  • WYSIWYR What you save is what you read: TWiki should not transform data on save. This is the basics of wiki operation, and something that has been compromized with the current MD handling. I think your position is to enforce this rule, and I agree with you (hence my proposal to separate text and MD, even if they stay physically in the same file)
    Note that this would rule out things like abbrevs, such as ~~~~ being expansed on save as your signature.
Then, we could add other potential laws, to guide implementation of features, such as:
  • All in files Have all the information in plain text files. Databases could be used as caches to speed up things, but we could stop/reinit the DB at any time, it would re-read info from files. DB could cache the data, not store it.

But I digress. M, for the issue at hand, can we summarize your position in "whatever you do, do it according to WYSIWYR ? And that the current situation is thus a bug WRT this rule and should be corrected? I think we should try to agree on the goals otherwise the arguments can degenerate in unproductive flame wars.

  • You can put it that way. I would put it even simpler and more direct - user's data and intent must be respected. I note none of the current proposals appear to do anything about how to deal with existing content - short of PeterThoeny logging into everyone's systems and adding <nop> tags like he did here. Currently TWikiAlphaRelease breaks both of these rules. Eliminating a users use of extra %META{}% tags inline would also break that rule. Whilst it was not an explicitly stated thing that would work, people tried it because it seemed sensible. This includes experienced TWiki people like like RandyKramer, ThomasWeigert amongst many others. -- MS - 09 Feb 2004

-- ColasNahaboo - 08 Feb 2004

WYSIWYR can only be accomplished easily if the storage of real meta data is somehow different from the storage of text, either by

  • a difference in content that cannot be created by the user (such as the missing ^M in current (Athens, Beijing) meta data, or
  • use of a different location for storage (clearly recognizable beginning or end of the file, separate file, data base, etc.).

Obviously, having meta data and text be stored consistently goes against the WYSIWYR principle.

Regarding the data base, I would like to understand why you suggest that a data base would be inappropriate for the storage of meta data? You are right, there may be a philosophical principle of keeping everything in one place, but on the other hand, efficiency and performance concerns suggest that structured data be stored in a manner where it can be efficiently accessed and queried.

-- ThomasWeigert - 09 Feb 2004

Thank you, Thomas, well said. I'll point out that meta-data is not currently WYSIWYR. Configuration management metadata is hidden away in RCS files. Since the precedent is set this way, I'd favour moving all the metadata out of topic files, or failing that, Colas' proposal 1.1. If anyone has a problem with moving a meta-datum out of the topic, then what they are looking at is data and not meta data.

-- CrawfordCurrie - 09 Feb 2004

My take on the data vs. meta data is slightly different in formulation, but the same in spirit.

I think that today we are looking at meta data not really as meta data (except for the RCS information, the owner, and parent information) but as data of a separate application from the white board. This is one of the points I want to make in TWikiWhatWillYouBeWhenYouGrowUp but have not gotten that far yet.

So, I agree with you that people are looking at meta data as data, but I do not agree that, therefore, this information should be visible in the edit text area. Instead, I will argue in TWikiWhatWillYouBeWhenYouGrowUp (this is a spoiler so you might not want to read on) that we need to present in every application that comprises TWiki just the data that this application is interested in. That is, if you want the "form application" you should see the form data, if you want the "attachment application" you should see the attachment data, and if you want the "whiteboard application" you should see the text. You should never see or be able to edit the true meta data (parent, RCS information, stored date, etc., when you edit information.

Unfortunately, today we have the situation that meta data is used for more than just true meta data. As the applications other than the white board need some storage for their data, unless this is solved some other way, we need to keep that information in the "meta data".

-- ThomasWeigert - 09 Feb 2004

mmmm, I like what you say here Thomas.

can you please define what you would call the (AssignedToCore:SvenDowideit) named pair? I think that its not metadata by your definition, but we have generally called it that around here (to our detriment perhaps)

as you can see from my other writing, i'd like the named pair to be in the topic text, but accessable as you say for application data. and in this way i think we would discard with the name meta-data (which is just historical).

-- SvenDowideit - 09 Feb 2004

MIME (rfc:2045), HTTP (rfc:2616), RTSP (rfc:2326) & NNTP (rfc:850, rfc:822, rfc:2822) calls them message-headers, fields, and the two parts a field-name and field-value. Often the're referred to as fields. The message as a whole is essentially treated as a record.

FlexWiki calls them a properties.

HTTP actually allows for a structure that is:

field: line
more: field lines
etc: etc
<empty line>
Payload Text
more: field lines

The trailers aren't commonly used, but they do exist. Inline field use as you suggest (in keeping with current behaviour) allows users to define more fields and edit them in a wiki manner. (In order for trailers to work in HTTP, a content length header field is required, a wiki could be more relaxed)

The system metadata lines (ie true metadata) can then be stored where every other major network application stores them - in the main MIME header.

Furthermore, having inline metadata doesn't mean that Sven is wrong or Thomas is wrong. What's wrong with having a FieldDefinitionPage which lists all the fields and their types. A user can then have a preference of clicky pointy or "wiki" for field settings.

You can even add in extra properties for the properties/fields on that page then - such as who is allowed to change things, what sequence of changes is acceptable, and so on. Ideally most people would not use that kind of restrictive behaviour, but in some circumstances it would be useful.

On the point of META:FIELD data being user edittable being a security hole - that's a misnomer - it misses the point. Client side enforced security being security is an idea about as dated as the concept that bare NFS and ident are secure protocols.

Simply sending a client a page that says "this field is read only" does not preclude a client saving the HTML to a file. Editting it to a type in box, and then sending the data back.

If you truly want fields protectable, then it has to be done server, not client side.

However the point still stands either way, breaking people's existing content is bad.

Also if you DO change the ondisk storage format bump the format version .

-- MS - 09 Feb 2004

I don't think there is a right answer here for meta data, I'll mainly focus on the specific bug that has come up, that is when meta data is entered into the topic text. The choice being between proposal (1) or (2) above, or some variant of one of these. I favour (1), possibly with a switch that lets a Wiki WebMaster allow direct editing of meta data by all users if there's a strong demand for this (would like include the modification date, person, file format, attachments etc).

I think (1) as stated is probably the cleanest solution, but we could have:

  • Meta data
  • Empty line
  • Topic
  • Empty line
  • Meta data

That way a topic could contain meta data, but it would be ignored. This saves special processing when saving and means only apply meta data rendering to meta data. However, I suspect this approach will confuse.

A few comments (some off topic):

  • Hiding meta data has always seemed a bit non Wiki
  • Much of the meta data needs to be protected to retain integrity e.g. attachments, format, modification data, last changed by
  • Storing in the topic makes version control easier and means a single topic file can be copied and stays valid
  • Meta data is currently stripped from topic text, into separate data structures. It has been argued before that this takes away from the simplicity of passing topic text as one long (renderable) string.
  • I'm not clear that a material file format change is actually being proposed here. Changing the line ending shouldn't make a material difference - sorry if I've missed something ...
    • What would go is the ability to add meta data via the text box
  • I think the main issue when looking at meta data is allowing rendering to be overriden by plugins and allow extra meta data items to be defined

-- JohnTalintyre - 09 Feb 2004

Let me present a view of the user, not coder.

1. As Twiki newbe I was surprised that the data from the webforms is stored in metadata section of the txt file. Why? Because, like most people, I was accustomed to treat metadata as the data refering only to the contents and authoring/publishing the object. Now I know that what is usually contained in ONE meta description item (in HTML page) is broken here (in Twiki page/topic) into MANY separate metadata items like operation system, OS version, topic classification, status of the topic in the workflow,... and whatever User included in the webform. Now I think it is useful and OK, provided that such extension of the customary meaning and application of metadata is clearly explained in documentation.

2. So, let's agree that metadata IS A DATA subject to manipulate partially by the Users. I see the evil in the word "partially". Going in line with JohnTalintyre I would like to suggest one step further - to implement the policy concerning metadata of various criticality. Specifically:

  • to secure sensitive metadata like attachments, format, modification data, last changed by - with read-only access
  • to enable some manipulation of other data. What - exactly - kind of manipulation can be enabled is a question to be discussed, but the community has much experience (changing workflow status for example).

Technically, this policy can be implemented as a link "meta" in the bottom navigation bar, giving the access to read-only view of all the topic txt file, and to "extended edit" view to all editable contents of the topic txt file. Details to be discussed.

3. I think that WYSIWYR rule is important, esp. for the users not familiar with application complexity and not conscious of the possible consequences of incorrect operation. So, if above policy is implemented, the "delete on save" behavior should be eliminated as a bug. If anybody want to add an item like InterestedParties, mentioned above by ThomasWeigert, she or he should use the link "meta" and edit/add what permitted.

4. Consequently, please extend the METASEARCH usability and add a format option, like in Formatted Search. It is not only my point of view, some other Users asked for that. Such possibility would enable us, for example, to format metasearch of parent topic in the same way as the search for "Back_to..." in standard topics_in_this_collection application. Great simplifiaction!

Above is a point of view of user, not coder, so excuse me and correct if anything is stupid. Excuse me also my poor English, as not my native language.

-- AndrzejGoralczyk - 09 Feb 2004

As I said earlier, I believe that twiki meta data today serves two purposes:

  1. Represent "real" meta data, such as "parent", "topicinfo", "topicmoved"
  2. Represent data for the other twiki applications besides the white board, such as form, attachments, SimpleTableEntryUsingForms, and a number of internal applications people have developed.

I don't think that (1) should be editable by the user. (2) belongs to the applications that created that data and operate on it and, therefore, should not be accessible to the white board application. At least, I don't want somebody putting text on the white board to mess with my application data.

Thus, given how twiki has evolved, it is inappropriate, in my opinion, to expose its meta data to the white board application.

-- ThomasWeigert - 10 Feb 2004

By the way, above is an example why it would be nice to have named include sections. The comment is literally copied from TWikiWhatWillYouBeWhenYouGrowUpDiscussion. I would have liked to include just that section here, so as to not repeat it. (I realize there are no other includes on that topic, so I could have just taken that section using the current means, but it seems presumptious to claim that those few lines are the most important part of the whole topic.)

-- ThomasWeigert - 10 Feb 2004

My big Yes for named include sections. I extended the SearchingSectionsOfTheTopics to describe an expample of powerful application of such feature.

-- AndrzejGoralczyk - 10 Feb 2004

Storing the meta data in the text file was a concious decision - you'll see a limited amount of discussion at GenericMetaDataStoreForTopics. The main this that's kept simple is version control of meta data and keeping it in line with the topic text revisions.

Clearly meta search would be faster if text and meta data were separate, but I would ask:

  • Do we need more speed in this area?
  • Do we really need to jump through hoops at present?
  • I would think inport/export would become more difficult - at present moving one topic, means copying one file (+ one directory for attachments).

I don't think there is a clear cut answer here. Keeping together in one file makes some things easier and other harder. The same is true of separating the data. We can't make a change based just on issues with the current approach, we need to balance that with advantages of the current approach plus the cost of change.

-- JohnTalintyre - 10 Feb 2004

(Note: Discussion of storage format for meta data moved to MetaDataStorage -- ThomasWeigert - 10 Feb 2004)

Edit | Attach | Watch | Print version | History: r21 < r20 < r19 < r18 < r17 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r21 - 2004-02-11 - JohnTalintyre
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.