Bug: Meta Data Handler can't Process CR-LF Line Endings
A meta data line ending in "\r\n" is not recognized as a valid meta data line, it shows up literaly as META:DATA{....} in the topic text. Plugins that manipulate meta data might introduce "\r\n" line endings instead of the default "\n" line ending.
Test case
Edit a topic file on the shell level and save with CR-LF line endings; or try the undocumented repRev command (which is broken in the latest Beta release)
Environment
--
PeterThoeny - 29 Jan 2004
Fix record
Accept "\r\n" and "\n" line endings when parsing the meta data. Change is in
TWikiAlphaRelease and at TWiki.org.
*** ./b/lib/TWiki/Meta.pm 2004-01-17 00:23:58.000000000 -0800
--- ./a/lib/TWiki/Meta.pm 2004-01-28 22:23:35.000000000 -0800
***************
*** 298,304 ****
my $newText = "";
! foreach ( split( /\n/, $text ) ) {
if( /^%META:([^{]+){(.*)}%$/ ) { # greedy match for ending "}%"
my $type = $1;
my $args = $2;
--- 298,304 ----
my $newText = "";
! foreach ( split( /\r?\n/, $text ) ) {
if( /^%META:([^{]+){(.*)}%$/ ) { # greedy match for ending "}%"
my $type = $1;
my $args = $2;
--
PeterThoeny - 29 Jan 2004
Follow up
Just realized that there is a change in storing files:
- Before this fix:
- Meta data lines terminated by "\n" (LF) only
- Topic text lines terminated by "\r\n" (CR-LF)
- After this fix:
- Meta data lines terminated by "\n" (LF) only
- Topic text lines terminated by "\n" (LF) only
Does that impose a problem?
--
PeterThoeny - 29 Jan 2004
Yes -- you now have no way of
correctly determining the end of the topic text, since having %META lines in topic text is valid, having them at the end of the topic is also valid. (And indeed functions as expected - there's been numerous discussion in Codev where this has been used as a demonstration.)
This is the same problem that SMTP has with embedded full stops and mbox format mail systems have with
^From lines. This will break content - (including pages in Codev).
An example page that in the past would still be edittable now is here:
Note that the debug version now contains what the raw=1 version above shows on a standard TWiki:
Also note that the text has been mangled - the text as entered looked like:
The topic text is now returned as:
And the raw view is:
I suspect even this simple view might confuse TWiki - that's not the intent, the intent is to show the problems this causes. (The last example TWiki has problems with and inserts the rogue
<p /> )
The reason this happens is because you've constrained the by your earlier bugfix (without any tests that it breaks other content):
if( /^%META:([^{]+){(.*)}%$/ ) { # greedy match for ending "}%"
Used to be: (and is in the majority of people's installations)
if( /^%META:([^{]+){(.*)}%/ ) {
To summarise, the change you've made prevents metadata from being allowed at the end of topic text, thus breaking content on TWiki.org and elsewhere. Reverting to
if( /^%META:([^{]+){(.*)}%/ ) {
Would re-open the old bug, but resolve this one. (And indeed if this patch was applied by anyone other than yourself, this is probably what would happen)
Incidentally the original bug was a problem caused by lack of escaping metadata content and storage of metadata does not go through a marshall/unmarshalling phase.
Since your "fix" actually breaks more content than it fixes, I've change the
TopicClassification from
BugResolved to
BugReport .
--
MS - 29 Jan 2004
The new spec with identical line endings for meta data and topic text is orthogonal and more clear. My only concern is to make sure that this does not break existing
tool functionality like
RCS on different platforms,
RcsLite and diff (which can be fixed). Because of this I leave this bug report open for a while. Can anyone help test this?
Entering meta data into topic text is not documented and should not be used. The common practice to escape TWiki variables is
%<nop>NAME%, that should also be the case for meta data.
--
PeterThoeny - 03 Feb 2004
See
Sandbox.WillProbablyResultInCorruptedText for content
that is bust, and can no longer be editted by end users - thereby
breaking the wiki nature of those pages (the reason I've put it in
the sandbox incidentally).
The content was entered looking like this - note that content was
entered inside <verbatim> tags - another method that people
expect to
just work . The example is also not specious - there are
many places on twiki.org where people discuss meta data handling
changes.
This is text preceding the inline meta definition surrounded by verbatim. We are discussing here what these declarations do, and considering
extending this to do more things, which is why we have an extra field
<verbatim>
---
%META:FORM{name="NonExistantForm"}%
%META:FIELD{name="TopicClassification" title="TopicClassification" value="DocumentationPage"}%
%META:FORMLOOKANDFEEL{Topic="NonExistantFormLookAndFeel"}%
---
</verbatim>
This is text preceding the inline meta definition surrounded by verbatim. We are discussing here what these declarations do, and considering
extending this to do more things, which is why we have an extra field
This is a trivial way to a) break a wiki with this feature b) make it difficult
(not impossible) for the normal audience to fix without access to
WebPreferences.
This change makes TWiki vulnerable to
defacement in such a way that
SoftSecurity is also compromised.
(Since the community can't fix this easily) It might be said that
this isn't a problem, but the corruption of topic text on it's way to
store, the misdisplay of content, the fact it's a security issue as
well strike me as major problems.
Furthermore, to make line endings orthoganal
there is no point in storing topic text with trailing ^M. After all,
if this security flaw that breaks soft security is ignored then you
lose all the benefits you used to have of ^M's still being there
(differentiation of metadata from topic text). At that point removing
the ^M's when storing topic text would make logical sense.
However in that situation a user may start editting a topic which
ends with ^M on disk, edits it, and the ^M's are replaced with
standard line endings. This results in a potential 2 words edit
looking like the entire page contents has changed.
- Consider what I'm saying for once, not who is saying it.
- Corruption of user's text is about the worst possible thing any application can do, and it's a huge problem for a wiki.
(
"Oh that wiki software corrupted my data, now we're happily using..." )
--
MS - 05 Feb 2004
Could everyone also consider how they are saying things? Consider
this revision for example.
--
SamHasler - 05 Feb 2004
I do consider how I'm saying things. In my experience
PeterThoeny
does NOT (appear to) listen when people raise points that are counter to his own
viewpoint - he dismisses these independent opinions. He has dismissed the point
originally of no longer being able to tell topic text from meta data. He has
dismissed it breaks content (on twiki.org of all places, as well as other sites).
He has dismissed he's changing semantics, without providing a mechanism to
change topic storage format. (Which he's dismissed as going against the
TWikiMission)
That is why I had phrased things more forcefully. You have diluted that.
I was also specifically speaking to
PeterThoeny when I said "If
you're serious...". You have changed the meaning of what I've said.
If you intend to change meaning of what I've said, delete my signature
from the contribution. Better yet, refactor the discussion, since then
you won't (accidentally) misrepresent what other people have said.
The point stands - this small lexical* change is
bad and breaks content,
(due to also changing siginifcantly semantics) introduces a security hole
that is hard to fix with soft security.
- * It's specifically about how the written text is stored/dealt with, and specifically a regex change which is a lexical issue in parsing parlance.
In case it helps, I'll write a brief doc on the
structure of ondisk topic format
to illustrate the change that this change causes and
why I'm so adamant that
this introduces a nasty bug (aside from all the other reasons noted above).
--
MS - 06 Feb 2004
First my apologies, as I must be the guilty party that introduced this problem (different line ending for meta). Second, lets please focus on all understanding the issue here, this will not be achieved by making alegations against Peter.
I agree with Peter that having all line endings the same is important. Clearly this must no break with existing content. If use of meta examples in verbatim tags fails, the we clearly need to address this. I'm not clear from the above, that there are any other bugs that will be caused by cleaning up line endings. Have I understood correctly?
I should add that the intention was to destinguish meta data purely on the basic of %META:...% being present at the start of a line (verbatim should clearly be taken note of). Unless I'm missing something, I can't see why you would want another mechanism. Although this might mean that TWiki needs to throw out META data added during a normal edit.
--
JohnTalintyre - 06 Feb 2004
Could "real" meta be defined as being on the first & last lines in the file, and if any are added at the start or end during edit then an extra line end could be automatically inserted between the "real" and the "edit" meta tags.
--
SamHasler - 06 Feb 2004
No, you can't define it that way. (Prove me wrong with code. Please!) The reason is simple - people can -
and do
add meta data in topics manually. Often at the beginning or end of topics. Furthermore, how would you tell which on disk format you were dealing with?
- * I can find more examples if required - it's been suggested in many places.
A good wrapper mechanism provides a clear mechanism to define the contents of what's been wrapped. Whilst not a perfect wrapper (I don't believe a
perfect one exists) lines ending ^M versus not ending ^M is a good wrapper by that particular definition. If John did it by accident, it explains alot, but it doesn't change the fact it's a good wrapper. I would like to see a better wrapper. But if that is done, the format version number at the top of the topic must be bumped (ie from
format="1.0" ) Calls from plugins for raw topic text should ideally be presented a representation consistent with that format. Whilst this might be called "unsupported", the fact it's had to be used surely takes precedence?
John,
>
I'm not clear from the above, that there are any other bugs that will be caused by cleaning up line endings. Have I understood correctly?
Yes. See also:
structure of ondisk topic format.
Bottom line - this change dramatically changes the interpretation of on-disk storage introducing further bugs. It's also a knockon from Peter's earlier change on 17 Jan (or so).
On a secondary note - this point is bad: (IMO)
the intention was to destinguish meta data purely on the basic of %META:...% being present at the start of a line (verbatim should clearly be taken note of). Unless I'm missing something, I can't see why you would want another mechanism. Although this might mean that TWiki needs to throw out META data added during a normal edit.
Reasons users want to be able to add META in the topic text are:
- They don't want to define a webform, but do want to use metadata. (There's no support for arbitrary fields)
- Users want to define a webform, but can't because the local site has denied them that ability. (Goes against the WikiWay)
- Because they can, and hence do.
- Because they want to talk about meta tags.
- Because they might say something like:
... yeah, the storage on disk starts off with a
%META:BLA{... type declaration, and continues on until the end of the line. Customisation using %PARAM{"param"}%
is fairly common, replacing the older %URLPARAM{"param"}% in many locations.
And last but by no means least:
- Because the topic text container should preserve anything they choose to say, and not hijack it.
Unless this metadata shows up in the topic text edit box, they effectively
end up with a write once mechanism.
On another point, whether the verbatim was there or not is
irrelevant - I was providing a fairly reasonable example of how
discussion on TWiki.org can happen (I can find further real
examples if needed), with the storage mechanism corrupting
the contents.
A storage mechanism that seperates that separates wrapper
information (meta data) from the thing being wrapper (topic text)
should do so in such a way that it can cope with any form of
content it's wrapping. (Marshalling & unmarshalling as it needs to)
If it starts looking at the content - ie the contents of <verbatim>
then the wrapper mechanism is bust.
Second note, the bug report states this:
- "Edit a topic file on the shell level and save with CR-LF line endings;"
- Surely this is unsupported? Anyone could break TWiki by editting the raw topic text, and the obligation is on the onus of the user, not the code to cope with this.
- Furthermore trying this with a Store.pm with unmodified semantics from BeijingRelease shows that this bug does not exhibit itself when doing precisely this. Example:
This means the TWiki version noted up the top (
BeijingRelease) is wrong - it's actually a problem with recent TWiki alphas.
BeijingRelease copes with this "issue" fine. (Surely
BugRejected against
BeijingRelease in that case, but
BugReport for current alpha?)
--
MS - 07 Feb 2004
Question moved to StructureOfOndiskTopicFormat -- ThomasWeigert - 07 Feb 2004
Hi
Thomas,
Peter asked whether his change broke stuff. I said it did. I
provided examples. I then provided more examples. I've
provided further examples now of people talking about ways
of doing things which his change breaks. I've also posted
structure of ondisk topic format.
I think I've made my point. Whether it will be accepted is
another matter. I'm ceasing contribution on this issue at this
point - clearly breaking the ondisk storage format is not a
problem for TWiki.org, or else I would not have recieved the
response I did. There are only so many hours in the day.
(Also there's a saying, if at first you don't succeed, try, try
again and then give up, you're probably making yourself look
a fool)
My "decision" will have no bearing at all on what TWiki.org
does or does not do. If you take a CVS alpha update at this
time however, your content will be busted*, since
PeterThoeny
has already taken that decision (this is checked in). If you want
that decision reversed, please read
structure of ondisk topic format
and decide for yourself whether this is a big enough issue for you
to disagree with
Peter on.
* If any of your users are using the facility you mentioned
on MetaDataRendering regarding embedding META:FIELD values
in a topic. (It's simple enough to check of course - write a perl one
liner to grep your topic text for lines
The "bug" described at the beginning of this topic does not exist
in Athens or Beijing release - it only exists in recent TWiki alphas.
Examples of content broken on TWiki.org:
Possibly unrelated oddness:
Who ever wanted to edit a METASEARCH inside a topic anyway...
--
MS - 07 Feb 2004
Michael, can you explain what you think is wrong in these examples? I looked at each of the indicated pages and could not see what you were referring to. Please explain... Thanks.
--
ThomasWeigert - 08 Feb 2004