Tags: view all tags

Bug: Topic text is handled inconsistently with respect to protective encoding

I might have gotten somewhere confused, but it appears to me that there may be some old or partially completed stuff hanging around in the TWiki core.

Let me trace how text is passed around from view thorugh the various stages of the edit cycle:

view
- Text is taken from file
- Text is rendered and substituted for %TEXT% in template (for display)
  - Alternatively, in raw view mode, text is encoded to protect &, %, <, and > and tabs are converted to 3 spaces
    - (Except in DEVELOP, where the conversion to 3sp is not done)
edit
- Text is taken from file or optionally from URL parameter text
- Text is run through decodeSpecialChars if taken from URL parameter text (the "special characters" are &, <, >, the quote character, and a sequence of newlines)
- Text is encoded to protect &, <, and > and tabs are converted to 3 spaces
  - (Except in DEVELOP, where the conversion to 3sp is not done)
- Text is substituted for %TEXT% in template (for inclusion in textarea)
- Text will be passed in URL parameter text (a textarea input element) to save script
preview
- Text is taken from URL parameter text
- In Text, 3 spaces are converted to tabs
  - (Except in DEVELOP)
- Text is rendered and substituted for %TEXT% in template (for display)
- An other copy of text is run through encodeSpecialChars (for later saving)
- Text will be passed in URL parameter text (a hidden input element) to save script
save
- Text is taken from URL parameter text (may come from textarea or hidden)
- Text is run through decodeSpecialChars
- In Text, 3 spaces are converted to tabs
  - (Except in DEVELOP)
- Text is written to file

In above trace, one can observe the following inconsistencies:

When text is shown in raw form (in raw view mode and edit mode), sometimes the % character is protected, sometimes not.
When text is passed as URL parameter from edit to the save script, it is not encoded, but when it is passed from preview to the save script, it is encoded.

The difference in (1) should be resolvable easily. Either % needs to be protected or not, and then it should be done so everywhere. I suggest that we write a function that is called from everywhere.

The difference in (2) is trickier. As it is not applied when passing data in a URL parameter from preview to save, but not from edit to save, it appears that most of the encoding may not really be needed, otherwise the edited text would always be messed up.

However, according to reports in SectionalEditPlugin (reported by MarioFrasca), there is on problem with passing text in a URL parameter:

On Firefox (and possibly other browsers, it is claimed that all browsers other than IE are affected), all leading and trailing \n are chopped off when posted as the value of a hidden input element.

From these reports and what I see in the code I conclude that most of the protective encoding is not needed for (2), but that we need to protect leading \n for non-IE browsers. The problem reported by MarioFrasca does not show up for going through the preview script (as there the protection through %_N_% translation takes effect, albeit it might be overkill to replace all \n, rather than just the leading ones). However, it does show up when saving directly, as we obviously cannot ask the user to type %_N_% whenever \n is meant.

The consequence appears to be (I could not verify this as I don't have other browsers than IE, but the bug reports on SectionalEditPlugin are convincing) that whenever text is edited and saved, all leading and trailing newline disappear unless we protect them somehow. The current solution has a consistent loss of one single leading \n character in both the edit-save and edit-preview-save cycles. I don't quite know what the spec says about preservation of leading newlines, though.

Either way, with respect to (2) I suggest:

We need to clarify whether protection of &, <, >, the quote character, and a sequence of newlines is required when passing as URL parameters is required.
We need to figure out an uniform way to protect leading and trailing newline when passing the edited text to the save script.

Finally, I would like to understand the difference between the two kind of protections:

&, <, >, and the quote character (for parameter passing)
&, <, >, and % (for display in text area)

Test case
Environment
Impact and Available Solutions
Follow up

Test case

Environment

TWiki version:	TWikiRelease02Sep2004
TWiki plugins:	DefaultPlugin, EmptyPlugin, InterwikiPlugin
Server OS:
Web server:
Perl version:
Client OS:
Web Browser:

-- ThomasWeigert - 18 Mar 2005

Impact and Available Solutions

WhatDoesItAffect:
AffectedExtensions:
AffectedReleases:	TWikiRelease02Sep2004
HaveQuickFixFor:

Follow up

Discussions on leading \n bug moved to SomeBrowsersLoseInitialNewlineInTextArea. Let's keep this for discussion about the general problem of collapsing all the different encodings.

Thomas, note that there are other protective encodings for form fields. It is those encodings that I tried to collapse with the "standard" encodings.

-- CrawfordCurrie - 19 Mar 2005

Guide to related discussions:

Discussion of uniform treatment of protective encoding of parameters throughout TWiki code: InconsistentTreatmentOfTextEncoding
Discussion of how to deal with some (or all?) browsers loosing newlines when passing a textarea to the server:
Discussion on a related problem in SectionalEditPlugin: SectionalEditPluginDev

The most relevant text I could find in the HTML standard is in section 17.13.4 of the W3C recommendation. It appears that form input data is by default encoded as application/x-www-form-urlencoded and should, therefore, need no protective encoding except for the & (see also the discussion Ampersands in URI values).

There may be some issues hidden with respect to our platforms, when at times there are \r\n at the end of lines, but other times there are just \n. When an HTML page is submitted, the expectation is that (like all MIME transmissions), CRLF is used to separate lines (see the quoted standard). I wonder if the leading newline loss (see SomeBrowsersLoseInitialNewlineInTextArea) is victim of this?

Either way, it appears that much of the protective encoding applied is unnecessary.

-- ThomasWeigert - 19 Mar 2005

Moved some material to the more relevant SomeBrowsersLoseInitialNewlineInTextArea --TW

Actually, I think that the moved material is more relevant here. in first instance I myself had written it there because it was an answer to some discussion there, but, as it was generalizing the discussion, and this topic was intended about the more general issue, I removed it from there and pasted it here. well, let's decide in which topic to discuss the way we can address our consistent or inconsistent treatment of text encoding before we look for solutions. --MF
Please see how the related areas of this discussion have been factorized for easier treatment. --TW

Thomas, you write "form input data". do I get you right reading you so: the content of a TEXTAREA.? or are you speaking more in general also of what I call the value attribute of a type HIDDEN INPUT element? in the remainder of this contribution I assume you solely meant the first thing, here is the reason:

as far as I could understand, the P in PCDATA (content type) stands for "processed" so we don't need to care about that (exception made for the some browsers lose initial newline in text area problem). if we want to keep our text untouched, we have to process it ourselves before we put it into an attribute of type CDATA. I hope that this what we are discussing here...

as I see it now, we have two choices:

we assume the user agent to adhere to the w3 description in the most free way. (worst case of standard behaviour)
we make a survey of current user agents and describe the common grounds.

at the moment twiki does the first thing (protecting [\ \t\n] whitespace), while the SectionalEditPlugin and its siblings, that is where the discussion originated, do not do either thing.

whichever the choice, as we receive the text value in the save script, we can:

make save aware of the type of data received (PCDATA or CDATA).
take additional initialization care so that the CDATA processed by us is processed back to PCDATA so that the save script does not see the difference.
apply the transformation in any case, making sure that t*t = t (applying more than once has no further effect)

at the moment twiki does the third thing.

my earlier change proposals were based on insufficient knowledge of the w3 reccomandations, so please disregard them. sorry and thanks!

-- MarioFrasca - 20 Mar 2005

Excuse me for stating the obvious, I need to formalise a bit to get my head round this. Looking from the server side I need all data exchanges to be content preserving.

If I pass $text in a form field to the browser, and the browser posts back without any intermediate changes, I must be able to recover $text exactly as originally written.
This applies to INPUT type="text" (single line edit), INPUT type="hidden" (buffer) and TEXTAREA (multi-line edit) elements.

The attached CGI script allows you to explore the exchange when there is no encoding applied, for these three type. Install it in your twiki bin directory, it has no dependencies. The encoding in the "Recieved" strings is just # and then the ASCII decimal code for non-\w characters.

A small amount of experimentation reveals how brain-dead the browsers really are. It is notable that when you pass a linefeed (
) as the first character, then it gets treated just like a \n (at least in Firefox it does). The encoding/decoding of other characters (e.g. ©) seems to be ok, though I am left wondering what would happen if the browser locale was different to the server locale.

Try these:

enter 
A
 in the textfield
URL-escapang things on the URL line, and see the effect on the received text

-- CrawfordCurrie - 20 Mar 2005

The discussion in this topic applies to all situations where input is passed from client to server via form fields or URL parameters (really the same thing). However as noted, there are two situations in the standard:

PCDATA (passed by textarea fields) which supports multiline text
CDATA (passed by everything else) which does not guarantee to preserve linefeed or newline characters.

A wrinkle in this discussion is that apparently some (or all?) browsers drop the leading newline in textarea, albeit this seems not licensed by the standard.

-- ThomasWeigert - 20 Mar 2005

Crawford, Thomas, let me put that part here again: since the leading \n problem is closed, we can let that topic alone...

From http://www.w3.org/TR/REC-html40/interact/forms.html I read:

In general, a control's "initial value" may be specified with the control element's value attribute. However, the initial value of a TEXTAREA element is given by its contents,
17.4 the INPUT element
<!ELEMENT INPUT - O EMPTY              -- form control -->
<!ATTLIST INPUT
...
  value       CDATA          #IMPLIED  -- Specify for radio buttons and checkboxes --
...
17.7 the TEXTAREA element
<!ELEMENT TEXTAREA - - (#PCDATA)       -- multi-line text field -->

so in our save script we are receiving the text from a PCDATA or a CDATA, not knowing which was the case...

from the same source (http://www.w3.org/TR/REC-html40/types.html#type-cdata) I read:

User agents may ignore leading and trailing white space in CDATA attribute values (e.g., " myval " may be interpreted as "myval"). Authors should not declare attribute values with leading or trailing white space.
...
CDATA is a sequence of characters from the document character set and may include character entities. User agents should interpret attribute values as follows:

Replace character entities with characters,

Ignore line feeds,

Replace each carriage return or tab with a single space.

I think that it is time to refactor this page again, respecting the discussion guide above. who dares?

-- MarioFrasca - 20 Mar 2005

Mario, the save script receives data as PCDATA when invoked from edit, but as CDATA when invoked from preview.

Therefore, care has to be taken that preview encodes the text to avoid any newline from being lost. Edit need not to worry, except for the leading newline apparently being lost by some (or all?) browsers.

Note that the text you quote is about any whitespace characters, not just leading or trailing newlines, as CDATA should not even contain newlines. However, as we have seen, browsers do pass some newlines along, but not always the leading ones.

-- ThomasWeigert - 20 Mar 2005

Thomas, it is good to see that we agree on the whole line. what you are stating here summarizes things once again. in my opinion any of us may go on and rewrite the description of the problem...

please, don't use undefined terms. I did not manage to find any definition of what you call "url-parameter"... can we stay on PCDATA, CDATA, input element, textarea, value, type, hidden... thanks. goodnight,

-- MarioFrasca - 20 Mar 2005

Crawford, here is the result of my testing using your nice bin/exchange script:

Browser	Input	Received
Browser	Input	Text	Textarea	Hidden
Firefox	abc#13;#10;#13;#10;def#13;#10;#13;#10;	abc	abc#13;#10;#13;#10;def#13;#10;#13;#10;	abc#13;#10;#13;#10;def
IE	abc#13;#10;#13;#10;def#13;#10;#13;#10;	abcdef	abc#13;#10;#13;#10;def#13;#10;#13;#10;	#13;#10;abc#13;#10;#13;#10;def#13;#10;#13;#10;

This confirms and clarifies what we have been seeing:

While IE passes in hidden fields exactly what it was handed, Firefox drops leading and trailing newlines
Both browsers drp the leading newlines in textarea, but keep the rest intact
Firefox only keeps the first legal string of characters in text, while IE filters out illegal characters.

This, for example, implies that IE will not loose any leading newline inserted in the edit topic textarea, but Firefox will.

I also ran a test with blanks and there both browsers are consistent and pass the full character string through unaltered.

From this it is not obvious what these browsers actually implement, as they interpret the W3C spec quoted earlier differently for blank whitespace vs. newline whitespace.

The consequences for us are clear: must encode linefeed when passing through hidden elements. We probably should make it spec that the textarea does not have leading newlines, otherwise we will end up with having to use your javascrip trick or something similar.

-- ThomasWeigert - 21 Mar 2005

Here is the same experiment for non-alpha characters:

Browser	Input	Received
Browser	Input	Text	Textarea	Hidden
Firefox	abc~!@#$%^&*()_+>?/.,-=:"<	abc~!@#$%^&*()_+>?/.,-=:"<	abc~!@#$%^&*()_+>?/.,-=:"<	abc~!@#$%^&*()_+>?/.,-=:"<
IE	abc~!@#$%^&*()_+>?/.,-=:"<	abc~!@#$%^&*()_+>?/.,-=:"<	abc~!@#$%^&*()_+>?/.,-=:"<	abc~!@#$%^&*()_+>?/.,-=:"<

This seems to indicate that we are encoding much too much. The only problem I had was with < which messed up the whole output afterwards if followed by certaom characters, such as ? or /.

My conclusion is that we should encode (in TWiki::Render::encodeSpecialChars) only newlines and <.

-- ThomasWeigert - 21 Mar 2005

Thanks for the excellent analysis. Before we make any code changes, I want to have testcases in place. We need to eliminate any chance of this re-occurring in the future. I can test firefox, konqueror and mozilla, and IE on checked-in code. Is this a sufficient set, or do we need to add opera and/or safari?

-- CrawfordCurrie - 21 Mar 2005

Crawford, Thomas, my idea is that in TWiki::Render::encodeSpecialChars we could add a parameter which specifies if we are encoding for a PCDATA or for a CDATA. in the second case we should encode also newlines and tabs. otherwise only, as also Thomas says, <.

also: if you want to eliminate any chance that this reoccurs in the future, we cannot base the decision on the current browsers alone but also on the w3c reccomandations.

one more thought: may it be so that the <textarea> is parsed including the first following \n, if present? this would explain the behaviour... just a thought, could be tested with a xml parser... maybe I'll check it later.

cheers,
-- MarioFrasca - 21 Mar 2005

Reasonable, except that there is no way to know what you are encoding for. In the places that method is called, all it knows is that it is about to replace %TEXT% with the value. The only solution is to replace %TEXT% with %CDATA_TEXT% and %PCDATA_TEXT% as appropriate in the templates. Clunky, but nothing better springs to mind. I'd have preferred %CDATA{"!%TEXT%"}% but that won't work as the %TEXT% substitution is done after common tags expansion is finished. (why? good question. make a mental note to research that)

BTW I also prefer a separate function in this case; it is better practice to avoid an "if" statement. As they say, "every IF adds a BUT".

-- CrawfordCurrie - 21 Mar 2005

There is nothing we can do about PCDATA, as we do not have a way of intercepting the passing of that data other than by Crawford's javascript trick, which I don't think is worth it.

For CDATA, it appears all we need to do is encode newlines and <, I think.

There is no need to differentiate different text types in the templates.

Oh, and by the way, we should deploy the protective encoding systematically where it is needed (i.e., whereever we pass text containing these characters in URL parameters).

-- ThomasWeigert - 21 Mar 2005

Crawford, why do you say that there is no way for us to know which encoding we should produce? when we are filling in a CDATA field, we know it is a CDATA field, don't we? I can't think of an example where we do not. I'm thinking of edit putting the text inside the textarea. edit knows it is a PCDATA field, so it knows it should only protect a the < character (or whatever, anyway no \n nor tabs). similarly, preview knows it is putting the text inside the value of a hidden input element, that is, it is passing CDATA so it should protect also other characters at risk (the \n and \t)... you surely have reasons for stating what you state, but I don't understand...

-- MarioFrasca - 21 Mar 2005

Mario, what you say above makes only partial sense in the context of twiki (or other web tools for that matter).

The edit script is generating the edit topic, I cannot do anything about the encoding of the text area (other than using the javascript trick Crawford talked about earlier). The contents of the textarea (or any other input field for that manner) is passed by the browser to the server (using the standard encoding rules defined for forms). We cannot intercept that.

The only control we have is when dealing with hidden fields, as these are generated by the scripts (and, of course, the initial values of the other fields, albeit there the concern is that it has to display right for the user).

Thus the focus has to be on

encoding hidden fields properly, and
not relying on browsers when assembling the resultant final text into what is saved or displayed.

Mario, it would be fruitful for this discussion if you familiarized yourself with the internal working of twiki. In particular, how data is passed between browsers, scripts, and servers throughout the view-edit-(preview)-save cycle.

-- ThomasWeigert - 21 Mar 2005

<rant mode>
Thomas, in an earlier contribution signed by me which you factored away I wrote something on the line: we are here to make TWiki a better tool and the internet a better place. do we agree on this? so please lower the tone when you're replying to my posts or avoid replying altogether. thanks.
</rant mode>

now to the point:

When we are talking about what the server passes to the browser, it knows what kind of data it is encoding. At least this is my understanding and this is the reason for asking Crawford why he states what he states, so Crawford please if you can explain what you meant, I'll read you with interest. as things are now, I too don't see the need for distinguishing PCDATA_TEXT and CDATA_TEXT in the templates.

More interestingly, when we're talking about what the client passes to the server, in fact there is a problem in the save script, where it does not know if what it receiving has been passed as the content of a textarea or as the value of a hidden input. (see my contribution in this very topic, version 1.4) so actually my remark to Crawford is that it is on the other side that I see problems. I've been experimenting with a very small modification in the only place where TWiki uses new CGI... save expects a text PCDATA, which is what it does receive as the client postes the form built by the edit script. on the other hand, save can be called after the client postes the form built by the preview script, which holds a CDATA text. if we call the parameters differently, say 'hid_text' and 'text', a very limited change in UI.pm does the trick:

     if( $ENV{'DOCUMENT_ROOT'} ) {
         # script is called by browser
         $query = new CGI;
+        if ($query->param( 'hid_text') ) {
+            $query->param( 'text' => &recoverFromCDATA($query->param('hid_text') );
+        }
         # SMELL: The Microsoft Internet Information Server is broken with
         # respect to additional path information. If you use the Perl DLL
         # library, the IIS server will attempt to execute the additional

you could even think about looping over the whole post and recovering from CDATA all posted data whose name matches /^hid_/... in the remainder of TWiki we would only need to deal with the PCDATA encoding.

well, this is it for today, keep it cool,

-- MarioFrasca - 21 Mar 2005

Mario, what this topic addresses is the problem that in many different places there are encodings applied to parameters passed from the browser to the server which are not consistent. See for example, TWiki::Render::encodeSpecialChars, but then there are other places were these encodings are applied manually. E.g., the SectionalEditPlugin code which I inherited, had a function which applied a totally different encoding, missing the protection of newlines, but protecting many irrelevant things). In this topic we have discussed how we can make this encoding more consistent, and also, what it really should be.

Your concern as expressed by above suggestion appears to be to avoid applying a decoding unnecessarily. This is an appropriate performance concern, but probably has not much impact on overall TWiki performance. But you are right in above suggestion, if we were to differentiate whether data came from the edit script (via savemulti) vs. preview script, we could avoid applying the decoding step in the former situation, as the text is not encoded when it comes from the textarea input (is a PCDATA).

The other issue that we have been discussion, which belongs in the other topic, SomeBrowsersLoseInitialNewlineInTextArea, is that when data is being passed from the textarea, some browsers drop the leading newline. That behavior is what Crawford and I were concerned about when we were saying that you cannot protect that from happening other than by resorting to javascript trickery.

-- ThomasWeigert - 21 Mar 2005

I'd still like to get to the point where we decide

What should the appropriate encoding be that is applied to hidden text?
Can we replace all occurrences of this encoding by a standard function provided in TWiki?
Should we differentiate in the save script were the data is coming from to avoid one unneeded decoding?

-- ThomasWeigert - 21 Mar 2005

We require hidden and textarea to be content-preserving
Additional encodings can be applied to hidden (it's hidden, after all ) but not textarea.
We can live with textfield (input type="text") stripping leaving/trailing [\r?\n]

I can't see a solution to textarea other than JavaScript. The encoding used with hidden can be any reversible encoding. For example, s/([^ -~]|%)/'%'.sprintf('%02x',ord($1))/ge; (reverse s/%(\d\d)/chr($1)/ge;

The current encodings are a mess, and need sorting out (some reasons: urlEncode encodes \n as <br />, for goodness sakes! encodeSpecialCharacters collapses \r's either side of \n's! entityEncode only handles a subset of 7-bit characters!)

-- CrawfordCurrie - 22 Mar 2005

Crawford, hold you: 4 questions in a row! what was the problem with textarea? you mean the loss of the first \n? SomeBrowsersLoseInitialNewlineInTextArea, hasn't that been sorted out? or did I miss anything?

-- MarioFrasca - 22 Mar 2005

It appears that the initial \n in textarea is sorted out; I thought we still had different behaviour with IE, but that appears from your test results not to be the case.

I've been looking at the encode/decode functions, and I think we need these:

entityEncode - encode to HTML entities - entityDecode
urlEncode - encode to %nn e.g. %10 - urlDecode

However the existing implementations are crap, and need generalising.

The current uses of encodeSpecialChars can be replaced with:

encodeCDATA($text,$hidden) encode for CDATA decodeCDATA($text)

The $hidden flags when the data is being encoded in a hidden field. In this case, it will add a unique byte sequence e.g. 0xDEC0DE to the start and end of the value and then urlEncode it. If decodeCDATA sees this byte sequence, it will apply the decoding. This technique needs to be applied evenly to $text and form values - anything where a hidden may be used.

There is an outstanding question about the encoding used to protect characters in field data stored in topics. This currently uses a subset of encodeSpecialChars. I think we should increment the data format number and change this to entity encoding. What do you think? (the major impact of this would be that older versions of TWiki would not correctly read filed values from data generated by this version).

-- CrawfordCurrie - 22 Mar 2005

I think that I have to think about it... sounds all quite reasonable, but I don't yet see where we are right now. ...must experiment with the data stored in the topics, I'll be back.

about encoding for hidden, it really is encoding to CDATA, no?

-- MarioFrasca - 22 Mar 2005

Crawford, regarding your suggestions/question above:

I do not think we should encode text area as this requires the use of Javascript. In such a central area as the editing box we cannot rely on Javascript, I believe. That means, we need to live with the leading \n being dropped.
Thus the only encoding of URL parameters needs to be for hidden fields. These need to encode all characters that cannot be in CDATA and also characters that may cause a mess otherwise, based on observation. I believe these are linefeeds, newlines, and <.
- We need to agree on the algorithm here, e.g., whether \n and \r are combined, etc.
- The "hidden" flag above is unnecessary, as no other input fields should be encoded (there is no way of sensibly doing that).
- The adding of byte sequences to the front or back seems unnecessary for hidden fields. You need to encode all the problem characters, even if observation teaches us that the browsers we looked at only appear to mess with the leading and trailing newlines. I cannot see anything in the W3C spec that would guarantee us that such remains that way.

Thus, with respect to parameters, it appears to me that the only thing to do is to find all the places where we apply such encoding and use a standard function.

Secondly, we need to do the same analysis for encoding to protect rendered text, which is a second can of worms.

-- ThomasWeigert - 22 Mar 2005

Ah, were it that simple. When you are editing a topic, and hit "change form", the $text is encoded in a hidden parameter for passing to the "change form" oops script. This text value is subsequently passed back to edit when you select the new form type. Without the hidden encoding, it would appear that the content of the editing textarea changed in mid flow as it lost it's leading newline. If we take the stance the leading/trailing newlines are fair game everywhere, then this is a moot point. However this feels like dodging the issue.

As for other characters in the hidden fields, entity encoding should suffice.

The entityEncode function is used for protecting rendered text in raw mode. It needs to be used for verbatim as well. I think those are the only two places that matter - unless someone else knows differently?

-- CrawfordCurrie - 22 Mar 2005

when you are editing a topic and don't see the "change form", you ask someone to help. the same you do when a native English speaker uses slang. or when a tuesday feels like monday.

-- MarioFrasca - 22 Mar 2005

Sorry, "Replace form" (or "Add form" if the topic doesn't have a form).

OK, despite an influenza-fuelled haze, I have performed the following experiment:

Replaced all inline HTML with calls to CGI (e.g. CGI::start_form)
Removed all encoding of CDATA and PCDATA (to let CGI deal with it). This involved moving the textarea out of the edit.tmpl into code, so I could leverage the CGI encoding.
Deleted methods encodeSpecialChars and decodeSpecialChars

I've only tried it on firefox so far, but it works perfectly (i.e. it works the same as it did before)

Interestingly enough, the move to using CGI for HTML composition has radically improved readability of the code smile

-- CrawfordCurrie - 22 Mar 2005

this sounds good... (?) why don't you put it somewhere we can test with other browsers? if your influenza allows you... you know that my system can be at disposition, if necessary. how does CGI-encoded CDATA look like? I'm curious. all right, more sleepy than curious, but still curious.

... if it is the CGI producing the textarea, does it include the extra leading \n?

-- MarioFrasca - 22 Mar 2005

Crawford, you have proposed two different things, this last one, letting CGI do all the work sounds interesting...

the previous one: two different functions for encoding for CDATA and PCDATA, I would say that you apply first the one (toPCDATA) and if the data has to go into a value attribute, you encode it with toCDATA, in fact, no, there is no need for that second parameter. well, unless you want to have one single function and keep a second parameter to specify if you are encoding for CDATA or PCDATA. I'm still just talking about the encoding at server side. on client side, I don't see the problem. the browser passes the any CDATA as it has received it, and the server knows how to redecode it. after the data has been reduced to PCDATA, it can be redecoded... but enough about this, I'm a lot more interested about the CGI doing all the work... (lazyness, what a good property)

-- MarioFrasca - 23 Mar 2005

CGI is now doing all the work (on DevelopBranch). Please watch it like a hungry owl, ready to swoop down on anything that's wrong. There were a number of mysterious and unexplained encoding/decoding steps that I never fully understood and have now removed; when we see it go wrong, we can recode them and this time explain them with a comment (you can't make an omelette without breaking eggs) wink

-- CrawfordCurrie - 24 Mar 2005

ChangeProposalForm
TopicClassification	BugReport
TopicSummary	Topic text is handled inconsistently with respect to protective encoding
CurrentState	UnderConstruction
OutstandingIssues
RelatedTopics
InterestedParties
ProposedFor
TWikiContributors

Attachments

Topic attachments
I	Attachment	History	Action	Size	Date	Who	Comment
ext	exchange	r2 r1	manage	2.1 K	2005-03-20 - 13:41	CrawfordCurrie

Topic revision: r41 - 2005-03-24 - CrawfordCurrie

Account
- Log In
- Register User

Edit
Attach

Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.