Raw text:
Entity encoded text:
URL param encoded text:
--
PeterThoeny - 11 Nov 2003
This seems to give a problem:
a-&\\
--
ArthurClemens - 12 Nov 2003
Actually the backslash is only gives problems with rendering the page. But to be neat, you can add the backslash to the list of to be encoded characters, it should be %5C.
Diff:
--- /Library/WebServer/alphatwiki/lib/TWiki.pm.org.pm Thu Dec 4 09:16:10 2003
+++ /Library/WebServer/alphatwiki/lib/TWiki.pm Mon Dec 8 00:08:05 2003
@@ -1912,6 +1912,7 @@
$theStr =~ s/\+/\%2B/g;
$theStr =~ s/\</\%3C/g;
$theStr =~ s/\>/\%3E/g;
+ $theStr =~ s/\\/\%5C/g;
# Encode characters with 8th bit set (ASCII-derived charsets only)
$theStr =~ s/([\x7f-\xff])/'%' . unpack( "H*", $1 ) /ge;
--
ArthurClemens - 07 Dec 2003
Escape of backslash is in
TWikiAlphaRelease.
--
PeterThoeny - 14 Dec 2003
Now the method can be used for form data, to repair
BugWithQuotesInAttachmentComment.
--
ArthurClemens - 14 Dec 2003
Still a problem is the pipe (|). This is especially troublesome in tables, such as the attachment table (see also
AttachCommentBadCharacters). Is the url encoding procedure used globally, also for forms?
I am sure we can come up with a formal test to see if all illegal url characters AND all TWiki used characters are encoded properly, and with a list of occasions where encoding must be used.
--
ArthurClemens - 22 Feb 2004
Here ya' go:
http://www.blooberry.com/indexdot/html/topics/urlencoding.htm
--
TomKagan - 22 Feb 2004
Taking the list from this url, I get this table:
| Character name |
Char |
Encoded |
Comment |
| Reserved characters |
| dollar |
$ |
%24 |
|
| ampersand |
& |
%26 |
|
| plus |
+ |
%2b |
|
| comma |
, |
%2c |
|
| forward slash |
/ |
/ |
|
| colon |
: |
: |
|
| semi-colon |
; |
%3b |
|
| equals |
= |
%3d |
|
| question mark |
? |
%3f |
|
| at |
@ |
%40 |
|
| Unsafe characters |
| space |
|
%20 |
|
| quotation marks |
" |
%22 |
|
| less than |
(not shown) |
%3c |
|
| greater than |
> |
%3e |
|
| pound |
# |
%23 |
|
| percent |
% |
%25 |
|
| left curly brace |
{ |
%7b |
|
| right curly brace |
} |
%7d |
|
| pipe |
(not shown) |
%7c |
|
| backslash |
(not shown) |
%22%22 |
|
| caret |
^ |
%5e |
|
| tilde |
~ |
~ |
|
| left square bracket |
[ |
%5b |
|
| right square bracket |
] |
%5d |
|
| grave accent |
` |
%60 |
|
The table gets messed up because the pipe and the backslash. Another try:
pipe,
|
, %7c
backslash,
\
, %22%22
caret,
^
, %5e
Quite some characters don't get encoded...
Instead of escaping individual characters, one could use a generic algorithm :
use CGI;
my $q=CGI->new();
$text=$q->escape($text);
--
ArthurClemens - 22 Feb 2004
There are two types of encoding:
- Entity encoding, needed when supplying data to an input field
- Example:
<input type="text" name="test" value="%nop>URLPARAM{ "test" encode="entity" }%" />
- URL encoding, needed when supplying data to an URL parameter
- Example:
https://www.twiki.org/cgi-bin/view/Codev/UrlEncodeTesting/?test=%URLPARAM{ "test" encode="url" }%
I updated the example form on top to reflect that. I also put the form into a TWiki table to show that pipes in input fields work if entity encoded.
- Does that mean that attachment comments are not entity encoded? -- Main.AC - 22 Feb 2004
- They are not, that is a different case. TWiki does an internal encoding of some special chars that interfere with the meta data synatx. -- PeterThoeny - 24 Feb 2004
It is easy to encode additional chars in case missing. This is faster at run time than to instantiate a new
CGI (The TWiki code needs to run in
CGI and non-CGI environment)
BTW, as documented, use the verbatim tag only on a line by itself; use =...= syntax to mix monospaced text with proportional text on the same line.
--
PeterThoeny - 22 Feb 2004
Is this only discussion or did it involve a code change? And if so, what release did it affect?
(I'm trying to tidy up some of the misclassified topics.)
--
SamHasler - 12 Nov 2004