Tags:
archive_me1Add my vote for this tag create new tag
, view all tags

Bug: Non-Roman Alphabet characters in form field names removed

See Support.InternationalCharactersInFormFields for details - this affects Greek, Cyrillic and East Asian languages where Roman alphabetic characters are rarely used. It would also corrupt any use of accented characters in field names in European languages.

Test case

  1. Create fields called 'יי' and 'חיי' using TWikiForms and enter two different values
  2. Field names and values do not appear

See Support.InternationalCharactersInFormFields for real world example.

Environment

Any version of TWiki up to 12 Dec 2004.

Fix record

Fixed in SVN DEVELOP.

Note that there really needs to be a setting for 'this is a primarily non-alphabetic locale', so that there's a way of only removing non-alpha characters when using an alphabetic language (including Greek and Cyrillic but not including Japanese). When this setting is off, any character could be used in form fields. See the TODO in SVNget:lib/TWiki/Form.pm.

-- RichardDonkin - 12 Dec 2004

Richard, simply commenting out $text =~ s/[^A-Za-z0-9_\.]//go; is likely to cause incompatibilities and could break TWikiApplications. TWiki stores the field name in two formats, as title and name, respectively. Commenting out the filter results in title and name to be the same. Example field with space:

%META:FIELD{name="TopicClassification" title="Topic Classification" value=""}%

This should be reverted to the way it was before (not supporting I18N), or done properly with a filter based on the locale.

-- PeterThoeny - 13 Dec 2004

I'll have a look at this - what would need to happen is dependent on the type of language involved:

  1. Alphabetic languages including Greek and Cyrillic - allow only alphabetic characters, same behaviour as now but works for Greek and Cyrillic as well as Roman languages
  2. All other languages (e.g. Japanese) - these typically don't have a concept of WikiWord so it's not a problem that the field name is the same as the title IMO (e.g. could be two Japanese words/characters).

Until we have full Unicode support for sites using Perl 5.8, we can't do much about the second case, so this would just have name = title, which I think is OK.

I'm not sure about why it's useful to set the title (human readable version) and name (cleaned up version) to different values when using non-alphabetic languages - after all, the name is just intended to look up the field name in the form definition (ref: TWikiMetaData spec). If you are using Japanese, you can't create WikiWords anyway and there won't be any spaces in either the name or the title.

As we move to support East Asian languages such as Japanese and Chinese better, through Unicode, we won't be able to use locales anyway, and matching based on valid 'letter' characters becomes language dependent - e.g. to know if a form field name is valid Japanese characters you would have to have it marked as Japanese (or perhaps mark the whole page as Japanese). This language marking is fairly painful so I haven't seen many applications do it, though it is important for displaying Japanese and Chinese characters properly - it may be enough to mark the whole site as one language (already supported through the %LANG% variable), but allow other language characters anyway, except for Japanese strings in a Chinese site (say), which would need to be properly marked as Japanese.

The summary is that matching on locale is not as easy as it looks when using Unicode, so as far as possible we should do script/language independent matching - perhaps using [\p{Letter}\p{Mark}] to match on letters and combining characters (accents and so on, where written as separate Unicode characters) - this should cover all scripts' concept of letters, including East Asian languages.

-- RichardDonkin - 13 Dec 2004

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatpatch I18N-formfieldfix.patch r1 manage 1.7 K 2004-12-12 - 13:24 RichardDonkin Patch against SVN DEVELOP - may work on other releases too
Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r6 - 2008-09-02 - TWikiJanitor
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.