While I was introducing someone I work with to TWiki, he wondered if the enumerated lists could use things other than numbers (i.e., letters and roman numerals).
HTML 4.01 has a
type modifier for the
<OL> tag which can accomplish this, although
W3C warns that it this is deprecated
and that it should be done with style sheets instead. TWiki doesn't appear to be set up to render from style sheets yet, but it would be a good thing to keep in mind for the future if things become driven by style sheets.
This request even includes the change. In
TWiki.pm, add the red line at around line 2611:
# Lists and paragraphs
s/^\s*$/<p \/>/o && ( $isList = 0 );
m/^(\S+?)/o && ( $isList = 0 );
s/^(\t+)(\S+?):\s/<dt> $2<\/dt><dd> /o && ( $result .= &emitList( "dl", "dd", length $1 ) );
s/^(\t+)\* /<li> /o && ( $result .= &emitList( "ul", "li", length $1 ) );
s/^(\t+)\d+\.? ?/<li> /o && ( $result .= &emitList( "ol", "li", length $1 ) );
s/^(\t+)([AaIi])\s?/<li> /o && ( $result .= &emitList( qq!ol type="$2"!, "li", length $1 ) );
if( ! $isList ) {
$result .= &emitList( "", "", 0 );
$isList = 0;
}
With this additional line in place, there are now five possible types of enumerated lists:
| Character |
Result |
Sample |
| 1 |
Arabic Numerals |
1, 2, 3... |
| A |
Uppercase Letters |
A, B, C... |
| a |
Lowercase Letters |
a, b, c... |
| I |
Uppercase Roman Numerals |
I, II, III, IV... |
| i |
Lowrcase Roman Numerals |
i, ii, iii, iv... |
If this would be useful, feel free to incorporate it.
--
MarkFeit - 20 Nov 2003
Wow, what a nice simple one line upgrade. Thanks Mark!
I wouldn't be too concerned about
start being deprecated in the html spec. As far as I know browser support for controlling the numbering options of lists via stylesheets is still quite limited outside Mozilla/Safari/Konqueror so it will be supported for a long time to come.
--
MattWilkie - 21 Nov 2003
Already in O'Wiki alpha & TWiki alphas :
s/^(\t+)([1AaIi]\.|\d+\.?) ?/<li> /o && ( $result .= &emitList( "ol", "li", length $1, $2 ) );
Anyone interested in installing a fully working version simply & easily can grab the
O'Wiki alpha release
. People interested in sticking with a standard code base can grab my script for building a usable install from a standard release & alpha release from
TWiki alpha release.
Another problem caused by slow release cycles - reimplementation of features.
--
MS - 22 Nov 2003
One comment on the implementation in the alpha: The changes to
emitList chew up some extra CPU cycles evaluating regexps and with the
if...else that generates the
<OL> tag. This isn't a real big deal in general but could generate extra load for topics having very long lists hosted on heavily-loaded servers. My version of the patch saves some of those cycles and also doesn't require the mods to
emitList. (I cut my teeth when CPU cycles and memory weren't cheap, so I'm kind of anal-retentive about things like that. Now that I'm using TWiki at work, I have an excuse to spend a little bit of time making improvements to the code.)
Side note: I did some experimentation with catching and renumbering lists that were already pre-numbered with roman numerals or letter sequences (aa, ab, ac, ad, etc.) but found that it opened some really ugly cans of worms with how some things in the markup are treated. I have a rough idea how to make it work neatly, and I'll tuck it away to work on when I have some time.
Mark,
Unfortunately, your code is (benchmarked) slower than the code in the alpha release. Whilst you save cycles in the emitList function you lose them in the pattern match. Rather than have a single pattern match across all of the text you have 2 - which means you're effectively parsing the whole text a second time. (Integrating the \d+ in breaks backwards data compatibility - as I suspect you found.)
Test data source: cd $TWIKIHOME/data/TWiki06x01 ; cat *txt |grep -v ^.META > ../testdata.txt - size is ~711K
Benchmark iterations: 5000
Body of Version 1
$_=$source;
# Version 1
s/^(\t+)([1AaIi]\.|\d+\.?) ?/
/o && ( $result .= &emitListStandard( "ol", "li", length $1, $2 ) );
Body of Version 2
$_=$source;
# Version 2
s/^(\t+)(\d+\.?)\s?/
/o && ( $result .= &emitListNew( "ol", "li", length $1) );
s/^(\t+)([1AaIi]|\d+)\.\s?/ /o && ( $result .= &emitListNew( qq!ol type="$2"!, "li", length $1 ) );
NB. This is a slight variation of your patch that includes backwards compatibility with existing functionality.
- emitListStandard is the standard function
- emitListNew is the same as the standard function, but with the olType conditional commented out.
Results:
- version 1 is 5ms per iteration
- version 2 is 5.2 ms per iteration
Upping the iters to 20000 gives:
- version 1 : 5.1 ms
- version 2 : 5.2 ms
Not a very scientific benchmark (need more mixed data for it to be truly fair, but will be fairly representative of the content at people's locations), but matches intuition that you're doing 2 pattern matches across the entire code instead of a simple true/false check. When the difference is of the order of 100 microseconds (and the wrong way at that) I'd be inclined to leave the code as is myself if the only benefit is performance
There's other areas of the code that need attention first. (That said, if you get a version with one match that's backwards compatible, that'd be great since I'm certain these figures would then be reversed
-- MS - 22 Nov 2003
Hm, you're right, although I'd wager that some of those microseconds could be recouped with some tidying up of emitList. Now you've got me interested in making this thing good and fast, so I'm going to have to scrape up some time to work on it.
-- MarkFeit - 22 Nov 2003
-- MarkFeit - 22 Nov 2003
This looks like a sensible enhancement request.
Question: Do we need to worry about compatibility? For example, how likely is this:
I write this paragraph with three leading spaces. The first
line will be interpreted as a bullet with Roman letter I if we
implement this new feature.
Mark, if you find time to write a solid patch we would appreciate it here on TWiki.org.
I'm on the fence about this because of the aforementioned cans of worms. My inclination is to let paragraphs like that one be numbered. The definition of the markup pretty much implies that lines beginning with an even multiple of three spaces will get special treatment, and this change would expand on that treatment. One thing I experimented with was matches that would, like the existing code does with Arabic numerals, handle lists that were pre-enumerated with letters or Roman numerals and properly renumber them. For example:
Markup... Would render as...
i Rome i. Rome
xi Paris ii. Paris
mmi Teaneck iii. Teaneck
vii London iv. London
aa Red a. Red
ad Green b. Green
zap Blue c. Blue
boing Plaid d. Plaid
Because the letters that make up Roman numerals are a subset of the letters that can make up lettered enumerations, you have to match for Romans before letters. Under some circumstances, that could cause unexpected behavior when someone pastes in part of a list that happens to begin with a lettered enumeration that looks like Roman numerals. For example:
Markup... Would render as...
b. Peach a. Peach
c. Pineapple b. Pineapple
d. Grape c. Grape
e. Starfruit d. Starfruit
...but...
c. Pineapple i. Pineapple
d. Grape ii. Grape
e. Starfruit iii. Starfruit
f. Guava iv. Guava
If we were able to analyze the list as a whole, it would be trivial to accurately select which way to enumerate it, but that's not somethng I want to take on. Or we could just simplify it and handle 1., 2., 3. as is currently done and only use the alternate numbering schemes when the line matches /^(\t+)[1AaIi]\s/. The more complete version would be nice, but the the simplified one is ...well... simpler.
Any thoughts?
-- MarkFeit - 22 Nov 2003
-- PeterThoeny - 22 Nov 2003