Tags:
create new tag
view all tags
Please, do not use XHTML in TWIKI

As I see people trying to convert progressively TWiki to xhtml, I would like to voice against it. Do not forget that TWiki pages include HTML typed by hand by users, and it is impossible to force them to use syntaxically-correct XHTML instead:

  1. Most of these users do not know what is HTML and just cut n paste till it works,
    and it will be impossible to make them use proper XHTML
  2. HTML/XML Gurus (like me :-)) find it bothersome to have to close their <li> tags, and to put quotes around attribute values.
HTML is a well-defined language. It has a specification, and thus can be validated (with the W3C HTML validator for instance). Mixing XHTML and HTML gives you something which is illegal in both worlds.

What will you put in the DOCTYPE?

So, I agree that wiki pages purely generated by programs could be XHTML.

But any pages with hand-typed (X)HTML data should be 100% HTML.

-- ColasNahaboo - 16 Feb 2002

We made the decision some ago to generate XHTML. Looking forward this isthe right thing to do. TWiki does (should?) comply with XHTML, and it is not recommended to use HTML where you can use TWikiShorthands.

The example you made for demonstration purposes breaks because of the <br> tag (instead of <br />). Granted, there are many users who are not familiar with XHTML, but so what?

A proper solution would be to add a switch that disallow any HTML/XHTML markup, however, possibly exempting the TWikiAdminGroup.

-- PeterThoeny - 16 Feb 2002

I think XHTML is the way forward - as XML becomes more common, it's useful to have a version of HTML that can be more easily validated. If people are not familiar with XHTML, maybe we should provide links to reference pages and to validators (or put one in a plugin for intranet use).

-- RichardDonkin - 16 Feb 2002

This all may be right and fine, but it still doesn't invalidate Colas' comments. The ability to enter HTML if standard markup isn't enough has always been a great advantage of TWiki; a forced move towards XTHML is going to make TWiki less useful for those who use this facility.

I'm not sure what the Right Thing is in this situation.

Of course, we could disable HTML entirely (this would also disable some dirty tricks that you can play with HTML, such as <div> tags that overlay the TWiki header or footer area).

-- JoachimDurchholz - 17 Feb 2002

I'm not aware of any 'forced move' to XHTML - the content of any TWiki site can be TWiki shorthand, HTML and/or XHTML, or even XML, depending entirely on what people type (and perhaps on local policies). The templates used by TWiki, and any dynamic code, should be XHTML, so that sites that want to use XHTML aren't held back by TWiki-generated HTML. Sites that want to use plain HTML can just ignore the fact that the TWiki-generated bits are XHTML, I think.

I don't think XHTML is any less readable than plain HTML - the use of quotation marks and '/' terminations isn't really a big difference IMO. Indentation and layout of the HTML source is much more of an issue.

Perhaps one solution is to make the DOCTYPE configurable for XHTML or HTML at the site level, through TWikiPreferences.

-- RichardDonkin - 17 Feb 2002

Actually, making the DOCTYPE configurable could be a solution. My point is that you should not worry too much about XHTML. It is a quick temporary hack, not really XML (not Xlinks...) The real evolution language will be actually XML + CSS2, allowing us to use any tag we want.

On the use of HTML in TWiki, I think one of the neat features of TWiki is that you can use TWiki syntax in ascii emails to other people in a natural way. It helps a lot non-technical people get to grips with the system when they can consider it as some email system. Getting rid of HTML will force to design more and more ad-hoc syntax for this, which will may TWiki notation less natural. So, keeping a way to add the occasional html is nice, although it brings these validation problems (and makes smart diff represenations difficult to implement).

-- ColasNahaboo - 17 Feb 2002

There's a good article on why it's worth writing validated HTML (whether 3.2, 4.0, XHTML 1.1, etc) at http://developer.netscape.com/evangelism/docs/articles/validate/. It also talks about the implications of strict vs loose DOCTYPEs, which is well worth thinking about - if browsers start producing XML-style validation errors on TWiki pages, which is quite likely with hand-entered HTML, it will be a pain. So perhaps we should specify the DOCTYPE as HTML 4.0 (loose) by default, and make it configurable - sites that can guarantee everything within the topics is XHTML can then change the DOCTYPE to suit, but by default browsers will be able to display any TWiki page.

The IETF has a saying - "Be strict in what you send, and liberal in what you receive", i.e. generate correct output when sending in a given protocol, but accept somewhat incorrect input when receiving. This applies here, with the wrinkle that the DOCTYPE controls how liberal the browser will be (in Mac IE5, IE6 and probably Mozilla) - so we should try to generate correct XHTML but not force the browser to validate it strictly.

Speaking of using TWiki syntax in emails - I wonder how hard it would be to do OutlookAutoLinking, i.e. render TWiki words in Outlook emails as links? I suspect this is non-trivial but it would be very useful.

-- RichardDonkin - 18 Feb 2002

Up with XHTML, down with HTML - or at least omitted end tags! It is sometimes correct in HTML (which is based on SGML, which allows omitted end tags to be declared in the DTD), but my experience is that it doesn't co-exist well with stylesheets (see BadHTML).

If a concern is that people are going to enter bad xhtml, then it should be possible to write additions to twiki that correct this, particularly in the example <br> vs <br /> which is an easy fix.

-- DavidLeBlanc - 15 Apr 2002

Correcting HTML that is non-XHTML conformant sounds like rather a large project, considering the range of errors that are possible (e.g. omitting a closing P tag, omitting double quotes, etc.) - so TWiki isn't going to get into this, no matter how useful XHTML is... (and it is).

It's much easier to set a more relaxed DTD spec as the default - that way TWiki works out of the box without browser errors due to DTD pickiness, but those who want to use XHTML only are free to set the DTD to XHTML and to train their users to write only valid XHTML.

-- RichardDonkin - 17 Apr 2002

I prefer to keep the XHTML transitional DTD since it is a user error if incorrect tags are entered.

We could AutoCorrectNonConformantXHTML for those tags that are (mis-)used most often, like <br> and non closing <p> as a start.

-- PeterThoeny - 17 Apr 2002

In the event that anyone is interested, Mozilla-0.99 (very latest) doesn't take css styles on <p /> tags. I think it has problems with other such tags too, but I haven't investigated enough to be sure. I'm actually surprised that such tags are legal in xhtml - I doubt they would be in pure xml as they seem to be a hack to allow the effect of omitted end tags and those aren't allowed in xml.

-- DavidLeBlanc - 17 Apr 2002

Although we can't expect the average user to enter well-formed xhtml by hand, using HTMLTidy at http://tidy.sourceforge.net/ it is reasonably possible to convert sometimes erroneous html into xhtml, and as an added bonus, we get all the features that xml brings to the table!

Anonymous Coward 4 Jun 2002

Note that in the HTML spec, it is forbidden to close some tags, namely:

  • link meta br col base img param area basefont hr frame input isindex
And that closing p tags will bring unwanted side effects to old (ns4) browsers.

Thus writing <hr></hr> is illegal in traditional HTML (Yes it sucks, like most features inherited from SGML smile

  • but <hr /> is okay, right? -- mw
  • Actually, it is illegal, but all existing browsers allow it, so it is now part of the legacy browsers must support. cn

-- ColasNahaboo - 04 Jun 2002

Because the issue is raised again in MozillaNotStylingXHTML, I will untired add some more:

  • It should not be a goal of TWiki to validate all content pages, but the core templates should be XHTML compliant.
    • Why? you seem to imply that HTML is not a proper language and XHTML is. But no, HTML has a fully-defined spec, and validators.
  • Users should normally not care about HTML or XHTML formatting. With twiki shorthand, valid XHTML will be generated for them.
    • Nearly all our TWiki pages use some html, if only for <br> This week, one of our webmaster had his pages issuing errors... it took a day for him to understand that XHTML is case-sensitive, and thus <H1> is invalid, as <b>...</B>. Why punish people for no gain?
  • If users willingly choose to use their own HTML markup, they will end up in the gray zone themselves. That is, they are at the mercy of particular browser rendering of common HTML.
    • TWiki syntax is clean because one can revert to HTML. preventing people to use HTML will make use add tons of non-reusable anywhere, ugly special purpose syntax, a rout I do not want to take. Why make users learn something like %BR% that has no specification, that cannot be reused anywhere else on the web, rather than <br> ?
  • How big a problem is it for bloggers that movabletype uses XHTML?
    • The engine uses it. But they prevent their users from entering any tag by hand! (only blogger or MT XML). This is the case where XHTML use is appropriate
  • XHTML is not difficult to learn, and HTML is not difficult to unlearn.
    • My users never learnt HTML. They just imitate pages or use tools like frontpage XHTML btw is not diffcult to learn, it is just gratuitously bothersome to use
  • XHTML will make it easier for AccessibleTWiki. Strict separation of structure (XHTML) from style (CSS) would enable us to comply with the W3C's Priority One rating for User Agent Accessibility Guidelines 1.0.
    • This has NOTHING to do with XHTML. HTML+CSS gives you this feature Using XML (not XHTML) +CSS(2,3...) is the real solution, but not very well supported yet
  • When do we stop to fully support NS4? Let it degrade gracefully.
    • We decided to drop NS4 support in our intranet for some time now. The KoalaSkin do not even work gracefully with NS4. NS4 is a non-issue. As a remark, we support lynx, mozilla, IE, opera, konqueror (thus safari). - yes it is easier to support lynx that totally ignores CSS and javascript than NS4 which attemps to do it but fails

A couple of more useful links from http://www.alistapart.com/stories/betterliving/:

I have been working inside the W3C at the time xhtml was designed. Believe me, a lot of people were opposed to xhtml, it was seen as really a dirty hack leaving all the important problems unresolved.

The pages you cite a quite incorrect. for instance they say:

Increased interoperability Unlike old–style HTML pages, valid, well–formed XHTML documents can easily be “transported” to wireless devices, Braille readers and other specialized web environments.
This is totally irrelevant. XHTML still mixes structure and presentation, so PDAs have the same problem displaying XHTML than HTML since the presentation is done with tables, etc. What is needed for this is to separate totally style in CSS and structure in XML. XHTML do not allow you to define your tags and still have presentation constructs just like HTML, there is no gain there.
Writing well–formed, valid XHTML pages is the easiest way to begin this transition.
This is the wrong assumption. going to XHTML do not bring you a step closer to the good direction. Going full abstract XML ( <left_panel> instead of <table> or even <div> tags) + CSS is the goal, going through XHTML to get there gives you no gain and just brings unproductive pains.

XHTML was not done with CSS in mind, but with XSL in mind. It was just a way to "clean up" HTML input so that it could be handled to primitive XSL parsers. But XSL is just a transformation language, we do not need it in TWiki since we already have a script engine (perl). And all real-life systems now accepts (valid) HTML as input as well as XML (or it can be tidy-ied).

-- ArthurClemens - 11 Oct 2003

comments added this way -- ColasNahaboo - 11 Oct 2003

I would add that I think that our manpower should be in priority allocated to 2 (much more important in my opinion) tasks first, before tackling XHTML:

  1. removing all embedded styling from TWiki: use only pure html tags + CSS
  2. implement a toy idea of mine: allowing user to type just any tag, and have the twiki engine convert it to span with class of the pseudo-tag, e.g:
    <date>11 Oct 2003</date> would be renamed:
    <span class=date>11 Oct 2003</span>
But this is something I would surely do anyways as soon as we complete the move smile

-- ColasNahaboo - 11 Oct 2003

i've got a plugin that does a primitive form of #2. i've bundled it up as a proper twiki plugin and posted it at PseudoXmlPlugin

-- WillNorris - 11 Oct 2003

Colas, I would like to make sure I understand your position, here is my summary. Please correct as necessary.

We should be striving for structural markup and using CSS for presentation. It doesn't matter which markup, html or xhtml, is used, it's the structure that counts. However because the pages are written by hand, html is better (less prone to error, easier to type) than xhtml.

yes?

-- MattWilkie - 12 Oct 2003

Matt:
exactly. Note that I was toying with forking tidy to XML-ise on the fly html, and it does not seem to a lot more of overhead. May be a viable solution to process the %TEXT% this way for people wanting to go the XHTML route
Will:
nice move!!! I'll try to help you on this one

-- ColasNahaboo - 12 Oct 2003

Colas, you put some well-formed opinions here, in contrast to my emerging and often incomplete knowledge on the subject (indeed fed by incorrect information on the net). Is XHTML the new hype word?

Maybe you can help me a bit more: what about the so-called 'quircks mode' of browsers? If you use <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> as in Koala wiki, in what mode will the browsers render the page?

-- ArthurClemens - 12 Oct 2003

A good introduction is: http://www.htmlhelp.com/tools/validator/doctype.html . Basically you have 3 choices: (PICK styled definition list)

HTML 4.01 Transitional
is the "common lingua" HTML spoken by everybody on the web nowadays.
with loose.dtd
browser should try to render HTML by the spec.
without dtd
browsers should emulate traditional legacy browser bugs
HTML 4.01 Strict
is the ideal HTML with all style removed (into CSS sheets), and no frames.

You could say from the above discussion that I am saying that going to HTML 4.01 Strict is what I advocate. But since users still enter code by hand, I think actually enforcing this rule would be an unproductive gain as it will make contructs like <span color=red> incorrect, you must write <span style='color:red'>. Quite pedantic isn't it? (even if we know this is the way). And frames are evil, but they are sometimes offering some usability gains.

So, I'll summarize why I am using this doctype:

  • we should all agree that we should write the engine+skins in HTML 4.01 Strict
  • but we cannot actually declare so in the doctype to not annoy our users.
  • but we still want the browsers to not emulate old browsers bugs, so we specify the DTD.

PS: XHTML is part of the XML hype. Going towards XML is good in general, but it is just syntaxic sugar. When the existing format was well-defined and had numerous available parsers, "going XML" is not needed. It can be really ridiculous and harmful in some cases, like the people trying to impose script languages using the XML syntax. XML is really useful when the existing syntax was bad (ambiguous, fuzzy defined, or lacking internationalisation).

PS2: PICK styled definition list: what do you think of my styling of DL/DT/DD (definition lists)? Quite legible, and works on all browsers, even text ones... but not on NS4 :-). This is more and more the case now: you can have nice clean HTML with a good rendering nowadays, even on text browsers, if you drop NS4 support. I did this because nobody used definition lists as they looked ugly, resorting to the ubiquitous * *name* description "emulation".

I like this very much indeed. --Main.MattWilkie

-- ColasNahaboo - 12 Oct 2003

Colas, I understand your points but do not agree that XHTML should be dropped:

  • XHTML is clean HTML
    • There is no overhead for the user if the TWiki engine generates XHTML, so why revert back to old HTML 4.01?
  • It makes processing of page data easier
    • The RSS feed would not be a simple SEARCH but extra code to produce valid XML.
    • Integration with other enterprise tools is simplified
    • Less problems if at a later point the TWiki topic data is stored in an XML format
  • HTML or XHTML should not be used by the user
    • This is documented in the TextFormattingRules
    • (X)HTML should only be used where needed, like by web applications for HTML forms.
      • We could make this more strict, e.g. require to put this in <xhtml> ... </xhtml> tags
    • The TWikiML can be extended over time to reduce the need for users wanting to use HTML
    • The <span class=colas> text on this page is an example of mixing presentation with content. This logic ought to be automated, it should go into the TWiki engine, e.g. a mode where text of contributors is shown in different colors. The key is to keep topic content clean and easy to read/type in source form.
      • BTW, believe it or not, a large percentage of engineers at my workplace who work mainly on Unix still use NS 4.7! That is, they would not see Colas' highlight.
  • Style sheets
    • The way to go with compatibility in mind
    • This should be used transparently, some skins might use it, some not
    • The TWiki engine can generate style sheets as long as it does not break XHTML and cause problems for older browsers. One good example is the TOC style added in TocNotClosedProperly.

-- PeterThoeny - 12 Oct 2003

> HTML or XHTML should not be used by the user [...]
> The TWikiML can be extended over time to reduce the need for users wanting to use HTML

This would need to come first. We are already routinely using html where twiki markup is lacking (the span class=colas bullet above for example). Telling users that they have to stop using html before there good alternatives in place will lead to needless aggravation (though it would add to the motivation to fix it... ;-)).

So perhaps in three steps: 1) is to divorce structure and presentation in all twiki-generated html, using the 4.1 strict doctype, and 2) enhance TWikiML so that user-generated html is not wanted, and then 3) move to xhtml.

Also beware of reinventing html in yet another syntax. Case in point is that we have verbatim, pre and code which all do very nearly the same thing, but are different enough that they can't be used interchangably.

-- MattWilkie - 13 Oct 2003

To Peter:
I think we disagree on the issues:
  • "We should prevent the use of HTML by hand in the future" I think it is an unreasonable goal, as we will reinvent HTML in a ugly way. For instance, I argue that <span class=colas> is totally semantics with no presentation: it indicates what I authored. Things like %GREEN% ... %ENDCOLOR% , on the opposite are totally presentation, and both ugly and unrememberable by users. Giving up HTML to get a plethora of these tags is not something I want. At least with Will Plugin, you could use <green>...</green>
    So, since I think we will never be able to forbid HTML tags, then we will never be able to have the XHTML doctype, so no tool willl be able to take advantage of it.
    So we disagree on this. But it is not very relevant for the short & medium term, so let's live with it. Let's all do Matt's point (1). You will think we will do (2) and (3) afterwards, and I think not, but it is not important for the next TWiki versions.
  • NS4 issues: ILOG make some javascript components. They decided to support NS4, as some people are stuck with it on intranets because they have to use custom-coded intranet sites which only work on NS4. What we did, and could be used for TWiki, is just detect NS4 and then disable ALL styling. It can be done for instance by using a CSS stylesheet via a @import, that will not be understood by old browsers (IE3, NS4). But I think this is a skin issue. So a nice hack could be to have the skin setting telling the engine to be in CSS mode or not. Perhaps some conventions on the skin name: use CSS mode if skin begins by capital, or by CSS ? We would have to code things then as:
    if ( $skin =~ /^[A-Z]/ ) {
      return "<span class='showerror'>$errormessage</span>" ;
      return "<font size=\"-1\" color=\"#FF0000\">$errormessage</font>" ;
    }

    But I really think it is much more trouble that it is worth.

-- ColasNahaboo - 13 Oct 2003

i updated PseudoXmlPlugin. Version 1.001 handles nested tags now, but i may have significantly impacted the performance?

-- WillNorris - 14 Oct 2003

Edit | Attach | Watch | Print version | History: r29 < r28 < r27 < r26 < r25 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r29 - 2003-10-14 - WillNorris
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.