Html Filtering Plugin Development/Brainstoming
This page is to brainstorm some ideas on creating a plugin to filter good
html into topics (e.g. <p>, <form>) and stop bad html
(e.g. <javascript>) from being displayed when viewing topics.
For this plugin to work on display, it must be called before any other
plugins process the page. Otherwise it could filter out a plugin's html
rather than user added html.
However, if this filter can be applied to the topic when it is saved,
it would cut down on processing, and permit particular users to pass
"bad" html through the filter. I think the beforeSaveHandler hook
in the current alpha allows this.
I envision the HtmlFilterPlugin page having preferences
that allows users or groups to use certain tags in the
pages they save: E.G.
-
- Set AllowJavaScript = TWikiAdminGroup
would mean that anybody in the TWikiAdminGroup would be allowed to save
pages including javascript. However the javascript tags would be removed (or sanitized)
if somebody outside of the TWikiAdminGroup saved the page.
Using INCLUDE, "bad" html pages can be created and the functionality made available
in pages editable by ordinary users.
I originally thought that the filter should pass bad html only if
the edited page had an ALLOWTOPICCHANGE that restricted permissions
to the allowed group. However I don't think that's needed. If unauthorized
users edit the page, the "bad" html can be recovered from RCS, or
the bad html tag can just be mangled (javascript->scriptjava)
into an innocuous form, waiting for the next authorized user to
reverse the change.
- One problem is that if an unauthorized user edits the page, the script that he adds could then hijack the authentication of the authorized user; so that when the authorized user views the page the script could, for example, do an HTTP POST / edit-save to a page that the original unauthorized user would not have been able to edit directly. -- DaleBrayden - 11 Mar 2003
So the questions are:
- What are safe html tags
- What are bad html tags
- Can the input be filtered correctly so that only allowed tags are passed through?
--
JohnRouillard - 31 Dec 2002
You have to do more than just filter tags - you need to look at event handlers on tags (onclick, onmouseover, etc.) These can occur on just about any tag. Fortunately, the event handler content must either include a script specifier (like 'javascript:do evil stuff here') or make a call to code defined elsewhere within <script> tags or linked in with a LINK tag.
--
DaleBrayden - 11 Mar 2003
These may be of some interest:
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/nms-cgi/modules/NMS/HTMLFilter/
http://nick.cleaton.net/xssrant.html
...HTML filtering is a complicated problem, and you need to consider what are safe and bad attributes as well as what are safe and bad tags.
--
NickCleaton - 11 Mar 2003
Nick's HTMLFilter sounds very useful - since it's whitelist based it sounds like it should be quite safe. I'd like to see if it can be used when saving a page, to avoid the performance overhead of running it on every page view.
--
RichardDonkin - 15 Mar 2003
Not only that, but filtering the raw text as it's saved avoids the problem that a rendering-time plugin would have: the filtering plugin would have to run before all other plugins, to avoid undoing what the other plugins (and twiki's own rendering code) has done.
At the risk of re-opening a discussion that may have been covered at
DisableHTML or
SanitisingHTML, it seems to me that a worthy goal would be
- Make enough TWiki syntax to make embedding of html unnecessary, and
- Provide a twiki configuration option that outright disables html input
The 2nd part of the goal would allow us to add the strip-html-during-save directly into the TWiki core. I think that the first part of the goal is mostly achieved already - the only thing I sometimes miss is the ability to express an href that opens in another window (i.e. <a href="foo" target="win2">). I'm sure there are other constructs that can't be expressed, but
surely these are all things that could be done with plugins and extended syntax ???
--
DaleBrayden - 15 Mar 2003
DaleBrayden said:
it seems to me that a worthy goal would be
1 Make enough TWiki syntax to make embedding of html unnecessary, and
2 Provide a twiki configuration option that outright disables html input
I claim that we will not and should not totally eliminate html/javascript.
The nice thing about TWiki is that 80% of work can be done without having
to know HTML. This makes it easier for people to use. But advanced HTML items
like forms, and javascript (which raises the requirments bar but may be suitable
for some intranets) also make things easier to use when you are following a process,
E.G. the bug creation pages.
Now why spend time re-inventing syntax for forms, or javascript when it will be used
less than 20% of the time. I suspect that only TWiki developers and advanced users
will probably use the html since it's features are usually required for adding process
rather than information to pages.
Now with this being said, should we provide safeguards against malicious
html/javascript ..., certainly, but why reinvent HTML for the last 20% or less of
operations?
--
JohnRouillard - 17 Mar 2003
OK - fair enough. I don't use forms much on either of my TWiki sites, so I tend to forget how useful they are. Still, it seems to me that form
definition is not something that a TWiki
user needs. I see this as akin to the definition of templates - we put our templates in a directory where they cannot be updated by twiki users.
It's not quite right that only TWiki developers and advanced users will use html - unless you include hackers and hacker wannabees as advanced users. This proposal about eliminating html has been made before by other people (e.g.
GetRidOfJavaScript), and has met with vehement opposition before. Maybe there are two types of people involved in this discussion: those who have been hacked and those who haven't.
Anyway - if the HtmlFilterPlugin provides the AllowJavaScript preference variable, as defined at the top of this topic, then my concerns are fully addressed.
Just one other note: someone at c2 recently said something to the effect that "hacking wiki is about as intellectually challenging as hacking a corkboard - so why bother?" And it does seem to be true that wiki sites are hacked less often and less destructively than, say, phpWebSite or phpNuke sites. The sad corollary to this is that any effort to hack-proof twiki had better be genuinely hack-proof, or it will become a challenging target for the defacement sub-culture.
--
DaleBrayden - 17 Mar 2003