Try doing a search for the word "format" on your wiki.
You will find every topic matches. This is broken behaviour, stemmed from the fact that there are lines like this in every topic that grep finds that aren't filtered by the matching process.
%META:TOPICINFO{author="MartinCleaver" date="1068398763" format="1.0" version="1.3"}%
--
MartinCleaver - 27 Nov 2003
Rather than arguing if this is a bug it would be useful suggest what specification is required.
- On the contrary, I'd assert that it is useful to make a note of bugs even if the author does not have time to state how it should be corrected. -- MartinCleaver - 28 Nov 2003
- I think my point was missed. Given this isn't clearly specified, it's not clear it's a bug. Also, specifying correct behavour is not a solution, but it's a good starting point. [ JohnTalintyre - 28 Nov 2003 ]
It would be fairly easy to ignore all META lines, but that would also likely give a
wrong result. Would it be all keys in all META lines? Unfortunately, not a trivial thing to implement.
--
JohnTalintyre - 28 Nov 2003
John,
Why have search & metasearch then?
What would a user consider a search of - a search of the data, meta data (same thing in this case) or the meta-meta-data (which is really meta data) - in this case
format above is meta-meta-data.
Users expect to search data when using search.
data in TWiki pages is (text, field1, field2, field3, field4)*
- In traditional TWiki parlance field1, field2, field3 are metadata, however with a structured topic they're just data, with the metadata being structure.
It's a bug (in everyone's fork).
The solution is to simply change search to only search data. (Some of which you may term data, but a wiki is the simplest database that can possibly work, and TWiki is
an implementation of a more complex one with no indices.)
-- MS
I thought I'd posed a simple question above, but perhaps not. I would think people would expect a search to show up comments against an attachment, this is meta data. Clarifying a spec when an issue like this comes up is to my mind the first step towards solving it.
--
JohnTalintyre - 28 Nov 2003
I agree - this could do with specing out further. I'd prefer not to search only the text, but to pre-render (to the pre-html fully expanded topic) the topics and then search that.
- revision control, multi-version container (possibly even multi-backend (rsc & sql))
- raw topic
- expand all VARIABLES
- convert to presentation language (html / pdf / xml / ...)
I think non-technical users would expect SEARCH to find all topics that include what they are SEARCHing for even if that text is actually generated by the expand all VARIABLES process (INCLUDES etc).
conversly, METASEARCH can be considered a bit techie - most of the time i'm using regex.. so that could be more tricksy - but even then i like the idea of generating METADATA using a query..
Pros:
- what you see is what you search
-
Cons:
- bloody hell, how big a computer do you want me to have?
--
SvenDowideit - 28 Nov 2003
John,
- Point: It's not metadata - it's data (people want to search it, yes?). Metadata is that it's a comment against an attachment.
(OK, in common parlance it's metadata, but if you're implementing an application framework it's just data) In TWiki structure I laid out that TWiki data structure is significantly more complex than people are treating it, and that's why you're hitting these problems. The data in a TWiki topic is a composite comprising of (text, mdatafield1, mdatafield2, mdatafield3, attach1, attach2, attach3, attach1comment, attach2comment, attach3comment) as one large non-normalised value.
Consider that some people run have some topics without the large
textarea text field.
It's not a simple question - which is why I started with the existing usage structues many months ago, and hence all the points I made about TWiki actually being a relational store, albeit one without indices, and all the gumf in logically nested webs. (And why I spent so long refactoring that particular topic despite being >1000 lines long) -- MS
There are a lot of good points made above and I agree it would be good to enhance searching so that users are more likely to see what they expect. I have moved this to a
FeatureEnhancementRequest. I think a first step improvement could be:
- Add an extra switch to search
- This could be used from normal form based searches
- Would check meta lines of matching topics and remove topic if only matches were in meta data keys
I would add that in my experience this is a fairly common issues with many searching engine returning results based on "keys" in
HTML.
--
JohnTalintyre - 29 Nov 2003
Warning: some people are actually
using this "feature" to search for test in forms metadata. If you take it away, a lot of their pages will break....
--
CrawfordCurrie - 29 Nov 2003
Well, they ought to be doing a METASEARCH. I don't mind so much that they values match but matching the key names is, in my mind, nothing but bug.
I guess we ought to have caught this at the specification stage. My vote (as if we had a convenient voting mechanism) is to force these users to change their search. I'd recommend something more friendly but I fear the complexity would get us into deeper trouble.
--
MartinCleaver - 29 Nov 2003
Many folks think TWiki is just a container for content. From that perspective it makes sense to search just for content. However, TWiki is moving more towards an
ApplicationPlatform. With that, most
TWikiApplication depend on searching meta data.
Here is a simple solution to combine both worlds, and to be backward compatible: Add new keywords to the
scope parameter:
- Default would be topic text including meta data (current spec)
-
scope="all" searches on topic name, topic text and meta data (see SearchScopeForTopicAndText)
-
scope="text" searches just for text, excluding meta data
--
PeterThoeny - 01 Dec 2003
So, if I read you correctly, you agree that searching for "format" should not match the key name? i.e. that this is a bug?
--
MartinCleaver - 01 Dec 2003
I would not call that a bug, more like a minor inconvenience. Again, this is a feature if used by a
TWikiApplication.
--
PeterThoeny - 01 Dec 2003
Agreed, its minor bug. But its still a bug! And surely it would still be broken if the
TWikiApplication wanted a value rather than a key called "format"?
--
MartinCleaver - 01 Dec 2003
You're both right.
- Having every topic returned for an end user search (Topic, version, author, format) from WebSearch is clearly bust, and counter intuitive.
- %SEARCH% is used by applications and relies on searching metadata. (Fact of life) Changing the default from this will break lots of things for many people.
1 is user driven - and clicky pointy and easiest to fix without hugely affecting people's code, or bookmarks, etc. 2 is used by sufficiently many places for it's behaviour to need to stay the same.
Why not simply add on a flag to search to have:
- onlydata=1 - https://twiki.org/cgi-bin/search/Codev/?scope=text&search=format&onlydata=1 Which searches Meta data values (not keys) and topic text (since the tuple of these is the topic contents - as for example Sven's mentioned he runs some wikis without the main topic text area active. This also means that bookmarked search results won't change behaviour.
- Allow people to add onlydata="1" (on/foo/whatever) to %SEARCH.
- You could allow this to be overridden by a web preference, or have DATASEARCH ala METASEARCH.
That way backwards compatibility is maintained for topic contents all the way back to category tables, and allows the casual user not to get confused by a clear bug -
which is where this bug lies. The key place this is a bug is in the
WebSearch based usage, not the %SEARCH% based usage.
Other possible names might be
excludemeta,
onlytopic,
bibble . Having
scope extended is misguided - people might want those other scope values
and this behaviour.
-- MS - 01 Dec 2003
That sounds reasonable Michael. Is it onlydata/DATASEARCH, onlycontent/CONTENTSEARCH or onlytopic/TOPICSEARCH though?
and then SEARCHTOPIC or TOPICSEARCH?
--
MartinCleaver - 01 Dec 2003
It is obvious that there are severe limitations to what you can put into the searchable content files without making a mess of things. This is particularly glaring if you try to use something like
HtmlAreaEditor and wind up with a bunch of
HTML in your file. Of course, searching for something like "TABLE" will return all topics that include a table of some kind, not what you would expect, but who knows, maybe someone actually wants to do that...
I think it is time to consider objectifying the Topic Object. I have started work on the
TopicObjectModel, which includes various "properties" of the files that would be useful, and should clarify what we are working with. I claim that it is impossible to have a flexible system which will also be restricted to a Topic Object definition that is suitable for searching. Also, providing this sort of object model solves a great many other problems I have been wrestling with, such as
SearchInINCLUDE,
SaveUnpublished,
ParameterizedIncludes,
ExcludeWebTopicsFromSearch, and the general Variables question.
--
RaymondLutz - 03 Dec 2003