Tags:
create new tag
, view all tags
Try doing a search for the word "format" on your wiki.

You will find every topic matches. This is broken behaviour, stemmed from the fact that there are lines like this in every topic that grep finds that aren't filtered by the matching process.

%META:TOPICINFO{author="MartinCleaver" date="1068398763" format="1.0" version="1.3"}%

-- MartinCleaver - 27 Nov 2003

Rather than arguing if this is a bug it would be useful suggest what specification is required.

  • On the contrary, I'd assert that it is useful to make a note of bugs even if the author does not have time to state how it should be corrected. -- MartinCleaver - 28 Nov 2003
    • I think my point was missed. Given this isn't clearly specified, it's not clear it's a bug. Also, specifying correct behavour is not a solution, but it's a good starting point. [ JohnTalintyre - 28 Nov 2003 ]

It would be fairly easy to ignore all META lines, but that would also likely give a wrong result. Would it be all keys in all META lines? Unfortunately, not a trivial thing to implement.

-- JohnTalintyre - 28 Nov 2003

John,

Why have search & metasearch then?

What would a user consider a search of - a search of the data, meta data (same thing in this case) or the meta-meta-data (which is really meta data) - in this case format above is meta-meta-data.

Users expect to search data when using search.

data in TWiki pages is (text, field1, field2, field3, field4)*

  • In traditional TWiki parlance field1, field2, field3 are metadata, however with a structured topic they're just data, with the metadata being structure.

It's a bug (in everyone's fork).

The solution is to simply change search to only search data. (Some of which you may term data, but a wiki is the simplest database that can possibly work, and TWiki is an implementation of a more complex one with no indices.)

-- MS

I thought I'd posed a simple question above, but perhaps not. I would think people would expect a search to show up comments against an attachment, this is meta data. Clarifying a spec when an issue like this comes up is to my mind the first step towards solving it.

-- JohnTalintyre - 28 Nov 2003

I agree - this could do with specing out further. I'd prefer not to search only the text, but to pre-render (to the pre-html fully expanded topic) the topics and then search that.

  1. revision control, multi-version container (possibly even multi-backend (rsc & sql))
  2. raw topic
  3. expand all VARIABLES
  4. convert to presentation language (html / pdf / xml / ...)

I think non-technical users would expect SEARCH to find all topics that include what they are SEARCHing for even if that text is actually generated by the expand all VARIABLES process (INCLUDES etc).

conversly, METASEARCH can be considered a bit techie - most of the time i'm using regex.. so that could be more tricksy - but even then i like the idea of generating METADATA using a query..

Pros:

  • what you see is what you search
Cons:
  • bloody hell, how big a computer do you want me to have? smile

-- SvenDowideit - 28 Nov 2003

John,

  • Point: It's not metadata - it's data (people want to search it, yes?). Metadata is that it's a comment against an attachment.

(OK, in common parlance it's metadata, but if you're implementing an application framework it's just data) In TWiki structure I laid out that TWiki data structure is significantly more complex than people are treating it, and that's why you're hitting these problems. The data in a TWiki topic is a composite comprising of (text, mdatafield1, mdatafield2, mdatafield3, attach1, attach2, attach3, attach1comment, attach2comment, attach3comment) as one large non-normalised value.

Consider that some people run have some topics without the large textarea text field.

It's not a simple question - which is why I started with the existing usage structues many months ago, and hence all the points I made about TWiki actually being a relational store, albeit one without indices, and all the gumf in logically nested webs. (And why I spent so long refactoring that particular topic despite being >1000 lines long) -- MS

There are a lot of good points made above and I agree it would be good to enhance searching so that users are more likely to see what they expect. I have moved this to a FeatureEnhancementRequest. I think a first step improvement could be:

  • Add an extra switch to search
  • This could be used from normal form based searches
  • Would check meta lines of matching topics and remove topic if only matches were in meta data keys

I would add that in my experience this is a fairly common issues with many searching engine returning results based on "keys" in HTML.

-- JohnTalintyre - 29 Nov 2003

Warning: some people are actually using this "feature" to search for test in forms metadata. If you take it away, a lot of their pages will break....

-- CrawfordCurrie - 29 Nov 2003

Well, they ought to be doing a METASEARCH. I don't mind so much that they values match but matching the key names is, in my mind, nothing but bug.

I guess we ought to have caught this at the specification stage. My vote (as if we had a convenient voting mechanism) is to force these users to change their search. I'd recommend something more friendly but I fear the complexity would get us into deeper trouble.

-- MartinCleaver - 29 Nov 2003

Many folks think TWiki is just a container for content. From that perspective it makes sense to search just for content. However, TWiki is moving more towards an ApplicationPlatform. With that, most TWikiApplication depend on searching meta data.

Here is a simple solution to combine both worlds, and to be backward compatible: Add new keywords to the scope parameter:

  • Default would be topic text including meta data (current spec)
  • scope="all" searches on topic name, topic text and meta data (see SearchScopeForTopicAndText)
  • scope="text" searches just for text, excluding meta data

-- PeterThoeny - 01 Dec 2003

So, if I read you correctly, you agree that searching for "format" should not match the key name? i.e. that this is a bug?

-- MartinCleaver - 01 Dec 2003

I would not call that a bug, more like a minor inconvenience. Again, this is a feature if used by a TWikiApplication.

-- PeterThoeny - 01 Dec 2003

Agreed, its minor bug. But its still a bug! And surely it would still be broken if the TWikiApplication wanted a value rather than a key called "format"?

-- MartinCleaver - 01 Dec 2003

You're both right.

  1. Having every topic returned for an end user search (Topic, version, author, format) from WebSearch is clearly bust, and counter intuitive.
  2. %SEARCH% is used by applications and relies on searching metadata. (Fact of life) Changing the default from this will break lots of things for many people.

1 is user driven - and clicky pointy and easiest to fix without hugely affecting people's code, or bookmarks, etc. 2 is used by sufficiently many places for it's behaviour to need to stay the same.

Why not simply add on a flag to search to have:

  • onlydata=1 - http://www.twiki.org/cgi-bin/search/Codev/?scope=text&search=format&onlydata=1 Which searches Meta data values (not keys) and topic text (since the tuple of these is the topic contents - as for example Sven's mentioned he runs some wikis without the main topic text area active. This also means that bookmarked search results won't change behaviour.
  • Allow people to add onlydata="1" (on/foo/whatever) to %SEARCH.
    • You could allow this to be overridden by a web preference, or have DATASEARCH ala METASEARCH.

That way backwards compatibility is maintained for topic contents all the way back to category tables, and allows the casual user not to get confused by a clear bug - which is where this bug lies. The key place this is a bug is in the WebSearch based usage, not the %SEARCH% based usage.

Other possible names might be excludemeta, onlytopic, bibble . Having scope extended is misguided - people might want those other scope values and this behaviour.

-- MS - 01 Dec 2003

That sounds reasonable Michael. Is it onlydata/DATASEARCH, onlycontent/CONTENTSEARCH or onlytopic/TOPICSEARCH though?

and then SEARCHTOPIC or TOPICSEARCH?

-- MartinCleaver - 01 Dec 2003

It is obvious that there are severe limitations to what you can put into the searchable content files without making a mess of things. This is particularly glaring if you try to use something like HtmlAreaEditor and wind up with a bunch of HTML in your file. Of course, searching for something like "TABLE" will return all topics that include a table of some kind, not what you would expect, but who knows, maybe someone actually wants to do that...

I think it is time to consider objectifying the Topic Object. I have started work on the TopicObjectModel, which includes various "properties" of the files that would be useful, and should clarify what we are working with. I claim that it is impossible to have a flexible system which will also be restricted to a Topic Object definition that is suitable for searching. Also, providing this sort of object model solves a great many other problems I have been wrestling with, such as SearchInINCLUDE, SaveUnpublished, ParameterizedIncludes, ExcludeWebTopicsFromSearch, and the general Variables question.

-- RaymondLutz - 03 Dec 2003

Edit | Attach | Watch | Print version | History: r20 < r19 < r18 < r17 < r16 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r20 - 2004-05-20 - PeterThoeny
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.