Tags:
taxonomy1Add my vote for this tag create new tag
, view all tags
ALERT! NOTE: This is a living document, please help improve it

Managing Stale Content

Any wiki administrator who maintains a lager wiki that is several years old knows the problem: How to manage aging content. The problem gets larger if you have 50K, 100K or 500K topics. Projects come and go, teams come and go; for some companies the only constant is change that comes with re-orgs. With all this change, content gets out of date quickly. How can this be managed?

Issues with stale content

  • You browse to a topic and you are not sure if this is the most up to date topic on a subject
  • Search shows you a flat list of content, a mixture of active and stale content
  • Stale content hinders navigation to relevant content
  • Stale content slows down the system (depends on implementation)
  • Stale content clobbers up namespace

When does content get stale?

It depends. Examples:
  • Due to a re-org, a team gets split into two. Content needs to be split into three: Content for the two new groups, and no longer relevant content.
  • An engineering project is completed. It is "done" for the engineers, but still relevant for support and marketing. Some project related content can be retired earlier (such as meeting minutes), some needs to be retained longer (such as design documents)
  • A product is no longer maintained. Associated engineering and marketing topics are still needed by customer support.
  • A product hits end of life. The life span can be 2 years, 5 year or 10 years. Associated engineering and marketing topics are no longer needed. Not until a customer demands an audit.
  • Company policy documents and process documents evolve over time. The product release process doc verson 2 is no longer and could be retired. However, all policy and process documents need to be retained for legal reasons.

Retire content manually or automatically?

From above list we can see that the answer is not that simple. Some stale content needs to be retired manually, some can be automated. Some content can be retired sooner than later. Some content should never be retired.

A. Manually retire content

Marking a topic as retired can be as simple as a delete operation. Or even easier, a one click "Retire me" operation. Possibly supported by a search that offers check boxes to retire a set of topics selectively.

B. Retire content by expiration date

A drivers license has an expiration date. The same could be applied to some type of documents. For example, you could say that company internal news articles expire automatically after 6 month.

Possible solutions of retiring content

1. Use many smaller webs

One obvious solution is to use one little web per project or team. Pros and cons:
  • smile Easy to get rid of a 5 year old project by removing the web
  • smile Even without removing web, it is out of the way (by removing links to web in prominent places)
  • smile URLs do not change
  • frown Cross-linking and search gets more complicated
  • frown Finding related content is more difficult
  • frown Low degree of context due to using fewer WikiWord links

Overall, if you disregard the content aging question, it is better to have few webs with many topics.

2. Move stale content to Trash web

The obvious solution is to trash stale content. Pros and cons of putting stale content into the Trash bin:
  • smile Out of the way of active content
  • frown Labor intensive way of retiring content topic by topic
  • frown Retired content clobbered with other web's content
  • frown Content might no longer be available when needed at a later point (audit etc)
  • frown URLs change, e.g. broken links in e-mail archives, bookmarks, intranet backlinks

3. Use one Archive web per web

Another solution is to move content into an area marked as stale, e.g. an Archive web per web. Pros and cons:
  • smile Out of the way of active content
  • frown Labor intensive way of retiring content
  • smile Content available when needed at a later point (audit etc)
  • frown URLs change, e.g. broken links in e-mail archives, bookmarks, intranet backlinks

4. Assign Stale state to each topic

Each topic could have a state, say active or retired. The state could have a finer granularity, such as draft, active, final, retired (which could be used by FacetedNavigation.) Each topic can be retired/restored easily by the users. Retired topics are omitted from a search result by default. There is a visible indicator if you look at a retired topic.

Pros and cons:

  • smile Stale content is logically out of the way of active content, and still accessible when needed
  • indifferent Relatively easy to retire content
  • smile Distinction between retiring content ("may be needed in future") and deleting content into Trash web ("this is trash")
  • smile Content available when needed at a later point (for audits etc)
  • smile URLs do not change, e.g. links in e-mail archives, bookmarks, intranet backlinks work
  • smile Whole applications with associated data can be retired because URLs do not change
  • frown Stale content clobbers up namespace

Spec of system based on Stale state per topic

Based on the solutions listed above, the most promising one is the last one that keeps track of stale content on a per topic level.

Specification of user interface:

  • Every topic has a state of active or retired
  • A topic is active by default
  • WebSearch and WebChanges omit topics marked as retired
    • WebChanges has an option to show also retired topics:
      • See 50, 100, 200, 400, 800 most recent changes including retired topics
    • WebSearchAdvanced has a new switch to show also retired topics:
      • Do show: unchecked.gif retired topics
  • Topics marked as retired are shown as such (old brown-yellowish newspaper look?)
  • Links to retired topics are marked as such (brownish looking link?)
  • Push button action to retire/restore a topic:
    • The "More screen" has these new bullets:
      • Retire topic:
        • Status: This topic is active
        • Action:
    • Or this for a retired topic:
      • Retire topic:
        • Status: This topic is retired
        • Action:
  • Special search that shows a list of topics with check boxes to retire a set of topics selectively
    • Similar user interface to GlobalReplacePlugin. Example:
      unchecked.gif WebcolorInclude possible a bug in in %INCLUDE: if i edit webcolor.incl (contains only #xxxxxx) with vi the result is a 8 byte file. this file will result in non working colors in... PeterThoeny - 2000-01-21 - 00:26
      unchecked.gif EmailAddressInWikiNotation Bug: An email address that starts with a WikiName gets rendered as a Wiki link, not as an email address. Example: WebHome #64;testPLEASENOSPAM.test PeterThoeny... PeterThoeny - 2000-01-29 - 09:11
      unchecked.gif SuffixForCgiScripts How about parameterizing the name of the cgi script `view`. The reason is that I can`t add ScriptAlias entries to my web server, but ExecCGI is on ... but... PeterThoeny - 2000-02-12 - 08:11
      unchecked.gif WikiProperties I recently modified my TWikiWeb to support per web properties files. It started because I realized I wanted to to a fancier page heading instead of a single block... PeterThoeny - 2000-02-28 - 09:17
      unchecked.gif TWikiVariableScopeTag Discovered on this TWiki. The scope attribute appears not to work when given an explicit value of `topic`. `topic` and `title` logic reversed? Peter, I`ll see if... PeterThoeny - 2000-04-19 - 23:13
      unchecked.gif WikiPropertiesCache Now that we have the WikiProperties at several different levels, I think we need to add a function to return a named property at site, web (by name) or user (again... AndreaSterbini - 2000-09-04 - 13:17

Technical specification

  • Implement as new RetireContentPlugin
    • Expand %RETIRECONTENT{ action="show" }% to active or retired
    • Change to retired state with %RETIRECONTENT{ action="set" value="retired" in="3 month" }%
    • Change to active state with %RETIRECONTENT{ action="set" value="active" }%
    • Retire scheduled topics with %RETIRECONTENT{ action="batch" value="scheduled" }%
    • Plugin introduces new meta data:
      • %META:RETIRECONTENT{name="state" value="active"}% or value="retired"
      • %META:RETIRECONTENT{name="scheduled" value="1130886435"}% in absolute epoc time
  • Define new Plugin callback searchResultHandler
    • Plugin supresses retired topics in search result unless URL parameter showretired=on is specified
  • Define new Plugin callback linkRenderHandler (or the like)
    • Plugin renders link to retired topics differently

Open questions

  • Add feature to remind people to look at topic in the future?
    • The "More screen" could have this additional bullet:
      • Reminder:
    • A new report could show the pending/upcoming topics

Contributors:
-- PeterThoeny - 27 Oct 2005

Discussions

This enhancement could be done in a Plugin or in the core. If it is done in a Plugin, we need to enhance the Plugin API to hook into the search result (to filter results). One could argue that this should go into the core, e.g. the storage backend should be aware of the staleness of content.

-- PeterThoeny - 27 Oct 2005

I'm all for to put this into the core. Some things to sort out:

  • single option retired vs. grayscale of retiredness, especially regarding batch marking
  • some topics would be better marked with a version ('final' version, not 'in progress' revision number), so instead of retired they would be "SalesGroup.2004" or "BigIdeas.Trashed"

-- ArthurClemens - 30 Oct 2005

TWiki has survived for a number of years without a concept of stale content. Given the goal of keeping the core light, I do not think this should be in the core. It should be a plugin. It is not something all admins will want.

Having said that, I am in favour of extending the access plugins have to core functions. For one thing, plugins should probably be able to store new, out of band, metadata in topics (for recording your "staleness" state). Here's the approach I would favour:

  1. Implement unit tests to verify that the Meta API allows storage and recovery of out-of-band meta-data (e.g. %META:OOB{name="staleness"....)
  2. Implement a handler to filter search results. Your plugin provides the handler.
  3. Implement a handler to change the rendering of topics in search results (the method for doing this at the moment is pretty horrific - all munged into a big if statement in Search.pm - and needs heavy refactoring anyway, which could be done at the same time)
  4. We have already discussed a link rendering handler, so this is not the only plugin requiring this handler.
This isn't going to get done for Dakar, but the core extensions could be implemented as a Contrib that extends the Dakar baseline until Edinburgh is ready.

-- CrawfordCurrie - 31 Oct 2005

I'm very much for the adoption of a stale-concept. We battle the issue currently, and one suggestion seen is to make use of the ActionTrackerPlugin to mark SMELLs of staleness with an action, and assign the action to most-relevant/latest/best-guess-author(s). This ensures:

  • The topic is marked with the SMELL instantly, for others to see, together with a deadline (action date) for a next determined state (wishful thinking!). The action won't change how the topic looks, but the action will be at the bottom of the page as a kind of "last note / state". Actually, thinking about it, I guess this could be made more automated by use of the comment plugin in the default templates .. will have to try that combination out.
  • The authors that get assigned can get a list of what content that they are (best guess) responsible for updates to - and, very important, they can show this list to their boss'es. The boss'es can then get a hold of what kind of effort is in keeping the documentation up to date, and how it should be prioritized. Getting a slice of ressources assigned for keeping up documentation is a battle, this is a way of visualizing it (assigning ressources for just writing the damn thing was a battle, and now you want ressources to update it ??)
  • Actions becomes mails in inbox'es - it seems to have some effect on getting things done.

I'm all for the suggested no. 4 solution, it would make our day.

-- SteffenPoulsen - 31 Oct 2005

I'd like to point out, that TWiki.org desperatly needs this work done - I'm looking forward to seeing your implementation and results of this (as I'm presuming that cleaning up TWiki.org is what triggered this!) a huge number of us beleive that the only way to make twiki.org useable again, is to delete most of the content, and only bring back topics that we realise we need - I hope to be proven wrong smile

-- SvenDowideit - 01 Nov 2005

Arthur: Thinking it over, it might be better to use WikiTagging to mark topics as "reviewed", "final" etc.

Crawford: I tend to agree now, implement this as a Plugin and enhance the Plugin API to do that. This could be a pre-installed Plugin though. CairoRelease and DakarRelease already support non-core meta data introduced by Plugins, such as %META:ANYTHINGNEW{name="..." value="..."}%. Some proprientary Plugins use this undocumented feature already. Non-core meta data survives an edit/save cycle (verified also in Dakar).

Steffen: Good point on making maintenance tasks more transparent.

Sven: This topic was triggered by something much bigger than TWiki.org. TWiki.org also needs cleanup, but what is needed here more is to review the workflow and taxonomy, and to restructure the content accordingly.

-- PeterThoeny - 01 Nov 2005

Crawford's proposal to implement handlers to intercept search is very interesting for TWikiApplications too where implementation, admin and data topics are located in the same web but a search should only retrieve data topics and not implementation or admin topics. Data topics might provide all its payload in forms and not in the whiteboard area which is only used to layout the TWikiApplication. So displaying a search hit should not show the typical text summary but render a nice form extract being totally application-specific.

-- MichaelDaum - 02 Nov 2005

Just realized, this could be implemented very easily, and we are half way there:

  • Tag topics as "stale_content"
  • Add a switch to search to exclude topics tagged as "stale_content". This should be done via a Plugin callback handler.

-- PeterThoeny - 03 Mar 2006

There was some discussion in twiki-dev to create an attic web for the Codev web.

I do not think that creating a new web is necessarily the best solution. Lets take a step back and see what the problem points are. The key issue is that there is a lot of stale content in Codev, the secondary point is that there is clutter and unorganized content.

I see these requirements:

  • Stale content needs to be out of the way, but accessible
  • Cool URIs don't change
  • It should be easy to mark content as stale
  • It should be easy to mark useful content
  • It should be easy to access useful content
  • No new webs on twiki.org (use other means of organizing content)

Organizing content is a taxonomy/folksonomy question. I posted a proposed process/tool to manage stale content on this topic a while ago.

Looking at the requirements and the proposed process, we can manage content in Codev like this:

  • Tag useful content
  • Tag stale content as "stale_content"
  • Enhance the Plugins API with search result hook
  • Filter out stale content from a default search
  • Make a search switch to include stale content in the search
  • Do a scripted tagging of very stale Codev topics (something like: Older than x month, and not looked at by an authenticated user for y month)

-- PeterThoeny - 08 Mar 2006

Organizing content is a taxononomy/folksonomy question when you're starting with an empty or at least very small web. Then people (hopefully) tag as you go. It is not a taxonomy question when you have 4k+ topics.

-- MeredithLesly - 08 Mar 2006

Oh, yes, I forgot ExampleAtticTopics and CruftSearch. (The latter by Rafael.)

-- MeredithLesly - 08 Mar 2006

There is another perspective that I think is worth considering, even if it doesn't neatly fit into above discussion. That is the factor of how much effort the solution will require in relation to the net benefit. The above discussion mostly explores optimal solutions in a ideal world.

For me, the overwhelming need is simply to have TWiki.org more accessible to new users who are quickly overwhelmed/discouraged by current status-quo. For this reason, I have advocated moving all of the development discussion to a more discrete location (and then work on reorganizing it if so desired) - and re-organize twiki.org as a radically simplified format that better serves the broadest user needs. This, imho, would bring the most immediate, most broad benefit with the least effort. I have a pretty serious concern that while the twiki development community is splitting hairs in this kind of discussion, we are rapidly loosing our strategic position in the wiki marketplace.

-- LynnwoodBrown - 08 Mar 2006

This is not a problem of managing "Stale Content". Is more a problem of the tremendous chaos that is Codev.

I ask "What is the purpose of Codev"? I see it has several uses:

  1. Brainstorm Ideas, that later are made into Features
  2. Report Bugs and Issues
  3. Serve as a place to store... interesting stuff (Howto's, links to the competition, etc)
  4. Track & Announce releases

As such, Codev has accumulated too much "cruft" over the years because it's was overloaded" with too many responsibilities.

Now, we can see that some of the responsibilities of Codev are being moved to other places: 2. Report Bugs and issues: This is being done in Support and Bugs 4. Track & Announce releases: Tracking is being done in the Bugs web.

What I think that we should do is:

  • Move all the Bug reports, feature done and release tracking topics "out of the way" (Attic?).
  • Either move all the Feature brainstorming discussion to a Development web, or all the "interesting stuff" someplace else (I vote for the former, as I think it's easier).
  • Use each place for it's function, and only that function alone.

I strongly disagree that tagging content will help. It won't help the WebChanges being flooded with content I don't care about while moving out of the page the content I do care (no, I don't want to use the RSS feed), nor it will help the severe performance impact in search while looking through 4000+ topics, and won't help when I look for information about the META PI and I need to scan 500+ unrelevant topics.

Think about this: If you are looking for information that you've never look at and don't know how to classify, what is your first choice: Google or del.icio.us?

-- RafaelAlvarez - 08 Mar 2006

I agree wholeheartedly with both Lynnwood and Rafael. I cringe at the idea of people looking at twiki.org and being so flooded that they flee.

Managing chaos is an all-too-good way of describing it.

-- MeredithLesly - 09 Mar 2006

No new webs on twiki.org (use other means of organizing content) : but we do have to think anew about how to organize content for (new) visitors/TWiki users.

Look at the current web names: Codev is not a friendly name. TWiki is not clear that it is about TWiki. Main is not the main entrance on twiki.org. Plugins is actually for developers.

I don't know if the web names should go (actually yes, preferably), but at least we need a clear navigation with recognizable labels. And preferably so few that they will fit on a horizontal menu bar:

  • Home (shortcuts to download, TWiki features, Success stories and TWiki in the news)
  • About TWiki (TWiki features, Success stories and TWiki in the news)
  • Development
    • Plugins
    • Archived development topics
    • Sandbox
  • Documentation (lots of the current TWiki web)
    • Cookbooks
  • Support

So 4 main entrances. Main web can be a small link in the meta navigation as it is not important for twiki.org. Call it Users or Users & Groups.

Cool URIs don't change - note that TWiki URIs are not cool at all: cgi-bin in the middle, pages organized by file structure.

We would like to, but we just don't have the right tools.

Now here is one I can sympathize with. I agree entirely. What you need to do is to have the web server look up a persistent URI in an instant and return the file, wherever your current crazy file system has it stored away at the moment.

Cool URIs don't change - by Tim Berners-Lee (1998)

If we want to have cool URIs we have some work to do. Don't let it be an argument now to hold on to a bad structure.

-- ArthurClemens - 09 Mar 2006

Wow. Such a well written and thought out comment that I don't really have anything to add.

-- MeredithLesly - 09 Mar 2006

Arthur, indeed very well summarized! I get a sense of deja-vue on growing pains, not here but seen at WindRiver. An improved HomePageNavigation was the answer. We should focus on a good TWikiOrgNavigationModel, which is not necessarily tied to the web structure (in fact it should be detached from it.)

I see a lot of discussion but not much coordinated effort in improving the taxonomy of TWiki.org with navigation/browsing/search. Lynnwood helped a lot in refining the document structure of the TWiki web (with entry point TWiki.WebHome, and sets of DistributionDocuments, SupplementalDocuments and HistoricalDocuments.)

-- PeterThoeny - 09 Mar 2006

Peter, I think that the whole point is that is not possible to improve the taxonomy of Codev in the state it is today. Pretending that the community will tag the current 4257 topics in Codev is naive at the least.

Besides, as I stated in my email and pasted above, improved navigation (be it tags or categories) will be useful only to search for "related content", but it's totally useless to look for "specific content". Why? because the "taxonomy" is being created in a free-form way (much like del.icio.us) and there is absolutely no guarantee that the content I'm looking for is using the tag I think it should have. If that's not the case, then I'll need to wade through all the cruft in search (because the 4000+ topics and growing) to find the topic I need an finally tag it in a way that will be useless to another user that don't think the same way I think. Also, it won't help at all the fact that searches are slower as the number of topic increase.

-- RafaelAlvarez - 09 Mar 2006

Peter raised the matter of HomePageNavigation. In fact with a good navigation scheme it doens't matter where the topics are stored. Its the presentation and human-interface that counts.

I've been experimenting with a variation on things Peter has shown. I use a nested search by the topic classification with a twisty to hide the topics in the classication, It allows a vast amount of information to be presented and makes use of the classifaction scheme in a useful manner.

-- AntonAylward - 09 Mar 2006

Does anyone care that some of us are getting more and more disenchanted? It seems that discussions/arguments either go endlessly or terminate due to hopelessness, knowing that the PowersThatBe will do what they want to do no matter what.

I am not going to go through Codev tagging things. I don't feel that searching Codev for information is productive. I don't understand why people think a chaotic multi-purpose web with 5 years of stuff and over 4k topics is a good thing.

Yes, a good navigation scheme is, well, good. That doesn't solve the searching problem at all. And, as Raphael point out, tags are already becoming crufty and unmanageable and they've been around, what, a couple of weeks? So in the end we're going to attempt to organize chaos with more chaos.'

I assume, from its name, that folksonomy assumes a way of categorising things that is commmon to a community. Somehow I'm skeptical that that concept applies here.

-- MeredithLesly - 09 Mar 2006

IMHO the discussions in the current form is not productive and does not help advance the TWikiCommunity. How about a CustomerFocusedTWikiOrg for a change?

-- PeterThoeny - 10 Mar 2006

I would like to know why nearly EVERYBODY is evading my main point: A good navigation won't help without good content. My MAIN complain, and the one that seems to be diluted (my fault) or ignored (not my fault) or whatever (i don't care whose fault) is: SEARCH results are polluted by all the cruft. If we don't solve that, any "good" navigation will be useless for those that want to SEARCH (not navigate, SEARCH) Codev.

Perhaps I'll give a chance to Crawford's proposal in ScriptToReorganiseCodev (I think I can tag at least 10 topics a day), but without an archive web it'll be fruitless.

-- RafaelAlvarez - 10 Mar 2006

Rafael, we are in agreement that stale content should not pollute search. You possibly missed related discussions elsewhere. The what is that search, WebIndex, WebChanges, WebRss, etc do not show stale content. The how is another question. It could be done by moving content around (thus creating many broken links; also from outside, such as from the many TWikiHistory topics out there). It can also be done by leaving the content where it is, but with stale content excluded. So, if you can't see it it is not there. My plan is to add Plugin handler(s) for search, so that Plugins can intervene the search result. That way, the TagMePlugin can actually filter out all topics marked as "stale_content". This, combined with a scripted approach to tag stale content, will address your concern.

-- PeterThoeny - 11 Mar 2006

Having handlers to filter out the result may slow down SEARCH even more. Btw, Archiving is a well known operation that has been done even before computers where there. Moving stale content out of the way to a well-known location will not break existing links. And if you plan to put a new handler (in TWiki4 or for Edin*, btw?), why not put a handler so wikiwords can be resolved by plugins? This way FindElsewherePlugin can work on Urls too (and you can implement mirroring with a plugin, hint, hint smile )

Now, forgetting about all the rest, I just want BugResolved, FeatureDone, FeatureRejected and all the old tracking topics out of Codev. That way it's easier to look for old (but good) ideas to implement in Edin*.

-- RafaelAlvarez - 11 Mar 2006

I note that under When does content get stale? above there is the item An engineering project is completed. This happens to us on TWiki.org with every release. After every release we are left with the old tracking topics. This has been lessened somewhat with the latest release because of the use of the Bugs web but I suspect that somewhere down the line after a few releases people will be asking for the bugs web to be cleaned up because it takes too long to search it.

What I think we need is a separate web for each release/branch, starting with creating "archive" webs for all the previous releases. (See CoolURIsDontChange, AvoidRenameLosingHistory or TopicRenamedHandler for ways to preserve outside links.)

This would give us several benefits:

  • We effectively get an automatic cleanup after every release by moving to a new web.
  • We can reuse topic names to track the same feature in each release. No more CairoPerformanceIssues / DakarPerformanceIssues, just a PerformanceIssues topic in each web.
  • Easier and faster to monitor changes to just one release/branch (i.e. monitor Dakar point releases while Edinburg development continues)
  • Much easer to redefine WebForm for each release to change the way we track development.

That last point is the best reason I can think of for having a new web for each release. If we want to change the way we track features/bugs after every release it currently requires a lot of work to modify all the historic content that is already using the form you are changing. Preserving existing content becomes a lot easier when you've left it behind in another web. (We could use a new form for each release but that would get just as confusing and has it's own set of problems.)

We've already had to do this for Dakar by creating the Bugs web.

-- SamHasler - 19 Apr 2006

Excellent idea, Sam. The tricky part right now is that it wasn't done after Dakar was released, so some content will have to be moved or copied. But that's a small price to pay IMO.

-- MeredithLesly - 19 Apr 2006

Edit | Attach | Watch | Print version | History: r26 < r25 < r24 < r23 < r22 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r26 - 2006-04-19 - MeredithLesly
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.