Tags:
create new tag
, view all tags

Feature Proposal: Topic Diffs should support by-word comparison

Motivation

The current topic diff does not make it easy to see what changed. The output format is marginal for code, but quite unhelpful for the text that TWiki is oriented toward. I spent 30 minutes this morning trying to find what turned out to be a 2 word change in a 300 word paragraph - at least, as DIFF saw it.

Description and Documentation

Just highlight the insertions and deletions

It would be nice if the rdiff script used to compare topic revisions highlighted changes within each line.

With each paragraph in a topic being a very long line, and users who add a comma, delete a space, or fix a single word, it can be difficult to see what happened.

Examples

Currrent:

Changed:
<
<
The web is represents each person as a primary topic that has a standardized format. These topics contain a form for vital statistics, a machine-generated
>
>
The web represents each person as a primary topic that has a standardized format. These topics contain a form for vital statistics, a machine-generated

Desired: Caution: the WYSIWYG editor will remove the formatting in the following line - if you don't see red, try an earlier revision

Changed:
  The web is represents each person as a primary topic that has a standardized format. These topics contain a form for vital statistics, a machine-generated

Impact

WhatDoesItAffect: Documentation, UI, Usability

Implementation

I suppose I should just complain - but here's a solution:

I'm not sure that I understand all the subtleties of the renderer, but I do have a prototype that seems functional.

This patch:
* Is against TWiki 4.2.3
* implements renderstyle={sequentialbyword, sequentialbyline}. byword is the default and what you get with renderstyle=sequential. byline is for anyone who relies on the old style. One might argue that the the default should be backwards compatibility - but then I'd have had to patch the differences GUI. (Exercise for the reader who cares.)
* Modifies UI/RDiff.pm
* Adds UI/WDiff.pm
* Requires CPAN's Text:::WordDiff
* Is a lot simpler than it appears - there are only 13 new/changed executable statements in RDiff and 7 in WDiff - the rest is indentation & comments.
* Could probably use some more testing.

You're welcome to take it as is, or send it through your new feature process. It seems a simple enough patch that the latter would be overkill.

As usual, I'm providing the technlogy - the release mechanics need to come from someone else.

Enjoy.

<verbatim> 
--- lib/TWiki/UI/RDiff.pm.4.2.3 2008-09-11 23:41:58.000000000 -0400
+++ lib/TWiki/UI/RDiff.pm       2009-01-01 10:17:56.000000000 -0500
@@ -244,13 +244,13 @@
 #| Parameter: =$left= | the text blob before the opteration |
 #| Parameter: =$right= | the text after the operation |
 #| Return: =$result= | Formatted html text |
 #| TODO: | this should move to Render.pm |
 sub _renderSequential
 {
-    my ( $session, $web, $topic, $diffType, $left, $right ) = @_;
+    my ( $session, $web, $topic, $diffType, $left, $right, $seqByLine ) = @_;
     my $result = '';
     ASSERT($topic) if DEBUG;
     #note: I have made the colspan 9 to make sure that it spans all columns (thought there are only 2 now)
     if ( $diffType eq '-') {
         $result .=
@@ -274,26 +274,40 @@
                           'Unchanged',
                           'Unchanged',
                           _renderCellData( $session, $right, $web, $topic ),
                           'u', '',
                           $session );
     } elsif ( $diffType eq 'c') {
-        $result .=
-          _sequentialRow( '#D0FFD0',
-                          'Changed',
-                          'Deleted',
-                          _renderCellData( $session, $left, $web, $topic ),
-                          '-', '&lt;',
-                          $session );
-        $result .=
-          _sequentialRow( undef,
-                          'Changed',
-                          'Added',
-                          _renderCellData( $session, $right, $web, $topic ),
-                          '+', '&gt;',
-                          $session );
+       if( $seqByLine ) {
+           $result .=
+               _sequentialRow( '#D0FFD0',
+                               'Changed',
+                               'Deleted',
+                               _renderCellData( $session, $left, $web, $topic ),
+                               '-', '&lt;',
+                               $session );
+           $result .=
+               _sequentialRow( undef,
+                               'Changed',
+                               'Added',
+                               _renderCellData( $session, $right, $web, $topic ),
+                               '+', '&gt;',
+                               $session );
+       } else {
+           my $diff = word_diff( \$left, \$right, { STYLE=>'TWiki::UI::WDiff',
+                                                } );
+           $diff = _renderCellData( $session, $diff, $web, $topic );
+
+           $result .= CGI::Tr(CGI::td({bgcolor=>'#D0FFD0',
+                                       class=>"twikiDiffChangedHeader",
+                                       colspan=>9},
+                                      CGI::b( $session->i18n->maketext('Changed').': ')));
+           $result .=CGI::Tr(CGI::td( "&nbsp;" ),
+                             CGI::td({class=>"twikiDiffChangedText", colspan=>9}, $diff));
+
+       }
     } elsif ( $diffType eq 'l' && $left ne '' && $right ne '' ) {
         $result .= CGI::Tr({bgcolor=>$format{l}[0],
                             class=>'twikiDiffLineNumberHeader'},
                            CGI::th({align=>'left',
                                     colspan=>9},
                                     ($session->i18n->maketext('Line: [_1] to [_2]',$left,$right))
@@ -311,13 +325,13 @@
 #| Parameter: =$diffArray= | array generated by parseRevisionDiff |
 #| Parameter: =$renderStyle= | style of rendering { debug, sequential, sidebyside} |
 #| Return: =$text= | output html for one renderes revision diff |
 #| TODO: | move into Render.pm |
 sub _renderRevisionDiff
 {
-    my( $session, $web, $topic, $sdiffArray_ref, $renderStyle ) = @_;
+    my( $session, $web, $topic, $sdiffArray_ref, $renderStyle, $seqByLine ) = @_;
 #combine sequential array elements that are the same diffType
     my @diffArray = ();
        foreach my $ele ( @$sdiffArray_ref ) {
                if( ( @$ele[1] =~ /^\%META\:TOPICINFO/ ) || ( @$ele[2] =~ /^\%META\:TOPICINFO/ ) ) {
                        # do nothing, ignore redundant topic info
@@ -348,26 +362,26 @@
                }
                if (( @$diff_ref[0] eq '-' ) && ( @$next_ref[0] eq '+' )) {
                    $diff_ref = ['c', @$diff_ref[1], @$next_ref[2]];
             $next_ref = undef;
                }
                if ( $renderStyle eq 'sequential' ) {
-                   $result .= _renderSequential( $session, $web, $topic, @$diff_ref );
+                   $result .= _renderSequential( $session, $web, $topic, @$diff_ref, $seqByLine );
                } elsif ( $renderStyle eq 'sidebyside' ) {
             $result .= CGI::Tr(CGI::td({ width=>'50%'}, ''),
                                CGI::td({ width=>'50%'}, ''));
                    $result .= _renderSideBySide( $session, $web, $topic, @$diff_ref );
                } elsif ( $renderStyle eq 'debug' ) {
                    $result .= _renderDebug( @$diff_ref );
                }
                $diff_ref = $next_ref;
        }
     #don't forget the last one ;)
     if ( $diff_ref ) {
         if ( $renderStyle eq 'sequential' ) {
-            $result .= _renderSequential ( $session, $web, $topic, @$diff_ref );
+            $result .= _renderSequential ( $session, $web, $topic, @$diff_ref, $seqByLine );
         } elsif ( $renderStyle eq 'sidebyside' ) {
             $result .= CGI::Tr(CGI::td({ width=>'50%'}, ''),
                                CGI::td({ width=>'50%'}, ''));
             $result .= _renderSideBySide( $session, $web, $topic, @$diff_ref );
         } elsif ( $renderStyle eq 'debug' ) {
             $result .= _renderDebug( @$diff_ref );
@@ -388,13 +402,13 @@
 invoked via the =UI::run= method.
 Renders the differences between version of a TwikiTopic
 | topic | topic that we are showing the differences of |
 | rev1 | the higher revision |
 | rev2 | the lower revision |
-| render | the rendering style {sequential, sidebyside, raw, debug} | (preferences) DIFFRENDERSTYLE, =sequential= |
+| render | the rendering style {sequential, sidebyside, raw, debug} | (preferences) DIFFRENDERSTYLE, =sequential= ; sequentialbyword (default)
 or sequentialbyline |
 | type | {history, diff, last} history diff, version to version, last version to previous | =history= |
 | context | number of lines of context |
 | skin | the skin(s) to use to display the diff |
 TODO:
    * add a {word} render style
    * move the common CGI param handling to one place
@@ -412,12 +426,17 @@
     TWiki::UI::checkWebExists( $session, $webName, $topic, 'diff' );
     TWiki::UI::checkTopicExists( $session, $webName, $topic, 'diff' );
     my $renderStyle = $query->param('render') ||
       $session->{prefs}->getPreferencesValue( 'DIFFRENDERSTYLE' ) ||
         'sequential';
+    my $seqByLine = ($renderStyle eq 'sequentialbyline');
+    $renderStyle =~ s/^sequential.*$/sequential/;
+    if( $renderStyle eq 'sequential' && !$seqByLine ) {
+       use Text::WordDiff;
+    }
     my $diffType = $query->param('type') || 'history';
     my $contextLines = $query->param('context');
     unless( defined $contextLines ) {
         $session->{prefs}->getPreferencesValue( 'DIFFCONTEXTLINES' );
         $contextLines = 3 unless defined $contextLines;
     }
@@ -491,13 +510,13 @@
         # eliminate white space to prevent wrap around in HR table:
         $rInfo =~ s/\s+/&nbsp;/g;
         $rInfo2 =~ s/\s+/&nbsp;/g;
         my $diffArrayRef = $session->{store}->getRevisionDiff(
             $session->{user}, $webName, $topic, $r2, $r1, $contextLines );
         $text = _renderRevisionDiff( $session, $webName, $topic,
-                                     $diffArrayRef, $renderStyle );
+                                     $diffArrayRef, $renderStyle, $seqByLine );
         $diff =~ s/%REVINFO1%/$rInfo/go;
         $diff =~ s/%REVINFO2%/$rInfo2/go;
         $diff =~ s/%TEXT%/$text/go;
         $page .= $diff;
         $r1 = $r1 - 1;
         $r2 = $r2 - 1;
--- /dev/null   2008-04-26 06:27:25.709194980 -0400
+++ lib/TWiki/UI/WDiff.pm       2009-01-01 10:22:56.000000000 -0500
@@ -0,0 +1,43 @@
+package TWiki::UI::WDiff;
+
+use strict;
+use HTML::Entities qw(encode_entities);
+use vars qw($VERSION @ISA);
+
+$VERSION = '0.01';
+@ISA = qw(Text::WordDiff::Base);
+
+sub file_header {
+    return '';
+}
+
+sub same_items {
+    shift;
+    return encode_entities( join '', @_ );
+}
+
+sub delete_items {
+    shift;
+    return '<del style="background:#ff9999;" class="twikiDiffDeletedMarker">' . encode_entities( join'', @_ ) . '</del>';
+}
+
+sub insert_items {
+    shift;
+    return '<ins style="background:#ccccff;" class="twikiDiffAddedMarker">' . encode_entities( join'', @_ ) . '</ins>';
+}
+
+1;
+__END__
+
+
+=begin comment
+
+=head1 Copyright and License
+
+Copyright (c) 2005-2008 David Wheeler. Some Rights Reserved.
+Copyright (c) 2009 Timothe Litt. Some Rights Reserved.
+
+This module is free software; you can redistribute it and/or modify it under the
+same terms as Perl itself.
+
+=cut
</verbatim>

-- Contributors: TimotheLitt - 01 Jan 2009

Discussion

Thank you Timothe for providing this feature request and a patch! Line-only diff was bugging me for a long time, I think a solid word-by-word diff should be part of the core TWiki.

For inspiration, look at DiffWordByWordAddOn (can't be used in tWiki distribution since it depends on Python) and CompareRevisionsAddOn (word diff of page revisons.)

-- PeterThoeny - 01 Jan 2009

I looked at CompareRevisions - it's pretty good, but not fully integrated. For example, mailer-contrib doesn't sense & use it, though pattern.viewtopicbuttons does. It requires the history plugin for a GUI - and then still leaves the rdiff plugin as the more topic actions default. Compare Revisions doesn't return 1 on load, and still uses the ineffecient common tags handler. So there's lots of work to be done. None particularly hard, but it's work.

As for inspiration, I'm done. This patch isn't perfect, but it seems to do pretty well for the cases it's encountered so far. I'm currently using it + CompareRevisions + History. Seems to handle the various situations & was high return on small effort.

For a production version, I suggest integrating the three components into the core (including MailerContrib), fixing the more topic actions screen, and perhaps improving the more topic actions GUI. The latter should default to comparing the last two revisions & should have a place to explicitly compare two numeric revisions.

But what I have is good enough for my present needs.

-- TimotheLitt - 05 Jan 2009

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r3 - 2009-01-05 - TimotheLitt
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.