Tags:
create new tag
view all tags

Feature Proposals » Anchor regex not defined correctly

Summary

Current State: Developer: Reason: Date: Concerns By: Bug Tracking: Proposed For:
AcceptedProposal CalvinSo AcceptedBy7DayFeedbackPeriod 2021-12-23   Item7937 LimaRelease

Edit Form

TopicSummary:
CurrentState:
CommittedDeveloper:
ReasonForDecision:
DateOfCommitment:   Format: YYYY-MM-DD
ConcernRaisedBy:
BugTracking:
OutstandingIssues:
RelatedTopics:
InterestedParties:
ProposedFor:
TWikiContributors:
 

Motivation

Currently external link is working correctly for anchor but not for internal links, anchor definition for [[..][..]] should follow definition in RFC3986.

Please check example below in Example Section

https://twiki.org/cgi-bin/view/Codev/AnchorRegexnotdefinedcorrectly#:~:text=AnchorText

AnchorRegexnotdefinedcorrectly#:~:text=AnchorText

Description and Documentation

Regex definition of anchor is not correct in TWiki.pm

1. Does not support anchor name with special character like '-' E.g. Testpage#this-is-an-anchor

2. Does not support anchor name in non-alphabet charcaters like '%2A' E.g. Testpage#%2A%32

3. Does not support Scroll To Text Fragment anchor like Testpage#:~:text=anchortext

Line 468

   $regex{anchorRegex} = qr/\#[$regex{mixedAlphaNum}_]+/o;

Line 440

   $regex{mixedAlphaNum} = $regex{mixedAlpha}.$regex{numeric};

Examples

AnchorText 日本

These link works if it is an externalLink https://twiki.org/cgi-bin/view/Codev/AnchorRegexnotdefinedcorrectly#:~:text=AnchorText

https://twiki.org/cgi-bin/view/Codev/AnchorRegexnotdefinedcorrectly#%E6%97%A5%E6%9C%AC

https://twiki.org/cgi-bin/view/Codev/AnchorRegexnotdefinedcorrectly#this-is-an-anchor

https://twiki.org/cgi-bin/view/Codev/AnchorRegexnotdefinedcorrectly#:~:text=a%20dot,

These link doesn't work the whole URL is treated as new topic name

AnchorRegexnotdefinedcorrectly#:~:text=AnchorText

AnchorRegexnotdefinedcorrectly#%E6%97%A5%E6%9C%AC

AnchorRegexnotdefinedcorrectly#this-is-an-anchor

AnchorRegexnotdefinedcorrectly

Impact

WhatDoesItAffect: Rendering

Implementation

-- Contributors: Calvin So - 2021-12-23

Discussion

Proposing 1 line code change #Item7937

$regex{anchorRegex} = qr/\#(?:[$regex{mixedAlphaNum}_\/\?\-\+\.\'=~:!\$&\(\)*,;]|%[0-9a-fA-F]{2})+/o;

-- Calvin So - 2021-12-23

This looks good, except for one corner case: It is not uncommon to write a sentence that ends in a link, followed by a dot, comma, or semicolon, such as https://twiki.org/cgi-bin/view/Codev/AnchorRegexnotdefinedcorrectly#:~:text=AnchorText, or https://twiki.org/cgi-bin/view/Codev/AnchorRegexnotdefinedcorrectly#this-is-an-anchor. In this case, the trailing punctuation should be excluded.

-- Peter Thoeny - 2021-12-24

https://twiki.org/cgi-bin/view/Codev/AnchorRegexnotdefinedcorrectly#:~:text=a%20dot,%20 <-for this use case and according to RFC3968, punctuation like , ; / ? should be included.

-- Calvin So - 2021-12-27

Correct, just not at the end.

-- Peter Thoeny - 2021-12-27

TWiki already excludes punctuation at the end of external links, for inspiration see sub getRenderedVersion in lib/TWiki/Render.pm

$text =~ s/(^|[-*\s(|])($TWiki::regex{linkProtocolPattern}:([^\s<>"]+[^\s*.,!?;:)<|]))/$1._externalLink( $this,$2)/geo;

-- Peter Thoeny - 2021-12-27

Thanks, I see what you are talking about now. Seems external links are handled in a totally different manner where it never reference $regex{anchorRegex} and wiki link seems not excluding punctuation currently. You want to add and exclude them now also?

-- Calvin So - 2022-01-04

Currently, external links are working fine, this problem only occurs in anchor definition for [[...][...]] when it is trying to guess the name from the anchor and also check for valid wiki words. Punctuation seems already ignored totally during that check.

-- Calvin So - 2022-01-04

The problem for punctuation does not exist for anchor links with the current implementation because qr/\#[$regex{mixedAlphaNum}_]+/o does not include punctuation. Once you add punctuation to the anchor we should exclude trailing punctuation, as it's done for external links. That is, handle trailing punctuation for anchors in standalone external links and internal links (e.g. special case for trailing punctuation does not apply for [[...][...]] links.

Examples where the special case of trailing punctuation applies:

The special case can be ignore in [[...][...]] links, or kept the same as for external links in case the implementation is easier.

-- Peter Thoeny - 2022-01-04

I would like to understand the reason behind for external link to exclude trailing punctuation first because it is not excluded in RFC document. I am wondering if it has do to with the "autolink" function where it is trying to eliminate the trailing punctuation while detecting a link in a sentence ending with common, fullstop, question marks etc. ?

-- Calvin So - 2022-01-06

If that is the case I am planning to handle that by

    unless( TWiki::isTrue( $prefs->getPreferencesValue('NOAUTOLINK')) ) {
        # Handle WikiWords
        $text = $this->takeOutBlocks( $text, 'noautolink', $removed );
        $text =~ s/$STARTWW(?:($TWiki::regex{webNameRegex})\.)?($TWiki::regex{wikiWordRegex}|$TWiki::regex{abbrevRegex})($TWiki::regex{anchorRegex}?[^\s*.,!?;:)<|]+)?/_handleWikiWord( $this,$theWeb,$1,$2,$3)/geom;
        $this->putBackBlocks( \$text, $removed, 'noautolink' );
    }

-- Calvin So - 2022-01-06

I am sorry, very late I notice that you refer to only square bracket links. I think we should look at internal and external links, each normal and with square bracket, with and without trailing punctuation.

As a test case I created TestLinks. IN6...IN11, and IB5...IB8 currently do not work properly.

$TWiki::regex{anchorRegex} is also used by the WYSIWYG editor, so that needs some investigation too.

-- Peter Thoeny - 2022-01-18

Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r11 - 2022-01-18 - PeterThoeny
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.