NeditRegularExpressions < Wikilearn

Some examples of regular expressions I've used in Nedit's search and replace dialog.

Hint: If you're going to do much with the dialog, click on the "Keep Dialog" "pushbutton" to keep the dialog from closing and losing all your settings each time you do one find or replace.

Updates: Some quirks (or features) of Nedits find or find and replace dialogs:

<feature:> (re: closing and losing all your settings, above) I learned (quite a while ago) that, in that case, you can simply use the UP arrow to scroll back through previous settings (or the DOWN arrow to scroll forward again)

<quirk:> Sometimes (on systems I've used), when I pop up a find or find and replace dialog, I cannot enter anything. If I move the mouse cursor into the document (anywhere) and click, I can then (usually) enter into the dialog.

See AboutThesePages.

Contents:

Change an UPPERCASE WORD to Title Case
Delete Cruft after a WikiWord
Duplicate a TWiki Variable
Removing Hard Line Breaks from Text
Convert Snippets of an HTML File to TWiki Markup
Convert a man page to plain text
Tips
Contributors
Page Ratings

Change an UPPERCASE WORD to Title Case

The following works!

find:

<([A-Z])([A-Z]+)

replace with (note one vs. ell!):

\1\L\2

Changes RESULTS to Results.

<Learn more about \l, \L, &, and so forth on http://www.nedit.org/documentation/5.1/RegEx_Parenthetical_Constructs.shtml -- they might be unique to Nedit:

"The capitalization of text inserted by `&' or `\1', `\2', ... `\9' can be altered by preceding them with `\U', `\u', `\L', or `\l'. `\u' and `\l' change only the first character of the inserted entity, while `\U' and `\L` change the entire entity to uppercase or lowercase, respectively."

Delete Cruft after a WikiWord

(after copying and pasting results from an inline search on a TWiki view page)

Aside: Now that I know / remember that I can extend a selection with the keyboard navigation keys it would be faster and easier to record a quick and dirty keyboard macro, especially as the " * " was added at the beginning of the line via a keyboard macro, and I will concatenate two adjacent lines together with a keyboard macro -- the keyboard macro can do it all in one step.

To convert this:

   * EmailInterimUpdate 16 Aug 2002 - 03:06 - NEW   RandyKramer

To this:

   * EmailInterimUpdate --

I searched for:

   \* (\w*) .*

and replaced it with:

   \* \1 --

Notes:

\w* found the wiki word (\w is equivalent to [a-zA-Z0-9]+)
the parenthesis assigned the wiki word to \1

Duplicate a TWiki Variable

I don't recall exactly why I was doing this -- I think I was creating some documentation for TWiki (probably on my private TWiki or on WikiLearn) and wanted to show the name of the variable (hence the %NOTIFYTOPIC%) the current contents (hence WebNotify), and the comments (notify topic name (WebNotify).

To change this:

%NOTIFYTOPIC%    notify topic name (WebNotify)

To this:

%<nop>NOTIFYTOPIC% %NOTIFYTOPIC%    notify topic name (WebNotify)

I searched for this:

(^%)([a-zA-Z]+%)

And replaced it with this:

\1<nop>\2 \1\2

Removing Hard Line Breaks from Text

While trying to preserve proper spacing around punctuation. wink

This is the strategy I started to follow to remove new lines from the plain text file while trying to preserve proper spacing around punctuation. I tried this once, had some problems — needs some troubleshooting / meticulous retrying. The intent is to follow the steps in order.

This could get even more complicated with end of sentence or mid sentence constructs like:

last word."

last word".

I haven't attempted to deal with quotes in the following table.

Initially, the best thing may be to handle this semi-automatically, until I see a real possibility for an automatic "algorithm" (the way I did similar things in Word).

BTW: The first and last steps are a big part of the "trick" to doing this — if you don't replace the \n\n with some unique string, then, when you start doing other substitutions you may change some \n\n paragraph separators to single \n's. The last step puts the \n\n back in place of the unique string. I used to do things like this fairly often in Word for basically text only files (no code, verbatim sections, or ASCII graphics) — I may have some new things to learn.

And, to repeat, the following needs testing!

Step	Find	Replace	Notes
1	\n\n	<>	or some string that will be unique in the file
2	". \n"	". "	ignore quotes, they are just to show spaces
3	". \n"	". "
4	".\n"	". "
5 - 10	""	""	Repeat steps 2 thru 4 for ! and \?
11	",\n"	", "
12	";\n"	"; "
13	")\n"	") "	Might be OK for plain text, not so sure for code or verbatim stuff
14 - 16	""	""	Repeat step 13 for >, }, and ]
	""	""
?	"<>"	"\n\n"	This is the last step
	""	""
	""	""

Convert Snippets of an HTML File to TWiki Markup

The first aggravating problem I ran into was how to find line breaks with a regular expression. I found out how to do it in Nedit — is it the same (or similar) in Perl, sed, awk?

Introduction and Status Overview

UPDATE: Attempting this was interesting, and could probably be done (I'll be adding some more notes about things I did), but, as it turned out, simply copying (copy and paste) from the regular browser window to Nedit made a file that seems highly readable and wraps lines to the window. (This as opposed to copying (copy and paste) from the View Source browser window, which resulted in a file with tons of HTML to deal with.)

I'm trying to convert some web pages on routing that won't wrap to put on my private TWiki (at least) so they will wrap and will be easier to read. In the course of doing that, I need to make conversions as listed below. I may think about making a Perl, awk, or sed script to make this easier in the future (for similar pages — however, that may not be a terrible limitation as the page I'm working on came from the LDP and (IIUC) originates as SGML markup) — hmm, maybe I should start from the raw SGML instead of copying and pasting HTML from the web page?.

How to Match Newlines

To make a \s or . match a newline, prefix the regular expression with "?n" and enclose the entire thing in parenthesis. (See the third screen of Nedit help for regular expressions for more explanation and variations.) The nedit help describes ?n as a command to get \s and . to match newlines — parenthesis around the whole string is part of the syntax. I'm not clear why you'd need ?N, but it forces \s and . to not match newlines — I guess that would be useful if you had a more complicated situation and needed to nest the expressions.

How to construct the search string:

copy the target string to a scratch area (if you put it in the search dialog, you will have to backspace to delete each newline character to see the entire string)
insert a \ in front of all characters that need to be escaped (in these cases, mainly < and >)
insert a \s in place of each newline character (now the string should be all on one line)
insert a ?n in front of the entire string
enclose the entire string in parenthesis

These examples might take less space as a table, if I could put the multi-line strings in a table (or, if I didn't want to preserve the line breaks in the table).

Paragraph End and Start

String to find (including line breaks):

</P
><P
>

regular expression to match the string: (?n\</P\s\>\<P\s\>)

replace with: "/n/n" (ignore the " ")

Literal (=)

String to find (including line breaks): (Later, consider an optional leading ' (or `?).)

<TT
CLASS="LITERAL"
>

regular expression to match the string: (?n\<TT\sCLASS="LITERAL"\s\>)

replace with: "=" (ignore the " ")

/Literal (=)

String to find (including line breaks):

</TT
>

regular expression to match the string: (?n\</TT\s\>)

replace with: "=" (ignore the " ")

<pre>

String to find (including line breaks):

</P
><P
><PRE
CLASS="SCREEN"
>

regular expression to match the string: (?n\</P\s\>\<P\s\>\<PRE\sCLASS="SCREEN"\s\>)

replace with: "\n\n<pre>" (ignore the " ")

</pre>

String to find (including line breaks):

</PRE
></P
><P
>

regular expression to match the string: (?n\</PRE\s\>\</P\s\>\<P\s\>)

replace with: "</pre>\n\n" (ignore the " ")

Next Case

String to find (including line breaks):


regular expression to match the string: 

replace with: "" (ignore the " ")

Convert a man page to plain text

Create a file containing the text of the man page =man <man_page_name> >> <man_page_name>.txt
Find "\n " (seven spaces), replace with \n\n
Find " " (seven spaces), replace with = = (one space)
Find <bs>, replace with nothing

Tips

"Search" (move cursor) to beginning of next word: do an RE search on "<"

Contributors

RandyKramer - 27 Sep 2001 (on home TWiki)
RandyKramer - 24 Mar 2002 (transferred to WikiLearn)
<If you edit this page, add your name here, move this to the next line>

Page Ratings

WebForm
PageStatus	Scribbles

Topic revision: r11 - 2016-02-04 - RandyKramer

Edit
Attach

Copyright � 1999-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding WikiLearn? WebBottomBar">Send feedback
See TWiki's New Look