Some examples of regular expressions I've used in Nedit's search and replace dialog.
Hint: If you're going to do much with the dialog, click on the "Keep Dialog" "pushbutton" to keep the dialog from closing and losing all your settings each time you do one find or replace.
Updates: Some quirks (or features) of Nedits find or find and replace dialogs:
- <feature:> (re: closing and losing all your settings, above) I learned (quite a while ago) that, in that case, you can simply use the UP arrow to scroll back through previous settings (or the DOWN arrow to scroll forward again)
- <quirk:> Sometimes (on systems I've used), when I pop up a find or find and replace dialog, I cannot enter anything. If I move the mouse cursor into the document (anywhere) and click, I can then (usually) enter into the dialog.
See
AboutThesePages.
Contents:
Change an UPPERCASE WORD to Title Case
The following works!
find:
<([A-Z])([A-Z]+)
replace with (note one vs. ell!):
\1\L\2
Changes RESULTS to Results.
<Learn more about \l, \L, &, and so forth on
http://www.nedit.org/documentation/5.1/RegEx_Parenthetical_Constructs.shtml
-- they might be unique to Nedit:
"The capitalization of text inserted by `&' or `\1', `\2', ... `\9' can be altered by preceding them with `\U', `\u', `\L', or `\l'. `\u' and `\l' change only the first character of the inserted entity, while `\U' and `\L` change the entire entity to uppercase or lowercase, respectively."
Delete Cruft after a WikiWord
(after copying and pasting results from an inline search on a TWiki view page)
Aside: Now that I know / remember that I can extend a selection with the keyboard navigation keys it would be faster and easier to record a quick and dirty keyboard macro, especially as the " * " was added at the beginning of the line via a keyboard macro, and I will concatenate two adjacent lines together with a keyboard macro -- the keyboard macro can do it all in one step.
To convert this:
* EmailInterimUpdate 16 Aug 2002 - 03:06 - NEW RandyKramer
To this:
* EmailInterimUpdate --
I searched for:
\* (\w*) .*
and replaced it with:
\* \1 --
Notes:
- \w* found the wiki word (\w is equivalent to [a-zA-Z0-9]+)
- the parenthesis assigned the wiki word to \1
Duplicate a TWiki Variable
I don't recall exactly why I was doing this -- I think I was creating some documentation for TWiki (probably on my private TWiki or on
WikiLearn) and wanted to show the name of the variable (hence the %NOTIFYTOPIC%) the current contents (hence
WebNotify), and the comments (notify topic name (
WebNotify).
To change this:
%NOTIFYTOPIC% notify topic name (WebNotify)
To this:
%<nop>NOTIFYTOPIC% %NOTIFYTOPIC% notify topic name (WebNotify)
I searched for this:
(^%)([a-zA-Z]+%)
And replaced it with this:
\1<nop>\2 \1\2
Removing Hard Line Breaks from Text
While trying to preserve proper spacing around punctuation.
This is the strategy I started to follow to remove new lines from the plain text file while trying to
preserve proper spacing around punctuation. I tried this once, had some problems — needs some troubleshooting / meticulous retrying. The intent is to follow the steps in order.
This could get even more complicated with end of sentence or mid sentence constructs like:
last word."
or
last word".
I haven't attempted to deal with quotes in the following table.
Initially, the best thing may be to handle this semi-automatically, until I see a real possibility for an automatic "algorithm" (the way I did similar things in Word).
BTW: The first and last steps are a big part of the "trick" to doing this — if you don't replace the \n\n with some unique string, then, when you start doing other substitutions you may change some \n\n paragraph separators to single \n's. The last step puts the \n\n back in place of the unique string. I used to do things like this fairly often in Word for basically text only files (no code, verbatim sections, or
ASCII graphics) — I may have some new things to learn.
And, to repeat, the following needs
testing!
| Step |
Find |
Replace |
Notes |
| 1 |
\n\n |
<> |
or some string that will be unique in the file |
| 2 |
". \n" |
". " |
ignore quotes, they are just to show spaces |
| 3 |
". \n" |
". " |
|
| 4 |
".\n" |
". " |
|
| 5 - 10 |
"" |
"" |
Repeat steps 2 thru 4 for ! and \? |
| 11 |
",\n" |
", " |
|
| 12 |
";\n" |
"; " |
|
| 13 |
")\n" |
") " |
Might be OK for plain text, not so sure for code or verbatim stuff |
| 14 - 16 |
"" |
"" |
Repeat step 13 for >, }, and ] |
| |
"" |
"" |
|
| ? |
"<>" |
"\n\n" |
This is the last step |
| |
"" |
"" |
|
| |
"" |
"" |
|
| |
|
|
|
Convert Snippets of an HTML File to TWiki Markup
The first aggravating problem I ran into was how to find line breaks with a regular expression.
I found out how to do it in Nedit — is it the same (or similar) in Perl, sed, awk?
Introduction and Status Overview
UPDATE: Attempting this was interesting, and could probably be done (I'll be adding some more notes about things I did), but, as it turned out, simply copying (copy and paste) from the regular browser window to Nedit made a file that seems highly readable and wraps lines to the window. (This as opposed to copying (copy and paste) from the View Source browser window, which resulted in a file with tons of HTML to deal with.)
I'm trying to convert some web pages on routing that won't wrap to put on my private TWiki (at least) so they will wrap and will be easier to read. In the course of doing that, I need to make conversions as listed below. I may think about making a Perl, awk, or sed script to make this easier in the future (for
similar pages — however, that may not be a terrible limitation as the page I'm working on came from the LDP and (IIUC) originates as
SGML markup) — hmm, maybe I should start from the raw
SGML instead of copying and pasting HTML from the web page?.
How to Match Newlines
To make a \s or . match a newline, prefix the regular expression with "?n" and enclose the entire thing in parenthesis. (See the third screen of Nedit help for regular expressions for more explanation and variations.) The nedit help describes ?n as a command to get \s and . to match newlines — parenthesis around the whole string is part of the syntax. I'm not clear why you'd need ?N, but it forces \s and . to
not match newlines — I guess that would be useful if you had a more complicated situation and needed to nest the expressions.
How to construct the search string:
- copy the target string to a scratch area (if you put it in the search dialog, you will have to backspace to delete each newline character to see the entire string)
- insert a \ in front of all characters that need to be escaped (in these cases, mainly < and >)
- insert a \s in place of each newline character (now the string should be all on one line)
- insert a ?n in front of the entire string
- enclose the entire string in parenthesis
These examples might take less space as a table, if I could put the multi-line strings in a table (or, if I didn't want to preserve the line breaks in the table).
Paragraph End and Start
String to find (including line breaks):
</P
><P
>
regular expression to match the string: (?n\</P\s\>\<P\s\>)
replace with: "/n/n" (ignore the " ")
Literal (=)
String to find (including line breaks): (Later, consider an optional leading ' (or `?).)
<TT
CLASS="LITERAL"
>
regular expression to match the string: (?n\<TT\sCLASS="LITERAL"\s\>)
replace with: "=" (ignore the " ")
/Literal (=)
String to find (including line breaks):
</TT
>
regular expression to match the string: (?n\</TT\s\>)
replace with: "=" (ignore the " ")
<pre>
String to find (including line breaks):
</P
><P
><PRE
CLASS="SCREEN"
>
regular expression to match the string: (?n\</P\s\>\<P\s\>\<PRE\sCLASS="SCREEN"\s\>)
replace with: "\n\n<pre>" (ignore the " ")
</pre>
String to find (including line breaks):
</PRE
></P
><P
>
regular expression to match the string: (?n\</PRE\s\>\</P\s\>\<P\s\>)
replace with: "</pre>\n\n" (ignore the " ")
Next Case
String to find (including line breaks):
regular expression to match the string:
replace with: "" (ignore the " ")
Convert a man page to plain text
- Create a file containing the text of the man page =man <man_page_name> >> <man_page_name>.txt
- Find
"\n " (seven spaces), replace with \n\n
- Find
" " (seven spaces), replace with = = (one space)
- Find
<bs>, replace with nothing
Tips
- "Search" (move cursor) to beginning of next word: do an RE search on "<"
Contributors
- RandyKramer - 27 Sep 2001 (on home TWiki)
- RandyKramer - 24 Mar 2002 (transferred to WikiLearn)
- <If you edit this page, add your name here, move this to the next line>
Page Ratings