Shallow Thoughts : tags : regexp
Akkana's Musings on Open Source Computing, Science, and Nature.
Sat, 19 Jan 2013
I'm fiddling with a serial motor controller board, trying to get it
working with a Raspberry Pi. (It works nicely with an Arduino, but
one thing I'm learning is that everything hardware-related is far
easier with Arduino than with RPi.)
The excellent Arduino library helpfully
provided by Pololu
has a list of all the commands the board understands. Since it's Arduino,
they're in C++, and look something like this:
#define QIK_GET_FIRMWARE_VERSION 0x81
#define QIK_GET_ERROR_BYTE 0x82
#define QIK_GET_CONFIGURATION_PARAMETER 0x83
[ ... ]
#define QIK_CONFIG_DEVICE_ID 0
#define QIK_CONFIG_PWM_PARAMETER 1
and so on.
On the Arduino side, I'd prefer to use Python, so I need to get
them to look more like:
QIK_GET_FIRMWARE_VERSION = 0x81
QIK_GET_ERROR_BYTE = 0x82
QIK_GET_CONFIGURATION_PARAMETER = 0x83
[ ... ]
QIK_CONFIG_DEVICE_ID = 0
QIK_CONFIG_PWM_PARAMETER = 1
... and so on ... with an indent at the beginning of each line since
I want this to be part of a class.
There are 32 #defines, so of course, I didn't want to make all those
changes by hand. So I used vim. It took a little fiddling -- mostly
because I'd forgotten that vim doesn't offer + to mean "one or more
repetitions", so I had to use * instead. Here's the expression
I ended up with:
.,$s/\#define *\([A-Z0-9_]*\) *\(.*\)/ \1 = \2/
In English, you can read this as:
From the current line to the end of the file (,.$/),
look for a pattern
consisting of only capital letters, digits and underscores
([A-Z0-9_]). Save that as expression #1 (\( \)).
Skip over any spaces, then take the rest of the line (.*),
and call it expression #2 (\( \)).
Then replace all that with a new line consisting of 4 spaces,
expression 1, a spaced-out equals sign, and expression 2
( \1 = \2).
Who knew that all you needed was a one-line regular expression to
translate C into Python?
(Okay, so maybe it's not quite that simple.
Too bad a regexp won't handle the logic inside the library as well,
and the pin assignments.)
Tags: regexp, linux, programming, python, editors, vim
[
20:38 Jan 19, 2013
More linux/editors |
permalink to this entry |
comments
]
Sun, 18 Dec 2011
A friend had a fun problem: she had some XML files she needed to
import into GNUcash, but the program that produced them left names
in all-caps and she wanted them more readable. So she'd have a file
like this:
<STMTTRN>
<TRNTYPE>DEBIT
<DTPOSTED>20111125000000[-5:EST]
<TRNAMT>-22.71
<FITID>****
<NAME>SOME COMPANY
<MEMO>SOME COMPANY ANY TOWN CA 11-25-11 330346
</STMTTRN>
and wanted to change the NAME and MEMO lines to read
Some Company and Any Town. However, the tags, like <NAME>,
all had to remain upper case, and presumably so did strings like DEBIT.
How do you change just the NAME and MEMO lines from upper case to title case?
The obvious candidate to do string substitutes is sed.
But there are several components to the problem.
Addresses
First, how do you ensure the replacement only happens on lines with
NAME and MEMO?
sed lets you specify address ranges for just that purpose.
If you say sed 's/xxx/yyy/' sed will change all xxx's
to yyy; but if you say sed '/NAME/s/xxx/yyy/'
then sed will only do that substitution on lines containing NAME.
But we need this to happen on lines that contain either NAME or MEMO.
How do you do that? With \|, like this:
sed '/\(NAME\|MEMO\)/s/xxx/yyy/'
Converting to title case
Next, how do you convert upper case to lower case?
There's a
sed
command for that: \L. Run
sed 's/.*/\L&/' and type some upper and lower case
characters, and they'll all be converted to lower-case.
But here we want title case -- we want most of each word converted
to lowercase, but the first letter should stay uppercase.
That means we need to detect a word and figure out which is the
first letter.
In the strings we're considering, a word is a set of letters A through Z
with one of the following characteristics:
- It's preceded by a space
- It's preceded by a close-angle-bracket, >
So the pattern /[ >][A-Z]*/ will match anything we consider a word
that might need conversion.
But we need to separate the first letter and the rest of the word,
so we can treat them separately. sed's \( \) operators will let us do that.
The pattern \([ >][A-Z]\) finds the first letter of a word (including
the space or > preceding it), and saves that as its first matched
pattern, \1.
Then \([A-Z]*\) right after it will save the rest of the word as \2.
So, taking our \L case converter, we can convert to title case like this:
sed 's/\([ >][A-Z]\)\([A-Z]*\)/\1\L\2/g
Starting to look long and scary, right? But it's not so bad if you build
it up gradually from components. I added a g on the end to tell sed
this is a global replace: do the operation on every word it finds in
the line, otherwise it will only make the substitution once, on the
first word it sees, then quit.
Putting it together
So we know how to seek out specific lines, and how to convert to
title case. Put the two together, and you get the final command:
sed '/\(NAME\|MEMO\)/s/\([ >][A-Z]\)\([A-Z]*\)/\1\L\2/g'
I ran it on the test input, and it worked just fine.
For more information on sed, a good place to start is the
sed
regular expressions manual.
Tags: regexp, cmdline, sed
[
13:13 Dec 18, 2011
More linux/cmdline |
permalink to this entry |
comments
]
Tue, 15 Mar 2011
It's another episode of "How to use Linux to figure out CarTalk puzzlers"!
This time you don't even need any programming.
Last week's puzzler was
A
Seven-Letter Vacation Curiosity. Basically, one couple hiking
in Northern California and another couple carousing in Florida
both see something described by a seven-letter word containing
all five vowels -- but the two things they saw were very different.
What's the word?
That's an easy one to solve using basic Linux command-line skills --
assuming the word is in the standard dictionary. If it's some esoteric
word, all bets are off. But let's try it and see. It's a good beginning
exercise in regular expressions and how to use the command line.
There's a handy word list in /usr/share/dict/words, one word per line.
Depending on what packages you have installed, you may have bigger
dictionaries handy, but you can usually count on /usr/share/dict/words
being there on any Linux system. Some older Unix systems may have it in
/usr/dict/words instead.
We need a way to choose all seven letter words.
That's easy. In a regular expression, . (a dot) matches one letter.
So ....... (seven dots) matches any seven letters.
(There's a more direct way to do that: the expression .\{7\}
will also match 7 letters, and is really a better way. But personally,
I find it harder both to remember and to type than the seven dots.
Still, if you ever need to match 43 characters, or 114, it's good to know the
"right" syntax.)
Fine, but if you grep ....... /usr/share/dict/words
you get a list of words with seven or more letters. See why?
It's because grep prints any line where it finds a match -- and a
word with nine letters certainly contains seven letters within it.
The pattern you need to search for is '^.......$' -- the up-caret ^
matches the beginning of a line, and the dollar sign $ matches the end.
Put single quotes around the pattern so the shell won't try to interpret
the caret or dollar sign as special characters. (When in doubt, it's
always safest to put single quotes around grep patterns.)
So now we can view all seven-letter words:
grep '^.......$' /usr/share/dict/words
How do we choose only the ones that contain all the letters a e i o and u?
That's easy enough to build up using pipelines, using the pipe
character | to pipe the output of one grep into a different grep.
grep '^.......$' /usr/share/dict/words | grep a
sends that list of 7-letter words through another grep command to
make sure you only see words containing an a.
Now tack a grep for each of the other letters on the end, the same way:
grep '^.......$' /usr/share/dict/words | grep a | grep e | grep i | grep o | grep u
Voilà! I won't spoil the puzzler, but there are two words that
match, and one of them is obviously the answer.
The power of the Unix command line to the rescue!
Tags: cmdline, regexp, linux, shell, pipelines, puzzles
[
10:00 Mar 15, 2011
More linux/cmdline |
permalink to this entry |
comments
]
Mon, 29 Mar 2010
I maintain the websites for several clubs. No surprise there -- anyone
with even a moderate knowledge of HTML, or just a lack of fear of
it, invariably gets drafted into that job in any non-computer club.
In one club, the person in charge of scheduling sends out an elaborate
document every three months in various formats -- Word, RTF, Excel, it's
different each time. The only regularity is that it's always full of
crap that makes it hard to turn it into a nice simple HTML table.
This quarter, the formats were Word and RTF. I used unrtf to turn
the RTF version into HTML -- and found a horrifying page full of
lines like this:
<body><font size=3></font><font size=3><br>
</font><font size=3></font><b><font size=4></font></b><b><font size=4><table border=2>
</font></b><b><font size=4><tr><td><b><font size=4><font face="Arial">Club Schedule</font></font></b><b><font size=4></font></b><b><font size=4></font></b></td>
<font size=3></font><font size=3><td><font size=3><b><font face="Arial">April 13</font></b></font><font size=3></font><font size=3><br>
</font><font size=3></font><font size=3><b></b></font></td>
I've put the actual page content in bold; the rest is just junk,
mostly doing nothing, mostly not even legal HTML,
that needs to be eliminated if I want
the page to load and display reasonably.
I didn't want to clean up that mess by hand! So I needed some regular
expressions to clean it up in an editor.
I tried emacs first, but emacs makes it hard to try an expression then
modify it a little when the first try doesn't work, so I switched to vim.
The key to this sort of cleanup is non-greedy regular expressions.
When you have a bad tag sandwiched in the middle of a line containing
other tags, you want to remove everything from the <font
through the next > -- but no farther, or else you'll delete
real content. If you have a line like
<td><font size=3>Hello<font> world</td>
you only want to delete through the <font>, not through the </td>.
In general, you make a regular expression non-greedy by adding a ?
after the wildcard -- e.g. <font.*?>. But that doesn't work
in vim. In vim, you have to use \{M,N} which matches
from M to N repetitions of whatever immediately precedes it.
You can also use the shortcut \{-} to mean the same thing
as *? (0 or more matches) in other programs.
Using that, I built up a series of regexp substitutes to clean up
that unrtf mess in vim:
:%s/<\/\{0,1}font.\{-}>//g
:%s/<b><\/b>//g
:%s/<\/b><b>//g
:%s/<\/i><i>//g
:%s/<td><\/td>/<td><br><\/td>/g
:%s/<\/\{0,1}span.\{-}>//g
:%s/<\/\{0,1}center>//g
That took care of 90% of the work, leaving me with hardly any cleanup
I needed to do by hand. I'll definitely keep that list around for
the next time I need to do this.
Tags: regexp, html, editors, vim
[
22:02 Mar 29, 2010
More linux/editors |
permalink to this entry |
comments
]
Sun, 12 Oct 2008
Someone on LinuxChix' techtalk list asked whether she could get
tcsh to print "[no output]" after any command that doesn't produce
output, so that when she makes logs to help her co-workers, they
will seem clearer.
I don't know of a way to do that in any shell (the shell would have
to capture the output of every command; emacs' shell-mode does that
but I don't think any real shells do) but it seemed like it ought
to be straightforward enough to do as a regular expression substitute
in vi. You're looking for lines where a line beginning with a prompt
is followed immediately by another line beginning with a prompt;
the goal is to insert a new line consisting only of "[no output]"
between the two lines.
It turned out to be pretty easy in vim. Here it is:
:%s/\(^% .*$\n\)\(% \)/\1[no results]\r\2/
Explanation:
- :
- starts a command
- %
- do the following command on every line of this file
- s/
- start a global substitute command
- \(
- start a "capture group" -- you'll see what it does soon
- ^
- match only patterns starting at the beginning of a line
- %
- look for a % followed by a space (your prompt)
- .*
- after the prompt, match any other characters until...
- $
- the end of the line, after which...
- \n
- there should be a newline character
- \)
- end the capture group after the newline character
- \(
- start a second capture group
- %
- look for another prompt. In other words, this whole
- expression will only match when a line starting with a prompt
- is followed immediately by another line starting with a prompt.
- \)
- end the second capture group
- /
- We're finally done with the mattern to match!
- Now we'll start the replacement pattern.
- \1
- Insert the full content of the first capture group
- (this is also called a "backreference" if you want
- to google for a more detailed explanation).
- So insert the whole first command up to the newline
- after it.
- [no results]
- After the newline, insert your desired string.
- \r
- insert a carriage return here (I thought this should be
- \n for a newline, but that made vim insert a null instead)
- \2
- insert the second capture group (that's just the second prompt)
- /
- end of the substitute pattern
Of course, if you have a different prompt, substitute it for "% ".
If you have a complicated prompt that includes time of day or
something, you'll have to use a slightly more complicated match
pattern to match it.
Tags: regexp, shell, CLI, linux, editors
[
13:34 Oct 12, 2008
More linux/editors |
permalink to this entry |
comments
]
Sun, 31 Aug 2008
I wanted to get a list of who'd been contributing the most in a
particular open source project. Most projects of any size have a
ChangeLog file, in which check-ins have entries like this:
2008-08-26 Jane Hacker <hacker@domain.org>
* src/app/print.c: make sure the Portrait and Landscape
* buttons update according to the current setting.
I wanted to take each entry, save the name of the developer checking
in, then eventually count the number of times each name occurs (the
number of times that developer checked in) and print them in order
from most check-ins to least.
Getting the names is easy: for check-ins in the last 9 years, I just
want the lines that start with "200". (Of course, if I wanted earlier
check-ins I could make the match more general.)
grep "^200" ChangeLog
But now I want to trim the line so it includes only the
contributor's name. A bit of sed geekery can do that: the date is a
fixed format (four characters, a dash, two, dash, two, then two
spaces, so "^....-..-.. " matches that pattern.
But I want to remove the email address part too
(sometimes people use different email addresses
when they check in). So I want a sed pattern that will match
something at the front (to discard), something in the middle (keep that part)
and something at the end (discard).
Here's how to do that in sed:
grep "^200" ChangeLog | sed 's/^....-..-.. \(.*\)<.*$/\1/'
In English, that says: "For each line in the ChangeLog that starts
with 200, find a pattern at the beginning consisting of any four
characters, a dash, two characters, dash, two characters, dash, and
two spaces; then immediately after that, save all characters up to
a < symbol; then throw away the < and any characters that follow
until the end of the line."
That works pretty well! But it's not quite right: it includes the
two spaces after the name as part of the name. In sed, \s matches
any space character (like space or tab).
So you'd think this should work:
grep "^200" ChangeLog | sed 's/^....-..-.. \(.*\)\s+<.*$/\1/'
\s+ means it will require that at least one and maybe more space
characters immediately before the < are also discarded.
But it doesn't work. It turns out the reason is that the \(.*\)
expression is "greedier" than the \s+: so the saved name expression
grabs the first space, leaving only the second to the \s+.
The way around that is to make the name expression specify that it
can't end with a space. \S is the term for "anything that's not a
space character"; so the expression becomes
grep "^200" ChangeLog | sed 's/^....-..-.. \(.*\S\)\s\+<.*$/\1/'
(the + turned out to need a backslash before it).
We have the list of names! Add a | sort on the end to
sort them alphabetically -- that will make sure you get all the
"Jane Hacker" lines listed together. But how to count them?
The Unix program most frequently invoked after sort
is uniq, which gets rid of all the repeated lines.
On a hunch, I checked out the man page, man uniq,
and found the -c option: "prefix lines by the number of occurrences".
Perfect! Then just sort them by the number, from largest to
smallest:
grep "^200" ChangeLog | sed 's/^....-..-.. \(.*\S\)\s+<.*$/\1/' | sort | uniq -c | sort -rn
And we're done!
Now, this isn't perfect since it doesn't catch "Checking in patch
contributed by susan@otherhost.com" attributions -- but those aren't in
a standard format in most projects, so they have to be handled by hand.
Disclaimer: Of course, number of check-ins is not a good measure of
how important or productive someone is. You can check in a lot of
one-line fixes, or you can write an important new module and submit
it for someone else to merge in. The point here wasn't to rank
developers, but just to get an idea who was checking into the tree
and how often.
Well, that ... and an excuse to play with nifty Linux shell pipelines.
Tags: shell, CLI, linux, pipelines, regexp
[
11:12 Aug 31, 2008
More linux |
permalink to this entry |
comments
]
Thu, 20 Dec 2007
I had a chance to spend a day at the AGU conference last week. The
American Geophysical Union is a fabulous conference -- something like
14,000 different talks over the course of the week, on anything
related to earth or planetary sciences -- geology, solar system
astronomy, atmospheric science, geophysics, geochemistry, you name it.
I have no idea how regular attendees manage the information overload
of deciding which talks to attend. I wasn't sure how I would, either,
but I started by going
through the schedule
for the day I'd be there, picking out a (way too long) list of
potentially interesting talks, and saving them as lines in a file.
Now I had a file full of lines like:
1020 U22A MS 303 Terrestrial Impact Cratering: New Insights Into the Cratering Process From Geophysics and Geochemistry II
Fine, except that I couldn't print out something like that -- printers
stop at 80 columns. I could pass it through a program like "fold" to
wrap the long lines, but then it would be hard to scan through quickly
to find the talk titles and room numbers. What I really wanted was to
wrap it so that the above line turned into something like:
1020 U22A MS 303 Terrestrial Impact Cratering: New Insights
Into the Cratering Process From Geophysics
and Geochemistry II
But how to do that? I stared at it for a while, trying to figure out
whether there was a clever vim substitute that could handle it.
I asked on a couple of IRC channels, just in case there was some
amazing Linux smart-wrap utility I'd never heard of.
I was on the verge of concluding that the answer was no, and that I'd
have to write a python script to do the wrapping I wanted, when
Mikael emitted a burst of line noise:
%s/\(.\{72\}\)\(.*\)/\1^M^I^I^I\2/
Only it wasn't line noise. Seems Mikael just happened to have been
reading about some of the finer points of vim regular expressions
earlier that day, and he knew exactly the trick I needed -- that
.\{72\}, which matches lines that are at least 72
characters long. And amazingly, that expression did something very
close to what I wanted.
Or at least the first step of it. It inserts the first line break,
turning my line into
1020 U22A MS 303 Terrestrial Impact Cratering: New Insights
Into the Cratering Process From Geophysics and Geochemistry II
but I still needed to wrap the second and subsequent lines.
But that was an easier problem -- just do essentially the same thing
again, but limit it to only lines starting with a tab.
After some tweaking, I arrived at exactly what I wanted:
%s/^\(.\{,65\}\) \(.*\)/\1^M^I^I^I\2/
%g/^^I^I^I.\{58\}/s/^\(.\{,55\}\) \(.*\)/\1^M^I^I^I\2/
I had to run the second line two or three times to wrap the very long
lines.
Devdas helpfully translated the second one into English:
"You have 3 tabs, followed by 58 characters, out of
which you match the first 55 and put that bit in $1, and the capture
the remaining in $2, and rewrite to $1 newline tab tab tab $2."
Here's a more detailed breakdown:
Line one:
| % | Do this over the whole file
|
|---|
| s/ | Begin global substitute
|
|---|
| ^ | Start at the beginning of the line
|
|---|
| \( | Remember the result of the next match
|
|---|
| .\{,65\}_ | Look for up to 65 characters with a space at the end
|
|---|
| \) \( | End of remembered pattern #1, skip a space, and
start remembered pattern #2
|
|---|
| .*\) | Pattern #2 includes everything to the end of the line
|
|---|
| / | End of matched pattern; begin replacement pattern
|
|---|
| \1^M | Insert saved pattern #1 (the first 65 lines ending with a
space) followed by a newline
|
|---|
| ^I^I^I\2 | On the second line, insert three tabs then
saved pattern #2
|
|---|
| / | End replacement pattern
|
|---|
Line two:
| %g/ | Over the whole file, only operate on lines with this pattern
|
|---|
| ^^I^I^I | Lines starting with three tabs
|
|---|
| .\{58\}/ | After the tabs, only match lines that still have at
least 58 characters
(this guards against wrapping already wrapped lines
when it's run repeatedly)
|
|---|
| s/ | Begin global substitute
|
|---|
| ^ | Start at the beginning of the line
|
|---|
| \( | Remember the result of the next match
|
|---|
| .\{,55\} | Up to 55 characters
|
|---|
| \) \( | End of remembered pattern #1, skip a space, and
start remembered pattern #2
|
|---|
| .*\) | Pattern #2 includes everything to the end of the line
|
|---|
| / | End of matched pattern; begin replacement pattern
|
|---|
| \1^M | The first pattern (up to 55 chars) is one line
|
|---|
| ^I^I^I\2 | Three tabs then the second pattern
|
|---|
| / | End replacement pattern
|
|---|
Greedy and non-greedy brace matches
The real key is those curly-brace expressions, \{,65\}
and \{58\} -- that's how you control how many characters
vim will match and whether or not the match is "greedy".
Here's how they work (thanks to Mikael for explaining).
The basic expression is {M,N} --
it means between M and N matches of whatever precedes it.
(Vim requires that the first brace be escaped -- \{}. Escaping the
second brace is optional.)
So .{M,N} can match anything between M and N characters
but "prefer" N, i.e. try to match as many as possible up to N.
To make it "non-greedy" (match as few as possible, "preferring" M),
use .{-M,N}
You can leave out M, N, or both; M defaults to 0 while N defaults to
infinity. So {} is short for {0,∞} and is
equivalent to *, while {-} means {-0,∞}, like a non-greedy
version of *.
Given the string: one, two, three, four, five
| ,.\{}, | matches , two, three, four,
|
|---|
| ,.\{-}, | matches , two,
|
|---|
| ,.\{5,}, | matches , two, three, four,
|
|---|
| ,.\{-5,}, | matches , two, three,
|
|---|
| ,.\{,2}, | matches nothing
|
|---|
| ,.\{,7}, | matches , two,
|
|---|
| ,.\{5,7}, | matches , three,
|
|---|
Of course, this syntax is purely for vim; regular expressions are
unfortunately different in sed, perl and every other program.
Here's a fun
table of
regexp terms in various programs.
Tags: linux, editors, regexp
[
11:44 Dec 20, 2007
More linux/editors |
permalink to this entry |
comments
]
Sun, 14 May 2006
I had a page of plaintext which included some URLs in it, like this:
Tour of the Hayward Fault
http://www.mcs.csuhayward.edu/~shirschf/tour-1.html
Technical Reports on Hayward Fault
http://quake.usgs.gov/research/geology/docs/lienkaemper_docs06.htm
I wanted to add links around each of the urls, so that I could make
it part of a web page, more like this:
Tour of the Hayward Fault
http://www.mcs.csu
hayward.edu/~shirschf/tour-1.html
Technical Reports on Hayward Fault
htt
p://quake.usgs.gov/research/geology/docs/lienkaemper_docs06.htm
Surely there must be a program to do this, I thought. But I couldn't
find one that was part of a standard Linux distribution.
But you can do a fair job of linkifying just using a regular
expression in an editor like vim or emacs, or by using sed or perl from
the commandline. You just need to specify the input pattern you want
to change, then how you want to change it.
Here's a recipe for linkifying with regular expressions.
Within vim:
:%s_\(https\=\|ftp\)://\S\+_<a href="&">&</a>_
If you're new to regular expressions, it might be helpful to see a
detailed breakdown of why this works:
- :
- Tell vim you're about to type a command.
- %
- The following command should be applied everywhere in the file.
- s_
- Do a global substitute, and everything up to the next underscore
will represent the pattern to match.
- \(
- This will be a list of several alternate patterns.
- http
- If you see an "http", that counts as a match.
- s\=
- Zero or one esses after the http will match: so http and https are
okay, but httpsssss isn't.
- \|
- Here comes another alternate pattern that you might see instead
of http or https.
- ftp
- URLs starting with ftp are okay too.
- \)
- We're done with the list of alternate patterns.
- ://
- After the http, https or ftp there should always be a colon-slash-slash.
- \S
- After the ://, there must be a character which is not whitespace.
- \+
- There can be any number of these non-whitespace characters as long
as there's at least one. Keep matching until you see a space.
- _
- Finally, the underscore that says this is the end of the pattern
to match. Next (until the final underscore) will be the expression
which will replace the pattern.
- <a href="&">
- An ampersand, &, in a substitute expression means "insert
everything that was in the original pattern". So the whole url will
be inserted between the quotation marks.
- &</a>
- Now, outside the <a href="..."> tag, insert the matched url
again, and follow it with a </a> to close the tag.
- _
- The final underscore which says "this is the end of the
replacement pattern". We're done!
Linkifying from the commandline using sed
Sed is a bit trickier: it doesn't understand \S for
non-whitespace, nor = for "zero or one occurrence".
But this expression does the trick:
sed -e 's_\(http\|https\|ftp\)://[^ \t]\+_<a href="&">&</a>_' <infile.txt >outfile.html
Addendum: George
Riley tells me about
VST for Vim 7,
which looks like a nice package to linkify, htmlify, and various
other useful things such as creating HTML presentations.
I don't have Vim 7 yet, but once I do I'll definitely check out VST.
Tags: linux, editors, pipelines, regexp, shell, CLI
[
12:40 May 14, 2006
More linux/editors |
permalink to this entry |
comments
]
Mon, 10 Oct 2005
Ever want to look for something in your browser cache, but when you
go there, it's just a mass of oddly named files and you can't figure
out how to find anything?
(Sure, for whole pages you can use the History window, but what if
you just want to find an image you saw this morning
that isn't there any more?)
Here's a handy trick.
First, change directory to your cache directory (e.g.
$HOME/.mozilla/firefox/blahblah/Cache).
Next, list the files of the type you're looking for, in the order in
which they were last modified, and save that list to a file. Like this:
% file `ls -1t` | grep JPEG | sed 's/: .*//' > /tmp/foo
In English:
ls -t lists in order of modification date, and -1 ensures
that the files will be listed one per line. Pass that through
grep for the right pattern (do a file * to see what sorts of
patterns get spit out), then pass that through sed to get rid of
everything but the filename. Save the result to a temporary file.
The temp file now contains the list of cache files of the type you
want, ordered with the most recent first. You can now search through
them to find what you want. For example, I viewed them with Pho:
pho `cat /tmp/foo`
For images, use whatever image viewer you normally use; if you're
looking for text, you can use grep or whatever search you lke.
Alternately, you could
ls -lt `cat foo` to see what was
modified when and cut down your search a bit further, or any
other additional paring you need.
Of course, you don't have to use the temp file at all. I could
have said simply:
pho `ls -1t` | grep JPEG | sed 's/: .*//'`
Making the temp file is merely for your convenience if you think you
might need to do several types of searches before you find what
you're looking for.
Tags: tech, web, mozilla, firefox, pipelines, CLI, shell, regexp
[
21:40 Oct 10, 2005
More tech/web |
permalink to this entry |
comments
]