Shallow Thoughts : : linux

Akkana's Musings on Open Source Computing, Science, and Nature.

Tue, 08 Jul 2014

Big and contrasty mouse cursors

[Big mouse cursor from Comix theme] My new home office with the big picture windows and the light streaming in come with one downside: it's harder to see my screen.

A sensible person would, no doubt, keep the shades drawn when working, or move the office to a nice dim interior room without any windows. But I am not sensible and I love my view of the mountains, the gorge and the birds at the feeders. So accommodations must be made.

The biggest problem is finding the mouse cursor. When I first sit down at my machine, I move my mouse wildly around looking for any motion on the screen. But the default cursors, in X and in most windows, are little subtle black things. They don't show up at all. Sometimes it takes half a minute to figure out where the mouse pointer is.

(This wasn't helped by a recent bug in Debian Sid where the USB mouse would disappear entirely, and need to be unplugged from USB and plugged back in before the computer would see it. I never did find a solution to that, and for now I've downgraded from Sid to Debian testing to make my mouse work. I hope they fix the bug in Sid eventually, rather than porting whatever "improvement" caused the bug to more stable versions. Dealing with that bug trained me so that when I can't see the mouse cursor, I always wonder whether I'm just not seeing it, or whether it really isn't there because the kernel or X has lost track of the mouse again.)

What I really wanted was bigger mouse cursor icons in bright colors that are visible against any background. This is possible, but it isn't documented at all. I did manage to get much better cursors, though different windows use different systems.

So I wrote up what I learned. It ended up too long for a blog post, so I put it on a separate page: X Cursor Themes for big and contrasty mouse cursors.

It turned out to be fairly complicated. You can replace the existing cursor font, or install new cursor "themes" that many (but not all) apps will honor. You can change theme name and size (if you choose a scalable theme), and some apps will honor that. You have to specify theme and size separately for GTK apps versus other apps. I don't know what KDE/Qt apps do.

I still have a lot of unanswered questions. In particular, I was unable to specify a themed cursor for xterm windows, and for non text areas in emacs and firefox, and I'd love to know how to do that.

But at least for now, I have a great big contrasty blue mouse cursor that I can easily see, even when I have the shades on the big windows open and the light streaming in.

Tags: ,
[ 10:25 Jul 08, 2014    More linux | permalink to this entry | comments ]

Sun, 15 Jun 2014

Vim: Set wrapping and indentation according to file type

Although I use emacs for most of my coding, I use vim quite a lot too, for quick edits, mail messages, and anything I need to edit when logged onto a remote server. In particular, that means editing my procmail spam filter files on the mail server.

The spam rules are mostly lists of regular expression patterns, and they can include long lines, such as:
gift ?card .*(Visa|Walgreen|Applebee|Costco|Starbucks|Whitestrips|free|Wal.?mart|Arby)

My default vim settings for editing text, including line wrap, don't work if get a flood of messages offering McDonald's gift cards and decide I need to add a "|McDonald" on the end of that long line.

Of course, I can type ":set tw=0" to turn off wrapping, but who wants to have to do that every time? Surely vim has a way to adjust settings based on file type or location, like emacs has.

It didn't take long to find an example of Project specific settings on the vim wiki. Thank goodness for the example -- I definitely wouldn't have figured that syntax out just from reading manuals. From there, it was easy to make a few modifications and set textwidth=0 if I'm opening a file in my procmail directory:

" Set wrapping/textwidth according to file location and type
function! SetupEnvironment()
  let l:path = expand('%:p')
  if l:path =~ '/home/akkana/Procmail'
    " When editing spam filters, disable wrapping:
    setlocal textwidth=0
endfunction
autocmd! BufReadPost,BufNewFile * call SetupEnvironment()

Nice! But then I remembered other cases where I want to turn off wrapping. For instance, editing source code in cases where emacs doesn't work so well -- like remote logins over slow connections, or machines where emacs isn't even installed, or when I need to do a lot of global substitutes or repetitive operations. So I'd like to be able to turn off wrapping for source code.

I couldn't find any way to just say "all source code file types" in vim. But I can list the ones I use most often. While I was at it, I threw in a special wrap setting for mail files:

" Set wrapping/textwidth according to file location and type
function! SetupEnvironment()
  let l:path = expand('%:p')
  if l:path =~ '/home/akkana/Procmail'
    " When editing spam filters, disable wrapping:
    setlocal textwidth=0
  elseif (&ft == 'python' || &ft == 'c' || &ft == 'html' || &ft == 'php')
    setlocal textwidth=0
  elseif (&ft == 'mail')
    " Slightly narrower width for mail (and override mutt's override):
    setlocal textwidth=68
  else
    " default textwidth slightly narrower than the default
    setlocal textwidth=70
  endif
endfunction
autocmd! BufReadPost,BufNewFile * call SetupEnvironment()

As long as we're looking at language-specific settings, what about doing language-specific indentation like emacs does? I've always suspected vim must have a way to do that, but it doesn't enable it automatically like emacs does. You need to set three variables, assuming you prefer to use spaces rather than tabs:

" Indent specifically for the current filetype
filetype indent on
" Set indent level to 4, using spaces, not tabs
set expandtab shiftwidth=4

Then you can also use useful commands like << and >> for in- and out-denting blocks of code, or ==, for indenting to the right level. It turns out vim's language indenting isn't all that smart, at least for Python, and gets the wrong answer a lot of them time. You can't rely on it as a syntax checker the way you can with emacs. But it's a lot better than no language-specific indentation.

I will be a much happier vimmer now!

Tags: , ,
[ 11:29 Jun 15, 2014    More linux/editors | permalink to this entry | comments ]

Tue, 29 Apr 2014

The evil HTML double-dash problem in Emacs is still there

Long ago (in 2006!), I blogged on an annoying misfeature of Emacs when editing HTML files: you can't type double dashes. Emacs sees them as an SGML comment and insists on indenting all subsequent lines in strange ways.

I wrote about finding a fix for the problem, involving commenting out four lines in sgml-mode.el. That file had a comment at the very beginning suggesting that they know about the problem and had guarded against it, but obviously it didn't work and the variable that was supposed to control the behavior had been overridden by other hardwired behaviors.

That fix has worked well for eight years. But just lately, I've been getting a lot of annoying warnings when I edit HTML files: "Error: autoloading failed to define function sgml_lexical_context". Apparently the ancient copy of sgml-mode.el that I'd been using all these years was no longer compatible with ... something else somewhere inside emacs. I needed to update it.

Maybe, some time during the intervening 8 years, they'd actually fixed the problem? I was hopeful. I moved my old patched sgml-mode.el aside and edited some files. But the first time I tried typing a double dashes -- like this, with text inside that's long enough to wrap to a new line -- I saw that the problem wasn't fixed at all.

I got a copy of the latest sgml-mode.el -- on Debian, that meant:

apt-get install emacs23-el
cp /usr/share/emacs/23.4/lisp/textmodes/sgml-mode.el.gz ~/.emacs-lisp
gunzip ~/.emacs-lisp/sgml-mode.el.gz
Then I edited the file and started searching for strings like font-lock and comment.

Unfortunately, the solution I documented in my old blog post is no longer helpful. The code has changed too much, and now there are many, many different places where automatic comment handling happens. I had to comment out each of them bit by bit before I finally found the section that's now causing the problem. Commenting out these lines fixed it:

   (set (make-local-variable 'indent-line-function) 'sgml-indent-line)
   (set (make-local-variable 'comment-start) "")
   (set (make-local-variable 'comment-indent-function) 'sgml-comment-indent)
   (set (make-local-variable 'comment-line-break-function)
        'sgml-comment-indent-new-line)

I didn't have to remove any .elc files, like I did in 2006; just putting the sgml-mode.el file in my Emacs load-path was enough. I keep all my customized Emacs code in a directory called .emacs-lisp, and in my .emacs I make sure it's in my path:

(setq load-path (cons "~/.emacs-lisp/" load-path))
And now I can type double dashes again. Whew!

Tags: , ,
[ 12:42 Apr 29, 2014    More linux/editors | permalink to this entry | comments ]

Sat, 28 Dec 2013

Finding filenames in a disorganized directory

I've been scanning a bunch of records with Audacity (using as a guide Carla Schroder's excellent Book of Audacity and a Behringer UCA222 USB audio interface -- audacity doesn't seem able to record properly from the built-in sound card on any laptop I own, while it works fine with the Behringer.

Audacity's user interface isn't great for assembly-line recording of lots of tracks one after the other, especially on a laptop with a trackpad that doesn't work very well, so I wasn't always as organized with directory names as I could have been, and I ended up with a mess. I was periodically backing up the recordings to my desktop, but as I shifted from everything-in-one-directory to an organized system, the two directories got out of sync.

To get them back in sync, I needed a way to answer this question: is every file inside directory A (maybe in some subdirectory of it) also somewhere under subdirectory B? In other words, can I safely delete all of A knowing that anything in it is safely stored in B, even though the directory structures are completely different?

I was hoping for some clever find | xargs way to do it, but came up blank. So eventually I used a little zsh loop: one find to get the list of files to test, then for each of those, another find inside the target directory, then test the exit code of find to see if it found the file. (I'm assuming that if the songname.aup file is there, the songname_data directory is too.)

for fil in $(find AAA/ -name '*.aup'); do
  fil=$(basename $fil)
  find BBB -name $fil >/dev/null
  if [[ $? != 0 ]]; then
    echo $fil is not in BBB
  fi
done

Worked fine. But is there an easier way?

Tags: , , ,
[ 10:36 Dec 28, 2013    More linux/cmdline | permalink to this entry | comments ]

Wed, 13 Nov 2013

Does scrolling output make a program slower? Followup.

Last week I wrote about some tests I'd made to answer the question Does scrolling output make a program slower? My test showed that when running a program that generates lots of output, like an rsync -av, the rsync process will slow way down as it waits for all that output to scroll across whatever terminal client you're using. Hiding the terminal helps a lot if it's an xterm or a Linux console, but doesn't help much with gnome-terminal.

A couple of people asked in the comments about the actual source of the slowdown. Is the original process -- the rsync, or my test script, that's actually producing all that output -- actually blocking waiting for the terminal? Or is it just that the CPU is so busy doing all that font rendering that it has no time to devote to the original program, and that's why it's so much slower?

I found pingu on IRC (thanks to JanC) and the group had a very interesting discussion, during which I ran a series of additional tests.

In the end, I'm convinced that CPU allocation to the original process is not the issue, and that output is indeed blocked waiting for the terminal to display the output. Here's why.

First, I installed a couple of performance meters and looked at the CPU load while rendering. With conky, CPU use went up equally (about 35-40%) on both CPU cores while the test was running. But that didn't tell me anything about which processes were getting all that CPU.

htop was more useful. It showed X first among CPU users, xterm second, and my test script third. However, the test script never got more than 10% of the total CPU during the test; X and xterm took up nearly all the remaining CPU.

Even with the xterm hidden, X and xterm were the top two CPU users. But this time the script, at number 3, got around 30% of the CPU rather than 10%. That still doesn't seem like it could account for the huge difference in speed (the test ran about 7 times faster with xterm hidden); but it's interesting to know that even a hidden xterm will take up that much CPU.

It was also suggested that I try running it to /dev/null, something I definitely should have thought to try before. The test took .55 seconds with its output redirected to /dev/null, and .57 seconds redirected to a file on disk (of course, the kernel would have been buffering, so there was no disk wait involved). For comparison, the test had taken 56 seconds with xterm visible and scrolling, and 8 seconds with xterm hidden.

I also spent a lot of time experimenting with sleeping for various amounts of time between printed lines. With time.sleep(.0001) and xterm visible, the test took 104.71 seconds. With xterm shaded and the same sleep, it took 98.36 seconds, only 6 seconds faster. Redirected to /dev/null but with a .0001 sleep, it took 97.44 sec.

I think this argues for the blocking theory rather than the CPU-bound one: the argument being that the sleep gives the program a chance to wait for the output rather than blocking the whole time. If you figure it's CPU bound, I'm not sure how you'd explain the result.

But a .0001 second sleep probably isn't very accurate anyway -- we were all skeptical that Linux can manage sleep times that small. So I made another set of tests, with a .001 second sleep every 10 lines of output. The results: 65.05 with xterm visible; 63.36 with xterm hidden; 57.12 to /dev/null. That's with a total of 50 seconds of sleeping included (my test prints 500000 lines). So with all that CPU still going toward font rendering, the visible-xterm case still only took 7 seconds longer than the /dev/null case. I think this argues even more strongly that the original test, without the sleep, is blocking, not CPU bound.

But then I realized what the ultimate test should be. What happens when I run the test over an ssh connection, with xterm and X running on my local machine but the actual script running on the remote machine?

The remote machine I used for the ssh tests was a little slower than the machine I used to run the other tests, but that probably doesn't make much difference to the results.

The results? 60.29 sec printing over ssh (LAN) to a visible xterm; 7.24 sec doing the same thing with xterm hidden. Fairly similar to what I'd seen before when the test, xterm and X were all running on the same machine.

Interestingly, the ssh process during the test took 7% of my CPU, almost as much as the python script was getting before, just to transfer all the output lines so xterm could display them.

So I'm convinced now that the performance bottleneck has nothing to do with the process being CPU bound and having all its CPU sucked away by rendering the output, and that the bottleneck is in the process being blocked in writing its output while waiting for the terminal to catch up.

I'd be interested it hear further comments -- are there other interpretations of the results besides mine? I'm also happy to run further tests.

Tags: , , ,
[ 17:19 Nov 13, 2013    More linux | permalink to this entry | comments ]

Fri, 08 Nov 2013

Does scrolling output make a program slower?

While watching my rsync -av messages scroll by during a big backup, I wondered, as I often have, whether that -v (verbose) flag was slowing my backup down.

In other words: when you run a program that prints lots of output, so there's so much output the terminal can't display it all in real-time -- like an rsync -v on lots of small files -- does the program wait ("block") while the terminal catches up?

And if the program does block, can you speed up your backup by hiding the terminal, either by switching to another desktop, or by iconifying or shading the terminal window so it's not visible? Is there any difference among the different ways of hiding the terminal, like switching desktops, iconifying and shading?

Since I've never seen a discussion of that, I decided to test it myself. I wrote a very simple Python program:

import time

start = time.time()

for i in xrange(500000):
    print "Now we have printed", i, "relatively long lines to stdout."

print time.time() - start, "seconds to print", i, "lines."

I ran it under various combinations of visible and invisible terminal. The results were striking. These are rounded to the nearest tenth of a second, in most cases the average of several runs:

Terminal type Seconds
xterm, visible 56.0
xterm, other desktop 8.0
xterm, shaded 8.5
xterm, iconified 8.0
Linux framebuffer, visible 179.1
Linux framebuffer, hidden 3.7
gnome-terminal, visible 56.9
gnome-terminal, other desktop 56.7
gnome-terminal, iconified 56.7
gnome-terminal, shaded 43.8

Discussion:

First, the answer to the original question is clear. If I'm displaying output in an xterm, then hiding it in any way will make a huge difference in how long the program takes to complete.

On the other hand, if you use gnome-terminal instead of xterm, hiding your terminal window won't make much difference. Gnome-terminal is nearly as fast as xterm when it's displaying; but it apparently lacks xterm's smarts about not doing that work when it's hidden. If you use gnome-terminal, you don't get much benefit out of hiding it.

I was surprised how slow the Linux console was (I'm using the framebuffer in the Debian 3.2.0-4-686-pae on Intel graphics). But it's easy to see where that time is going when you watch the output: in xterm, you see lots of blank space as xterm skips drawing lines trying to keep up with the program's output. The framebuffer doesn't do that: it prints and scrolls every line, no matter how far behind it gets.

But equally interesting is how much faster the framebuffer is when it's not visible. (I typed Ctrl-alt-F2, logged in, ran the program, then typed Ctrl-alt-F7 to go back to X while the program ran.) Obviously xterm is doing some background processing that the framebuffer console doesn't need to do. The absolute time difference, less than four seconds, is too small to worry about, but it's interesting anyway.

I would have liked to try it my test a base Linux console, with no framebuffer, but figuring out how to get a distro kernel out of framebuffer mode was a bigger project than I wanted to tackle that afternoon.

I should mention that I wasn't super-scientific about these tests. I avoided doing any heavy work on the machine while the tests were running, but I was still doing light editing (like this article), reading mail and running xchat. The times for multiple runs were quite consistent, so I don't think my light system activity affected the results much.

So there you have it. If you're running an output-intensive program like rsync -av and you care how fast it runs, use either xterm or the console, and leave it hidden most of the time.

Tags: , , ,
[ 15:17 Nov 08, 2013    More linux | permalink to this entry | comments ]

Sat, 24 Aug 2013

A nifty shell redirection trick: process substitution

I love shell pipelines, and flatter myself that I'm pretty good at them. But a discussion last week on the Linuxchix Techtalk mailing list on finding added lines in a file turned up a terrific bash/zsh shell redirection trick I'd never seen before:

join -v 2 <(sort A.txt) <(sort B.txt)

I've used backquotes, and their cognate $(), plenty. For instance, you can do things like PS1=$(hostname): or PS1=`hostname`: to set your prompt to the current hostname: the shell runs the hostname command, takes its output, and substitutes that output in place of the backquoted or parenthesized expression.

But I'd never seen that <(...) trick before, and immediately saw how useful it was. Backquotes or $() let you replace arguments to a command with a program's output -- they're great for generating short strings for programs that take all their arguments on the command line. But they're no good for programs that need to read a file, or several files. <(...) lets you take the output of a command and pass it to a program as though it was the contents of a file. And if you can do it more than once in the same command -- as in Little Girl's example -- that could be tremendously useful.

Playing with it to see if it really did what it looked like it did, and what other useful things I could do with it, I tried this (and it worked just fine):

$ diff <(echo hello; echo there) <(echo hello; echo world)
2c2
< there
---
> world
It acts as though I had two files, which each have "hello" as their first line; but one has "there" as the second line, while the other has "world". And diff shows the difference. I don't think there's any way of doing anything like that with backquotes; you'd need to use temp files.

Of course, I wanted to read more about it -- how have I gone all these years without knowing about this? -- and it looks like I'm not the only one who didn't know about it. In fact, none of the pages I found on shell pipeline tricks even mentioned it.

It turns out it's called "process substitution" and I found it documented in Chapter 23 of the Advanced Bash-Scripting Guide.

I tweeted it, and a friend who is a zsh master gave me some similar cool tricks. For instance, in zsh echo hi > >(cat) > >(cat -n) lets you pipe the output of a command to more than one other command.

That's zsh, but in bash (or zsh too, of course), you can use >() and tee to do the same thing: echo hi | tee >(cat) | cat -n

If you want a temp file to be created automatically, one you can both read and write, you can use =(foo) (zsh only?)

Great stuff! Some other pages that discuss some of these tricks:

Tags: , , ,
[ 19:23 Aug 24, 2013    More linux/cmdline | permalink to this entry | comments ]

Wed, 24 Jul 2013

Yet more on that comma-inserting regexp, plus a pattern to filter unprintable characters

One more brief followup on that comma inserting sed pattern and its followup:

$ echo 20130607215015 | sed ':a;s/\b\([0-9]\+\)\([0-9]\{3\}\)\b/\1,\2/;ta'
20,130,607,215,015

In the second article, I'd mentioned that the hardest part of the exercise was figuring out where we needed backslashes. Devdas (f3ew) asked on Twitter whether I would still need all the backslash escapes even if I put the pattern in a file -- in other worse, are the backslashes merely to get the shell to pass special characters unchanged?

A good question, and I suspected the need for some of the backslashes would disappear. So I tried this:

$ echo ':a;s/\b\([0-9]\+\)\([0-9]\{3\}\)\b/\1,\2/;ta' >/tmp/commas   
$ echo 20130607215015 | sed -f /tmp/commas

And it didn't work. No commas were inserted.

The problem, it turns out, is that my shell, zsh, changed both instances of \b to an ASCII backspace, ^H. Editing the file fixes that, and so does

$ echo -E ':a;s/\b\([0-9]\+\)\([0-9]\{3\}\)\b/\1,\2/;ta' >/tmp/commas   

But that only applies to echo: zsh doesn't do the \b -> ^H substitution in the original command, where you pass the string directly as a sed argument.

Okay, with that straightened out, what about Devdas' question?

Surprisingly, it turns out that all the backslashes are still needed. None of them go away when you echo > file, so they weren't there just to get special characters past the shell; and if you edit the file and try removing some of the backslashes, you'll see that the pattern no longer works. I had thought at least some of them, like the ones before the \{ \}, were extraneous, but even those are still needed.

Filtering unprintable characters

As long as I'm writing about regular expressions, I learned a nice little tidbit last week. I'm getting an increasing flood of Asian-language spams which my mail ISP doesn't filter out (they use spamassassin, which is pretty useless for this sort of filtering). I wanted a simple pattern I could pass to egrep (via procmail) that would filter out anything with a run of more than 4 unprintable characters in a row. [^[:print:]]{4,} should do it, but it wasn't working.

The problem, it turns out, is the definition of what's printable. Apparently when the default system character set is UTF-8, just about everything is considered printable! So the trick is that you need to set LC_ALL to something more restrictive, like C (which basically means ASCII) to before :print: becomes useful for language-based filtering. (Thanks to Mikachu for spotting the problem).

So in a terminal, you can do something like

LC_ALL=C egrep -v '[^[:print:]]' filename

In procmail it was a little harder; I couldn't figure out any way to change LC_ALL from a procmail recipe; the only solution I came up with was to add this to ~/.procmailrc:

export LC_ALL=C

It does work, though, and has cut the spam load by quite a bit.

Tags: , , , ,
[ 19:35 Jul 24, 2013    More linux/cmdline | permalink to this entry | comments ]

Syndicated on:
LinuxChix Live
Ubuntu Women
Women in Free Software
Graphics Planet
DevChix
Ubuntu California
Planet Openbox
Devchix
Planet LCA2009

Friends' Blogs:
Morris "Mojo" Jones
Jane Houston Jones
Dan Heller
Long Live the Village Green
Ups & Downs
DailyBBG

Other Blogs of Interest:
DevChix
Scott Adams
Dave Barry
BoingBoing

Powered by PyBlosxom.