Shallow Thoughts : tags : performance
Akkana's Musings on Open Source Computing and Technology, Science, and Nature.
Wed, 13 Nov 2013
Last week I wrote about some tests I'd made to answer the question
Does
scrolling output make a program slower?
My test showed that when running a program that generates lots of output,
like an rsync -av, the rsync process will slow way down as it waits for
all that output to scroll across whatever terminal client you're using.
Hiding the terminal helps a lot if it's an xterm or a Linux console,
but doesn't help much with gnome-terminal.
A couple of people asked in the comments about the actual source of
the slowdown. Is the original process -- the rsync, or my test script,
that's actually producing all that output -- actually blocking waiting
for the terminal? Or is it just that the CPU is so busy doing all that
font rendering that it has no time to devote to the original program,
and that's why it's so much slower?
I found pingu on IRC (thanks to JanC) and the group had a very
interesting discussion, during which I ran a series of additional tests.
In the end, I'm convinced that CPU allocation to the original process
is not the issue, and that output is indeed blocked waiting for the
terminal to display the output. Here's why.
First, I installed a couple of performance meters and looked at the
CPU load while rendering. With conky, CPU use went up equally (about
35-40%) on both CPU cores while the test was running. But that didn't
tell me anything about which processes were getting all that CPU.
htop was more useful. It showed X first among CPU users, xterm second,
and my test script third. However, the test script never got more than
10% of the total CPU during the test; X and xterm took up nearly all
the remaining CPU.
Even with the xterm hidden, X and xterm were the top two CPU users.
But this time the script, at number 3, got around 30% of the CPU
rather than 10%. That still doesn't seem like it could account for the
huge difference in speed (the test ran about 7 times faster with xterm
hidden); but it's interesting to know that even a hidden xterm will
take up that much CPU.
It was also suggested that I try running it to /dev/null,
something I definitely should have thought to try before.
The test took .55 seconds with its output redirected to /dev/null,
and .57 seconds redirected to a file on disk (of course, the kernel
would have been buffering, so there was no disk wait involved).
For comparison, the test had taken 56 seconds with xterm visible
and scrolling, and 8 seconds with xterm hidden.
I also spent a lot of time experimenting with sleeping for various
amounts of time between printed lines.
With time.sleep(.0001) and xterm visible, the test took 104.71 seconds.
With xterm shaded and the same sleep, it took 98.36 seconds, only 6 seconds
faster. Redirected to /dev/null but with a .0001 sleep, it took 97.44 sec.
I think this argues for the blocking theory rather than the CPU-bound one:
the argument being that the sleep gives the program a chance
to wait for the output rather than blocking the whole time.
If you figure it's CPU bound, I'm not sure how you'd explain the result.
But a .0001 second sleep probably isn't very accurate anyway -- we
were all skeptical that Linux can manage sleep times that small.
So I made another set of tests, with a .001 second sleep every 10
lines of output. The results:
65.05 with xterm visible; 63.36 with xterm hidden; 57.12 to /dev/null.
That's with a total of 50 seconds of sleeping included
(my test prints 500000 lines).
So with all that CPU still going toward font rendering, the
visible-xterm case still only took 7 seconds longer than the /dev/null
case. I think this argues even more strongly that the original test,
without the sleep, is blocking, not CPU bound.
But then I realized what the ultimate test should be. What happens when
I run the test over an ssh connection, with xterm and X running on my
local machine but the actual script running on the remote machine?
The remote machine I used for the ssh tests was a little slower than the
machine I used to run the other tests, but that probably doesn't make
much difference to the results.
The results? 60.29 sec printing over ssh (LAN) to a visible xterm;
7.24 sec doing the same thing with xterm hidden. Fairly similar to
what I'd seen before when the test, xterm and X were all running on the
same machine.
Interestingly, the ssh process during the test took 7% of my CPU,
almost as much as the python script was getting before, just to
transfer all the output lines so xterm could display them.
So I'm convinced now that the performance bottleneck has nothing to do
with the process being CPU bound and having all its CPU sucked away by
rendering the output, and that the bottleneck is in the process being
blocked in writing its output while waiting for the terminal to catch up.
I'd be interested it hear further comments -- are there other
interpretations of the results besides mine?
I'm also happy to run further tests.
Tags: performance, linux, programming, python
[
17:19 Nov 13, 2013
More linux |
permalink to this entry |
]
Fri, 08 Nov 2013
While watching my rsync -av messages scroll by during a big backup,
I wondered, as I often have, whether that -v (verbose) flag was slowing
my backup down.
In other words: when you run a program that prints lots of output,
so there's so much output the terminal can't display it all in real-time
-- like an rsync -v on lots of small files --
does the program wait ("block") while the terminal catches up?
And if the program does block, can you speed up your backup by
hiding the terminal, either by switching to another desktop, or by
iconifying or shading the terminal window so it's not visible?
Is there any difference among the different ways of hiding the
terminal, like switching desktops, iconifying and shading?
Since I've never seen a discussion of that, I decided to test it myself.
I wrote a very simple Python program:
import time
start = time.time()
for i in xrange(500000):
print "Now we have printed", i, "relatively long lines to stdout."
print time.time() - start, "seconds to print", i, "lines."
I ran it under various combinations of visible and invisible terminal.
The results were striking.
These are rounded to the nearest tenth of a second, in most cases
the average of several runs:
Terminal type | Seconds
|
xterm, visible | 56.0
|
xterm, other desktop | 8.0
|
xterm, shaded | 8.5
|
xterm, iconified | 8.0
|
Linux framebuffer, visible | 179.1
|
Linux framebuffer, hidden | 3.7
|
gnome-terminal, visible | 56.9
|
gnome-terminal, other desktop | 56.7
|
gnome-terminal, iconified | 56.7
|
gnome-terminal, shaded | 43.8
|
Discussion:
First, the answer to the original question is clear. If I'm displaying
output in an xterm, then hiding it in any way will make a huge
difference in how long the program takes to complete.
On the other hand, if you use gnome-terminal instead of xterm,
hiding your terminal window won't make much difference.
Gnome-terminal is nearly as fast as xterm when it's displaying;
but it apparently lacks xterm's smarts about not doing
that work when it's hidden. If you use gnome-terminal,
you don't get much benefit out of hiding it.
I was surprised how slow the Linux console was (I'm using the framebuffer
in the Debian 3.2.0-4-686-pae on Intel graphics). But it's easy to see
where that time is going when you watch the output: in xterm, you see
lots of blank space as xterm skips drawing lines trying to keep up
with the program's output. The framebuffer doesn't do that:
it prints and scrolls every line, no matter how far behind it gets.
But equally interesting is how much faster the framebuffer is when
it's not visible. (I typed Ctrl-alt-F2, logged in, ran the program,
then typed Ctrl-alt-F7 to go back to X while the program ran.)
Obviously xterm is doing some background processing that the framebuffer
console doesn't need to do. The absolute time difference, less than four
seconds, is too small to worry about, but it's interesting anyway.
I would have liked to try it my test a base Linux console, with no framebuffer,
but figuring out how to get a distro kernel out of framebuffer mode was
a bigger project than I wanted to tackle that afternoon.
I should mention that I wasn't super-scientific about these tests.
I avoided doing any heavy work on the machine while the tests were running,
but I was still doing light editing (like this article), reading mail and
running xchat. The times for multiple runs were quite consistent, so I
don't think my light system activity affected the results much.
So there you have it. If you're running an output-intensive program
like rsync -av and you care how fast it runs, use either xterm or the
console, and leave it hidden most of the time.
Tags: performance, linux, programming, python
[
15:17 Nov 08, 2013
More linux |
permalink to this entry |
]
Fri, 11 Sep 2009
Linux Planet requested an article on multicore processors and how to
make the most of them. Happily, I've been playing with that anyway
lately, so I was happy to oblige:
Get
the Most Out of Your Multicore Processor:
Two heads are better than one!
Tags: writing, linux, performance
[
22:59 Sep 11, 2009
More writing |
permalink to this entry |
]
Thu, 27 Aug 2009
Part 2 of my Linux bloat article looks at information you can get
from the kernel via some useful files in /proc, at three scripts
that display that info, and also at how to use exmap, an app and
kernel module that shows you a lot more about what resources your
apps are using.
How
Do You Really Measure Linux Bloat?
Tags: writing, programming, linux, performance, bloat
[
20:52 Aug 27, 2009
More writing |
permalink to this entry |
]
Thu, 13 Aug 2009
Continuing my Linux Planet series on Linux performance monitoring,
the latest article looks at bloat and how you can measure it:
Finding
and Trimming Linux Bloat.
This one just covers the basics.
The followup article, in two weeks, will dive into more detail
on how to analyze what resources programs are really using.
Tags: writing, programming, linux, performance, bloat
[
11:27 Aug 13, 2009
More writing |
permalink to this entry |
]
Tue, 21 Jul 2009
It's been a day -- or week, month -- of performance monitoring.
I'm posting this
while sitting in an excellent OSCON tutorial on Linux
System and Network Performance Monitoring, by
Darren Hoch.
It's full of great information and I'm sure his web site is
equally useful.
And it's a great extension to topic that's been occupying me
over the past few months: performance tracking to slim down
software that might be slowing a Linux system down.
That's the topic of one of my two OSCON talks this Wednesday:
"Featherweight Linux: How to turn a netbook or older laptop into a Ferrari."
Although I don't go into anywhere near the detail Darren does,
a lot of the principles are the same, and I know I'll find a use
for a lot of his techniques. The talk also includes a free bonus
tourist tip for San Jose visitors.
Today's Linux Planet article is related to my Featherweight talk:
What's
Bogging Down Your Linux PC? Tracking Down Resource Hogs.
Usually they publish my articles on Thursdays, but I asked for an
early release since it's related to tomorrow's talk.
For anyone at OSCON in San Jose, I hope you can come to Featherweight late
Wednesday afternoon, or to my other talk, Wednesday just after lunch,
"Bug Fixing for Everyone (even non-programmers!)" where I'll go over
the steps programmers use while fixing bugs, and show that anyone can
fix simple bugs even without any prior knowledge of programming.
Tags: linux, performance, conferences, oscon09
[
11:58 Jul 21, 2009
More conferences |
permalink to this entry |
]
Thu, 02 Jul 2009
Suspend (sleep) works very well on the dual-Atom desktop. The only
problem with it is that the mouse or keyboard wake it up. I don't mind
the keyboard, but the mouse is quite sensitive, so a breeze through
the window or a big truck driving by on the street can jiggle the
mouse and wake the machine when I'm away.
I've been through all the BIOS screens looking for a setting to flip,
but there's nothing there. Some web searching told me that under
Windows, there's a setting you can change that will affect this,
but I couldn't find anything similar for Linux, until finally
drc clued me in to /proc/acpi/wakeup.
cat /proc/acpi/wakeup
will tell you all the events that can cause your machine to wake up
from various sleep states.
Unfortunately, they're also obscurely coded. Here are mine:
Device S-state Status Sysfs node
SLPB S4 *enabled
P32 S4 disabled pci:0000:00:1e.0
UAR1 S4 enabled pnp:00:0a
PEX0 S4 disabled pci:0000:00:1c.0
PEX1 S4 disabled
PEX2 S4 disabled pci:0000:00:1c.2
PEX3 S4 disabled pci:0000:00:1c.3
PEX4 S4 disabled
PEX5 S4 disabled
UHC1 S3 disabled pci:0000:00:1d.0
UHC2 S3 disabled pci:0000:00:1d.1
UHC3 S3 disabled pci:0000:00:1d.2
UHC4 S3 disabled pci:0000:00:1d.3
EHCI S3 disabled pci:0000:00:1d.7
AC9M S4 disabled
AZAL S4 disabled pci:0000:00:1b.0
What do all those symbols mean? I have no clue. Apparently the codes
come from the BIOS's DSDT code, and since it varies from board to
board, nobody has published tables of likely translations.
The only two wakeups that were enabled for me were SLPB and UAR1.
SLPB apparently stands for SLeeP Button, and Rik suggested UAR
probably stood for Universal Asynchronous Receiver (the more familiar
term UART both receives and Transmits.)
Some of the other devices in the list can possibly be identified by
comparing their pci: codes against lspci
, but not those two.
Time for some experimentation.
You can toggle any of these by writing to the wakeup device:
echo UAR1 >/proc/acpi/wakeup
It turned out that to disable mouse and keyboard wakeup, I had to
disable both SLPB and UAR1. With both disabled, the machine wakes
up when I press the power button.
(What the SLeeP Button is, if it's not the power button, I don't know.)
My mouse and keyboard are PS/2. For a USB mouse and keyboard, look
for something like USB0, UHC0, USB1.
The UAR1 setting is remembered even across boots: there's no need to
do anything to make sure the setting is remembered. But the SLPB
setting resets every time I boot. So I edited /etc/rc.local and
added this line:
echo SLPB >/proc/acpi/wakeup
Tags: linux, kernel, performance
[
10:21 Jul 02, 2009
More linux/kernel |
permalink to this entry |
]
Sun, 11 Jan 2009
Update: For details on how to edit Firefox history rather than just
delete it, see this later post:
Removing Bad Autocompletes from Firefox's Location Bar.
Firefox decided, some time ago, that whenever I try to type in a local
file pathname, as soon as I start typing /home/... I must be
looking for one specific file: an article I wrote over two months
ago and am long since done with.
Usually it happens when I'm trying to preview a new article.
I no longer have any interest in my local copy of that old article;
it's not bookmarked or anything like that; I haven't viewed it in
quite some time. But try to tell Firefox that. It's convinced that
the old one (why that one, and not one of the more recent ones?)
is what I want.
A recursive grep in ~/.mozilla/firefox showed that the only reference
to the old unwanted file was in the binary file places.sqlite.
My places.sqlite was 11Mb. I look through the Prefs window
showed that the default setting was to store history for minimum of
90 days. That seemed rather excessive, so I reduced it drastically.
But that didn't reduce the size of the file any, nor did it banish
the spurious URLbar suggestion when I typed /home/....
After some discussion with folks on IRC, it developed that Firefox
may never actually reduce the size of the places.sqlite file at all.
Even if it did reduce the amount of data in the file (which it's not
clear it does), it never tells sqlite to compact the file to use less
space. Apparently there was some work on that about a year ago, but it
was slow and unreliable and they never got it working, and
eventually gave up on it.
You can run an sqlite compaction by hand (make sure to exit your
running firefox first!):
sqlite3 places.sqlite vacuum
But vacuuming really didn't help much. It reduced the size of the file
from 11 to 8.8 Mb (after reducing the number of days firefox was
supposed to store to less than a third of the original size) and it
didn't get rid of that spurious suggestion.
So the only remaining option seemed to be to remove the file.
It stores both history and bookmarks, so it's best to back up
bookmarks before removing it. I backed up bookmarks to the
.json format firefox likes to use for backups, and also exported
them to a more human (and browser) readable bookmarks.html.
Then I removed the places.sqlite file.
Success! The spurious recommendation was gone. Typing seems faster
too (less of those freezes while the "awesomebar" searches through
its list of recommendations).
So I guess firefox can't be trusted to clean up after itself,
and users who care have to do that manually.
It remains to be seen how much the file will grow now. I expect
periodic vacuumings or removals will still be warranted if I don't
want a huge file; but it's pretty easy to do, and firefox found the
bookmarks backup and reloaded them without any extra work on my part.
In the meantime, I made a new bookmark -- hidden in my bookmarklets
menu so it doesn't clutter the main bookmarks menu -- to the
directory where I preview articles I'm writing. That ought to help a
bit with future URLbar suggestions.
Tags: firefox, mozilla, performance
[
12:00 Jan 11, 2009
More tech/web |
permalink to this entry |
]
Fri, 31 Oct 2008
Quite a while ago I noticed that drag-n-drop of images from Firefox
had stopped working for me in GIMP's trunk builds (2.6 and 2.7);
it failed with a "file not found" error. Opening URIs with
Open
Location also failed in the same way.
Since I don't run a gnome desktop, I assumed it probably had something
to do with requiring gnome-vfs services that I don't have. But
yesterday I finally got some time to chase it down with help from
various folk on #gimp.
I had libgnomevfs (and its associated dev package) installed on my
Ubuntu Hardy machine, but I didn't have gvfs. It was suggested that
I install the gfvs-backends package. I tried that, but it
didn't help; apparently gvfs requires not just libgvfs and
gvfs-backends, but also running a new daemon, gvfsd.
Finding an alternative was starting to sound appealing.
Turns out gimp now has three compile-time
configure options related to opening URIs:
--without-gvfs build without GIO/GVfs support
--without-gnomevfs build without gnomevfs support
--without-libcurl build without curl support
These correspond to four URI-getting methods in the source, in
plug-ins/file-uri:
- uri-backend-gvfs.c
- uri-backend-gnomevfs.c
- uri-backend-libcurl.c
- uri-backend-wget.c
GIMP can degrade from gvfs to gnomevfs to libcurl to wget, but only at
compile time, not at runtime: only one of the four is built.
On my desktop machine, --without-gvfs
was all I needed.
Even without running the gnome desktop, the gnomevfs
front-end seems to work fine. But it's good to know about the other
options too, in case I need to make a non-gnomevfs version to run on
the laptop or other lightweight machines.
Tags: gimp, desktop, performance, gnome
[
12:09 Oct 31, 2008
More gimp |
permalink to this entry |
]
Sat, 16 Aug 2008
Last night Joao and I were on IRC helping someone who was learning
to write gimp plug-ins. We got to talking about pixel operations and
how to do them in Python. I offered my arclayer.py as an example of
using pixel regions in gimp, but added that C is a lot faster for
pixel operations. I wondered if reading directly from the tiles
(then writing to a pixel region) might be faster.
But Joao knew a still faster way. As I understand it, one major reason
Python is slow at pixel region operations compared to a C plug-in is
that Python only writes to the region one pixel at a time, while C can
write batches of pixels by row, column, etc. But it turns out you
can grab a whole pixel region into a Python array, manipulate it as
an array then write the whole array back to the region. He thought
this would probably be quite a bit faster than writing to the pixel
region for every pixel.
He showed me how to change the arclayer.py code to use arrays,
and I tried it on a few test layers. Was it faster?
I made a test I knew would take a long time in arclayer,
a line of text about 1500 pixels wide. Tested it in the old arclayer;
it took just over a minute to calculate the arc. Then I tried Joao's
array version: timing with my wristwatch stopwatch, I call it about
1.7 seconds. Wow! That might be faster than the C version.
The updated, fast version (0.3) of arclayer.py is on my
arclayer page.
If you just want the trick to using arrays, here it is:
from array import array
[ ... setting up ... ]
# initialize the regions and get their contents into arrays:
srcRgn = layer.get_pixel_rgn(0, 0, srcWidth, srcHeight,
False, False)
src_pixels = array("B", srcRgn[0:srcWidth, 0:srcHeight])
dstRgn = destDrawable.get_pixel_rgn(0, 0, newWidth, newHeight,
True, True)
p_size = len(srcRgn[0,0])
dest_pixels = array("B", "\x00" * (newWidth * newHeight * p_size))
[ ... then inside the loop over x and y ... ]
src_pos = (x + srcWidth * y) * p_size
dest_pos = (newx + newWidth * newy) * p_size
newval = src_pixels[src_pos: src_pos + p_size]
dest_pixels[dest_pos : dest_pos + p_size] = newval
[ ... when the loop is all finished ... ]
# Copy the whole array back to the pixel region:
dstRgn[0:newWidth, 0:newHeight] = dest_pixels.tostring()
Good stuff!
Tags: gimp, python, programming, performance
[
22:02 Aug 16, 2008
More gimp |
permalink to this entry |
]
Sat, 20 Oct 2007
I remember a few years ago the Mozilla folks were making a lot of
noise about the "blazingly fast Back/Forward" that was coming up
in the (then) next version of Firefox. The idea was that the layout
engine was going to remember how the page was laid out (technically,
there would be a "frame cache" as opposed to the normal cache which
only remembers the HTML of the page). So when you click the Back
button, Firefox would remember everything it knew about that page --
it wouldn't have to parse the HTML again or figure out how to lay
out all those tables and images, it would just instantly display
what the page looked like last time.
Time passed ... and Back/Forward didn't get faster. In fact, they
got a lot slower. The "Blazingly Fast Back" code did get checked in
(here's
how to enable it)
but somehow it never seemed to make any difference.
The problem, it turns out, is that the landing of
bug
101832 added code to respect a couple of HTTP Cache-Control header
settings, no-store and no-cache. There's also a
third cache control header, must-revalidate, which is similar
(the difference among the three settings is fairly subtle, and
Firefox seems to treat them pretty much the same way).
Translated, that means that web servers, when they send you a page,
can send some information along with the page that asks the browser
"Please don't keep a local copy of this page -- any time you want
it again, go back to the web and get a new copy."
There are pages for which this makes sense. Consider a secure bank
site. You log in, you do your banking, you view your balance and other
details, you log out and go to lunch ... then someone else comes by
and clicks Back on your browser and can now see all those bank
pages you were just viewing. That's why banks like to set no-cache
headers.
But those are secure pages (https, not http). There are probably
reasons for some non-secure pages to use no-cache or no-store
... um ... I can't think of any offhand, but I'm sure there are some.
But for most pages it's just silly. If I click Back, why wouldn't I
want to go back to the exact same page I was just looking at?
Why would I want to wait for it to reload everything from the server?
The problem is that modern Content Management Systems (CMSes) almost
always set one or more of these headers. Consider the
Linux.conf.au site.
Linx.conf.au is one of the most clueful, geeky conferences around.
Yet the software running their site sets
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
on every page. I'm sure this isn't intentional -- it makes no sense for
a bunch of basically static pages showing information about a
conference several months away. Drupal, the CMS used by
LinuxChix sets
Cache-Control: must-revalidate
-- again, pointless.
All it does is make you afraid to click on links because then
if you want to go Back it'll take forever. (I asked some Drupal
folks about this and they said it could be changed with
drupal_set_header).
(By the way, you can check the http headers on any page with:
wget -S -O /dev/null http://...
or, if you have curl,
curl --head http://...
)
Here's an excellent summary of the options in an
Opera developer's
blog, explaining why the way Firefox handle caching is not only
unfriendly to the user, but also wrong according to the specs.
(Darn it, reading sensible articles like that make me wish I wasn't
so deeply invested in Mozilla technology -- Opera cares so much
more about the user experience.)
But, short of a switch to Opera, how could I fix it on my end?
Google wasn't any help, but I figured that this must be a reported
Mozilla bug, so I turned to Bugzilla and found quite a lot.
Here's the scoop. First, the code to respect the cache settings
(slowing down Back/Forward) was apparently added in response to bug 101832.
People quickly noticed the performance problem, and filed
112564.
(This was back in late 2001.) There was a long debate,
but in the end, a fix was checked in which allowed no-cache http
(non-secure) sites to cache and get a fast Back/Forward.
This didn't help no-store and must-revalidate sites, which
were still just as slow as ever.
Then a few months later,
bug
135289 changed this code around quite a bit. I'm still getting
my head around the code involved in the two bugs, but I think this
update didn't change the basic rules governing which
pages get revalidated.
(Warning: geekage alert for next two paragraphs.
Use this fix at your own risk, etc.)
Unfortunately, it looks like the only way to fix this is in the
C++ code. For folks not afraid of building Firefox, the code lives in
nsDocShell::ShouldDiscardLayoutState and controls the no-cache and
no-store directives. In nsDocShell::ShouldDiscardLayoutState
(currently lie 8224, but don't count on it), the final line is:
return (noStore || (noCache && securityInfo));
Change that to
return ((noStore || noCache) && securityInfo);
and Back/Forward will get instantly faster, while still preserving
security for https. (If you don't care about that security issue
and want pages to cache no matter what, just replace the whole
function with
return PR_FALSE;
)
The must-validate setting is handled in a completely different
place, in nsHttpChannel.
However, for some reason, fixing nsDocShell also fixes Drupal pages
which set only must-validate. I don't quite understand why yet.
More study required.
(End geekage.)
Any Mozilla folks are welcome to tell me why I shouldn't be doing
this, or if there's a better way (especially if it's possible in a
way that would work from an extension or preference).
I'd also be interested in from Drupal or other CMS folks defending why
so many CMSes destroy the user experience like this. But please first
read the Opera article referenced above, so that you understand why I
and so many other users have complained about it. I'm happy to share
any comments I receive (let me know if you want your comments to
be public or not).
Tags: tech, web, mozilla, firefox, performance
[
20:32 Oct 20, 2007
More tech/web |
permalink to this entry |
]