Shallow Thoughts : : Jan

Akkana's Musings on Open Source Computing and Technology, Science, and Nature.

Fri, 25 Jan 2019

Announcing the New Mexico Bill Tracker

For the last few weeks I've been consumed with a project I started last year and then put aside for a while: a bill tracker.

The project sprung out of frustration at the difficulty of following bills as they pass through the New Mexico legislature. Bills I was interested in would die in committee, or they would make it to a vote, and I'd read about it a few days later and wish I'd known that it was a good time to write my representative or show up at the Roundhouse to speak. (I've never spoken at the Roundhouse, and whether I'd have the courage to actually do it remains to be seen, but at least I'd like to have the chance to decide.)

New Mexico has a Legislative web site where you can see the status of each bill, and they even offer a way to register and save a list of bills; but then there's no way to get alerts about bills that change status and might be coming up for debate.

New Mexico legislative sessions are incredibly short: 60 days in odd years, 30 days in even. During last year's 30-day session, I wrote some Python code that scraped the HTML pages describing a bill, extract the useful information like when the bill last changed status and where it was right now, present the information in a table where the user could easily scan it, and email the user a daily summary. Fortunately, the nmlegis.gov site, while it doesn't offer raw data for bill status, at least uses lots of id tags in its HTML which make them relatively easy to scrape.

Then the session ended and there was no further way to test it, since bills' statuses were no longer changing. So the billtracker moved to the back burner.

In the runup to this year's 60-day session, I started with Flask, a lightweight Python web library I've used for a couple of small projects, and added some extensions that help Flask handle tasks like user accounts. Then I patched in the legislative web scraping code from last year, and the result was The New Mexico Bill Tracker. I passed the word to some friends in the League of Women Voters and the Sierra Club to help me test it, and I think (hope) it's ready for wider testing.

There's lots more I'd like to do, of course. I still have no way of knowing when a bill will be up for debate. It looks like this year the Legislative web site is showing committ schedules in a fairly standard way, as opposed to the unparseable PDFs they used in past years, so I may be able to get that. Not that legislative committees actually stick to their published schedules; but at least it's a start.

New Mexico readers (or anyone else interested in following the progress of New Mexico bills) are invited to try it. Let me know about any problems you encounter. And if you want to adapt the billtracker for use in another state, send me a note! I'd love to see it extended and would be happy to work with you. Here's the source: BillTracker on GitHub.

Tags: , , , ,
[ 12:34 Jan 25, 2019    More politics | permalink to this entry | ]

Fri, 18 Jan 2019

Python: Find Your System's Biggest CPU Hogs

My machine has recently developed an overheating problem. I haven't found a solution for that yet -- you'd think Linux would have a way to automatically kill or suspend processes based on CPU temperature, but apparently not -- but my investigations led me down one interesting road: how to write a Python script that finds CPU hogs.

The psutil module can get a list of processes with psutil.process_iter(), which returns Process objects that have a cpu_percent() call. Great! Except it always returns 0.0, even for known hogs like Firefox, or if you start up a VLC and make it play video scaled to the monitor size.

That's because cpu_percent() needs to run twice, with an interval in between: it records the elapsed run time and sees how much it changes. You can pass an interval to cpu_percent() (the units aren't documented, but apparently they're seconds). But if you're calling it on more than one process -- as you usually will be -- it's better not to wait for each process. You have to wait at least a quarter of a second to get useful numbers, and longer is better. If you do that for every process on the system, you'll be waiting a long time.

Instead, use cpu_percent() in non-blocking mode. Pass None as the interval (or leave it blank since None is the default), then loop over the process list and call proc.cpu_percent(None) on each process, throwing away the results the first time. Then sleep for a while and repeat the loop: the second time, cpu_percent() will give you useful numbers.

def hoglist(delay=5):
    '''Return a list of processes using a nonzero CPU percentage
       during the interval specified by delay (seconds),
       sorted so the biggest hog is first.
    '''
    proccesses = list(psutil.process_iter())
    for proc in proccesses:
        proc.cpu_percent(None)    # non-blocking; throw away first bogus value

    print("Sleeping ...")
    sys.stdout.flush()
    time.sleep(delay)
    print()

    procs = []
    for proc in proccesses:
        percent = proc.cpu_percent(None)
        if percent:
            procs.append((proc.name(), percent))

    print(procs)
    procs.sort(key=lambda x: x[1], reverse=True)
    return procs

if __name__ == '__main__':
    prohogscs = hoglist()
    for p in hogs:
        print("%20s: %5.2f" % p)

It's a useful trick. Though actually applying this to a daemon that responds to temperature, to solve my overheating problem, is more complicated. For one thing, you need rules about special processes. If your Firefox goes wonky and starts making your X server take lots of CPU time, you want to suspend Firefox, not the X server.

Tags: , ,
[ 15:54 Jan 18, 2019    More programming | permalink to this entry | ]

Sun, 13 Jan 2019

Snowy Views and Giant Curling Icicles

[Snowy back yard] And the snow continues to fall. We got a break of a few days, but today it's snowed fairly steadily all day, adding another -- I don't know, maybe four inches? Snow is hard to measure because it piles up so unevenly, two inches here, eight there.

[Snowshoe trail, Jemez East Fork] The hiking group I'm in went snowshoeing up in the Jemez last week -- lovely! The shrubs that managed to stick up above the snow all wore coats of ice, which fell by afternoon, littering the snow around them with an extra coat of glitter.

[icicle] And it was lovely here too, with a thick blanket of snow over everything. (I need to get some snowshoes of my own, to make it easier to explore the yard when conditions get like this, otherwise the snow would be thigh-deep in places. For the hike last week, I borrowed a pair.)

[Curling icicles] And, of course, there's the never-ending fascination of watching icicles, snow glaciers moving down the roof, and, this time, huge curving icicles growing downward above the den deck. They hung more than four feet below the roof before they finally separated and fell with a huge THUMP!, leaving a three-foot-high pile of snow that poor Dave had to shovel (I helped with shoveling at first, until I slipped and sprained my wrist; it's improving, but not enough that I can shovel ice yet).
Images of the snowstorm and the showshoe hike: Snowstorms in January 2019.

Tags: , , , , ,
[ 16:00 Jan 13, 2019    More misc | permalink to this entry | ]

Thu, 10 Jan 2019

Drawing on Slides

Years ago, I saw someone demonstrating an obscure slide presentation system, and one of the tricks it had was to let you draw on slides with the mouse. So you could underline or arrow specific points, or, more important (since underlines and arrows are easily included in slides), draw something in response to an audience question.

Neat feature, but there were other reasons I didn't want to switch to that particular slide system.

Many years later, and quite happy with my home-grown htmlpreso system for HTML-based slides, I was sitting in an astronomy panel discussion listening to someone explain black holes when it occurred to me: with HTML Canvas being a fairly mature technology, how hard could it be to add drawing to my htmlpreso setup? It would just take a javascript snippet that creates a canvas on top of the existing slide, plus some basic event handling and drawing code that surely someone else has already written. [Drawing on top of an HTML slide]

Curled up in front of the fire last night with my laptop, it only took a couple of hours to whip up a proof of concept that seems remarkably usable. I've added it to htmlpreso.

I have to confess, I've never actually felt the need to draw on a slide during a talk. But I still love knowing that it's possible. It'll be interesting to see how often I actually use it.

To play with drawing on slides, go to my HTMLPreso self-documenting slide set (with JavaScript enabled) and, on any slide, type Shift-D. Some color swatches should appear in the upper right of the slide, and now you can scribble over the tops of slides to your heart's content.

Tags: , ,
[ 14:39 Jan 10, 2019    More speaking | permalink to this entry | ]

Sun, 06 Jan 2019

Keeping track of reading

About fifteen years ago, a friend in LinuxChix blogged about doing the "50-50 Book Challenge". The goal was to read fifty new books in a year, plus another fifty old books she'd read before.

I had no idea whether this was a lot of books or not. How many books do I read in a year? I had no idea. But now I wanted to know. So I started keeping a list: not for the 50-50 challenge specifically, but just to see what the numbers were like.

It would be easy enough to do this in a spreadsheet, but I'm not really a spreadsheet kind of girl, unless there's a good reason to use one, like accounting tables or other numeric data. So I used a plain text file with a simple, readable format, like these entries from that first year, 2004:

Dragon Hunter: Roy Chapman Andrews and the Central Asiatic Expeditions, Charles Gallenkamp, Michael J. Novacek
  Fascinating account of a series of expeditions in the early 1900s
  searching for evidence of early man.  Instead, they found
  groundbreaking dinosaur discoveries, including the first evidence
  of dinosaurs protecting their eggs (Oviraptor).

Life of Pi
  Uneven, quirky, weird.  Parts of it are good, parts are awful.
  I found myself annoyed by it ... but somehow compelled to keep
  reading.  The ending may have redeemed it.

The Lions of Tsavo : Exploring the Legacy of Africa's Notorious Man-Eaters, Bruce D. Patterson
  Excellent overview of the Tsavo lion story, including some recent
  findings.  Makes me want to find the original book, which turns
  out to be public domain in Project Gutenberg.

- Bellwether, Connie Willis
  What can I say?  Connie Willis is one of my favorite writers and
  this is arguably her best book.  Everyone should read it.
  I can't imagine anyone not liking it.

If there's a punctuation mark in the first column, it's a reread. (I keep forgetting what character to use, so sometimes it's a dot, sometimes a dash, sometimes an atsign.) If there's anything else besides a space, it's a new book. Lines starting with spaces are short notes on what I thought of the book. I'm not trying to write formal reviews, just reminders. If I don't have anything in specific to say, I leave it blank or write a word or two, like "fun" or "disappointing".

Crunching the numbers

That means it's fairly easy to pull out book titles and count them with grep and wc. For years I just used simple aliases:

 All books this year: egrep '^[^ ]' books2019 | wc -l
 Just new books:      egrep '^[^ -.@]' books2019 | wc -l
 Just reread books:   egrep '^[-.@]' books2019 | wc -l

But after I had years of accumulated data I started wanting to see it all together, so I wrote a shell alias that I put in my .zshrc:

booksread() {
  setopt extendedglob
  for f in ~/Docs/Lists/books/books[0-9](#c4); do
    year=$(echo $f | sed 's/.*books//')
    let allbooks=$(egrep '^[^ ]' $f | grep -v 'Book List:' | wc -l)
    let rereads=$(egrep '^[-.@\*]' $f  | grep -v 'Book List:'| wc -l)
    printf "%4s:   All: %3d   New: %3d   rereads: %3d\n" \
           $year $allbooks $(($allbooks - $rereads)) $rereads
  done
}

In case you're curious, my numbers are all over the map:

$ booksread
2004:   All:  53   New:  44   rereads:   9
2005:   All:  51   New:  36   rereads:  15
2006:   All:  72   New:  59   rereads:  13
2007:   All:  59   New:  49   rereads:  10
2008:   All:  42   New:  33   rereads:   9
2009:   All:  56   New:  47   rereads:   9
2010:   All:  43   New:  27   rereads:  16
2011:   All:  80   New:  55   rereads:  25
2012:   All:  65   New:  58   rereads:   7
2013:   All:  59   New:  54   rereads:   5
2014:   All: 128   New: 121   rereads:   7
2015:   All: 111   New: 103   rereads:   8
2016:   All:  66   New:  62   rereads:   4
2017:   All:  57   New:  56   rereads:   1
2018:   All:  74   New:  71   rereads:   3
2019:   All:   3   New:   3   rereads:   0

So sometimes I beat that 100-book target that the 50-50 people advocated, other times not. I'm not worried about the overall numbers. Some years I race through a lot of lightweight series mysteries; other years I spend more time delving into long nonfiction books.

But I have learned quite a few interesting tidbits.

What Does it all Mean?

I expected my reread count would be quite high. As it turns out, I don't reread nearly as much as I thought. I have quite a few "comfort books" that I like to read over and over again (am I still five years old?), especially when I'm tired or ill. I sometimes feel guilty about that, like I'm wasting time when I could be improving my mind. I tell myself that it's not entirely a waste: by reading these favorite books over and over, perhaps I'll absorb some of the beautiful rhythms, strong characters, or clever plot twists, that make me love them; and that maybe some of that will carry over into my own writing. But it feels like rationalization.

But that first year, 2004, I read 44 new books and reread 9, including the Lord of the Rings trilogy that I hadn't read since I was a teenager. So I don't actually "waste" that much time on rereading. Over the years, my highest reread count was 25 in 2011, when I reread the whole Tony Hillerman series.

Is my reread count low because I'm conscious of the record-keeping, and therefore I reread less than I would otherwise? I don't think so. I'm still happy to pull out a battered copy of Tea with the Black Dragon or Bellweather or Watership Down or The Lion when I don't feel up to launching into a new book.

Another thing I wondered: would keeping count encourage me to read more short mysteries and fewer weighty non-fiction tomes? I admit I am a bit more aware of book lengths now -- oh, god, the new Stephenson is how many pages? -- but I try not to get competitive, even with myself, about numbers, and I don't let a quest for big numbers keep me from reading Blood and Thunder or The Invention of Nature. (And I had that sinking feeling about Stephenson even before I started keeping a book list. The man can write, but he could use an editor with a firm hand.)

What counts as a book? Do I feel tempted to pile up short, easy books to "get credit" for them, or to finish a bad book I'm not enjoying? Sometimes a little, but mostly no. What about novellas? What about partial reads, like skipping chapters? I decide on a case by case basis but don't stress over it. I do keep entries for books I start and don't finish (with spaces at the beginning of the line so they don't show up in the count), with notes on why I gave up on them, or where I left off if I intend to go back.

Unexpected Benefits

Keeping track of my reading has turned out to have other benefits. For instance, it prevents accidental rereads. Last year Dave checked a mystery out of the library (we read a lot of the same books, so anything one of us reads, the other will at least consider). I looked at it and said "That sounds awfully familiar. Haven't we already read it?" Sure enough, it was on my list from the previous year, and I hadn't liked it. Dave doesn't keep a book list, so he started reading, but eventually realized that he, too, had read it before.

And sometimes my memory of a book isn't very clear, and my notes on what I thought of a book are useful. Last year, on a hike, a friend and I got to talking about the efforts to eradicate rats on southern California's Channel Islands. I said "Oh, I read an interesting novel about that recently. Was it Barbara Kingsolver? No, wait ... I think it was T.C. Boyle. Interesting book, you should check it out."

When I got home, I consulted my book lists and found it in 2011:

When the Killing's Done, T.C. Boyle
  A tough slog through part 1, but it gets somewhat better in part 2
  (there are actually a few characters you don't hate, finally)
  and some plot eventually emerges, near the end of the novel.

I sent my friend an email rescinding my recommendation. I told her the book does cover some interesting details related to the rat eradication, but I'd forgotten that it was a poor excuse for a novel. In the end she decided to read it anyway, and her opinion agreed with mine. I believe she's started keeping a book list of her own now.

On the other hand, it's also good to have a record of delightful new discoveries. A gem from last year:

Mr. Penumbra's 24-hour bookstore, Robin Sloan
  Unexpectedly good! I read this because Sloan was on the Embedded
  podcast, but I didn't expect much. Turns out Sloan can write!
  Had me going from the beginning. Also, the glow-in-the-dark books
  on the cover were fun.

Even if I forget Sloan's name (sad, I know, but I have a poor memory for names), when I see a new book of his I'll know to check it out. I didn't love his second book, Sourdough, quite as much as Mr. Penumbra, but he's still an author worth following.

Tags: , , ,
[ 12:09 Jan 06, 2019    More misc | permalink to this entry | ]