Shallow Thoughts : : programming

Akkana's Musings on Open Source Computing and Technology, Science, and Nature.

Sat, 25 Mar 2017

Reading keypresses in Python

As part of preparation for Everyone Does IT, I was working on a silly hack to my Python script that plays notes and chords: I wanted to use the computer keyboard like a music keyboard, and play different notes when I press different keys. Obviously, in a case like that I don't want line buffering -- I want the program to play notes as soon as I press a key, not wait until I hit Enter and then play the whole line at once. In Unix that's called "cbreak mode".

There are a few ways to do this in Python. The most straightforward way is to use the curses library, which is designed for console based user interfaces and games. But importing curses is overkill just to do key reading.

Years ago, I found a guide on the official Python Library and Extension FAQ: Python: How do I get a single keypress at a time?. I'd even used it once, for a one-off Raspberry Pi project that I didn't end up using much. I hadn't done much testing of it at the time, but trying it now, I found a big problem: it doesn't block.

Blocking is whether the read() waits for input or returns immediately. If I read a character with c = sys.stdin.read(1) but there's been no character typed yet, a non-blocking read will throw an IOError exception, while a blocking read will waitm not returning until the user types a character.

In the code on that Python FAQ page, blocking looks like it should be optional. This line:

fcntl.fcntl(fd, fcntl.F_SETFL, oldflags | os.O_NONBLOCK)
is the part that requests non-blocking reads. Skipping that should let me read characters one at a time, block until each character is typed. But in practice, it doesn't work. If I omit the O_NONBLOCK flag, reads never return, not even if I hit Enter; if I set O_NONBLOCK, the read immediately raises an IOError. So I have to call read() over and over, spinning the CPU at 100% while I wait for the user to type something.

The way this is supposed to work is documented in the termios man page. Part of what tcgetattr returns is something called the cc structure, which includes two members called Vmin and Vtime. man termios is very clear on how they're supposed to work: for blocking, single character reads, you set Vmin to 1 (that's the number of characters you want it to batch up before returning), and Vtime to 0 (return immediately after getting that one character). But setting them in Python with tcsetattr doesn't make any difference.

(Python also has a module called tty that's supposed to simplify this stuff, and you should be able to call tty.setcbreak(fd). But that didn't work any better than termios: I suspect it just calls termios under the hood.)

But after a few hours of fiddling and googling, I realized that even if Python's termios can't block, there are other ways of blocking on input. The select system call lets you wait on any file descriptor until has input. So I should be able to set stdin to be non-blocking, then do my own blocking by waiting for it with select.

And that worked. Here's a minimal example:

import sys, os
import termios, fcntl
import select

fd = sys.stdin.fileno()
newattr = termios.tcgetattr(fd)
newattr[3] = newattr[3] & ~termios.ICANON
newattr[3] = newattr[3] & ~termios.ECHO
termios.tcsetattr(fd, termios.TCSANOW, newattr)

oldterm = termios.tcgetattr(fd)
oldflags = fcntl.fcntl(fd, fcntl.F_GETFL)
fcntl.fcntl(fd, fcntl.F_SETFL, oldflags | os.O_NONBLOCK)

print "Type some stuff"
while True:
    inp, outp, err = select.select([sys.stdin], [], [])
    c = sys.stdin.read()
    if c == 'q':
        break
    print "-", c

# Reset the terminal:
termios.tcsetattr(fd, termios.TCSAFLUSH, oldterm)
fcntl.fcntl(fd, fcntl.F_SETFL, oldflags)

A less minimal example: keyreader.py, a class to read characters, with blocking and echo optional. It also cleans up after itself on exit, though most of the time that seems to happen automatically when I exit the Python script.

Tags: ,
[ 12:40 Mar 25, 2017    More programming | permalink to this entry | comments ]

Fri, 24 Feb 2017

Coder Dojo: Kids Teaching Themselves Programming

We have a terrific new program going on at Los Alamos Makers: a weekly Coder Dojo for kids, 6-7 on Tuesday nights.

Coder Dojo is a worldwide movement, and our local dojo is based on their ideas. Kids work on programming projects to earn colored USB wristbelts, with the requirements for belts getting progressively harder. Volunteer mentors are on hand to help, but we're not lecturing or teaching, just coaching.

Despite not much advertising, word has gotten around and we typically have 5-7 kids on Dojo nights, enough that all the makerspace's Raspberry Pi workstations are filled and we sometimes have to scrounge for more machines for the kids who don't bring their own laptops.

A fun moment early on came when we had a mentor meeting, and Neil, our head organizer (who deserves most of the credit for making this program work so well), looked around and said "One thing that might be good at some point is to get more men involved." Sure enough -- he was the only man in the room! For whatever reason, most of the programmers who have gotten involved have been women. A refreshing change from the usual programming group. (Come to think of it, the PEEC web development team is three women. A girl could get a skewed idea of gender demographics, living here.) The kids who come to program are about 40% girls.

I wondered at the beginning how it would work, with no lectures or formal programs. Would the kids just sit passively, waiting to be spoon fed? How would they get concepts like loops and conditionals and functions without someone actively teaching them?

It wasn't a problem. A few kids have some prior programming practice, and they help the others. Kids as young as 9 with no previous programming experience walk it, sit down at a Raspberry Pi station, and after five minutes of being shown how to bring up a Python console and use Python's turtle graphics module to draw a line and turn a corner, they're happily typing away, experimenting and making Python draw great colorful shapes.

Python-turtle turns out to be a wonderful way for beginners to learn. It's easy to get started, it makes pretty pictures, and yet, since it's Python, it's not just training wheels: kids are using a real programming language from the start, and they can search the web and find lots of helpful examples when they're trying to figure out how to do something new (just like professional programmers do. :-)

Initially we set easy requirements for the first (white) belt: attend for three weeks, learn the names of other Dojo members. We didn't require any actual programming until the second (yellow) belt, which required writing a program with two of three elements: a conditional, a loop, a function.

That plan went out the window at the end of the first evening, when two kids had already fulfilled the yellow belt requirements ... even though they were still two weeks away from the attendance requirement for the white belt. One of them had never programmed before. We've since scrapped the attendance belt, and now the white belt has the conditional/loop/function requirement that used to be the yellow belt.

The program has been going for a bit over three months now. We've awarded lots of white belts and a handful of yellows (three new ones just this week). Although most of the kids are working in Python, there are also several playing music or running LED strips using Arduino/C++, writing games and web pages in Javascript, writing adventure games Scratch, or just working through Khan Academy lectures.

When someone is ready for a belt, they present their program to everyone in the room and people ask questions about it: what does that line do? Which part of the program does that? How did you figure out that part? Then the mentors review the code over the next week, and they get the belt the following week.

For all but the first belt, helping newer members is a requirement, though I suspect even without that they'd be helping each other. Sit a first-timer next to someone who's typing away at a Python program and watch the magic happen. Sometimes it feels almost superfluous being a mentor. We chat with the kids and each other, work on our own projects, shoulder-surf, and wait for someone to ask for help with harder problems.

Overall, a terrific program, and our only problems now are getting funding for more belts and more workstations as the word spreads and our Dojo nights get more crowded. I've had several adults ask me if there was a comparable program for adults. Maybe some day (I hope).

Tags: ,
[ 13:46 Feb 24, 2017    More programming | permalink to this entry | comments ]

Mon, 23 Jan 2017

Testing a GitHub Pull Request

Several times recently I've come across someone with a useful fix to a program on GitHub, for which they'd filed a GitHub pull request.

The problem is that GitHub doesn't give you any link on the pull request to let you download the code in that pull request. You can get a list of the checkins inside it, or a list of the changed files so you can view the differences graphically. But if you want the code on your own computer, so you can test it, or use your own editors and diff tools to inspect it, it's not obvious how. That this is a problem is easily seen with a web search for something like download github pull request -- there are huge numbers of people asking how, and most of the answers are vague unclear.

That's a shame, because it turns out it's easy to pull a pull request. You can fetch it directly with git into a new branch as long as you have the pull request ID. That's the ID shown on the GitHub pull request page:

[GitHub pull request screenshot]

Once you have the pull request ID, choose a new name for your branch, then fetch it:

git fetch origin pull/PULL-REQUEST_ID/head:NEW-BRANCH-NAME
git checkout NEW-BRANCH-NAME

Then you can view diffs with something like git difftool NEW-BRANCH-NAME..master

Easy! GitHub should give a hint of that on its pull request pages.

Fetching a Pull Request diff to apply it to another tree

But shortly after I learned how to apply a pull request, I had a related but different problem in another project. There was a pull request for an older repository, but the part it applied to had since been split off into a separate project. (It was an old pull request that had fallen through the cracks, and as a new developer on the project, I wanted to see if I could help test it in the new repository.)

You can't pull a pull request that's for a whole different repository. But what you can do is go to the pull request's page on GitHub. There are 3 tabs: Conversation, Commits, and Files changed. Click on Files changed to see the diffs visually.

That works if the changes are small and only affect a few files (which fortunately was the case this time). It's not so great if there are a lot of changes or a lot of files affected. I couldn't find any "Raw" or "download" button that would give me a diff I could actually apply. You can select all and then paste the diffs into a local file, but you have to do that separately for each file affected. It might be, if you have a lot of files, that the best solution is to check out the original repo, apply the pull request, generate a diff locally with git diff, then apply that diff to the new repo. Rather circuitous. But with any luck that situation won't arise very often.

Update: thanks very much to Houz for the solution! (In the comments, below.) Just append .diff or .patch to the pull request URL, e.g. https://github.com/OWNER/REPO/pull/REQUEST-ID.diff which you can view in a browser or fetch with wget or curl.

Tags: , ,
[ 14:34 Jan 23, 2017    More programming | permalink to this entry | comments ]

Thu, 19 Jan 2017

Plotting Shapes with Python Basemap wwithout Shapefiles

In my article on Plotting election (and other county-level) data with Python Basemap, I used ESRI shapefiles for both states and counties.

But one of the election data files I found, OpenDataSoft's USA 2016 Presidential Election by county had embedded county shapes, available either as CSV or as GeoJSON. (I used the CSV version, but inside the CSV the geo data are encoded as JSON so you'll need JSON decoding either way. But that's no problem.)

Just about all the documentation I found on coloring shapes in Basemap assumed that the shapes were defined as ESRI shapefiles. How do you draw shapes if you have latitude/longitude data in a more open format?

As it turns out, it's quite easy, but it took a fair amount of poking around inside Basemap to figure out how it worked.

In the loop over counties in the US in the previous article, the end goal was to create a matplotlib Polygon and use that to add a Basemap patch. But matplotlib's Polygon wants map coordinates, not latitude/longitude.

If m is your basemap (i.e. you created the map with m = Basemap( ... ), you can translate coordinates like this:

    (mapx, mapy) = m(longitude, latitude)

So once you have a region as a list of (longitude, latitude) coordinate pairs, you can create a colored, shaped patch like this:

    for coord_pair in region:
        coord_pair[0], coord_pair[1] = m(coord_pair[0], coord_pair[1])
    poly = Polygon(region, facecolor=color, edgecolor=color)
    ax.add_patch(poly)

Working with the OpenDataSoft data file was actually a little harder than that, because the list of coordinates was JSON-encoded inside the CSV file, so I had to decode it with json.loads(county["Geo Shape"]). Once decoded, it had some counties as a Polygonlist of lists (allowing for discontiguous outlines), and others as a MultiPolygonlist of list of lists (I'm not sure why, since the Polygon format already allows for discontiguous boundaries)

[Blue-red-purple 2016 election map]

And a few counties were missing, so there were blanks on the map, which show up as white patches in this screenshot. The counties missing data either have inconsistent formatting in their coordinate lists, or they have only one coordinate pair, and they include Washington, Virginia; Roane, Tennessee; Schley, Georgia; Terrell, Georgia; Marshall, Alabama; Williamsburg, Virginia; and Pike Georgia; plus Oglala Lakota (which is clearly meant to be Oglala, South Dakota), and all of Alaska.

One thing about crunching data files from the internet is that there are always a few special cases you have to code around. And I could have gotten those coordinates from the census shapefiles; but as long as I needed the census shapefile anyway, why use the CSV shapes at all? In this particular case, it makes more sense to use the shapefiles from the Census.

Still, I'm glad to have learned how to use arbitrary coordinates as shapes, freeing me from the proprietary and annoying ESRI shapefile format.

The code: Blue-red map using CSV with embedded county shapes

Tags: , , , , ,
[ 09:36 Jan 19, 2017    More programming | permalink to this entry | comments ]

Sat, 14 Jan 2017

Plotting election (and other county-level) data with Python Basemap

After my arduous search for open 2016 election data by county, as a first test I wanted one of those red-blue-purple charts of how Democratic or Republican each county's vote was.

I used the Basemap package for plotting. It used to be part of matplotlib, but it's been split off into its own toolkit, grouped under mpl_toolkits: on Debian, it's available as python-mpltoolkits.basemap, or you can find Basemap on GitHub.

It's easiest to start with the fillstates.py example that shows how to draw a US map with different states colored differently. You'll need the three shapefiles (because of ESRI's silly shapefile format): st99_d00.dbf, st99_d00.shp and st99_d00.shx, available in the same examples directory.

Of course, to plot counties, you need county shapefiles as well. The US Census has county shapefiles at several different resolutions (I used the 500k version). Then you can plot state and counties outlines like this:

from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt

def draw_us_map():
    # Set the lower left and upper right limits of the bounding box:
    lllon = -119
    urlon = -64
    lllat = 22.0
    urlat = 50.5
    # and calculate a centerpoint, needed for the projection:
    centerlon = float(lllon + urlon) / 2.0
    centerlat = float(lllat + urlat) / 2.0

    m = Basemap(resolution='i',  # crude, low, intermediate, high, full
                llcrnrlon = lllon, urcrnrlon = urlon,
                lon_0 = centerlon,
                llcrnrlat = lllat, urcrnrlat = urlat,
                lat_0 = centerlat,
                projection='tmerc')

    # Read state boundaries.
    shp_info = m.readshapefile('st99_d00', 'states',
                               drawbounds=True, color='lightgrey')

    # Read county boundaries
    shp_info = m.readshapefile('cb_2015_us_county_500k',
                               'counties',
                               drawbounds=True)

if __name__ == "__main__":
    draw_us_map()
    plt.title('US Counties')
    # Get rid of some of the extraneous whitespace matplotlib loves to use.
    plt.tight_layout(pad=0, w_pad=0, h_pad=0)
    plt.show()
[Simple map of US county borders]

Accessing the state and county data after reading shapefiles

Great. Now that we've plotted all the states and counties, how do we get a list of them, so that when I read out "Santa Clara, CA" from the data I'm trying to plot, I know which map object to color?

After calling readshapefile('st99_d00', 'states'), m has two new members, both lists: m.states and m.states_info.

m.states_info[] is a list of dicts mirroring what was in the shapefile. For the Census state list, the useful keys are NAME, AREA, and PERIMETER. There's also STATE, which is an integer (not restricted to 1 through 50) but I'll get to that.

If you want the shape for, say, California, iterate through m.states_info[] looking for the one where m.states_info[i]["NAME"] == "California". Note i; the shape coordinates will be in m.states[i]n (in basemap map coordinates, not latitude/longitude).

Correlating states and counties in Census shapefiles

County data is similar, with county names in m.counties_info[i]["NAME"]. Remember that STATE integer? Each county has a STATEFP, m.counties_info[i]["STATEFP"] that matches some state's m.states_info[i]["STATE"].

But doing that search every time would be slow. So right after calling readshapefile for the states, I make a table of states. Empirically, STATE in the state list goes up to 72. Why 72? Shrug.

    MAXSTATEFP = 73
    states = [None] * MAXSTATEFP
    for state in m.states_info:
        statefp = int(state["STATE"])
        # Many states have multiple entries in m.states (because of islands).
        # Only add it once.
        if not states[statefp]:
            states[statefp] = state["NAME"]

That'll make it easy to look up a county's state name quickly when we're looping through all the counties.

Calculating colors for each county

Time to figure out the colors from the Deleetdk election results CSV file. Reading lines from the CSV file into a dictionary is superficially easy enough:

    fp = open("tidy_data.csv")
    reader = csv.DictReader(fp)

    # Make a dictionary of all "county, state" and their colors.
    county_colors = {}
    for county in reader:
        # What color is this county?
        pop = float(county["votes"])
        blue = float(county["results.clintonh"])/pop
        red = float(county["Total.Population"])/pop
        county_colors["%s, %s" % (county["name"], county["State"])] \
            = (red, 0, blue)

But in practice, that wasn't good enough, because the county names in the Deleetdk names didn't always match the official Census county names.

Fuzzy matches

For instance, the CSV file had no results for Alaska or Puerto Rico, so I had to skip those. Non-ASCII characters were a problem: "Doña Ana" county in the census data was "Dona Ana" in the CSV. I had to strip off " County", " Borough" and similar terms: "St Louis" in the census data was "St. Louis County" in the CSV. Some names were capitalized differently, like PLYMOUTH vs. Plymouth, or Lac Qui Parle vs. Lac qui Parle. And some names were just different, like "Jeff Davis" vs. "Jefferson Davis".

To get around that I used SequenceMatcher to look for fuzzy matches when I couldn't find an exact match:

def fuzzy_find(s, slist):
    '''Try to find a fuzzy match for s in slist.
    '''
    best_ratio = -1
    best_match = None

    ls = s.lower()
    for ss in slist:
        r = SequenceMatcher(None, ls, ss.lower()).ratio()
        if r > best_ratio:
            best_ratio = r
            best_match = ss
    if best_ratio > .75:
        return best_match
    return None

Correlate the county names from the two datasets

It's finally time to loop through the counties in the map to color and plot them.

Remember STATE vs. STATEFP? It turns out there are a few counties in the census county shapefile with a STATEFP that doesn't match any STATE in the state shapefile. Mostly they're in the Virgin Islands and I don't have election data for them anyway, so I skipped them for now. I also skipped Puerto Rico and Alaska (no results in the election data) and counties that had no corresponding state: I'll omit that code here, but you can see it in the final script, linked at the end.

    for i, county in enumerate(m.counties_info):
        countyname = county["NAME"]
        try:
            statename = states[int(county["STATEFP"])]
        except IndexError:
            print countyname, "has out-of-index statefp of", county["STATEFP"]
            continue

        countystate = "%s, %s" % (countyname, statename)
        try:
            ccolor = county_colors[countystate]
        except KeyError:
            # No exact match; try for a fuzzy match
            fuzzyname = fuzzy_find(countystate, county_colors.keys())
            if fuzzyname:
                ccolor = county_colors[fuzzyname]
                county_colors[countystate] = ccolor
            else:
                print "No match for", countystate
                continue

        countyseg = m.counties[i]
        poly = Polygon(countyseg, facecolor=ccolor)  # edgecolor="white"
        ax.add_patch(poly)

Moving Hawaii

Finally, although the CSV didn't have results for Alaska, it did have Hawaii. To display it, you can move it when creating the patches:

    countyseg = m.counties[i]
    if statename == 'Hawaii':
        countyseg = list(map(lambda (x,y): (x + 5750000, y-1400000), countyseg))
    poly = Polygon(countyseg, facecolor=countycolor)
    ax.add_patch(poly)
The offsets are in map coordinates and are empirical; I fiddled with them until Hawaii showed up at a reasonable place. [Blue-red-purple 2016 election map]

Well, that was a much longer article than I intended. Turns out it takes a fair amount of code to correlate several datasets and turn them into a map. But a lot of the work will be applicable to other datasets.

Full script on GitHub: Blue-red map using Census county shapefile

Tags: , , , , , ,
[ 15:10 Jan 14, 2017    More programming | permalink to this entry | comments ]

Thu, 22 Dec 2016

Tips on Developing Python Projects for PyPI

I wrote two recent articles on Python packaging: Distributing Python Packages Part I: Creating a Python Package and Distributing Python Packages Part II: Submitting to PyPI. I was able to get a couple of my programs packaged and submitted.

Ongoing Development and Testing

But then I realized all was not quite right. I could install new releases of my package -- but I couldn't run it from the source directory any more. How could I test changes without needing to rebuild the package for every little change I made?

Fortunately, it turned out to be fairly easy. Set PYTHONPATH to a directory that includes all the modules you normally want to test. For example, inside my bin directory I have a python directory where I can symlink any development modules I might need:

mkdir ~/bin/python
ln -s ~/src/metapho/metapho ~/bin/python/

Then add the directory at the beginning of PYTHONPATH:

export PYTHONPATH=$HOME/bin/python

With that, I could test from the development directory again, without needing to rebuild and install a package every time.

Cleaning up files used in building

Building a package leaves some extra files and directories around, and git status will whine at you since they're not version controlled. Of course, you could gitignore them, but it's better to clean them up after you no longer need them.

To do that, you can add a clean command to setup.py.

from setuptools import Command

class CleanCommand(Command):
    """Custom clean command to tidy up the project root."""
    user_options = []
    def initialize_options(self):
        pass
    def finalize_options(self):
        pass
    def run(self):
        os.system('rm -vrf ./build ./dist ./*.pyc ./*.tgz ./*.egg-info ./docs/sphinxdoc/_build')
(Obviously, that includes file types beyond what you need for just cleaning up after package building. Adjust the list as needed.)

Then in the setup() function, add these lines:

      cmdclass={
          'clean': CleanCommand,
      }

Now you can type

python setup.py clean
and it will remove all the extra files.

Keeping version strings in sync

It's so easy to update the __version__ string in your module and forget that you also have to do it in setup.py, or vice versa. Much better to make sure they're always in sync.

I found several version of that using system("grep..."), but I decided to write my own that doesn't depend on system(). (Yes, I should do the same thing with that CleanCommand, I know.)

def get_version():
    '''Read the pytopo module versions from pytopo/__init__.py'''
    with open("pytopo/__init__.py") as fp:
        for line in fp:
            line = line.strip()
            if line.startswith("__version__"):
                parts = line.split("=")
                if len(parts) > 1:
                    return parts[1].strip()

Then in setup():

      version=get_version(),

Much better! Now you only have to update __version__ inside your module and setup.py will automatically use it.

Using your README for a package long description

setup has a long_description for the package, but you probably already have some sort of README in your package. You can use it for your long description this way:

# Utility function to read the README file.
# Used for the long_description.
def read(fname):
    return open(os.path.join(os.path.dirname(__file__), fname)).read()
    long_description=read('README'),

Tags: , ,
[ 10:15 Dec 22, 2016    More programming | permalink to this entry | comments ]

Sat, 17 Dec 2016

Distributing Python Packages Part II: Submitting to PyPI

In Part I, I discussed writing a setup.py to make a package you can submit to PyPI. Today I'll talk about better ways of testing the package, and how to submit it so other people can install it.

Testing in a VirtualEnv

You've verified that your package installs. But you still need to test it and make sure it works in a clean environment, without all your developer settings.

The best way to test is to set up a "virtual environment", where you can install your test packages without messing up your regular runtime environment. I shied away from virtualenvs for a long time, but they're actually very easy to set up:

virtualenv venv
source venv/bin/activate

That creates a directory named venv under the current directory, which it will use to install packages. Then you can pip install packagename or pip install /path/to/packagename-version.tar.gz

Except -- hold on! Nothing in Python packaging is that easy. It turns out there are a lot of packages that won't install inside a virtualenv, and one of them is PyGTK, the library I use for my user interfaces. Attempting to install pygtk inside a venv gets:

********************************************************************
* Building PyGTK using distutils is only supported on windows. *
* To build PyGTK in a supported way, read the INSTALL file.    *
********************************************************************

Windows only? Seriously? PyGTK works fine on both Linux and Mac; it's packaged on every Linux distribution, and on Mac it's packaged with GIMP. But for some reason, whoever maintains the PyPI PyGTK packages hasn't bothered to make it work on anything but Windows, and PyGTK seems to be mostly an orphaned project so that's not likely to change.

(There's a package called ruamel.venvgtk that's supposed to work around this, but it didn't make any difference for me.)

The solution is to let the virtualenv use your system-installed packages, so it can find GTK and other non-PyPI packages there:

virtualenv --system-site-packages venv
source venv/bin/activate

I also found that if I had a ~/.local directory (where packages normally go if I use pip install --user packagename), sometimes pip would install to .local instead of the venv. I never did track down why this happened some times and not others, but when it happened, a temporary mv ~/.local ~/old.local fixed it.

Test your Python package in the venv until everything works. When you're finished with your venv, you can run deactivate and then remove it with rm -rf venv.

Tag it on GitHub

Is your project ready to publish?

If your project is hosted on GitHub, you can have pypi download it automatically. In your setup.py, set

download_url='https://github.com/user/package/tarball/tagname',

Check that in. Then make a tag and push it:

git tag 0.1 -m "Name for this tag"
git push --tags origin master

Try to make your tag match the version you've set in setup.py and in your module.

Push it to pypitest

Register a new account and password on both pypitest and on pypi.

Then create a ~/.pypirc that looks like this:

[distutils]
index-servers =
  pypi
  pypitest

[pypi]
repository=https://pypi.python.org/pypi
username=YOUR_USERNAME
password=YOUR_PASSWORD

[pypitest]
repository=https://testpypi.python.org/pypi
username=YOUR_USERNAME
password=YOUR_PASSWORD

Yes, those passwords are in cleartext. Incredibly, there doesn't seem to be a way to store an encrypted password or even have it prompt you. There are tons of complaints about that all over the web but nobody seems to have a solution. You can specify a password on the command line, but that's not much better. So use a password you don't use anywhere else and don't mind too much if someone guesses.

Update: Apparently there's a newer method called twine that solves the password encryption problem. Read about it here: Uploading your project to PyPI. You should probably use twine instead of the setup.py commands discussed in the next paragraph.

Now register your project and upload it:

python setup.py register -r pypitest
python setup.py sdist upload -r pypitest

Wait a few minutes: it takes pypitest a little while before new packages become available. Then go to your venv (to be safe, maybe delete the old venv and create a new one, or at least pip uninstall) and try installing:

pip install -i https://testpypi.python.org/pypi YourPackageName

If you get "No matching distribution found for packagename", wait a few minutes then try again.

If it all works, then you're ready to submit to the real pypi:

python setup.py register -r pypi
python setup.py sdist upload -r pypi

Congratulations! If you've gone through all these steps, you've uploaded a package to pypi. Pat yourself on the back and go tell everybody they can pip install your package.

Some useful reading

Some pages I found useful:

A great tutorial except that it forgets to mention signing up for an account: Python Packaging with GitHub

Another good tutorial: First time with PyPI

Allowed PyPI classifiers -- the categories your project fits into Unfortunately there aren't very many of those, so you'll probably be stuck with 'Topic :: Utilities' and not much else.

Python Packages and You: not a tutorial, but a lot of good advice on style and designing good packages.

Tags: , ,
[ 16:19 Dec 17, 2016    More programming | permalink to this entry | comments ]

Sun, 11 Dec 2016

Distributing Python Packages Part I: Creating a Python Package

I write lots of Python scripts that I think would be useful to other people, but I've put off learning how to submit to the Python Package Index, PyPI, so that my packages can be installed using pip install.

Now that I've finally done it, I see why I put it off for so long. Unlike programming in Python, packaging is a huge, poorly documented hassle, and it took me days to get a working.package. Maybe some of the hints here will help other struggling Pythonistas.

Create a setup.py

The setup.py file is the file that describes the files in your project and other installation information. If you've never created a setup.py before, Submitting a Python package with GitHub and PyPI has a decent example, and you can find lots more good examples with a web search for "setup.py", so I'll skip the basics and just mention some of the parts that weren't straightforward.

Distutils vs. Setuptools

However, there's one confusing point that no one seems to mention. setup.py examples all rely on a predefined function called setup, but some examples start with

from distutils.core import setup
while others start with
from setuptools import setup

In other words, there are two different versions of setup! What's the difference? I still have no idea. The setuptools version seems to be a bit more advanced, and I found that using distutils.core , sometimes I'd get weird errors when trying to follow suggestions I found on the web. So I ended up using the setuptools version.

But I didn't initially have setuptools installed (it's not part of the standard Python distribution), so I installed it from the Debian package:

apt-get install python-setuptools python-wheel

The python-wheel package isn't strictly needed, but I found I got assorted warnings warnings from pip install later in the process ("Cannot build wheel") unless I installed it, so I recommend you install it from the start.

Including scripts

setup.py has a scripts option to include scripts that are part of your package:

    scripts=['script1', 'script2'],

But when I tried to use it, I had all sorts of problems, starting with scripts not actually being included in the source distribution. There isn't much support for using scripts -- it turns out you're actually supposed to use something called console_scripts, which is more elaborate.

First, you can't have a separate script file, or even a __main__ inside an existing class file. You must have a function, typically called main(), so you'll typically have this:

def main():
    # do your script stuff

if __name__ == "__main__":
    main()

Then add something like this to your setup.py:

      entry_points={
          'console_scripts': [
              script1=yourpackage.filename:main',
              script2=yourpackage.filename2:main'
          ]
      },

There's a secret undocumented alternative that a few people use for scripts with graphical user interfaces: use 'gui_scripts' rather than 'console_scripts'. It seems to work when I try it, but the fact that it's not documented and none of the Python experts even seem to know about it scared me off, and I stuck with 'console_scripts'.

Including data files

One of my packages, pytopo, has a couple of files it needs to install, like an icon image. setup.py has a provision for that:

      data_files=[('/usr/share/pixmaps',      ["resources/appname.png"]),
                  ('/usr/share/applications', ["resources/appname.desktop"]),
                  ('/usr/share/appname',      ["resources/pin.png"]),
                 ],

Great -- except it doesn't work. None of the files actually gets added to the source distribution.

One solution people mention to a "files not getting added" problem is to create an explicit MANIFEST file listing all files that need to be in the distribution. Normally, setup generates the MANIFEST automatically, but apparently it isn't smart enough to notice data_files and include those in its generated MANIFEST.

I tried creating a MANIFEST listing all the .py files plus the various resources -- but it didn't make any difference. My MANIFEST was ignored.

The solution turned out to be creating a MANIFEST.in file, which is used to generate a MANIFEST. It's easier than creating the MANIFEST itself: you don't have to list every file, just patterns that describe them:

include setup.py
include packagename/*.py
include resources/*
If you have any scripts that don't use the extension .py, don't forget to include them as well. This may have been why scripts= didn't work for me earlier, but by the time I found out about MANIFEST.in I had already switched to using console_scripts.

Testing setup.py

Once you have a setup.py, use it to generate a source distribution with:

python setup.py sdist
(You can also use bdist to generate a binary distribution, but you'll probably only need that if you're compiling C as part of your package. Source dists are apparently enough for pure Python packages.)

Your package will end up in dist/packagename-version.tar.gz so you can use tar tf dist/packagename-version.tar.gz to verify what files are in it. Work on your setup.py until you don't get any errors or warnings and the list of files looks right.

Congratulations -- you've made a Python package! I'll post a followup article in a day or two about more ways of testing, and how to submit your working package to PyPI.

Update: Part II is up: Distributing Python Packages Part II: Submitting to PyPI.

Tags: , ,
[ 12:54 Dec 11, 2016    More programming | permalink to this entry | comments ]