Letters not used in Python keywords (Shallow Thoughts)

Akkana's Musings on Open Source Computing and Technology, Science, and Nature.

Tue, 19 Mar 2013

Letters not used in Python keywords

One of the closing lightning talks at PyCon this year concerned the answers to a list of Python programming puzzles given at some other point during the conference. I hadn't seen the questions (I'm still not sure where they are), but some of the problems looked fun.

One of them was: "What are the letters not used in Python keywords?" I hadn't known about Python's keyword module, which could come in handy some day:

>>> import keyword
>>> keyword.kwlist
['and', 'as', 'assert', 'break', 'class', 'continue', 'def', 'del', 'elif', 'else', 'except', 'exec', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'not', 'or', 'pass', 'print', 'raise', 'return', 'try', 'while', 'with', 'yield']

So, given the list of keywords, what's the best way to find the list of unique letters?

Any time you want a list of unique anything, you want a set. For instance,

>>> set([1, 2, 3, 2, 2, 4, 5, 1, 5])
set([1, 2, 3, 4, 5])
But first you need a list of letters so can make a set out of it.

Split the list of words into a list of letters

My first idea was to use list comprehensions. You can split a single word into letters like this:

>>> [ x for x in 'hello' ]
['h', 'e', 'l', 'l', 'o']

It took a bit of fiddling to get the right syntax to apply that to every word in the list:

>>> [[c for c in w] for w in keyword.kwlist]
[['a', 'n', 'd'], ['a', 's'], ['a', 's', 's', 'e', 'r', 't'], ... ]

Update: Dave Foster points out that [list(w) for w in keyword.kwlist] is another way, simpler and cleaner way than the double list comprehension.

That's a list of lists, so it needs to be flattened into a single list of letters before we can turn it into a set.

Flatten the list of lists

There are lots of ways to flatten a list of lists. Here are four of them:

[item for sublist in [[c for c in w] for w in keyword.kwlist] for item in sublist]

reduce(lambda x,y: x+y, [[c for c in w] for w in keyword.kwlist])

import itertools
list(itertools.chain.from_iterable([[c for c in w] for w in keyword.kwlist]))

sum([[c for c in w] for w in keyword.kwlist], [])

That last one, using sum(), makes use of the fact that Python uses + for list concatenation -- in other words, that [1, 2, 3] + [4, 5, 6] is [1, 2, 3, 4, 5, 6]. But the first method (item for sublist in) is faster: see Making a flat list out of list of lists in Python on StackOverflow. And another StackOverflow thread has a nice script for plotting speed vs. list size of various flatteners.

A simpler way of making the set

But it turns out none of this list comprehension stuff is needed anyway. set('word') splits words into letters already:

>>> set('bubble')
set(['e', 'b', 'u', 'l'])
Ignore the order -- elements of a set often end up displaying in some strange order. The important thing is that it has all the letters and no repeats.

Now we have an easy way of making a set containing the letters in one word. But how do we apply that to a list of words?

Again I initially tried using list comprehensions, then realized there's an easier way. Given a list of strings, it's trivial to join them into a single string using ''.join(). And that gives us our set of letters within keywords:

>>> set(''.join(keyword.kwlist))
set(['a', 'c', 'b', 'e', 'd', 'g', 'f', 'i', 'h', 'k', 'm', 'l', 'o', 'n', 'p', 's', 'r', 'u', 't', 'w', 'y', 'x'])

What letters are not in the set?

Almost done! But the original problem was to find the letters not in keywords. We can do that by subtracting this set from the set of all letters from a to z. How do we get that? The string module will give us a list:

>>> string.lowercase
'abcdefghijklmnopqrstuvwxyz'

You could also use a list comprehension and ord and chr (alas, range won't give you a range of letters directly):

>>> [chr(i) for i in range(ord('a'), ord('z')+1)]
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
It's a bit longer, but doesn't require an import.

Now that you have your a-z set, just subtract the two sets:

>>> set(string.lowercase[:]) - set(''.join(keyword.kwlist))
set(['q', 'j', 'z', 'v'])

So the only letters not used in Python keywords are q, j, z and v.

Just a useless little ditty, really ... but I thought it was a fun exercise, so maybe you will too.

Tags: ,
[ 13:36 Mar 19, 2013    More programming | permalink to this entry | ]

Comments via Disqus:

blog comments powered by Disqus