Shallow Thoughts : tags : database

Akkana's Musings on Open Source Computing and Technology, Science, and Nature.

Sat, 01 Feb 2020

Migrate a sqlite3 Flask App to Postgresql

The New Mexico legislature is in session again, which means the New Mexico Bill Tracker I wrote last year is back in season. But I guess the word has gotten out, because this year, I started seeing a few database errors. Specifically, "sqlite3.OperationalError: database is locked".

It turns out that even read queries on an sqlite3 database inside flask and sqlalchemy can sometimes keep the database open indefinitely. Consider something like:

    userbills = user.get_bills()    # this does a read query

    # Do some slow operations that don't involve the database at all
    for bill in userbills:
        slow_update_involving_web_scraping(bill)

    # Now bills are all updated; add and commit them.
    # Here's where the write operations start.
    for bill in userbills:
        db.session.add(bill)
    db.session.commit()

I knew better than to open a write query that might keep the database open during all those long running operations. But apparently, when using sqlite3, even the initial query of the database to get the user's bill list opens the database and keeps it open ... until when? Can you close it manually, then reopen it when you're ready? Does it help to call db.session.commit() after the read query? No one seems to know, and it's not obvious how to test to find out.

I've suspected for a long time that sqlite was only a temporary solution. While developing the billtracker, I hit quite a few difficulties where the answer turned out to be "well, this would be easy in a real database, but sqlite doesn't support that". I figured I'd eventually migrate to postgresql. But I'm such a database newbie that I'd been putting it off.

And rightly so. It turns out that migrating an existing database from sqlite3 to postgresql isn't something that gets written about much; I really couldn't find any guides on it. Apparently everybody but me just chooses the right database to begin with? Anyway, here are the steps on Debian. Obviously, install postgresql first.

Create a User and a Database

Postgresql has its own notion of users, which you need to create. At least on Debian, the default is that if you create a postgres user named martha, then the Linux user martha on the same machine can access databases that the postgres user martha has access to. This is controlled by the "peer" auth method, which you can read about in the postgresql documentation on pg_hba.conf.

First su to the postgres Linux user and run psql:

$ sudo su - postgres
$ psql

Inside psql, create a postgresql user with the same name as your flask user, and create a database for that user:

CREATE USER myflaskuser WITH PASSWORD 'password';
ALTER ROLE myflaskuser SET client_encoding TO 'utf8';
ALTER ROLE myflaskuser SET default_transaction_isolation TO 'read committed';
ALTER ROLE myflaskuser SET timezone TO 'UTC';

CREATE DATABASE dbname;
GRANT ALL PRIVILEGES ON DATABASE dbname TO myflaskuser;

If you like, you can also create a user yourusername and give it access to the same database, to make debugging easier.

With the database created, the next step is to migrate the old data from the sqlite database.

pgloader (if you have a very recent pgloader)

Using sqlalchemy in my flask app meant that I could use flask db upgrade to create the database schema in any database I chose. It does a lovely job of creating an empty database. Unfortunately, that's no help if you already have an existing database full of user accounts.

Some people suggested exporting data in either SQL or CSV format, then importing it into postgresql. Bad idea. There are many incompatibilities between the two databases: identifiers that work in sqlite but not in postgresql (like "user", which is a reserved word in postgres but a common table name in flask-based apps), capitalization of column names, incompatible date formats, and probably many more.

A program called pgloader takes care of many (but not all) of the incompatibilities. Create a file -- I'll call it migrate.pgloader -- like this:

load database
    from 'latest-sqlite-file.db'
    into postgresql:///new_db_name

with include drop, quote identifiers, create tables, create indexes, reset sequences

set work_mem to '16MB', maintenance_work_mem to '512 MB';

Then, from a Linux user who has access to the database (e.g. the myflaskuser you created earlier), run pgloader migrate.pgloader.

That worked nicely on my Ubuntu 19.10 desktop, which has pgloader 3.6.1. It failed utterly on the server, which is running Debian stable and pgloader 3.3.2. Building the latest pgloader from source didn't work on Debian either; it's based on Common Lisp, and the older CL on Debian dumped me into some kind of assembly debugger when I tried to build pgloader. Rather than build CL from source too, I looked for another option.

On an Older OS: Use pgloader Remotely

Postgresql can take commands from remote machines. So you can configure postgresql to accept remote connections, then run the migration from a machine with a new enough pgloader version.

There are two files to edit. The location of postgresql's configuration directory varies with version, so do a locate pg_hba.conf to find it. In that directory, first edit pg_hba.conf and add these lines to the end to allow net socket connections from IP4 and IP6:

host  all     all   0.0.0.0/0     md5
host  all     all   ::/0          md5

In the same directory, edit postgresql.conf and search for listen_addr. Comment out the localhost line if it's uncommented, and add this to allow connections from anywhere, not just localhost:

listen_addresses = '*'

Then restart the database with

service postgresql restart

Modify the migrate.pgloader file from the previous section so the "into" line looks like

    into postgresql://username:password@host/dbname
The username there is the postgres username, if you made that different from the Unix username. You need to use a password because postgres is no longer using peer auth (see that postgres documentation file I linked earlier).

Assuming this You're done with the remote connection part. If you don't need remote database connections for your app, you can now edit postgresql.conf, comment out that listen_addresses = '*' line, and restart the database again with service postgresql restart. Don't remove the two lines you added in pg_hba.conf; flask apparently needs them.

You're ready for the migration. Make sure you have the latest copy of the server's sqlite database, then, from your desktop, run:

pgloader migrate.pgloader

Migrate Autoincrements to Sequences

But that's not enough. If you're using any integer primary keys that autoincrement -- a pretty common database model -- postgresql doesn't understand that. Instead, it has sequence objects. You need to define a sequence, tie it to a table, and tell postgresql that when it adds a new object to the table, the default value of id is the maximum number in the corresponding sequence. Here's how to do that for the table named "user":

CREATE SEQUENCE user_id_seq OWNED by "user".id;
ALTER TABLE "user" ALTER COLUMN id SET default nextval('user_id_seq');
SELECT setval(pg_get_serial_sequence('user', 'id'), coalesce(max(id)+1,1), false) FROM "user";

Note the quotes around "user" because otherwise user is a postgresql reserved word. Repeat these three lines for every table in your database, except that you don't need the quotes around any table name except user.

Incidentally, I've been told that using autoincrement/sequence primary keys isn't best practice, because it can be a bottleneck if lots of different objects are being created at once. I used it because all the models I was following when I started with flask worked that way, but eventually I plan try to switch to using some other unique primary key.

Update: Turns out there was another problem with the sequences, and it was pretty annoying. I ended up with a bunch of indices with names like "idx_15517_ix_user_email" when they should have been "ix_user_email". The database superficially worked fine, but it havoc ensues if you ever need to do a flask/sqlalchemy/alembic migration, since sqlalchemy doesn't know anything about those indices with the funny numeric names. It's apparently possible to rename indices in postgresql, but it's a tricky operation that has to be done by hand for each index.

Now the database should be ready to test.

Test

Your flask app probably has something like this in config.py:

    SQLALCHEMY_DATABASE_URI = os.environ.get('DATABASE_URL') or \
        'sqlite:///' + os.path.join(basedir, 'dbname.db')

If so, you can export DATABSE_URL=postgresql:///dbname and then test it as you usually would. If you normally test on a local machine and not on the server, remember you can tell flask's test server to accept connections from remote machines with flask run --host=0.0.0.0

Database Backups

You're backing up your database, right? That's easier in sqlite where you can just copy the db file.

From the command line, you can back up a postgresql database with: pg_dump dbname > dbname-backup.pg You can do that from Python in a subprocess:

    with open(backup_file, 'w') as fp:
        subprocess.call(["pg_dump", dbname], stdout=fp)

Verify You're Using The New Database

I had some problems with that DATABASE_URL setting; I'd never used it so I didn't realize that it wasn't in the right place and didn't actually work. So I ran through my migration steps, changed DATABASE_URL, thought I was done, and realized later that the app was still running off sqlite3.

It's better to know for sure what your app is running. For instance, you can add a route to routes.py that prints details like that.

You can print app.config["SQLALCHEMY_DATABASE_URI"]. That's enough in theory, but I wanted to know for sure. Turns out str(db.session.get_bind()) will print the connection the flask app's database is actually using. So I added a route that prints both, plus some other information about the running app.

Whew! I was a bit surprised that migrating was as tricky as it was, and that there wasn't more documentation for it. Happy migrations, everyone.

Tags: , , , , ,
[ 12:34 Feb 01, 2020    More tech/web | permalink to this entry | comments ]

Tue, 15 Sep 2015

Hacking / Customizing a Kobo Touch ebook reader: Part II, Python

I wrote last week about tweaking a Kobo e-reader's sqlite database by hand.

But who wants to remember all the table names and type out those queries? I sure don't. So I wrote a Python wrapper that makes it much easier to interact with the Kobo databases.

Happily, Python already has a module called sqlite3. So all I had to do was come up with an API that included the calls I typically wanted -- list all the books, list all the shelves, figure out which books are on which shelves, and so forth.

The result was kobo_utils.py, which includes a main function that can list books, shelves, or shelf contents.

You can initialize kobo_utils like this:

import kobo_utils

koboDB = KoboDB("/path/where/your/kobo/is/mounted")
koboDB.connect("/path/to/KoboReader.sqlite")
connect() throws an exception if it can't find the .sqlite file.

Then you can list books thusly:

koboDB.list_books()
or list shelf names:
koboDB.list_shelves()
or use print_shelf which books are on which shelves:
shelves = koboDB.get_dlist("Shelf", selectors=[ "Name" ])
for shelf in shelves:
    print shelf["Name"]

What I really wanted, though, was a way to organize my library, taking the tags in each of my epub books and assigning them to an appropriate shelf on the Kobo, creating new shelves as needed. Using kobo_utils.py plus the Python epub library I'd already written, that ended up being quite straightforward: shelves_by_tag.

Tags: , , , , ,
[ 20:38 Sep 15, 2015    More tech | permalink to this entry | comments ]

Thu, 03 Sep 2015

Hacking / Customizing a Kobo Touch ebook reader: Part I, sqlite

I've been enjoying reading my new Kobo Touch quite a lot. The screen is crisp, clear and quite a bit whiter than my old Nook; the form factor is great, it's reasonably responsive (though there are a few places on the screen where I have to tap harder than other places to get it to turn the page), and I'm happy with the choice of fonts.

But as I mentioned in my previous Kobo article, there were a few tweaks I wanted to make; and I was very happy with how easy it was to tweak, compared to the Nook. Here's how.

Mount the Kobo

When you plug the Kobo in to USB, it automatically shows up as a USB-Storage device once you tap "Connect" on the Kobo -- or as two storage devices, if you have an SD card inserted.

Like the Nook, the Kobo's storage devices show up without partitions. For instance, on Linux, they might be /dev/sdb and /dev/sdc, rather than /dev/sdb1 and /dev/sdc1. That means they also don't present UUIDs until after they're already mounted, so it's hard to make an entry for them in /etc/fstab if you're the sort of dinosaur (like I am) who prefers that to automounters.

Instead, you can use the entry in /dev/disk/by-id. So fstab entries, if you're inclined to make them, might look like:

/dev/disk/by-id/usb-Kobo_eReader-3.16.0_N905K138254971:0 /kobo   vfat user,noauto,exec,fmask=133,shortname=lower 0 0
/dev/disk/by-id/usb-Kobo_eReader-3.16.0_N905K138254971:1 /kobosd vfat user,noauto,exec,fmask=133,shortname=lower 0 0

One other complication, for me, was that the Kobo is one of a few devices that don't work through my USB2 powered hub. Initially I thought the Kobo wasn't working, until I tried a cable plugged directly into my computer. I have no idea what controls which devices work through the hub and which ones don't. (The Kobo also doesn't give any indication when it's plugged in to a wall charger, nor does

The sqlite database

Once the Kobo is mouted, ls -a will show a directory named .kobo. That's where all the good stuff is: in particular, KoboReader.sqlite, the device's database, and Kobo/Kobo eReader.conf, a human-readable configuration file.

Browse through Kobo/Kobo eReader.conf for your own amusement, but the remainder of this article will be about KoboReader.sqlite.

I hadn't used sqlite before, and I'm certainly no SQL expert. But a little web searching and experimentation taught me what I needed to know.

First, make a local copy of KoboReader.sqlite, so you don't risk overwriting something important during your experimentation. The Kobo is apparently good at regenerating data it needs, but you might lose information on books you're reading.

To explore the database manually, run: sqlite3 KoboReader.sqlite

Some useful queries

Here are some useful sqlite commands, which you can generalize to whatever you want to search for on your own Kobo. Every query (not .tables) must end with a semicolon.

Show all tables in the database:

.tables
The most important ones, at least to me, are content (all your books), Shelf (a list of your shelves/collections), and ShelfContent (the table that assigns books to shelves).

Show all column names in a table:

PRAGMA table_info(content);
There are a lot of columns in content, so try PRAGMA table_info(content); to see a much simpler table.

Show the names of all your shelves/collections:

SELECT Name FROM Shelf;

Show everything in a table:

SELECT * FROM Shelf;

Show all books assigned to shelves, and which shelves they're on:

SELECT ShelfName,ContentId FROM ShelfContent;
ContentId can be a URL to a sideloaded book, like file:///mnt/sd/TheWitchesOfKarres.epub, or a UUID like de98dbf6-e798-4de2-91fc-4be2723d952f for books from the Kobo store.

Show all books you have installed:

SELECT Title,Attribution,ContentID FROM content WHERE BookTitle is null ORDER BY Title;
One peculiarity of Kobo's database: each book has lots of entries, apparently one for each chapter. The entries for chapters have the chapter name as Title, and the book title as BookTitle. The entry for the book as a whole has BookTitle empty, and the book title as Title. For example, I have file:///mnt/sd/earnest.epub sideloaded:
sqlite> SELECT Title,BookTitle from content WHERE ContentID LIKE "%hamlet%";
HAMLET, PRINCE OF DENMARK|Hamlet
PERSONS REPRESENTED.|Hamlet
ACT I.|Hamlet
Scene II. Elsinore. A room of state in the Castle.|Hamlet
Scene III. A room in Polonius's house.|Hamlet
Scene IV. The platform.|Hamlet
Scene V. A more remote part of the Castle.|Hamlet
Act II.|Hamlet
  [ ... and so on ... ]
ACT V.|Hamlet
Scene II. A hall in the Castle.|Hamlet
Hamlet|
Each of these entries has Title set to the name of the chapter (an act in the play) and BookTitle set to Hamlet, except for the final entry, which has Title set to Hamlet and BookTitle set to nothing. That's why you need that query WHERE BookTitle is null if you just want a list of your books.

Show all books by an author:

SELECT Title,Attribution,ContentID FROM content WHERE BookTitle is null
AND Attribution LIKE "%twain%" ORDER BY Title;
Attribution is where the author's name goes. LIKE %% searches are case insensitive.

Update: how to change a field

I realized I didn't include how to change a field, and that seems to fit better with this article rather than writing a whole new blog post just for that.

The Kobo doesn't show series order. So to find a specific book, and then update its title to include the series and series number, do something like this:

SELECT Title,Attribution,ContentID,BookTitle FROM content WHERE Attribution LIKE "%Doyle%" AND Title LIKE "%Adventures%";
UPDATE content SET Title="Sherlock Stories 1: The Adventures of Sherlock Holmes" WHERE Attribution LIKE "%Doyle%" AND Title LIKE "%Adventures%";
    

To delete an entry -- in this case I had two copies of the same book and needed to specify the ContentID of the older one:

DELETE from content WHERE Attribution LIKE "%Doyle%" AND ContentID="file:///mnt/sd/memoirs-holmes.epub";
    

Of course, it's a lot handier to have a program that knows these queries so you don't have to type them in every time (especially since the sqlite3 app has no history or proper command-line editing). But this has gotten long enough, so I'll write about that separately.

Tags: , , , , ,
[ 19:11 Sep 03, 2015    More tech | permalink to this entry | comments ]

Sun, 25 Nov 2012

Keeping track of the deer

[DEER DB] I saw this sign on a fence around a debris dam.

It's good to see that LA County cares so much about keeping track of the local deer population that it maintains a special database for them.

Tags: ,
[ 11:20 Nov 25, 2012    More humor | permalink to this entry | comments ]

Fri, 15 Oct 2010

Snakes on a Couch! Using Python with CouchDB

Part II of my CouchDB tutorial is out at Linux Planet. In it, I use Python and CouchDB to write a simple application that keeps track of which restaurants you've been to recently, and to suggest new places to eat where you haven't been.

Snakes on a Couch, Part 2: Where do you want to eat?

Tags: , , , ,
[ 21:00 Oct 15, 2010    More writing | permalink to this entry | comments ]

Thu, 23 Sep 2010

Snakes on a Couch! Using Python with CouchDB

I've been learning CouchDB, the hot NoSQL database, as part of my new job. It's interesting -- a very different mindset compared to classic databases like MySQL.

There's a fairly good Python package for it, python-couchdb ... but the documentation is somewhat incomplete and there's very little else written about it, and virtually no sample code to steal.

That makes it a perfect topic for a Linux Planet tutorial! So here it is, Part 1:

Snakes on a Couch! Using Python with CouchDB.

I have a rather fun application for the database I introduce in the article, but you'll have to wait until Part 2, two weeks from now, to see the details.

Tags: , , , ,
[ 11:55 Sep 23, 2010    More writing | permalink to this entry | comments ]