Migrate a sqlite3 Flask App to Postgresql
The New Mexico legislature is in session again, which means the New Mexico Bill Tracker I wrote last year is back in season. But I guess the word has gotten out, because this year, I started seeing a few database errors. Specifically, "sqlite3.OperationalError: database is locked".
It turns out that even read queries on an sqlite3 database inside flask and sqlalchemy can sometimes keep the database open indefinitely. Consider something like:
userbills = user.get_bills() # this does a read query # Do some slow operations that don't involve the database at all for bill in userbills: slow_update_involving_web_scraping(bill) # Now bills are all updated; add and commit them. # Here's where the write operations start. for bill in userbills: db.session.add(bill) db.session.commit()
I knew better than to open a write query that might keep the database open during all those long running operations. But apparently, when using sqlite3, even the initial query of the database to get the user's bill list opens the database and keeps it open ... until when? Can you close it manually, then reopen it when you're ready? Does it help to call db.session.commit() after the read query? No one seems to know, and it's not obvious how to test to find out.
I've suspected for a long time that sqlite was only a temporary solution. While developing the billtracker, I hit quite a few difficulties where the answer turned out to be "well, this would be easy in a real database, but sqlite doesn't support that". I figured I'd eventually migrate to postgresql. But I'm such a database newbie that I'd been putting it off.
And rightly so. It turns out that migrating an existing database from sqlite3 to postgresql isn't something that gets written about much; I really couldn't find any guides on it. Apparently everybody but me just chooses the right database to begin with? Anyway, here are the steps on Debian. Obviously, install postgresql first.
Create a User and a Database
Postgresql has its own notion of users, which you need to create. At least on Debian, the default is that if you create a postgres user named martha, then the Linux user martha on the same machine can access databases that the postgres user martha has access to. This is controlled by the "peer" auth method, which you can read about in the postgresql documentation on pg_hba.conf.
First su to the postgres Linux user and run psql:
$ sudo su - postgres $ psql
Inside psql, create a postgresql user with the same name as your flask user, and create a database for that user:
CREATE USER myflaskuser WITH PASSWORD 'password'; ALTER ROLE myflaskuser SET client_encoding TO 'utf8'; ALTER ROLE myflaskuser SET default_transaction_isolation TO 'read committed'; ALTER ROLE myflaskuser SET timezone TO 'UTC'; CREATE DATABASE dbname; GRANT ALL PRIVILEGES ON DATABASE dbname TO myflaskuser;
If you like, you can also create a user yourusername and give it access to the same database, to make debugging easier.
With the database created, the next step is to migrate the old data from the sqlite database.
pgloader (if you have a very recent pgloader)
Using sqlalchemy in my flask app meant that I could use
flask db upgrade
to create the database schema in any
database I chose. It does a lovely job of creating an empty database.
Unfortunately, that's no help
if you already have an existing database full of user accounts.
Some people suggested exporting data in either SQL or CSV format, then importing it into postgresql. Bad idea. There are many incompatibilities between the two databases: identifiers that work in sqlite but not in postgresql (like "user", which is a reserved word in postgres but a common table name in flask-based apps), capitalization of column names, incompatible date formats, and probably many more.
A program called pgloader takes care of many (but not all) of the incompatibilities. Create a file -- I'll call it migrate.pgloader -- like this:
load database from 'latest-sqlite-file.db' into postgresql:///new_db_name with include drop, quote identifiers, create tables, create indexes, reset sequences set work_mem to '16MB', maintenance_work_mem to '512 MB';
Then, from a Linux user who has access to the database (e.g. the
myflaskuser you created earlier),
run pgloader migrate.pgloader
.
That worked nicely on my Ubuntu 19.10 desktop, which has pgloader 3.6.1. It failed utterly on the server, which is running Debian stable and pgloader 3.3.2. Building the latest pgloader from source didn't work on Debian either; it's based on Common Lisp, and the older CL on Debian dumped me into some kind of assembly debugger when I tried to build pgloader. Rather than build CL from source too, I looked for another option.
On an Older OS: Use pgloader Remotely
Postgresql can take commands from remote machines. So you can configure postgresql to accept remote connections, then run the migration from a machine with a new enough pgloader version.
There are two files to edit. The location of postgresql's
configuration directory varies with version, so do a
locate pg_hba.conf
to find it.
In that directory, first edit pg_hba.conf
and add these lines to the end to allow net socket connections
from IP4 and IP6:
host all all 0.0.0.0/0 md5 host all all ::/0 md5
In the same directory, edit postgresql.conf and search for listen_addr. Comment out the localhost line if it's uncommented, and add this to allow connections from anywhere, not just localhost:
listen_addresses = '*'
Then restart the database with
service postgresql restart
Modify the migrate.pgloader file from the previous section so the "into" line looks like
into postgresql://username:password@host/dbnameThe username there is the postgres username, if you made that different from the Unix username. You need to use a password because postgres is no longer using peer auth (see that postgres documentation file I linked earlier).
Assuming this
You're done with the remote connection part. If you don't need remote
database connections for your app, you can now edit
postgresql.conf, comment out that
listen_addresses = '*'
line, and restart the database
again with service postgresql restart
.
Don't remove the two lines you added in pg_hba.conf;
flask apparently needs them.
You're ready for the migration. Make sure you have the latest copy of the server's sqlite database, then, from your desktop, run:
pgloader migrate.pgloader
Migrate Autoincrements to Sequences
But that's not enough. If you're using any integer primary keys that autoincrement -- a pretty common database model -- postgresql doesn't understand that. Instead, it has sequence objects. You need to define a sequence, tie it to a table, and tell postgresql that when it adds a new object to the table, the default value of id is the maximum number in the corresponding sequence. Here's how to do that for the table named "user":
CREATE SEQUENCE user_id_seq OWNED by "user".id; ALTER TABLE "user" ALTER COLUMN id SET default nextval('user_id_seq'); SELECT setval(pg_get_serial_sequence('user', 'id'), coalesce(max(id)+1,1), false) FROM "user";
Note the quotes around "user" because otherwise user is a postgresql reserved word. Repeat these three lines for every table in your database, except that you don't need the quotes around any table name except user.
Incidentally, I've been told that using autoincrement/sequence primary keys isn't best practice, because it can be a bottleneck if lots of different objects are being created at once. I used it because all the models I was following when I started with flask worked that way, but eventually I plan try to switch to using some other unique primary key.
Update: Turns out there was another problem with the sequences, and it was pretty annoying. I ended up with a bunch of indices with names like "idx_15517_ix_user_email" when they should have been "ix_user_email". The database superficially worked fine, but it havoc ensues if you ever need to do a flask/sqlalchemy/alembic migration, since sqlalchemy doesn't know anything about those indices with the funny numeric names. It's apparently possible to rename indices in postgresql, but it's a tricky operation that has to be done by hand for each index.
Now the database should be ready to test.
Test
Your flask app probably has something like this in config.py:
SQLALCHEMY_DATABASE_URI = os.environ.get('DATABASE_URL') or \ 'sqlite:///' + os.path.join(basedir, 'dbname.db')
If so, you can export DATABSE_URL=postgresql:///dbname
and then test it as you usually would. If you normally test on a
local machine and not on the server, remember you can tell flask's
test server to accept connections from remote machines with
flask run --host=0.0.0.0
Database Backups
You're backing up your database, right? That's easier in sqlite where you can just copy the db file.
From the command line, you can back up a postgresql database with:
pg_dump dbname > dbname-backup.pg
You can do that from Python in a subprocess:
with open(backup_file, 'w') as fp: subprocess.call(["pg_dump", dbname], stdout=fp)
Verify You're Using The New Database
I had some problems with that DATABASE_URL setting; I'd never used it so I didn't realize that it wasn't in the right place and didn't actually work. So I ran through my migration steps, changed DATABASE_URL, thought I was done, and realized later that the app was still running off sqlite3.
It's better to know for sure what your app is running. For instance, you can add a route to routes.py that prints details like that.
You can print app.config["SQLALCHEMY_DATABASE_URI"]
.
That's enough in theory, but I wanted to know for sure.
Turns out str(db.session.get_bind())
will print the
connection the flask app's database is actually using. So I added a route
that prints both, plus some other information about the running app.
Whew! I was a bit surprised that migrating was as tricky as it was, and that there wasn't more documentation for it. Happy migrations, everyone.
[ 12:34 Feb 01, 2020 More tech/web | permalink to this entry | ]