The New Mexico legislature is in session again, which means the
New Mexico Bill Tracker
I wrote last year is back in season. But I guess the word has gotten
out, because this year, I started seeing a few database errors.
Specifically, "sqlite3.OperationalError: database is locked".
It turns out that even read queries on an sqlite3 database inside
flask and sqlalchemy can sometimes keep the database open
indefinitely. Consider something like:
userbills = user.get_bills() # this does a read query
# Do some slow operations that don't involve the database at all
for bill in userbills:
slow_update_involving_web_scraping(bill)
# Now bills are all updated; add and commit them.
# Here's where the write operations start.
for bill in userbills:
db.session.add(bill)
db.session.commit()
I knew better than to open a write query that might keep the database
open during all those long running operations. But apparently, when
using sqlite3, even the initial query of the database to get the
user's bill list opens the database and keeps it open ... until
when? Can you close it manually, then reopen it when you're ready?
Does it help to call db.session.commit()
after the read query? No one seems to know, and it's not obvious
how to test to find out.
I've suspected for a long time that sqlite was only a temporary
solution. While developing the billtracker, I hit quite a few
difficulties where the answer turned out to be "well, this would be
easy in a real database, but sqlite doesn't support that". I figured
I'd eventually migrate to postgresql. But I'm such a database newbie
that I'd been putting it off.
And rightly so. It turns out that migrating an existing database
from sqlite3 to postgresql isn't something that gets written
about much; I really couldn't find any guides on it.
Apparently everybody but me just chooses the right
database to begin with? Anyway, here are the steps on Debian.
Obviously, install postgresql first.
Create a User and a Database
Postgresql has its own notion of users, which you need to create.
At least on Debian, the default is that if you create a postgres
user named martha, then the Linux user martha on the same machine
can access databases that the postgres user martha has access to.
This is controlled by the "peer" auth method, which you can read about in
the
postgresql documentation on pg_hba.conf.
First su to the postgres Linux user and run psql:
$ sudo su - postgres
$ psql
Inside psql, create a postgresql user with the same name as your
flask user, and create a database for that user:
CREATE USER myflaskuser WITH PASSWORD 'password';
ALTER ROLE myflaskuser SET client_encoding TO 'utf8';
ALTER ROLE myflaskuser SET default_transaction_isolation TO 'read committed';
ALTER ROLE myflaskuser SET timezone TO 'UTC';
CREATE DATABASE dbname;
GRANT ALL PRIVILEGES ON DATABASE dbname TO myflaskuser;
If you like, you can also create a user yourusername and give
it access to the same database, to make debugging easier.
With the database created, the next step is to migrate the old
data from the sqlite database.
pgloader (if you have a very recent pgloader)
Using sqlalchemy in my flask app meant that I could use
flask db upgrade
to create the database schema in any
database I chose. It does a lovely job of creating an empty database.
Unfortunately, that's no help
if you already have an existing database full of user accounts.
Some people suggested exporting data in either SQL or CSV format,
then importing it into postgresql. Bad idea. There are many
incompatibilities between the two databases: identifiers that
work in sqlite but not in postgresql (like "user", which is a reserved
word in postgres but a common table name in flask-based apps),
capitalization of column names, incompatible date formats, and
probably many more.
A program called pgloader takes care of many (but not all)
of the incompatibilities.
Create a file -- I'll call it migrate.pgloader --
like this:
load database
from 'latest-sqlite-file.db'
into postgresql:///new_db_name
with include drop, quote identifiers, create tables, create indexes, reset sequences
set work_mem to '16MB', maintenance_work_mem to '512 MB';
Then, from a Linux user who has access to the database (e.g. the
myflaskuser you created earlier),
run pgloader migrate.pgloader
.
That worked nicely on my Ubuntu 19.10 desktop, which has pgloader
3.6.1. It failed utterly on the server, which is running Debian stable
and pgloader 3.3.2.
Building the latest pgloader from source didn't work on Debian either;
it's based on Common Lisp, and the older CL on Debian dumped me into
some kind of assembly debugger when I tried to build pgloader. Rather
than build CL from source too, I looked for another option.
On an Older OS: Use pgloader Remotely
Postgresql can take commands from remote machines. So you can
configure
postgresql to accept remote connections, then run the migration
from a machine with a new enough pgloader version.
There are two files to edit. The location of postgresql's
configuration directory varies with version, so do a
locate pg_hba.conf
to find it.
In that directory, first edit pg_hba.conf
and add these lines to the end to allow net socket connections
from IP4 and IP6:
host all all 0.0.0.0/0 md5
host all all ::/0 md5
In the same directory, edit postgresql.conf and
search for listen_addr.
Comment out the localhost line if it's uncommented, and add this to allow
connections from anywhere, not just localhost:
listen_addresses = '*'
Then restart the database with
service postgresql restart
Modify the migrate.pgloader file from the previous section
so the "into" line looks like
into postgresql://username:password@host/dbname
The
username there is the postgres username, if you made that
different from the Unix username. You need to use a password because
postgres is no longer using
peer auth (see that postgres
documentation file I linked earlier).
Assuming this
You're done with the remote connection part. If you don't need remote
database connections for your app, you can now edit
postgresql.conf, comment out that
listen_addresses = '*'
line, and restart the database
again with service postgresql restart
.
Don't remove the two lines you added in pg_hba.conf;
flask apparently needs them.
You're ready for the migration.
Make sure you have the latest copy of the server's sqlite database,
then, from your desktop, run:
pgloader migrate.pgloader
Migrate Autoincrements to Sequences
But that's not enough. If you're using any integer primary keys that
autoincrement -- a pretty common database model -- postgresql doesn't
understand that. Instead, it has sequence objects.
You need to define a sequence, tie it to a table, and tell postgresql
that when it adds a new object to the table,
the default value of id is the maximum number in the
corresponding sequence. Here's how to do that for the table named "user":
CREATE SEQUENCE user_id_seq OWNED by "user".id;
ALTER TABLE "user" ALTER COLUMN id SET default nextval('user_id_seq');
SELECT setval(pg_get_serial_sequence('user', 'id'), coalesce(max(id)+1,1), false) FROM "user";
Note the quotes around "user" because otherwise user is a postgresql
reserved word. Repeat these three lines for every table in your database,
except that you don't need the quotes around any table name except user.
Incidentally, I've been told that using autoincrement/sequence
primary keys isn't best practice, because it can be a bottleneck
if lots of different objects are being created at once. I used it
because all the models I was following when I started with flask
worked that way, but eventually I plan try to switch to using some
other unique primary key.
Update:
Turns out there was another problem with the sequences, and it was
pretty annoying. I ended up with a bunch of indices with names like
"idx_15517_ix_user_email" when they should have been "ix_user_email".
The database superficially worked fine, but it havoc ensues if you ever
need to do a flask/sqlalchemy/alembic migration, since sqlalchemy
doesn't know anything about those indices with the funny numeric names.
It's apparently possible to rename indices in postgresql, but it's a
tricky operation that has to be done by hand for each index.
Now the database should be ready to test.
Test
Your flask app probably has something like this in config.py:
SQLALCHEMY_DATABASE_URI = os.environ.get('DATABASE_URL') or \
'sqlite:///' + os.path.join(basedir, 'dbname.db')
If so, you can export DATABSE_URL=postgresql:///dbname
and then test it as you usually would. If you normally test on a
local machine and not on the server, remember you can tell flask's
test server to accept connections from remote machines with
flask run --host=0.0.0.0
Database Backups
You're backing up your database, right?
That's easier in sqlite where you can just copy the db file.
From the command line, you can back up a postgresql database with:
pg_dump dbname > dbname-backup.pg
You can do that from Python in a subprocess:
with open(backup_file, 'w') as fp:
subprocess.call(["pg_dump", dbname], stdout=fp)
Verify You're Using The New Database
I had some problems with that DATABASE_URL setting; I'd never
used it so I didn't realize that it wasn't in the right place and
didn't actually work. So I ran through my migration steps, changed
DATABASE_URL, thought I was done, and realized later
that the app was still running off sqlite3.
It's better to know for sure what your app is running.
For instance, you can add a route to routes.py that prints
details like that.
You can print app.config["SQLALCHEMY_DATABASE_URI"]
.
That's enough in theory, but I wanted to know for sure.
Turns out str(db.session.get_bind())
will print the
connection the flask app's database is actually using. So I added a route
that prints both, plus some other information about the running app.
Whew! I was a bit surprised that migrating was as tricky as it was,
and that there wasn't more documentation for it. Happy migrations, everyone.
Tags: programming, flask, python, database, sqlite, postgresql
[
12:34 Feb 01, 2020
More tech/web |
permalink to this entry |
]
I wrote last week about
tweaking a
Kobo e-reader's sqlite database by hand.
But who wants to remember all the table names and type out those queries?
I sure don't. So I wrote a Python wrapper that makes it much easier to
interact with the Kobo databases.
Happily, Python already has a module called sqlite3.
So all I had to do was come up with an API that included the calls
I typically wanted -- list all the books, list all the shelves,
figure out which books are on which shelves, and so forth.
The result was
kobo_utils.py,
which includes a main function that can list books, shelves, or shelf contents.
You can initialize kobo_utils like this:
import kobo_utils
koboDB = KoboDB("/path/where/your/kobo/is/mounted")
koboDB.connect("/path/to/KoboReader.sqlite")
connect()
throws an exception if it can't find the
.sqlite file.
Then you can list books thusly:
koboDB.list_books()
or list shelf names:
koboDB.list_shelves()
or use
print_shelf
which books are on which shelves:
shelves = koboDB.get_dlist("Shelf", selectors=[ "Name" ])
for shelf in shelves:
print shelf["Name"]
What I really wanted, though, was a way to organize my library,
taking the tags in each of my epub books and assigning them to
an appropriate shelf on the Kobo, creating new shelves as needed.
Using kobo_utils.py plus the
Python
epub library I'd already written, that ended up being quite
straightforward:
shelves_by_tag.
Tags: ebook, kobo, epub, database, sql, sqlite
[
20:38 Sep 15, 2015
More tech |
permalink to this entry |
]
I've been enjoying reading my new Kobo Touch quite a lot. The screen
is crisp, clear and quite a bit whiter than my old Nook;
the form factor is great, it's reasonably responsive (though there
are a few places on the screen where I have to tap harder than other
places to get it to turn the page), and I'm happy with the choice of fonts.
But as I mentioned in my
previous Kobo article,
there were a few tweaks I wanted to make; and I was very happy with how
easy it was to tweak, compared to the Nook. Here's how.
Mount the Kobo
When you plug the Kobo in to USB, it automatically shows up as a
USB-Storage device once you tap "Connect" on the Kobo -- or as two
storage devices, if you have an SD card inserted.
Like the Nook, the Kobo's storage devices show up without partitions.
For instance, on Linux, they might be /dev/sdb and /dev/sdc, rather
than /dev/sdb1 and /dev/sdc1. That means they also don't present UUIDs
until after they're already mounted, so it's hard to make an entry for
them in /etc/fstab if you're the sort of dinosaur (like I am) who prefers
that to automounters.
Instead, you can use the entry in /dev/disk/by-id.
So fstab entries, if you're inclined to make them, might look like:
/dev/disk/by-id/usb-Kobo_eReader-3.16.0_N905K138254971:0 /kobo vfat user,noauto,exec,fmask=133,shortname=lower 0 0
/dev/disk/by-id/usb-Kobo_eReader-3.16.0_N905K138254971:1 /kobosd vfat user,noauto,exec,fmask=133,shortname=lower 0 0
One other complication, for me, was that the Kobo is one of a few
devices that don't work through my USB2 powered hub. Initially I
thought the Kobo wasn't working, until I tried a cable plugged
directly into my computer. I have no idea what controls which devices
work through the hub and which ones don't.
(The Kobo also doesn't give any indication when it's plugged in to a
wall charger, nor does
The sqlite database
Once the Kobo is mouted, ls -a
will show a directory
named .kobo. That's where all the good stuff is:
in particular, KoboReader.sqlite, the device's database,
and Kobo/Kobo eReader.conf, a human-readable configuration file.
Browse through Kobo/Kobo eReader.conf for your own amusement,
but the remainder of this article will be about KoboReader.sqlite.
I hadn't used sqlite before, and I'm certainly no SQL expert. But a
little web searching and experimentation taught me what I needed to know.
First, make a local copy of KoboReader.sqlite, so you don't risk
overwriting something important during your experimentation.
The Kobo is apparently good at regenerating data it needs, but
you might lose information on books you're reading.
To explore the database manually, run:
sqlite3 KoboReader.sqlite
Some useful queries
Here are some useful sqlite commands, which you can generalize to
whatever you want to search for on your own Kobo. Every query (not .tables)
must end with a semicolon.
Show all tables in the database:
.tables
The most important ones, at least to me, are content (all your books),
Shelf (a list of your shelves/collections), and ShelfContent
(the table that assigns books to shelves).
Show all column names in a table:
PRAGMA table_info(content);
There are a lot of columns in
content, so try
PRAGMA
table_info(content);
to see a much simpler table.
Show the names of all your shelves/collections:
SELECT Name FROM Shelf;
Show everything in a table:
SELECT * FROM Shelf;
Show all books assigned to shelves, and which shelves they're on:
SELECT ShelfName,ContentId FROM ShelfContent;
ContentId can be a URL to a sideloaded book, like
file:///mnt/sd/TheWitchesOfKarres.epub, or a UUID like
de98dbf6-e798-4de2-91fc-4be2723d952f for books from the Kobo store.
Show all books you have installed:
SELECT Title,Attribution,ContentID FROM content WHERE BookTitle is null ORDER BY Title;
One peculiarity of Kobo's database: each book has lots of entries,
apparently one for each chapter. The entries for chapters have the
chapter name as Title, and the book title as BookTitle. The entry
for the book as a whole has BookTitle empty, and the book title as Title.
For example, I have
file:///mnt/sd/earnest.epub sideloaded:
sqlite> SELECT Title,BookTitle from content WHERE ContentID LIKE "%hamlet%";
HAMLET, PRINCE OF DENMARK|Hamlet
PERSONS REPRESENTED.|Hamlet
ACT I.|Hamlet
Scene II. Elsinore. A room of state in the Castle.|Hamlet
Scene III. A room in Polonius's house.|Hamlet
Scene IV. The platform.|Hamlet
Scene V. A more remote part of the Castle.|Hamlet
Act II.|Hamlet
[ ... and so on ... ]
ACT V.|Hamlet
Scene II. A hall in the Castle.|Hamlet
Hamlet|
Each of these entries has Title set to the name of the chapter (an act
in the play) and BookTitle set to
Hamlet, except for the final
entry, which has Title set to
Hamlet and BookTitle set to nothing.
That's why you need that query
WHERE BookTitle is null if you
just want a list of your books.
Show all books by an author:
SELECT Title,Attribution,ContentID FROM content WHERE BookTitle is null
AND Attribution LIKE "%twain%" ORDER BY Title;
Attribution is where the author's name goes. LIKE %% searches
are case insensitive.
Update: how to change a field
I realized I didn't include how to change a field, and
that seems to fit better with this article rather than writing
a whole new blog post just for that.
The Kobo doesn't show series order. So to find a specific book, and
then update its title to include
the series and series number, do something like this:
SELECT Title,Attribution,ContentID,BookTitle FROM content WHERE Attribution LIKE "%Doyle%" AND Title LIKE "%Adventures%";
UPDATE content SET Title="Sherlock Stories 1: The Adventures of Sherlock Holmes" WHERE Attribution LIKE "%Doyle%" AND Title LIKE "%Adventures%";
To delete an entry -- in this case I had two copies of the same book
and needed to specify the ContentID of the older one:
DELETE from content WHERE Attribution LIKE "%Doyle%" AND ContentID="file:///mnt/sd/memoirs-holmes.epub";
Of course, it's a lot handier to have a program that knows these queries
so you don't have to type them in every time (especially since the sqlite3
app has no history or proper command-line editing).
But this has gotten long enough, so I'll write about that separately.
Tags: ebook, kobo, epub, database, sql, sqlite
[
19:11 Sep 03, 2015
More tech |
permalink to this entry |
]