the lazy hacker blog

Bulk loading data into MySQL

July 26th 2010

During a casual conversation last week with a friend, we happened to talk about how you can bulk load data data in PostgreSQL without writing to temporary files via the PQputCopyData API.

The MySQL client API does not have an equivalent way of streaming data right from your program/script. So I started thinking about how I can do this in MySQL. MySQL C API does allow you to specify your own infile handlers but it seemed unnecessarily complicated.

A few drinkies later, I tripped on a neat solution. named pipes – et voilĂ !

So all I had to do was create a named pipe, fire up a thread and stream the data to the pipe in that thread and let the mysql client library read from it. No temporary files, no cleanups (except for unlinking the FIFO file).

In principle this is same as

mkfifo -m0666 /tmp/somefile.fifo
zcat users.gz > /tmp/somefile.fifo &
mysql -uroot mytestdb -e "load data local infile '/tmp/somefile.fifo' into table users;"

UPDATE Although named pipes worked like a charm it started to cause issues with threaded applications in ruby. So I had to implement custom infile handlers for dbic++ which was a bit more code but works like a charm.

But a nice little api around this helps!

require 'etc'
require 'swift'
require 'stringio'

data = StringIO.new "sally\tsally@local\njerry\tjerry@local\n"

Swift.setup :default, Swift::DB::Mysql, user: Etc.getlogin, db: 'swift'

Swift.db do |db|
  db.execute 'drop table if exists users'
  db.execute 'create table users (id serial, name text, email text, primary key(id))'
  db.write   'users', %w{name email}, data
end

The block can stream data from anywhere, file, network etc and straight into MySQL. You just have to make sure you do it in batches of reasonable size to avoid MySQL writing too much to its binary log (WAL in case of PosgreSQL) in case you have to rollback.

Grab the latest dbic++ and swift, have fun!

blog comments powered by Disqus