Tag Archive for 'python'

How to Not Suck at Transcoding Music

Sometimes I get really interested in something and can’t sleep and end up spending all night working on it. This was one of those nights.

I wrote flacsync a while ago in order to make an MP3 with suitable tags for every FLAC I own. I do this so that I can actually play my favorite music on my iPhone and also so that I can keep a lightweight copy of my music database on my laptop, which has far less space than my desktop.

The idea was to keep a database containing an MD5 hash for every FLAC I have and whenever I run the script, check that hash to see if the FLAC has changed and needs to be re-transcoded. If so, transcode it and store the new hash to the database.

I made some remarkably stupid initial design choices. I knew that I wanted to thread it in order to maximize throughput, but I had some ridiculous bottlenecks. For example, for some reason, I thought it would be a good idea to use a SQLite database to store the hashes and then have a complicated DBWorker thread that would interface with all the processing threads. Although I got this original design working, it was slow. It took maybe 30 minutes to run through all my FLACs.

I later redesigned the script to just load a Python dictionary containing all the hashes from a file into memory. Then, I could update the table freely and wouldn’t even really need to worry about locking since only one thread acted on a track.

But this was still slow for a few reasons:

  • I wasn’t ordering the list of files intelligently at all. It would make sense to try to process the most recently changed files first, wouldn’t it?
  • Python’s [cci]threading[/cci] module isn’t actually capable of performing tasks on multiple processors. It can still only perform on at most one core.
  • Running [cci]md5sum[/cci] on an entire FLAC is slow and therefore dumb.

So, I redesigned the whole script to use the [cci]multiprocessing[/cci] module’s [cci]Pool[/cci] abstraction where you can apply a function onto a list with a pool of workers and then gather the results. Now, each worker returns either an indication that the file didn’t change or a new hash for the file. The system tallies up all the new hashes at the end, updates the database, saves it to disk, and exits. Oh and when it first starts up and finds all the FLACs in my music directory, it sorts them so that the most recently changed files are first.

Moreover, I was lazy in that I was just calling the [cci]md5sum[/cci] program on the entire FLAC, so I used Python’s [cci]hashlib[/cci] module to only take the MD5 of the first 4096 bytes of the FLAC. This is pretty okay because the header information is almost always entirely contained there.

The result is that I can fly through my entire music library in like a second (okay some of that is coming from disk cache — I haven’t tried it on a cold boot yet). Transcoding on my computer (piping [cci]flac[/cci] to [cci]lame[/cci] and then copying tags over) takes about 15 seconds on average.

So 30 minutes just to check a already-sync’d database to a few seconds. Pretty good speedup.

I guess now I should go to work.

Controlling a Mouse with a MIDI Controller

I’ve been doing a lot of editing of large documents recently and some of them require lots of scrolling in either horizontal or vertical directions. Scrolling vertically is pretty easy thanks to my Logitech MX Revolution, which has an inertial scroll wheel — quite possibly the best invention ever. But quickly scrolling horizontally is basically impossible because it just has a rocker.

I glanced over at my MIDI controller and thought that it would be pretty great if I could use it to emulate scrolling. The Numark TotalControl has two jog wheels that can spin freely in an inertial manner.

I then discovered PyMouse and PyGame’s MIDI module. Using these two libraries, I cooked up this example, which uses the left wheel to scroll vertically and the right wheel to scroll horizontally.

The script simply initializes and finds a suitable MIDI controller. Then, it connects to that controller and periodically polls to see if there have been any events. If so, it processes the event to see if it’s something to scroll with. I implemented an overly-simple “low pass filter” which doesn’t even take advantage of the fact that the TotalControl outputs a number that increases as you scroll faster on the jog wheels, but it works well enough. Future iterations will have more complicated algorithms. Theoretically, this script is cross-platform since the libraries it uses are.

At any rate when you run this script, you can use the jog wheels to scroll. You need to patch PyMouse though. The developer forgot to include support for horizontal scrolling, but this patch changes a whopping two lines to add it (in reality, all of that code needs to be redone because it isn’t very well designed, but I’m lazy).

I’m sure MIDI mappers have been made before, but I’d like to make one where you can just script the controller in Python. You would just register a bunch of hooks (functions based on controller and control number) with calls to whatever functions you want to design. A project for when I’m bored and without internet access, I guess.