...so take it easy.
My name is Michal Migurski. Until December 2012, I was technology head at Stamen, a San Francisco design and development studio focused on data visualization and map-making. You might remember me from such recent projects as Oakland Crimespotting, Walking Papers, Maps From Scratch, Digg Labs and API, Modest Maps, Mappr, or Reblog. Below, you will find my weblog, tecznotes, my link blog (high-frequency, short posts), and with a collection of smaller and older things I've worked on.
Background photo by Fred.
Subscribe to
this site.
May 11, 2013 1:52pm
week 1,851: week one
I started my new gig this week, chief-technology-officering for Code For America. People have said nice things on Twitter, and I feel deeply welcomed.
I’ll write more about the actual thing soon, but for now I’m just basking in the weirdness of being in an office again. I have a desk and a calendar and colleagues and blooming, buzzing confusion. The code I’ve written this week is more angry, productive birds than TileStache or Extractotron, and that’s a funny feeling. There’s a shower at work, so I can ride my bike in and not offend people. The office is in SOMA, so I’ve started bringing my lunch. No one knew me when I was 25 (well, almost) and the organization is young, so the potential feels sky-high.
Soon, I will wear my new track jacket:
I’ve known Jen, Abhi, Meghan and the CfA team for many years, but I also spent most of April doing a big, serious grown-up job search. This was an entirely new experience for me, and very educational. I learned that recruiting is a real job for actual smart people at companies looking for talent, and that some people are mediocre at it while others are amazing. I learned that I enjoy technical interviews, the ones with whiteboards or pair-coding machines and one hour to solve a technical problem. Every single one that I did was fun, as long as I remembered to narrate my process and say “I don’t actually know what I’m doing here” where appropriate. I talked to some of the smartest, most interesting people I’ve ever dealt with and got to spend a whole month playing what-if with a variety of employers. I don’t know if this kind of abundance is something I’ll ever experience again, so I tried to savor it as much as possible.
Apr 21, 2013 11:30pm
tilestache 0.7% better
TileStache, the map tile rendering server I’ve been working on since 2010, hit version 1.47 this weekend. The biggest change comes from Seth, who streamlined and expanded TileStache’s HTTP chops with the new TheTileLeftANote exception. The documentation needs an update, but the gist is that it’s now possible to customize tile HTTP responses from deep inside the rendering pipeline, with control over headers, status codes, and content. I’m excited that this didn’t require a backwards-incompatible change to the API, and that it’s now possible to tweak behavior in concert with Apache X-Sendfile or NGinx X-Accel.
Apr 10, 2013 3:45pm
south end of lake merritt construction
Google Maps gives a nice unintentional before & after view of the construction along the south end of Lake Merritt in Oakland, if you turn the 45° aerials off and on.
The gated-up and pissed-drenched pedestrian tunnels are gone. The connection to the bay is wider. There’s a separate pedestrian bridge, more grass, and proper crosswalks to the courthouse and museum.
Apr 5, 2013 7:44pm
network time machine backups
I’ve been getting my house in order, computer-wise. I’ve maintained a continuous backup since Mac OS X introduced Time Machine several years ago, and I’ve grown increasingly uncomfortable with it just being a USB drive that I sometimes remember to attach when I’m at home. I researched network backups for the tiny home server (equivalent to a Raspberry Pi), and after struggling with a few of the steps I’ve got a basically-working encrypted backup RAID that runs transparently on my network and keeps my Mac OS X 10.6.8 Snow Leopard machine safe.
RAID
For durability, I wanted everything duplicated across two physical hard drives so that I could swap in new ones when failure made it necessary. RAID 1 is a standard for mirroring data to multiple redundant disks, and many manufacturers produce disk enclosures that do mirroring internally. I selected the NT2 from inXtron and two 2TB 3.5” hard drives, a total cost of ~$300.
The enclosure exposes a plain USB disk to Linux, identical to any other plug-in hard drive like the 2.5” one I was using previously. Unfortunately, the larger drives seem to require a fan in contrast to my previous silent drive. It’s not terribly loud, and a small price to pay for additional peace of mind.
udev
When connected, Linux assigns a drive letter to a USB volume, so that (for example) you can partition and mount from /dev/sda, /dev/sdb, etc. Unfortunately, these letters can be somewhat arbitrary, and you never know exactly where your connected drive will show up. This can be a real problem if you want the volume to be reliably findable every time. If you simply format the drive you can use the volume’s UUID instead of the drive letter, but I was interested in using Logical Volume Manager (LVM) so I needed it in a predictable place.
Fred Wenzel provided some hints on how to use udev, the device manager for the Linux kernel:
The solution for the crazily jumping dev nodes is the udev system, which is part of Linux for quite a while now, but I never really had a need to play with it yet. But the howto is pretty nice and easy to apply.
The idea is that you find some property of the device, like its manufacturer or product ID, and use that to create a stable link to the drive. With my drive temporarily at /dev/sda, I ran this udevadm command to read off its properties:
udevadm info -a -p /sys/block/sda/sda1
Running down the lengthy list that came back, I found three entries that looked meaningful:
- ATTRS{manufacturer}=="inXtron, Inc."
- ATTRS{product}=="NT2"
- ATTRS{serial}=="0123456789"
This whole process was difficult and confusing, and I didn’t understand quite what I was doing until I started using udev’s PROGRAM/RUN functionality to log events and inspect them. I created a rule that matched all events with a “*”, and then had that log to a file in /tmp that I could periodically watch. It wasn’t necessary to reboot the server when testing, which was a big relief.
The rule I ended up with in /udev/rules.d/10-local.rules looks like this:
ATTRS{product}=="NT2", KERNEL=="sd*1", SYMLINK="raid"
It’s causes any one of /dev/sda1, /dev/sdb1, etc. with the product name “NT2” to be symlinked to /dev/raid. I could add the serial number, but this minimal rule works for now.
LVM
Logical Volume Manager makes it possible to do all kinds of neat tricks with hard drives, such as having a single volume span many physical disks or freely resize volumes and move them around after they are created. Setting up LVM requires three steps:
- pvcreate /dev/raid to make a physical volume from /dev/raid.
- vgcreate lvmraid /dev/raid to create a new volume group called “lvmraid” from the /dev/raid physical disk.
- lvcreate -L 360g -n tmachine lvmraid to create a new 360GB logical volume at /dev/mapper/lvmraid-tmachine, which I want to use for my backup volume.
At this point, it would be possible to make a filesystem on /dev/mapper/lvmraid-tmachine and have a 360GB volume available. I’ve got more logical volumes than this, but I’m just showing the one.
Volume encryption
I wanted my backup to be safely encrypted, so I followed advice from Robin Bowes who shows how to use cryptsetup and Linux Unified Key Setup (LUKS):
- cryptsetup -y ––cipher aes-cbc-essiv:sha256 ––key-size 256 luksFormat /dev/mapper/lvmraid-tmachine
- cryptsetup luksOpen /dev/mapper/lvmraid-tmachine lvbackup
- mkfs.ext3 -j -O ^ext_attr,^resize_inode /dev/mapper/lvbackup
The first step encrypts the volume, where you’ll assign a secret passphrase. The second step opens the volume at /dev/mapper/lvbackup, where you’ll have to provide the passphrase. The third creates a filesystem on the new volume; I’ve included some mkfs flags that omit features which might make it hard to resize the volume later.
I mount the new volume at /time-machine, and confirm that I can read and write files to it. I will need to run the luksOpen step every time I want to mount this volume after a reboot, so it’s useful to save a two-line script in /time-machine/mount.sh for reference.
Netatalk and AFPD
This was the second hard part; I’ve tried running Apple File exchange before and gave up, this time I figured out how to make it write meaningful logs so I could debug the process. The default installation of netatalk from apt-get mostly works, with a couple small changes:
- Add “-setuplog "CNID LOG_INFO" -setuplog "AFPDaemon LOG_INFO"” to afpd.conf, to watch CNID and AFPD log useful progress to /var/log/syslog.
- Replace the default uamlist in /etc/netatalk/afpd.conf, changing it from “uams_clrtxt.so,uams_dhx.so” to “uams_dhx2.so” so that Mac OS X can correctly provide a password. Until I did this, I was consistently seeing failed login attempts.
Finally, I added this line to /etc/netatalk/AppleVolumes.default:
/time-machine TimeMachine allow:migurski cnidscheme:cdb options:usedots,upriv
Now I have a working Apple File server.
Time Machine
Apple’s Time Machine is picky about the format of the volume it writes its backups to, preferring HFS+ to anything else. I initially looked at setting up /time-machine as an actual HFS volume, but stopped when I started reading words like “recompile” and “kernel”. Matthias Kretschmann offers a better way with Disk Utility. His netatalk advice is useful above, and I simply skipped all the Avahi steps. The important part of his article is under Configure Time Machine: ask Time Machine to show unsupported network volumes, and create your own sparsebundle disk image to back up to:
In short, you have to create the backup disk image on your Desktop and copy it to your mounted Time Machine volume. But Time Machine creates a unique filename for the disk image and we can find out this name with a little trick…
Actually follow his actual advice on the name of the file and volume, before copying to the AppleTalk share. My computer is named “Null Island”, so my sparse bundle file is called “Null-Island_xxxxxxxxxxxx.sparsebundle”. The x’s come from the hardware ethernet address, which you can find by running ifconfig en0 on the command line.
AutoBackup
Finally, in my case I don’t actually want Time Machine running at all hours of the day. When you switch to a network backup, everything takes longer than USB. I added these two lines to my crontab, causing AutoBackup to be kept off during the day, and kept on late at night:
- */5 23,0-8 * * * defaults write /Library/Preferences/com.apple.TimeMachine AutoBackup -bool true
- */5 9-22 * * * defaults write /Library/Preferences/com.apple.TimeMachine AutoBackup -bool false
With this in place, I don’t saturate the network with backup traffic during the day, and I can guarantee that my data is safe by keeping the computer on overnight. Time Machine keeps Apple File credentials, so it’s capable of mounting the network drive on its own. I just need to have the computer on after 11pm and before 9am.
Apr 3, 2013 1:23pm
week 1,846: ladders
I finished Evgeny Morozov’s mega-screed The Meme Hustler (“Tim O’Reilly’s crazy talk”) yesterday. If you can squeeze uncomfortably past the acid-drenched ad hominem opener, Evgeny recounts the history of the Open Source vs. Free Software memetic war of the late 1990’s and its relationship to political power:
Ranking your purchases on Amazon or reporting spammy emails to Google are good examples of clever architectures of participation. Once Amazon and Google start learning from millions of users, they become “smarter” and more attractive to the original users. This is a very limited vision of participation. It amounts to no more than a simple feedback session with whoever is running the system. You are not participating in the design of that system, nor are you asked to comment on its future. There is nothing “collective” about such distributed intelligence; it’s just a bunch of individual users acting on their own and never experiencing any sense of solidarity or group belonging. Such “participation” has no political dimension; no power changes hands. … There’s a very explicit depoliticization of participation at work here.
This morning, Matt said this, in response to Wave’s comment/question about empowerment:
@drwave once you’re “there”: making sure you don’t pull up the ladder, making new, better ladders, admitting there was a ladder.
The image of the ladder sticks with me. I entered into awareness of Open-vs.-Free in 1999, when it looked like another Vi-vs.-Emacs thing, an interminable pissing match for nerds. In Morozov’s retelling, the power politics of Freedom and Openness look newly fresh, important all over again. It touches the question of who the ladder is for, who’s inside the tribe deserving of help, and how to think about equity. The Free Software side of the argument framed a bigger community, consisting of users and developers together. GNU's four freedoms name use and study before they name distribution and modification. Order matters, ladders are for everyone.
I spoke at Ragi Burhum’s Geomeetup yesterday, about my recent work on vector tiles for Mapnik. The slides are here in PDF form. One of the subtexts to my OSM work for the past few years has been the ladder-making that Matt describes: a way to make datasets like OSM available to more people who might not otherwise choose to learn the full set of tools needed to work with the raw stuff, but still have important things to say. That includes professional message-makers like journalists but also enthusiasts like Stephanie May or Burrito Justice (on tacos and history). There are commercial answers to this question from companies like Google or Mapbox, but in addition to those it should always be possible to take your message into your own hands, most especially if your message is likely to get under someone’s fingernails. Free software and free data work as one kind of ladder, continually looking back as well as forward, assimilating innovation and passing it down to where it wouldn’t otherwise reach. I’m tempted to call this “trickle down”, but it occurs to me that the pull of gravity is all wrong in that image. Things don’t move from the core to the gap like water flowing downhill, but quite the opposite. Left alone, innovation and capital accrue to where they are already in highest concentration. Collective work and effort are the only forces that can counteract gravity with any regularity.
Here is the data. Please tell me if you find it interesting, useful, or need help.
big things
Digg
We are Digg's
visualization partner, and helped launch the
new Labs experimental area on their site,
including Stack
and Swarm!
I also designed the Digg API
with Shawn
and Steve.
Modest Maps
Modest Maps is a BSD-licensed display and interaction library
for tile-based maps in Adobe Flash 7+, written in ActionScript.
This is an active project I'm working on with
Darren,
Shawn,
and Tom.
Mappr
Mappr is a geographic browser of
Flickr's
photo collection. I wrote a large portion
of this application with Tomas and Eric, notably
the place-name matching and geolocation bits,
and pretty much the entire back-end.
Reblog
Reblog is a server-side RSS aggregator that doubles as
a quick publishing mechanism for syndicated news.
I wrote it with Eyebeam
R+D fellow Michael Frumin.
small things
Giant-ass image viewer
Javascript pan and zoom interface for very large images, with Python
code for creating required tiles. Similar in spirit to
Google Maps and
Zoomify.
Strangely popular.
http://video.teczno.com
Distribution site for my ongoing, occasional
experiments with video production. Everything there is
free.
Jitter and 3D Geometry
Updated experiments in 3D geometry handling using OpenGL
and PHP.
Rooftop photos
Photos taken from the roof of the SOMA-SF warehouse space I lived in,
summer of 2002.
Freeway Interchanges
Collages of freeway satellite imagery to satisfy a fetish for
complex interchanges.
Visible Humans
Meat!
Quickdraw and basic 3D
Rough experiments in 3D rendering basics and matrix math.
old things
moveon: fahrenheit 9/11 national town meeting
/ part of a nationally-broadcast conversation between Michael Moore and MoveonPAC directors.
stamen google news visualizer
/ data visualisation experiment intended to give a high-level view of who's making news at the moment, and who made the news at specified times in the past.
bmw design priorities
/ rich internet application development in collaboration with DesignworksUSA Advanced Communications Group
moveon: bush uncovered
/ map of moveon.org's bush uncovered event series
naral/pro-choice america
/ map of the march for women's lives
sflnc
/ web dev political activism on behalf of the san francisco late night community
bipole
/ audio-video synchronicity courtesy of me & andy w.
video riot
/ “an edgy electronic tailgate party and a real-time drive-in multiplex”
viberation
/ event production, multimedia installations, dancing all night
h&k global and
h&k u.s. / website, day job, web applications developer
code
Map Projection
/ a collection of classes used to project GPS data points onto maps, implemented in PHP 4
JSON-PHP
/ PHP 4 implementation of JSON, lightweight data-interchange format optimized for efficient javascript/server communication
OSC hub
/ PHP-based client and server for Open Sound Control, optimized for use with Max/MSP implementation.
flash component of the H&K global website, a database-driven worldwide office map
coho
/ content management display component, for Apache/PHP/MySQL
sordid
/ command-line mp3 sorting utility for mac OS X, unix




