tecznotes

Michal Migurski's notebook, listening post, and soapbox. Subscribe to this blog. Check out the rest of my site as well.

Apr 22, 2013 6:30am

tilestache 0.7% better

TileStache, the map tile rendering server I’ve been working on since 2010, hit version 1.47 this weekend. The biggest change comes from Seth, who streamlined and expanded TileStache’s HTTP chops with the new TheTileLeftANote exception. The documentation needs an update, but the gist is that it’s now possible to customize tile HTTP responses from deep inside the rendering pipeline, with control over headers, status codes, and content. I’m excited that this didn’t require a backwards-incompatible change to the API, and that it’s now possible to tweak behavior in concert with Apache X-Sendfile or NGinx X-Accel.

Apr 10, 2013 10:45pm

south end of lake merritt construction

Google Maps gives a nice unintentional before & after view of the construction along the south end of Lake Merritt in Oakland, if you turn the 45° aerials off and on.

The gated-up and pissed-drenched pedestrian tunnels are gone. The connection to the bay is wider. There’s a separate pedestrian bridge, more grass, and proper crosswalks to the courthouse and museum.

More on the lake here and here.

Apr 6, 2013 2:44am

network time machine backups

I’ve been getting my house in order, computer-wise. I’ve maintained a continuous backup since Mac OS X introduced Time Machine several years ago, and I’ve grown increasingly uncomfortable with it just being a USB drive that I sometimes remember to attach when I’m at home. I researched network backups for the tiny home server (equivalent to a Raspberry Pi), and after struggling with a few of the steps I’ve got a basically-working encrypted backup RAID that runs transparently on my network and keeps my Mac OS X 10.6.8 Snow Leopard machine safe.

RAID

For durability, I wanted everything duplicated across two physical hard drives so that I could swap in new ones when failure made it necessary. RAID 1 is a standard for mirroring data to multiple redundant disks, and many manufacturers produce disk enclosures that do mirroring internally. I selected the NT2 from inXtron and two 2TB 3.5” hard drives, a total cost of ~$300.

The enclosure exposes a plain USB disk to Linux, identical to any other plug-in hard drive like the 2.5” one I was using previously. Unfortunately, the larger drives seem to require a fan in contrast to my previous silent drive. It’s not terribly loud, and a small price to pay for additional peace of mind.

udev

When connected, Linux assigns a drive letter to a USB volume, so that (for example) you can partition and mount from /dev/sda, /dev/sdb, etc. Unfortunately, these letters can be somewhat arbitrary, and you never know exactly where your connected drive will show up. This can be a real problem if you want the volume to be reliably findable every time. If you simply format the drive you can use the volume’s UUID instead of the drive letter, but I was interested in using Logical Volume Manager (LVM) so I needed it in a predictable place.

Fred Wenzel provided some hints on how to use udev, the device manager for the Linux kernel:

The solution for the crazily jumping dev nodes is the udev system, which is part of Linux for quite a while now, but I never really had a need to play with it yet. But the howto is pretty nice and easy to apply.

The idea is that you find some property of the device, like its manufacturer or product ID, and use that to create a stable link to the drive. With my drive temporarily at /dev/sda, I ran this udevadm command to read off its properties:

udevadm info -a -p /sys/block/sda/sda1

Running down the lengthy list that came back, I found three entries that looked meaningful:

  • ATTRS{manufacturer}=="inXtron, Inc."
  • ATTRS{product}=="NT2"
  • ATTRS{serial}=="0123456789"

This whole process was difficult and confusing, and I didn’t understand quite what I was doing until I started using udev’s PROGRAM/RUN functionality to log events and inspect them. I created a rule that matched all events with a “*”, and then had that log to a file in /tmp that I could periodically watch. It wasn’t necessary to reboot the server when testing, which was a big relief.

The rule I ended up with in /udev/rules.d/10-local.rules looks like this:

ATTRS{product}=="NT2", KERNEL=="sd*1", SYMLINK="raid"

It’s causes any one of /dev/sda1, /dev/sdb1, etc. with the product name “NT2” to be symlinked to /dev/raid. I could add the serial number, but this minimal rule works for now.

LVM

Logical Volume Manager makes it possible to do all kinds of neat tricks with hard drives, such as having a single volume span many physical disks or freely resize volumes and move them around after they are created. Setting up LVM requires three steps:

  1. pvcreate /dev/raid to make a physical volume from /dev/raid.
  2. vgcreate lvmraid /dev/raid to create a new volume group called “lvmraid” from the /dev/raid physical disk.
  3. lvcreate -L 360g -n tmachine lvmraid to create a new 360GB logical volume at /dev/mapper/lvmraid-tmachine, which I want to use for my backup volume.

At this point, it would be possible to make a filesystem on /dev/mapper/lvmraid-tmachine and have a 360GB volume available. I’ve got more logical volumes than this, but I’m just showing the one.

Volume encryption

I wanted my backup to be safely encrypted, so I followed advice from Robin Bowes who shows how to use cryptsetup and Linux Unified Key Setup (LUKS):

  1. cryptsetup -y --cipher aes-cbc-essiv:sha256 --key-size 256 luksFormat /dev/mapper/lvmraid-tmachine
  2. cryptsetup luksOpen /dev/mapper/lvmraid-tmachine lvbackup
  3. mkfs.ext3 -j -O ^ext_attr,^resize_inode /dev/mapper/lvbackup

The first step encrypts the volume, where you’ll assign a secret passphrase. The second step opens the volume at /dev/mapper/lvbackup, where you’ll have to provide the passphrase. The third creates a filesystem on the new volume; I’ve included some mkfs flags that omit features which might make it hard to resize the volume later.

I mount the new volume at /time-machine, and confirm that I can read and write files to it. I will need to run the luksOpen step every time I want to mount this volume after a reboot, so it’s useful to save a two-line script in /time-machine/mount.sh for reference.

Netatalk and AFPD

This was the second hard part; I’ve tried running Apple File exchange before and gave up, this time I figured out how to make it write meaningful logs so I could debug the process. The default installation of netatalk from apt-get mostly works, with a couple small changes:

  • Add “-setuplog "CNID LOG_INFO" -setuplog "AFPDaemon LOG_INFO"” to afpd.conf, to watch CNID and AFPD log useful progress to /var/log/syslog.
  • Replace the default uamlist in /etc/netatalk/afpd.conf, changing it from “uams_clrtxt.so,uams_dhx.so” to “uams_dhx2.so” so that Mac OS X can correctly provide a password. Until I did this, I was consistently seeing failed login attempts.

Finally, I added this line to /etc/netatalk/AppleVolumes.default:

/time-machine TimeMachine allow:migurski cnidscheme:cdb options:usedots,upriv

Now I have a working Apple File server.

Time Machine

Apple’s Time Machine is picky about the format of the volume it writes its backups to, preferring HFS+ to anything else. I initially looked at setting up /time-machine as an actual HFS volume, but stopped when I started reading words like “recompile” and “kernel”. Matthias Kretschmann offers a better way with Disk Utility. His netatalk advice is useful above, and I simply skipped all the Avahi steps. The important part of his article is under Configure Time Machine: ask Time Machine to show unsupported network volumes, and create your own sparsebundle disk image to back up to:

In short, you have to create the backup disk image on your Desktop and copy it to your mounted Time Machine volume. But Time Machine creates a unique filename for the disk image and we can find out this name with a little trick…

Actually follow his actual advice on the name of the file and volume, before copying to the AppleTalk share. My computer is named “Null Island”, so my sparse bundle file is called “Null-Island_xxxxxxxxxxxx.sparsebundle”. The x’s come from the hardware ethernet address, which you can find by running ifconfig en0 on the command line.

AutoBackup

Finally, in my case I don’t actually want Time Machine running at all hours of the day. When you switch to a network backup, everything takes longer than USB. I added these two lines to my crontab, causing AutoBackup to be kept off during the day, and kept on late at night:

  • */5 23,0-8 * * * defaults write /Library/Preferences/com.apple.TimeMachine AutoBackup -bool true
  • */5 9-22 * * * defaults write /Library/Preferences/com.apple.TimeMachine AutoBackup -bool false

With this in place, I don’t saturate the network with backup traffic during the day, and I can guarantee that my data is safe by keeping the computer on overnight. Time Machine keeps Apple File credentials, so it’s capable of mounting the network drive on its own. I just need to have the computer on after 11pm and before 9am.

Apr 3, 2013 8:23pm

week 1,846: ladders

I finished Evgeny Morozov’s mega-screed The Meme Hustler (“Tim O’Reilly’s crazy talk”) yesterday. If you can squeeze uncomfortably past the acid-drenched ad hominem opener, Evgeny recounts the history of the Open Source vs. Free Software memetic war of the late 1990’s and its relationship to political power:

Ranking your purchases on Amazon or reporting spammy emails to Google are good examples of clever architectures of participation. Once Amazon and Google start learning from millions of users, they become “smarter” and more attractive to the original users. This is a very limited vision of participation. It amounts to no more than a simple feedback session with whoever is running the system. You are not participating in the design of that system, nor are you asked to comment on its future. There is nothing “collective” about such distributed intelligence; it’s just a bunch of individual users acting on their own and never experiencing any sense of solidarity or group belonging. Such “participation” has no political dimension; no power changes hands. … There’s a very explicit depoliticization of participation at work here.

This morning, Matt said this, in response to Wave’s comment/question about empowerment:

@drwave once you’re “there”: making sure you don’t pull up the ladder, making new, better ladders, admitting there was a ladder.

The image of the ladder sticks with me. I entered into awareness of Open-vs.-Free in 1999, when it looked like another Vi-vs.-Emacs thing, an interminable pissing match for nerds. In Morozov’s retelling, the power politics of Freedom and Openness look newly fresh, important all over again. It touches the question of who the ladder is for, who’s inside the tribe deserving of help, and how to think about equity. The Free Software side of the argument framed a bigger community, consisting of users and developers together. GNU's four freedoms name use and study before they name distribution and modification. Order matters, ladders are for everyone.

I spoke at Ragi Burhum’s Geomeetup yesterday, about my recent work on vector tiles for Mapnik. The slides are here in PDF form. One of the subtexts to my OSM work for the past few years has been the ladder-making that Matt describes: a way to make datasets like OSM available to more people who might not otherwise choose to learn the full set of tools needed to work with the raw stuff, but still have important things to say. That includes professional message-makers like journalists but also enthusiasts like Stephanie May or Burrito Justice (on tacos and history). There are commercial answers to this question from companies like Google or Mapbox, but in addition to those it should always be possible to take your message into your own hands, most especially if your message is likely to get under someone’s fingernails. Free software and free data work as one kind of ladder, continually looking back as well as forward, assimilating innovation and passing it down to where it wouldn’t otherwise reach. I’m tempted to call this “trickle down”, but it occurs to me that the pull of gravity is all wrong in that image. Things don’t move from the core to the gap like water flowing downhill, but quite the opposite. Left alone, innovation and capital accrue to where they are already in highest concentration. Collective work and effort are the only forces that can counteract gravity with any regularity.

Here are the slides.

Here is the data. Please tell me if you find it interesting, useful, or need help.

March 2024
Su M Tu W Th F Sa
     
      

Recent Entries

  1. Mapping Remote Roads with OpenStreetMap, RapiD, and QGIS
  2. How It’s Made: A PlanScore Predictive Model for Partisan Elections
  3. Micromobility Data Policies: A Survey of City Needs
  4. Open Precinct Data
  5. Scoring Pennsylvania
  6. Coming To A Street Near You: Help Remix Create a New Tool for Street Designers
  7. planscore: a project to score gerrymandered district plans
  8. blog all dog-eared pages: human transit
  9. the levity of serverlessness
  10. three open data projects: openstreetmap, openaddresses, and who’s on first
  11. building up redistricting data for North Carolina
  12. district plans by the hundredweight
  13. baby steps towards measuring the efficiency gap
  14. things I’ve recently learned about legislative redistricting
  15. oh no
  16. landsat satellite imagery is easy to use
  17. openstreetmap: robots, crisis, and craft mappers
  18. quoted in the news
  19. dockering address data
  20. blog all dog-eared pages: the best and the brightest

Archives