...so take it easy.
My name is Michal Migurski. I play the role of CTO for Code for America, a Bay Area non-profit organization helping make government digital services beautiful, simple, and easy to use. Until December 2012, I was technology head at Stamen, a San Francisco design and development studio focused on data visualization and map-making. Below, you will find my weblog, tecznotes, and a collection of smaller and older things I've worked on.
Background photo by Fred.
Subscribe to this site.
For bike day trips, keep a six-pack of beer cans cold and three burritos hot with this insulated rack-mountable box.
I switched the Schwinn touring bike from a front basket to a rear-mounted rack, and assembled this insulated box using $20 in parts from a hardware store and an art supply store. It should work for any rear-mounted bicycle rack with a flat top.
The box is a Rubbermaid 3 Gallon tote. Home Depot sells them for $5.27:
The insulation is plain styrofoam. I had a few sheets of 1" foam around the house, but it was slightly too thick to fit into the box leaving room for the six-pack (5.2" × 7.8" × 4.83"). Michael’s Art Supplies sells 5/8" sheets for an extortionate $5 apiece; two 12" × 12" sheets were enough for the sides:
Use a sharp utility knife to cut the foam. The tote tapers slightly toward the bottom and has curved corners, so use the knife to trim off a bit of the bottom and corners of each of the four side panels until a six-pack fits snugly into the box:
I labeled each of the panels so I wouldn’t be confused about how to fit them in, and wrapped the cut edges with clear packing tape so bits of styrofoam wouldn’t fly all over the place.
To mount the box to the rack, mark four points on the bottom of the tote box up against the inside corners of the rack top; it will look like this from the bottom when you’re done:
Each mount point is made from a three washers, a screw, and a wingnut.
The screw here is fairly long, to make it possible to attach and detach the tote box without removing the wingnuts completely. They can simply be unscrewed, and the natural flexibility of the tote container will make it possible to bend the large washers around the rack. I use large fender washers to catch a complete interior corner of the rack on the bottom, and to distribute the stress on the bottom of the tote box. Smaller washers will distort the box and allow the screws to bend until they detach from the box while riding over bumps.
Make a sandwich from the washers, and use the tiny one to capture the screw head at top:
I used metric M5 screws for everything, to match most other bike components.
When you drill the four holes you’ve marked on the bottom of the box, use a bit slightly narrower than the screw threads so everything fits tightly. This should keep water from splashing up into the box from below, if you ride in the wet.
I tested the box on day-long ride in July. We started out with cans cold from the fridge and burritos hot from Burrito Express at 10am, and ascended Palomares Road over the next couple hours. We descended into Fremont, and made our way west along Alameda Creek to Coyote Hills Park where we stopped to eat around 3pm. After five hours in near 90° heat, the burritos were deliciously hot and the beers were acceptably cold.
Similar to my quick adaptation of OSM data the other day, Kate Elswit recently asked for data and mapping help with her Moving Bodies, Moving Culture project. MBMC is a series of exploratory visualizations all based on the 1941 South American tour of American Ballet Caravan. The goal here was to adapt a Rand McNally road atlas from 1940 as a base map for the tour data, to “to see how different certain older maps might be, given some of the political upheaval in South America in the past 80 years.”
Kate is using Github to store the data, and I’ve written up a document explaining how to do simple map warping with a known source projection to get an accurate base map.
This is a re-post of the process documentation.
Vector Data and QGIS
This downloads a file called 5969008.jpg.
Download vector data from Natural Earth, selecting a few layers that match the content of the 1940 map:
ogr2ogr -t_srs '+proj=moll +lon_0=-59' \ ne_50m_admin_0_countries-moll.shp \ ne_50m_admin_0_countries.shp ogr2ogr -t_srs '+proj=moll +lon_0=-59' \ ne_50m_populated_places-moll.shp \ ne_50m_populated_places.shp ogr2ogr -t_srs '+proj=moll +lon_0=-59' \ ne_50m_graticules_5-moll.shp \ ne_50m_graticules_5.shp
In QGIS, the location can be read from the coordinates display:
Warping The Map
Back in the 1940 map, corresponding pixel coordinates can be read from Adobe Photoshop’s info panel:
Using GDAL, define a series of ground control points
(GCP) centered on cities
in the Natural Earth data and the 1940 map. Use
to describe the downloaded map and then
to bend it into shape:
gdal_translate -a_srs '+proj=moll +lon_0=-59' \ -gcp 1233 1249 -1655000 775000 \ -gcp 4893 2183 2040000 -459000 \ -gcp 2925 5242 52000 -4176000 \ -gcp 1170 3053 -1788000 -1483000 \ -gcp 2256 6916 -767000 -6044000 \ -of VRT 5969008.jpg 5969008-moll.vrt gdalwarp -co COMPRESS=JPEG -co JPEG_QUALITY=50 \ -tps -r cubic 5969008-moll.vrt 5969008-moll.tif
Opening the result in QGIS and comparing it to the 5° graticules shows that the Mollweide guess was probably wrong:
The exactly horizontal lines of latitude in the original map suggest a pseudocylindrical projection, and a look at a list of examples shows that Sinusoidal might be better. Try it all again with a different PROJ.4 string:
ogr2ogr -t_srs '+proj=sinu +lon_0=-59' \ ne_50m_admin_0_countries-sinu.shp \ ne_50m_admin_0_countries.shp ogr2ogr -t_srs '+proj=sinu +lon_0=-59' \ ne_50m_populated_places-sinu.shp \ ne_50m_populated_places.shp ogr2ogr -t_srs '+proj=sinu +lon_0=-59' \ ne_50m_graticules_5-sinu.shp \ ne_50m_graticules_5.shp
The pixel coordinates will be identical, but the locations will be slightly different and must be read from QGIS again:
gdal_translate -a_srs '+proj=sinu +lon_0=-59' \ -gcp 1233 1249 -1838000 696000 \ -gcp 4893 2183 2266000 -414000 \ -gcp 2925 5242 52000 -3826000 \ -gcp 1170 3053 -1970000 -1329000 \ -gcp 2256 6916 -711000 -5719000 \ -of VRT 5969008.jpg 5969008-sinu.vrt gdalwarp -co COMPRESS=JPEG -co JPEG_QUALITY=50 \ -tps -r cubic 5969008-sinu.vrt 5969008-sinu.tif
The results looks pretty good:
For web map display, convert the warped map to map tiles using
starting at map zoom level 6:
gdal2tiles.py -w openlayers -z 0-6 \ -c 'Rand McNally 1940' -t 'Map of South and Central America' \ 5969008-sinu.tif tiles
Convert all generated PNG tiles to smaller JPEG images using Python and convert:
Here it is in CartoDB:
Scott Murray wrote the other day asking about getting Church data out of OpenStreetMap:
What is the easiest way to extract a list of a specific type of features from OSM for a particular area? For example, say I want all of the churches ( feature type: building / church ) in London, and the name, lat, and lon for each. Ideally all of this would end up in a simple CSV. This would be a one-time extract, and I don’t need to update it again later.
It was a pretty quick process, so I wrote it up for him and asked his permission to re-post here. I figured others might run into the same need. Today, Steven Vance in Chicago posted a response to a near-identical question with a different approach. There are many ways to skin this cat, and possibly not enough guides on this kind of retail data extraction from OpenStreetMap.
This is what I sent to Scott:
Since you’re asking for a major urban area, I would expect that London is part of the Mapzen metro extracts.
I downloaded a copy of the London OSM2PGSQL SHP data, because I know that it tends to be a closer (and often messier) representation of what’s in the OSM source database. If I was looking for roads or something else that I felt confident was already a defined and separate layer, I would download the IMPOSM SHP data. If I was looking for something outside a covered city, then I’d need to go digging in the Planet and I would be sad (edit: I’d follow Steven’s advice).
Next I looked on the OSM wiki to see how churches are tagged. The suggested tag is amenity = place_of_worship.
Then I used ogr2ogr, a tool in the GDAL family, to quickly peel out all the tag matches. I could do this interactively in QGIS as well, but I find the command line to be a speedier way to get what I want. ogr2ogr can be a pain in the butt to install, but I’ve found that it’s something of a secret hidden easter egg in Postgres.app, so if you install that you can find ogr2ogr hidden inside.
Here’s the conversion to get the OSM ID and the name for all places of worship:
ogr2ogr \ -select 'osm_id, name' \ -where "amenity = 'place_of_worship’” \ london_england_osm_point-amenity-place_of_worship.shp \ london_england_osm_point.shp
At this point I opened the shapefile in QGIS to see what’s there, and saw this this:
That looks right, so I convert it to a CSV preserving the geometry in (Y, X) fields, also using ogr2ogr:
ogr2ogr \ -f CSV -lco GEOMETRY=AS_YX \ london_england_osm_point-amenity-place_of_worship.csv \ london_england_osm_point-amenity-place_of_worship.shp
This week, I’ll be speaking at two Bay Area tech events:
On Wednesday, IxDA San Francisco is hosting Making Data Meaningful, where I’ll be joining designers from Facebook, Automatic, and Jawbone to talk about meaning and data.
On Thursday, SAP is hosting Accelerating Smart Cities in Palo Alto, where I’ll be joining a (regrettably all-male) group of technology and government experts to talk about smartness and cities.
This is me trying to get some thoughts straight while I prepare.
When I first met my now-boss Jen Pahlka and got excited about Code for America, it was 2010 and Tim O’Reilly was starting to unveil his “government as a platform” meme. For tech people like me, it’s an evocative and potent image and I’ve been wondering why. The UK’s Government Digital Service made this video to attempt an explanation, and it misses the mark:
I’ve been re-reading Science in Action. Eight years later, there are a lot of ideas in Latour’s book directly applicable to what makes a platform and what’s missing from the GDS video. Latour uses the Roman two-faced god Janus as a recurring illustration, contrasting ready-made science that you learn in a school textbook with science-in-the-making that you learn in the news.
Uncertainty, people at work, decisions, competition, controversies are what one gets when making a flashback from certain, cold, unproblematic black boxes to their recent past. If you take two pictures, one of the black boxes and the other of the open controversies, they are utterly different. They are as different as the two sides, one lively, the other severe, of a two-faced Janus. “Science in the making” on the right side, “all made science” or “ready made science” on the other; such is Janus bifrons, the first character that greets us at the beginning of our journey.
The Janus illustration appears repeatedly, showing the difference between settled facts on the left and the process by which they’re made on the right:
On the right is where the messy controversies of science and technology happen, and usually they’re in the form of suggested truths being put to a test. When things “hold,” they work for new people in new contexts. The chemist’s double-helix shape of DNA is used by the biologist to explain how genetic information is copied. Pasteur’s work on bacterial vaccines is used by farmers to keep their sheep and cows alive. The GDS video shows the platforminess of technology as a settled truth with neatly-shaped blocks, but without those other people using the platform for support it doesn’t mean anything.
So, nothing is a platform until it’s used as one.
Meanwhile, there are a few potential visions of what a government platform might look like. Specific actors work on the right side of Janus developing and promoting visions to make them real. The winning bingo words are “big data,” “smart cities,” “internet of things,” and so on.
Adam Greenfield (author of Against The Smart City) ties a few of these threads together in a recent edition of his weekly email letter:
So the idea that we will somehow use the data we garner to “make wiser decisions” about our own lives is something I find naive at best. If other parties will almost always better be able to use data to act in ways that are counter to my interests (and even do me harm!) than I will be able to marshal the time, effort and energy to use them in ways that advance my interests, then the house always wins. And this is particularly problematic as one failing “smart city” initiative after another gets reframed and repositioned as an “urban IoT” project.
The initiatives fail to hold, and are recast into new initiatives.
Government has always had high potential for running platforms, because platforms are essentially made of regulations. The web platform is a set of rules for how markup, addresses, and state transfers work together. The Amazon services platform is a set of rules for how computers, networks, and credit cards work together. The Interstate Highway platform is a set of rules for how roads, tax dollars, and cars work together. All those pieces can be swapped out, but the rules that bind them hold. In the GDS video metaphor, rules might specify the acceptable size and weight of a block but not its material or color. It should reference the idea of other people in the picture, the potential for new actors to use those blocks for support.
Where government has failed at platforms is in delivery. Outgoing GDS director Mike Bracken immortalized their approach with his “Strategy is Delivery” meme, jumping the gap between isolated rule-setting and the services that deliver those rules. Implementation is a pre-digital, book-length exploration on the same theme from 1973: “If one wishes to assure a reasonable prospect of program implementation, he had better begin with a high probability that each every actor will cooperate. The purpose of bureaucracy is precisely to secure this degree of predictability.” Delivery secures predictability by making things real. Dan Hon has published CfA’s point of view on service delivery with respect to technology procurement, with a special focus on connecting that bureacratic probability machine to the original user needs that set it in motion.
Mikey Dickerson of USDS illustrates this using his modified hierarchy of needs. It’s his response to the lack of platform thinking at Healthcare.gov in 2013, and the platform metaphor is right there in the picture of the pyramid:
Without a foundation of monitoring or incident response, it was impossible to know that Healthcare.gov worked. The policy intent was not being delivered. The individual components were all being individually monitored by the contractors responsible for them, but little effort was spent securing predictability by enforcing coordination so that outages could have an agreed-upon boundary. Without the platform of common language about the system, the pyramid is just a tower of babel.
Faced with these same challenges, the private sector often simply folds (pivots). Phil Gyford runs a cherished record of tone-deaf service shutdowns called Our Incredible Journey. Markets get resegmented, teams sell themselves to bigger companies, and engagement gets prioritized over service delivery. However, public servants don’t pivot.
Returning to the topic of the Accelerating Smart Cities and Making Data Meaningful events, “smart” and “meaningful” can only be used in retrospect as a judgement of success. A city government is mandated to meet certain needs of its residents. Having met those needs is the only way in which it can be said to be smart. Urban informatics dashboards, mass data collection, and coordinated networks of magic talking light poles are not user needs. Having been available, current, and usable is the only way in which data can be said to be meaningful (to borrow from Renee Sieber’s definition of open data). Having supported novel uses by other people is the only way in which a government can be said to be a platform.
We’ve been working on an update to the technology behind OpenAddresses, and it’s now being used in public.
OpenAddresses is a global repository for open address data. In good open source fashion, OpenAddresses provides a space to collaborate. Today, OpenAddresses is a downloadable archive of address files, it is an API to ingest those address files into your application and, more than anything, it is a place to gather more addresses and create a movement: add your government’s address file and if there isn’t one online yet, petition for it. —Launching OpenAddresses.
Timely feedback is vital to a shared data project like OA, something I learned many years ago when I started contributing to OpenStreetMap. In 2006, tiles rendered many days after edits were made, and the impossibility of seeing the results of your own work gated participation. Today, the infrastructure behind OSM makes it easy to see changes immediately and incorporate them into other projects, feedback vital to keeping editors motivated.
Last year, we automated the OpenAddresses process to cut the update frequency from weeks or months to days. Now, we’d like to cut that frequency from days to hours or minutes.
OpenAddresses is run from Github. If you host a code project there and you’re serious about code quality, it’s likely that you’ve configured Travis or Circle to automatically run your test suite as you work. For external contributors sending pull requests to a project, CI services make it easy to see whether changes will work:
OA has always used Travis CI to verify the syntax correctness of submitted data, and it will tell you that your JSON is valid and that you’re using the right tags. We wanted to be able to see the true results of integrating that data into OA. Ordinarily, Travis itself might be a good tool for this job, but OA sources can take many hours to run, and a single PR might include changes to many sources. So, we needed to roll our own continuous integration service.
Creating our own service to run OA source submissions required three parts:
- A web service to listen for events from Github.
- A pool of workers to act on those events.
- Communication back to Github.
The web service is the easiest part, and consists of a simple Flask-based application listening for events from Github. These events can be signed with a secret to ensure that only real requests are acted on. There are dozens of event types to choose from, but we care about just two: push events when data in the OA repository is changed, and pull request events when a contributor suggests new data from outside. Events come in the form of JSON data, and it takes a bit of rooting around in the Github API to determine what actual files were affected. Git’s underlying data model (more, easier) is helpful here, with commits linked to directory trees and trees linked to individual file blobs. Each event from Github is turned into a job of added or changed source data, and each individual source is queued up for work. Nelson chose PQ for the queue implementation since we’re already using Python and PostgreSQL, and it’s been working very well.
The worker pool is tricky. It’s wasteful to keep a lot of workers standing around and waiting, but you still want to act quickly on a new submissions so people don’t get antsy. There are also a lot of interesting things that can go wrong. Amazon’s EC2 service is a big help here, with a few useful features to use. Auto Scaling Groups make it easy to spin up new workers to do big jobs in parallel. We’ve set up a few triggers based on the size of the queue backlog to determine how large a group is needed. When there have been tasks waiting for a worker for longer than a few minutes, the pool grows. When no new tasks have been waiting for a few hours, the pool shrinks. We use Amazon Cloudwatch to continuously communicate the size of the queue. We have struck a balance here, aiming for results within hours or minutes rather than seconds, so we only grow the pool to a half-dozen workers or so.
Finally, Github needs to know about the work being done. As each task is completed, a status of "pending", "success", or "failure" is communicated back to Github where it is shown to a user along with a link to a detailed page. Commercial CI services use the Commit Status API to integrate with Github, and it’s available to anyone. The tricky part here is how to differentiate between failed jobs and ones that simply take a long time. In our case, we have a hard limit of three hours for any job, and judge a job to have failed when it’s been AWOL for more than three hours. Right now, we’re not retrying failed jobs.
There are still bugs and weird behaviors in the CI service, so I’m shaking those out as I watch it in action.
We are continuing to run the batch job process. There’s nothing else that can generate the summary page at data.openaddresses.io for the time being, so the new continuous integration feature is being used solely to inform data contributors within pull requests. I’d like to replace the batch job with a smaller one that schedules missing sources, renders maps, and summarizes output. Then we can kill off the old batch process.
Modest Maps is a BSD-licensed display and interaction library for tile-based maps in Adobe Flash 7+, written in ActionScript. This is an active project I'm working on with Darren, Shawn, and Tom.
Mappr is a geographic browser of Flickr's photo collection. I wrote a large portion of this application with Tomas and Eric, notably the place-name matching and geolocation bits, and pretty much the entire back-end.
Jitter and 3D Geometry
Updated experiments in 3D geometry handling using OpenGL and PHP.
Photos taken from the roof of the SOMA-SF warehouse space I lived in, summer of 2002.
Collages of freeway satellite imagery to satisfy a fetish for complex interchanges.
Quickdraw and basic 3D
Rough experiments in 3D rendering basics and matrix math.
moveon: fahrenheit 9/11 national town meeting / part of a nationally-broadcast conversation between Michael Moore and MoveonPAC directors.
stamen google news visualizer / data visualisation experiment intended to give a high-level view of who's making news at the moment, and who made the news at specified times in the past.
bmw design priorities / rich internet application development in collaboration with DesignworksUSA Advanced Communications Group
moveon: bush uncovered / map of moveon.org's bush uncovered event series
naral/pro-choice america / map of the march for women's lives
sflnc / web dev political activism on behalf of the san francisco late night community
bipole / audio-video synchronicity courtesy of me & andy w.
video riot / “an edgy electronic tailgate party and a real-time drive-in multiplex”
viberation / event production, multimedia installations, dancing all night
Map Projection / a collection of classes used to project GPS data points onto maps, implemented in PHP 4
OSC hub / PHP-based client and server for Open Sound Control, optimized for use with Max/MSP implementation.
flash component of the H&K global website, a database-driven worldwide office map
coho / content management display component, for Apache/PHP/MySQL
sordid / command-line mp3 sorting utility for mac OS X, unix