So it was OpenTech weekend. I wasn’t presenting anything (although I’m kicking myself for not having done a talk on Tropo and Phono) but of course I was there. This year’s was, I think, a bit better than last year’s – the schedule filled up late on, and there were a couple of really good workshop sessions. As usual, it was also the drinking conference with a code problem (the bar was full by the end of the first session).
Things to note: everyone loves Google Refine, and I really enjoyed the Refine HOWTO session, which was also the one where the presenter asked if anyone present had ever written a screen-scraper and 60-odd hands reached for the sky. Basically, it lets you slurp up any even vaguely tabular data and identify transformations you need to clean it up – for example, identifying particular items, data formats, or duplicates – and then apply them to the whole thing automatically. You can write your own functions for it in several languages and have the application call them as part of the process. Removing cruft from data is always incredibly time consuming and annoying, so it’s no wonder everyone likes the idea of a sensible way of automating it. There’s been some discussion on the ScraperWiki mailing list about integrating Refine into SW in order to provide a data-scrubbing capability and I wouldn’t be surprised if it goes ahead.
Tim Ireland’s presentation on the political uses of search-engine optimisation was typically sharp and typically amusing – I especially liked his point that the more specific a search term, the less likely it is to lead the searcher to a big newspaper website. Also, he made the excellent point that mass audiences and target audiences are substitutes for each other, and the ultimate target audience is one person – the MP (or whoever) themselves.
The Sukey workshop was very cool – much discussion about propagating data by SMS in a peer-to-peer topology, on the basis that everyone has a bucket of inclusive SMS messages and this beats paying through the nose for Clickatell or MBlox to send out bulk alerts. They are facing a surprisingly common mobile tech issue, which is that when you go mobile, most of the efficient push-notification technologies you can use on the Internet stop being efficient. If you want to use XMPP or SIP messaging, your problem is that the users’ phones have to maintain an active data connection and/or recreate one as soon after an interruption as possible. Mobile networks analogise an Internet connection to a phone call – the terminal requests a PDP (Packet Data Profile) data call from the network – and as a result, the radio in the phone stays in an active state as long as the “call” is going on, whether any data is being transferred or not.
This is the inverse of the way they handle incoming messages or phone calls – in that situation, the radio goes into a low power standby mode until the network side signals it on a special paging channel. At the moment, there’s no cross-platform way to do this for incoming Internet packets, although there are some device-specific ways of getting around it at a higher level of abstraction. Hence the interest of using SMS (or indeed MMS).
Their other main problem is the integrity of their data – even without deliberate disinformation, there’s plenty of scope for drivel, duplicates, cockups etc to get propagated, and a risk of a feedback loop in which the crap gets pushed out to users, they send it to other people, and it gets sucked up from Twitter or whatever back into the system. This intersects badly with their use cases – it strikes me, and I said as much, that moderation is a task that requires a QWERTY keyboard, a decent-sized monitor, and a shirt-sleeve working environment. You can’t skim-read through piles of comments on a 3″ mobile phone screen in the rain, nor can you edit them on a greasy touchscreen, and you certainly can’t do either while looking out that you don’t get hit over the head by the cops.
Fortunately, there is no shortage of armchair revolutionaries on the web who could actually contribute something by reviewing batches of updates, and once you have reasonably large buckets of good stuff and crap you can use Bayesian filtering to automate part of the process.
Francis Davey’s OneClickOrgs project is coming along nicely – it automates the process of creating an organisation with legal personality and a constitution and what not, and they’re looking at making it able to set up co-ops and other types of organisation.
I didn’t know that OpenStreetMap is available through multiple different tile servers, so you can make use of Mapquest’s CDN to serve out free mapping.
OpenCorporates is trying to make a database of all the world’s companies (they’re already getting on for four million), and the biggest problem they have is working out how to represent inter-company relationships, which have the annoying property that they are a directed graph but not a directed acylic graph – it’s perfectly possible and indeed common for company X to own part of company Y which owns part of company X, perhaps through the intermediary of company Z.
OpenTech’s precursor, Notcon, was heavier on the hardware/electronics side than OT usually is, but this year there were quite a few hardware projects. However, I missed the one that actually included a cat.
What else? LinkedGov is a bit like ScraperWiki but with civil servants and a grant from the Technology Strategy Board. Francis Maude is keen. Kumbaya is an encrypted, P2P online backup application which has the feature that you only have to store data from people you trust. (Oh yes, and apparently nobody did any of this stuff two years ago. Time to hit the big brown bullshit button.)
As always, the day after is a bit of an enthusiasm killer. I’ve spent part of today trying to implement monthly results for my lobby metrics project and it looks like it’s much harder than I was expecting. Basically, NetworkX is fundamentally node-oriented and the dates of meetings are edge properties, so you can’t just subgraph nodes with a given date. This may mean I’ll have to rethink the whole implementation. Bugger.
I’m also increasingly tempted to scrape the competition‘s meetings database into ScraperWiki as there doesn’t seem to be any way of getting at it without the HTML wrapping. Oddly, although they’ve got the Department of Health’s horrible PDFs scraped, they haven’t got the Scottish Office although it’s relatively easy, so it looks like this wouldn’t be a 100% solution. However, their data cleaning has been much more effective – not surprising as I haven’t really been trying. This has some consequences – I’ve only just noticed that I’ve hugely underestimated Oliver Letwin’s gatekeepership, which should be 1.89 rather than 1.05. Along with his network degree of 2.67 (the eight highest) this suggests that he should be a highly desirable target for any lobbying you might want to do.