In a perfectly normal Jamie Kenny comments thread, weird machines are seen, circling the skies of West Yorkshire. What’s up is that someone has been reading Richard Aldrich’s book on GCHQ (my five-part unread series of posts starts here and refers here).
Basically, the intelligence services maintain various capabilities to acquire electronic intelligence. As well as ground-based and maritime systems, these include the (temporarily reprieved) Nimrod R1s, the Shadow R1 based on the Beechcraft King Air, and a group of three Islander planes which seem to be based in the UK permanently. Aldrich describes these as being used to hoover up mobile phone traffic, and claims that voiceprint data collected in Afghanistan from Taliban radio intercepts is compared to the take in an effort to identify returnees.
However, he also suggests that the interception is of backhaul, rather than access, traffic. This is unlikely to yield much in the UK, as typical cell sites here were originally set up with between a pair and a dozen of E-1 (2Mbps) leased lines depending on planned capacity. For many years, Vodafone was BT’s single biggest customer. More recently, a lot of these have been replaced with fibre-optic cable, usually Gigabit Ethernet, quite often owned by the mobile operator. O2 got some microwave assets in the demerger from BT, so they may have used more. But in general, 3G operators have been pulling fibre since 2005 or thereabouts.
I would therefore tend to guess that it’s the access side. There are good reasons to do it this way – notably, requesting surveillance of someone’s phone via the Regulation of Investigatory Powers Act or alternatively via the alternative Dodgy Ex-Copper Down the Pub route usually requires that you know who you’re looking for quite specifically. That is to say, you need to know an identity that is likely to be in a given phone company’s database. Also, in some use-cases you might want imperfect but live coverage rather than a giant pile of data weeks later.
Listening in to radio doesn’t work like that, and could be done more secretly as well. I’m not particularly convinced by the idea of trying to match “voiceprints” – it sounds a bit Nemesysco, and in this case, the sampled voice would have first gone through whatever radio system the Taliban were using (which will have filtered out or just lost some information, and also added some noise and artefacts) and the target would have been filtered by the voice codec used on their phone, which throws away quite a bit, as well as by the network’s acoustic echo cancellation if the call is inbound. Also, they might be speaking a different language, which may or may not make a difference but won’t help.
Perhaps they have some magic, or perhaps this is a cover story. This happens to be the most difficult case of a speaker identification system – it’s identification rather than verification (so the number of possible alternatives scales with the size of the population), it’s an open set process (no bounds on who could be in either group), and it’s wholly text independent in both samples (no way of knowing what they are going to say, and no reason to think they will say it twice). There are methodologies based on high-level statistical analysis, but these require long-term sampling of a speaker to train the algorithm, which gives you a chicken-and-egg problem – you need to know that you’re listening to the same speaker before you can train the identification system. Of course, other sources of information could be used to achieve that, but this makes it progressively harder to operationalise.
Anyway, doing some background reading, it turns out that a) speech perception is a really interesting topic and b) the problem isn’t so much the quality of the intercept (because speech information is very robust to even deliberate interference) as just the concept of voiceprint identification in general. Out of Google-inspired serendipity, it turns out Language Log has covered this.
In lab conditions with realistic set-ups (i.e. different microphones etc. but not tactical conditions and not primarily with multiple languages), it looks like you could expect an equal-error rate, that is to say the point where the false-negative and false-positive rates are equal, of between 3% and 10%. However, the confidence intervals are sizeable (10 percentage points on an axis of 0-40 for the best performing cross-channel case). Obviously, a 3% false positive rate in an environment where there are very few terrorists is not that useful.
Permalink