September 23, 2007 by yorksranter 4 comments

Pathetic Python Blogging

Dear Lazyweb – can anyone work out why I can’t get useful data out of this page with BeautifulSoup and Python 2.5?

The information is in an HTML table, enclosed by td tags nested in tr tags, and governed by three CSS classes, “flight-data”, “data-head” and “data-row2”. The latter pair are used only within the first. So you would think something like this would work:

for item in soup.findAll('td', {'class': 'flight-data'}): ...output.append(item)

The ellipsis is there to make the indentation obvious in this post. Where soup is naturally an instance of BeautifulSoup that’s been fed the webpage as a file-like object. But it doesn’t; it does grab some of the data, but it also grabs much of the webpage as raw html, including the header and a gaggle of javascript. And it’s slow, dammit. I can’t be too far off beam, because I’m successfully parsing another very similar website using a near-identical parse command.

I’ve tried various interlocking restrictions, and searching for both data-head and data-row2, but these usually find nothing.

4 Comments on "Pathetic Python Blogging"

arvind1
September 23, 2007 7:13 pm

[[td.string for td in tr.findAll(‘td’) if td.string] for tr in soup.findAll(‘tr’, {‘class’: ‘data-row2’})]

Reply
arvind1
September 23, 2007 7:54 pm

oh yes, as of speed, first:

soup = soup.find(‘table’, {‘id’: ‘dgArrivals’, ‘class’: ‘flight-data’})

(only ‘id’ is enough, though)

if you need more speed, you’d want to use lxml.

Reply
Alex
September 24, 2007 2:53 pm

Hey, I tried that; it eventually sporked the python interpreter, not before producing reams of unparsed html.

Reply
Alex
September 24, 2007 3:02 pm

OK; slight change; try again – works!!

Reply

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Mohammad Mazhari on X: "Reportedly, General Behrouz Esbati, a senior commander of the IRGC in Syria reveals: 🔹 We suffered a major defeat in Syria. 🔹 The Syrian people rose up to overthrow the corrupt regime. 🔹 Russia was one of the main factors
The Islamic Republic has avenged the killing of Hassan Nasrallah. Escalating the war in the region does not benefit the Axis of Resistance. @barbaraslavin1 @lrozen @Massoudmaalouf
Sentencing of two teenage arsonists will take place in summer | Bradford Telegraph and Argus
Land Registry documents show that Dalton Mills was bought by Bellissimo Investments Limited, of Northampton, for £10,000 in 2013. But after that company was dissolved via a compulsory strike-off in 2021, Dalton Mills automatically passed to the Crown. Two years later, the Treasury Solicitor disclaimed - or gave up - Dalton Mills to the Crown […]
Southport murderer bought weapons and ricin-making equipment two years before attack | Southport attack | The Guardian
officers found safety goggles, a pestle and mortar, funnels and a flask, which contained traces of ricin residue // it's basically useless but they all seem to do it because it was on t'telly. ironically it's become an accurate marker of being a terrorist, while back in the day it was the case that silly […]
Pokrovsk, Toretsk, Chasiv Yar, Velyka Novosilka: The hotspots on the front and why they are the most critical | Ukrainska Pravda
not great esp the flank at VN. also 155th Bde seems to have been trained by the French version of Benny Hill
Southport killer Axel Rudakubana admitted carrying a knife more than 10 times - BBC News
On Monday, Rudakubana also pleaded guilty to the possession of an al-Qaeda training manual - a terror offence. However, police have never treated his case as terror-related as he did not appear to follow a single ideology // so if you have more than one that's like not having one at all? very deep
How to set up God Mode in Windows 11 - and the wonders you can do with it | ZDNET
neat!
How Shanghai’s ‘western food’ became a cuisine all of its own
In 1973, having somehow survived the worst years of extremism, it resumed trading as the quintessential Shanghainese western restaurant and Russian soup was back on the menu...The Lea & Perrins Worcestershire sauce that inspired Shanghainese “hot soy sauce” was, in turn, derived from an Indian recipe that may have incorporated a Chinese condiment: actual soy […]
TikTok users posting cat videos do not threaten UK national security, minister says | TikTok | The Guardian
He added: “There is a different approach on government devices [on which] we’ve not been allowed to use TikTok for many years. The last Conservative government took the same position because there’s sensitive information on those devices, but for consumers who want to post videos of their cats or dancing, that doesn’t seem like a […]
Boeing_747-400__modified__LauncherOne_Spaceflight.pdf
On 5 July 2023, the SAIA received a copy of the ‘Failure Investigation and Final Report’ from the operator. The operator’s report is subject to export-control restrictions and the information within it cannot be included in this statement // wait record scratch wtf the Branson firework crash report is sekrit bekos ITAR? iirc the RAF […]
Rotorsport_UK_Cavalon_G-CKYT_12-24.pdf
1) Autogyros are bad. They are in fact even worse than helicopters. #FixedWingChauvinism is right. 2) Are those welds meant to be the famous MADE IN GERMANY? What? 3) I can understand it's tough to push back against Boeing or LockMart but the German crash investigator is squishing for some penny-ante autogyro maker? What the […]