Thursday, October 7, 2010

Extracting article summaries

I made my own version of this technique to extract article summaries.
Source code can be found here.

The idea is simple - extract the biggest text block - but performs well.
Here are some test results:

http://www.nytimes.com/2010/03/23/technology/23google.html?_r=1
The decision to shut down google.cn will have a limited financial impact on Google, which is based in Mountain View, Calif. China accounted for a small fraction of Google’s $23.6 billion in global revenue last year. Ads that once appeared on google.

http://www.theregister.co.uk/2010/09/29/novell_suse_appliance_1_1/
Being able to spin up appliance images for EC2 and spit them out onto the Amazon cloud meshes with Novell's EC2-based SUSE Linux licensing, which was announced back in August. Novell is only selling priority-level (24x7) support contract for SUSE Linux li

http://blog.sitescraper.net/2010/08/best-website-for-freelancers.html
However with Elance there is a high barrier to entry: you have to pass a test, receive a phone call to confirm your identity, and pay money for each job you bid on. Often I see jobs on Elance with no bids because it requires obscure experience - people we

No comments:

Post a Comment