Thursday, April 15, 2010

Why Google App Engine

In the previous post I covered three alternative approaches to regularly scrape a website for a client, with the most common one being in the form of a web application. However hosting the web application on either my own or the clients server has problems.

My solution is to host the application on a neutral third party platform - Google App Engine (GAE). Here is my overview of deploying on GAE:

Pros:
  • provides a stable and consistent platform that I can use for multiple applications
  • both the customer and I can login and manage it, so we do not need to expose our servers
  • has generous free quotas, which I rarely exhaust
Cons:
  • only supports pure Python (or Java), so libraries that rely on C such as lxml are not supported (yet)
  • limitations on maximum job time and interacting with the database
  • have to trust Google with storing our scraped data
Often deploying on GAE works well for both the client and me, but it is not always practical/possible. I'm still looking for a silver bullet!

No comments:

Post a Comment