Sunday, November 6, 2011

How to automatically find contact details

I often find businesses hide their contact details behind layers of navigation. I guess they want to cut down their support costs.

This wastes my time so I use this snippet to automate extracting the available emails:


import sys
from webscraping import common, download


def get_emails(website, max_depth):
    """Returns a list of emails found at this website
    
    max_depth is how deep to follow links
    """
    D = download.Download()
    return D.get_emails(website, max_depth=max_depth)
    

if __name__ == '__main__':
    try:
        website = sys.argv[1]
        max_depth = int(sys.argv[2])
    except:
        print 'Usage: %s <URL> <max depth>' % sys.argv[0]
    else:
        print get_emails(website, max_depth)

Example use:
>>> get_emails('http://www.sitescraper.net', 1)
['richard@sitescraper.net']

No comments:

Post a Comment