I often find businesses hide their contact details behind layers of navigation. I guess they want to cut down their support costs.
This wastes my time so I use this snippet to automate extracting the available emails:
Example use:
This wastes my time so I use this snippet to automate extracting the available emails:
import sys from webscraping import common, download def get_emails(website, max_depth): """Returns a list of emails found at this website max_depth is how deep to follow links """ D = download.Download() return D.get_emails(website, max_depth=max_depth) if __name__ == '__main__': try: website = sys.argv[1] max_depth = int(sys.argv[2]) except: print 'Usage: %s <URL> <max depth>' % sys.argv[0] else: print get_emails(website, max_depth)
Example use:
>>> get_emails('http://www.sitescraper.net', 1)
['richard@sitescraper.net']
No comments:
Post a Comment