Breaking News
Loading...
Saturday 30 May 2015

[Tutorial] Scrape proxies using Scrapebox


This is a method which I used to scape proxies back when I couldn't afford to buy proxies for google harvesting.
While google has become a lot stricter when it comes to banning public proxies this method should get enough proxies to do some harvesting.

1. First of all you need some proxies to start with. So head over to BHW proxy list subforum and grab all the proxy lists which have been added today and yesterday. Load them in the scrapebox proxy manager and start testing them. I use 100 connections and timeout 30 seconds for this test however you may want to adjust this depending on your internet connection.

2. Once that is done save all the google passed proxies and put them in the harvester. After that crate a txt file which contains
Code:
"%KW%"
and then use it as a custom footprint. Also set the harvester to scrape URLs which are created/updated in the last 24 hours. You can scrape without proxies this time because there won't be so many searches that your ip would get banned.

3. Now take all of the harvested URLs and save them to a txt file. Then open the proxy harvester and press ''Add source'' → ''By importing a list of urls''. Now harvest proxies from all of the URLs.

4. Now you have a bunch of proxies and you need to test them. To make this process quicker in the proxy manager configuration check off ''No google test''. I used 200 connections and 30s timeout. You may want to use lower connection count if your internet connection speed isn't good. Since there are a lot of proxies in your list checking will take quite a while. I checked 40k proxies and it took me a bit less than 1 hour.

5. Now keep only the anonymous proxies and save them in a file. If you don't want to use these proxies for search engine scraping then now you are done. If you do then go into proxy manager configuration and enable testing against google and set the connection count to 50 or less and set the timeout to 45s. Now check the proxies.

6. Now you have a list of proxies which you can use for scraping. However remember to keep the anonymous proxy list. After scraping for a while some of the proxies will get banned. Then you take the anonymous proxy list and recheck the proxies – some proxies which were previously banned can now be unbanned.

BONUS
If proxies from blackhatworld aren't enough here is a list of proxies which get updated every day. You can use these lists in scrapebox proxy harvester.

Code:
http://checkerproxy.net/all_proxy
http://www.ultraproxies.com/anonymous.html?datatype=Medium
http://www.ultraproxies.com/high-anonymous.html?datatype=High
http://www.ultraproxies.com/https.html?datatype=HTTPS=1
http://www.ultraproxies.com/http.html?datatype=HTTP=1
http://proxyserverlist.blogspot.com/
http://www.scrapeboxproxies.net/
http://www.pr0xies.org/
http://new-fresh-proxies.blogspot.com/
http://ssl-proxy-server.blogspot.com/
http://proxies.my-proxy.com/proxy-list-s2.html
http://proxies.my-proxy.com/proxy-list-s1.html
http://www.freeproxy.ch/proxylight.txt
http://proxy-level.blogspot.com/
I hope that you find this useful. Let me know if you have any questions.

0 comments:

Post a Comment

 
Toggle Footer