While doing the restructuring, I am testing in more depth as I change the code. And, I am trying to grok how the proxy options work. Specifically, how the proxy list works. Or, does not work. There is code in the main function that randomly selects proxies from a list, but it does not actually use the result. This was noticed in #292. It looks like the only place where the proxy list is used is when there is a proxy error during get_response()...in that case a new random proxy is chosen. But, there is no care taken to ensure that we do not get the same proxy that just errored out. It seems like problematic proxies should be blacklisted if there is that type of failure. Moreover, there is a check earlier in the code that does not allow the proxy list and proxy command line option to be used simultaneously. So, I can see no way that the proxy list has any functionality: if you do define the proxy list, then there is no way to kick off the general request with a proxy. I also noticed that the recursive get_response() call does not pass its return tuples back up the call chain. The existing code would never get any good from the switchover to an alternate proxy (even if the other problems mentioned above were resolved). For now, I am removing the support. This feature may be looked at after the restructuring is done.pull/350/head
parent
9e8e1a5aa4
commit
6114ca263d
@ -1,89 +0,0 @@
|
||||
import csv
|
||||
import requests
|
||||
import time
|
||||
from collections import namedtuple
|
||||
from colorama import Fore, Style
|
||||
|
||||
|
||||
def load_proxies_from_csv(path_to_list):
|
||||
"""
|
||||
A function which loads proxies from a .csv file, to a list.
|
||||
|
||||
Inputs: path to .csv file which contains proxies, described by fields: 'ip', 'port', 'protocol'.
|
||||
|
||||
Outputs: list containing proxies stored in named tuples.
|
||||
"""
|
||||
Proxy = namedtuple('Proxy', ['ip', 'port', 'protocol'])
|
||||
|
||||
with open(path_to_list, 'r') as csv_file:
|
||||
csv_reader = csv.DictReader(csv_file)
|
||||
proxies = [Proxy(line['ip'],line['port'],line['protocol']) for line in csv_reader]
|
||||
|
||||
return proxies
|
||||
|
||||
|
||||
def check_proxy(proxy_ip, proxy_port, protocol):
|
||||
"""
|
||||
A function which test the proxy by attempting
|
||||
to make a request to the designated website.
|
||||
|
||||
We use 'wikipedia.org' as a test, since we can test the proxy anonymity
|
||||
by check if the returning 'X-Client-IP' header matches the proxy ip.
|
||||
"""
|
||||
full_proxy = f'{protocol}://{proxy_ip}:{proxy_port}'
|
||||
proxies = {'http': full_proxy, 'https': full_proxy}
|
||||
try:
|
||||
r = requests.get('https://www.wikipedia.org',proxies=proxies, timeout=4)
|
||||
return_proxy = r.headers['X-Client-IP']
|
||||
if proxy_ip==return_proxy:
|
||||
return True
|
||||
else:
|
||||
return False
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
|
||||
def check_proxy_list(proxy_list, max_proxies=None):
|
||||
"""
|
||||
A function which takes in one mandatory argument -> a proxy list in
|
||||
the format returned by the function 'load_proxies_from_csv'.
|
||||
|
||||
It also takes an optional argument 'max_proxies', if the user wishes to
|
||||
cap the number of validated proxies.
|
||||
|
||||
Each proxy is tested by the check_proxy function. Since each test is done on
|
||||
'wikipedia.org', in order to be considerate to Wikipedia servers, we are not using any async modules,
|
||||
but are sending successive requests each separated by at least 1 sec.
|
||||
|
||||
Outputs: list containing proxies stored in named tuples.
|
||||
"""
|
||||
print((Style.BRIGHT + Fore.GREEN + "[" +
|
||||
Fore.YELLOW + "*" +
|
||||
Fore.GREEN + "] Started checking proxies."))
|
||||
working_proxies = []
|
||||
|
||||
# If the user has limited the number of proxies we need,
|
||||
# the function will stop when the working_proxies
|
||||
# loads the max number of requested proxies.
|
||||
if max_proxies != None:
|
||||
for proxy in proxy_list:
|
||||
if len(working_proxies) < max_proxies:
|
||||
time.sleep(1)
|
||||
if check_proxy(proxy.ip,proxy.port,proxy.protocol) == True:
|
||||
working_proxies.append(proxy)
|
||||
else:
|
||||
break
|
||||
else:
|
||||
for proxy in proxy_list:
|
||||
time.sleep(1)
|
||||
if check_proxy(proxy.ip,proxy.port,proxy.protocol) == True:
|
||||
working_proxies.append(proxy)
|
||||
|
||||
if len(working_proxies) > 0:
|
||||
print((Style.BRIGHT + Fore.GREEN + "[" +
|
||||
Fore.YELLOW + "*" +
|
||||
Fore.GREEN + "] Finished checking proxies."))
|
||||
return working_proxies
|
||||
|
||||
else:
|
||||
raise Exception("Found no working proxies.")
|
Loading…
Reference in new issue