Change "response_url" detection strategy completely.

Previously, there was a problem with sites that redirect an attempt to view a non-existing username to the main site. For example, if you try to go to https://devrant.com/users/dfoxxxxxxxxx (a user name that does not exist), then we get a redirect to the https://devrant.com/ root of the site. But, the "response_url" checking algorithm was only looking for the configured error URL being included in the response.  So, these sites always indicated that the username was not found.

Update the "response_url" detection method so that the request does not allow redirects. If we get a 200 response of some type, then the username has been found. However, if we get something like a 302, then we know that the username was not found as we are being redirected.

This whole method seems fragile, but I did exhaustively test all of the supported sites, and they all work.  So, this change is clearly an improvement.
pull/135/head
Christopher K. Hoadley 6 years ago
parent bb66d6a992
commit 65b38592c4

@ -208,13 +208,27 @@ def sherlock(username, site_data, verbose=False, tor=False, unique_tor=False, pr
if net_info["errorType"] == 'status_code': if net_info["errorType"] == 'status_code':
request_method = session.head request_method = session.head
if net_info["errorType"] == "response_url":
#Site forwards request to a different URL if username not
#found. Disallow the redirect so we can capture the
#http status from the original URL request.
allow_redirects = False
else:
#Allow whatever redirect that the site wants to do.
#The final result of the request will be what is available.
allow_redirects = True
# This future starts running the request in a new thread, doesn't block the main thread # This future starts running the request in a new thread, doesn't block the main thread
if proxy != None: if proxy != None:
proxies = {"http": proxy, "https": proxy} proxies = {"http": proxy, "https": proxy}
future = request_method( future = request_method(url=url, headers=headers,
url=url, headers=headers, proxies=proxies) proxies=proxies,
allow_redirects=allow_redirects
)
else: else:
future = request_method(url=url, headers=headers) future = request_method(url=url, headers=headers,
allow_redirects=allow_redirects
)
# Store future in data for access later # Store future in data for access later
net_info["request_future"] = future net_info["request_future"] = future
@ -290,9 +304,13 @@ def sherlock(username, site_data, verbose=False, tor=False, unique_tor=False, pr
exists = "no" exists = "no"
elif error_type == "response_url": elif error_type == "response_url":
error = net_info.get("errorUrl") # For this detection method, we have turned off the redirect.
# Checks if the redirect url is the same as the one defined in data.json # So, there is no need to check the response URL: it will always
if not error in r.url: # match the request. Instead, we will ensure that the response
# code indicates that the request was successful (i.e. no 404, or
# forward to some odd redirect).
if (r.status_code >= 200) and (r.status_code < 300):
#
print_found(social_network, url, response_time, verbose) print_found(social_network, url, response_time, verbose)
write_to_file(url, f) write_to_file(url, f)
exists = "yes" exists = "yes"

Loading…
Cancel
Save