Limit Number Of Parallel Requests To 20 (Instead Of Number Of Sites)

Previous code was allocating room for as many workers as there was sites. The problem is that as the number of sites has grown, there has not been enough memory to allocate all of those requests. In reality, having all of these requests in parallel does not really speed the processing: on my computer, the time to do a query for all of the sites was 1 minute 10 seconds before the change, and was 1 minute 9 seconds after the change. Limiting the number of workers to 10 did increase the query time to 1 minute 17s. I am not sure if that is just inconsistencies in network traffic, but I will leave the limit at 20 for now. Note that with the limit of 20, my query detected more sites than it did previously. It appears that some of the requests were failing on my computer because of memory reasons (as opposed to actual detection on the site).
5 years ago · e0d2102810
parent 4b6d2c1166
commit e0d2102810
1 changed files with 10 additions and 6 deletions
--- a/sherlock.py
+++ b/sherlock.py
@ -180,9 +180,6 @@ def sherlock(username, site_data, verbose=False, tor=False, unique_tor=False,
    """
    print_info("Checking username", username, color)
    # Allow 1 thread for each external service, so `len(site_data)` threads total
    executor = ThreadPoolExecutor(max_workers=len(site_data))
    # Create session based on request methodology
    if tor or unique_tor:
        #Requests using Tor obfuscation
@ -193,9 +190,16 @@ def sherlock(username, site_data, verbose=False, tor=False, unique_tor=False,
        underlying_session = requests.session()
        underlying_request = requests.Request()
-    # Create multi-threaded session for all requests. Use our custom FuturesSession that exposes response time
+    #Limit number of workers to 20.
-    session = ElapsedFuturesSession(
+    #This is probably vastly overkill.
-        executor=executor, session=underlying_session)
+    if len(site_data) >= 20:
        max_workers=20
    else:
        max_workers=len(site_data)
    #Create multi-threaded session for all requests.
    session = ElapsedFuturesSession(max_workers=max_workers,
                                    session=underlying_session)
    # Results from analysis of all sites
    results_total = {}