From e0d2102810ab122ef6dba0f7be81edbd90b31d95 Mon Sep 17 00:00:00 2001
From: "Christopher K. Hoadley" <chris.hoadley@gmail.com>
Date: Tue, 24 Dec 2019 16:56:10 -0600
Subject: [PATCH] Limit Number Of Parallel Requests To 20 (Instead Of Number Of
 Sites)

Previous code was allocating room for as many workers as there was sites.  The problem is that as the number of sites has grown, there has not been enough memory to allocate all of those requests.  In reality, having all of these requests in parallel does not really speed the processing: on my computer, the time to do a query for all of the sites was 1 minute 10 seconds before the change, and was 1 minute 9 seconds after the change.

Limiting the number of workers to 10 did increase the query time to 1 minute 17s.  I am not sure if that is just inconsistencies in network traffic, but I will leave the limit at 20 for now.

Note that with the limit of 20, my query detected more sites than it did previously.  It appears that some of the requests were failing on my computer because of memory reasons (as opposed to actual detection on the site).
---
 sherlock.py | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/sherlock.py b/sherlock.py
index cfd4f296..9e32eb07 100755
--- a/sherlock.py
+++ b/sherlock.py
@@ -180,9 +180,6 @@ def sherlock(username, site_data, verbose=False, tor=False, unique_tor=False,
     """
     print_info("Checking username", username, color)
 
-    # Allow 1 thread for each external service, so `len(site_data)` threads total
-    executor = ThreadPoolExecutor(max_workers=len(site_data))
-
     # Create session based on request methodology
     if tor or unique_tor:
         #Requests using Tor obfuscation
@@ -193,9 +190,16 @@ def sherlock(username, site_data, verbose=False, tor=False, unique_tor=False,
         underlying_session = requests.session()
         underlying_request = requests.Request()
 
-    # Create multi-threaded session for all requests. Use our custom FuturesSession that exposes response time
-    session = ElapsedFuturesSession(
-        executor=executor, session=underlying_session)
+    #Limit number of workers to 20.
+    #This is probably vastly overkill.
+    if len(site_data) >= 20:
+        max_workers=20
+    else:
+        max_workers=len(site_data)
+
+    #Create multi-threaded session for all requests.
+    session = ElapsedFuturesSession(max_workers=max_workers,
+                                    session=underlying_session)
 
     # Results from analysis of all sites
     results_total = {}