permanent memory with pinecone ready for alpha

Kaveen Kumarasinghe 1 year ago
parent a8b209c212
commit 720fad0ee7

@ -21,9 +21,9 @@
A big shoutout to `CrypticHeaven-Lab` for hitting our first sponsorship goal!
**PERMANENT MEMORY FOR CONVERSATIONS WORK IS STILL UNDERWAY, APOLOGIES FOR THE DELAY, COMING SOON!**
# Recent Notable Updates
- **Permanent memory with embeddings and PineconeDB finished!** - An initial alpha version of permanent memory is now done! This allows you to chat with GPT3 infinitely and accurately, and save tokens, by using embeddings. *Please read the Permanent Memory section for more information!*
- **Multi-user, group chats with GPT3** - Multiple users can converse with GPT3 in a chat now, and it will know that there are multiple distinct users chatting with it!
@ -36,13 +36,10 @@ A big shoutout to `CrypticHeaven-Lab` for hitting our first sponsorship goal!
- Custom conversation openers from https://github.com/f/awesome-chatgpt-prompts were integrated into the bot, check out `/gpt converse opener_file`! The bot now has built in support to make GPT3 behave like various personalities, such as a life coach, python interpreter, interviewer, text based adventure game, and much more!
- Autocomplete for settings and various commands to make it easier to use the bot!
# Features
- **Directly prompt GPT3 with `/gpt ask <prompt>`**
- **Have conversations with the bot, just like chatgpt, with `/gpt converse`** - Conversations happen in threads that get automatically cleaned up!
- **Have long term, permanent conversations with the bot, just like chatgpt, with `/gpt converse`** - Conversations happen in threads that get automatically cleaned up!
- **DALL-E Image Generation** - Generate DALL-E AI images right in discord with `/dalle draw <prompt>`! It even supports multiple image qualities, multiple images, creating image variants, retrying, and saving images.
@ -122,6 +119,32 @@ These commands are grouped, so each group has a prefix but you can easily tab co
- This uses the OpenAI Moderations endpoint to check for messages, requests are only sent to the moderations endpoint at a MINIMUM request gap of 0.5 seconds, to ensure you don't get blocked and to ensure reliability.
- The bot uses numerical thresholds to determine whether a message is toxic or not, and I have manually tested and fine tuned these thresholds to a point that I think is good, please open an issue if you have any suggestions for the thresholds!
# Permanent Memory
Permanent memory has now been implemented into the bot, using the OpenAI Ada embeddings endpoint, and Pinecone DB.
PineconeDB is a vector database. The OpenAI Ada embeddings endpoint turns pieces of text into embeddings. The way that this feature works is by embedding the user prompts and the GPT3 responses, storing them in a pinecone index, and then retrieving the most relevant bits of conversation whenever a new user prompt is given in a conversation.
**You do NOT need to use pinecone, if you do not define a `PINECONE_TOKEN` in your `.env` file, the bot will default to not using pinecone, and will use conversation summarization as the long term conversation method instead.**
To enable permanent memory with pinecone, you must define a `PINECONE_TOKEN` in your `.env` file as follows (along with the other variables too):
```env
PINECONE_TOKEN="87juwi58-1jk9-9182-9b3c-f84d90e8bshq"
```
To get a pinecone token, you can sign up for a free pinecone account here: https://app.pinecone.io/ and click the "API Keys" section on the left navbar to find the key. (I am not affiliated with pinecone).
After signing up for a free pinecone account, you need to create an index in pinecone. To do this, go to the pinecone dashboard and click "Create Index" on the top right.
<img src="https://i.imgur.com/L9LXVE0.png"/>
Then, name the index `conversation-embeddings`, set the dimensions to `1536`, and set the metric to `DotProduct`:
<img src="https://i.imgur.com/zoeLsrw.png"/>
Moreover, an important thing to keep in mind is: pinecone indexes are currently not automatically cleared by the bot, so you will eventually need to clear the index manually through the pinecone website if things are getting too slow (although it should be a very long time until this happens). Pinecone indexes are keyed on the `metadata` field using the thread id of the conversation thread.
Permanent memory using pinecone is still in alpha, I will be working on cleaning up this work, adding auto-clearing, and optimizing for stability and reliability, any help and feedback is appreciated (**add me on Discord Kaveen#0001 for pinecone help**)! If at any time you're having too many issues with pinecone, simply remove the `PINECONE_TOKEN` line in your `.env` file and the bot will revert to using conversation summarizations.
# Configuration

@ -571,7 +571,6 @@ class GPT3ComCon(discord.Cog, name="GPT3ComCon"):
)
self.conversation_threads[after.channel.id].count += 1
print("Doing the encapsulated send")
await self.encapsulated_send(
id=after.channel.id,
prompt=edited_content,
@ -615,7 +614,7 @@ class GPT3ComCon(discord.Cog, name="GPT3ComCon"):
# GPT3 command
if conversing:
# Extract all the text after the !g and use it as the prompt.
prompt = content # dead store but its okay :3
prompt = content
await self.check_conversation_limit(message)
@ -642,6 +641,7 @@ class GPT3ComCon(discord.Cog, name="GPT3ComCon"):
await self.deletion_queue.put(deletion_message)
return
if message.channel.id in self.awaiting_thread_responses:
message = await message.reply(
"This thread is already waiting for a response from GPT3. Please wait for it to respond before sending another message."
@ -659,106 +659,41 @@ class GPT3ComCon(discord.Cog, name="GPT3ComCon"):
return
print("BEFORE PINECONE SERVICE CHECK")
if self.pinecone_service:
# The conversation_id is the id of the thread
conversation_id = message.channel.id
print("Conversation id is", conversation_id)
# Create an embedding and timestamp for the prompt
prompt = prompt.encode("ascii", "ignore").decode()
prompt_less_author = f"{prompt} <|endofstatement|>\n"
prompt_with_gpt_instead = f"GPTie: {prompt} <|endofstatement|>\n"
prompt = f"\n'{message.author.display_name}': {prompt} <|endofstatement|>\n"
#print("Creating embedding for ", prompt)
# Print the current timestamp
timestamp = int(str(datetime.datetime.now().timestamp()).replace(".", ""))
print("Timestamp is ", timestamp)
starter_conversation_item = EmbeddedConversationItem(str(self.conversation_threads[message.channel.id].history[0]), 0)
self.conversation_threads[message.channel.id].history[0] = starter_conversation_item
new_prompt_item = EmbeddedConversationItem(prompt, timestamp)
self.conversation_threads[conversation_id].history.append(new_prompt_item)
# Create and upsert the embedding for the conversation id, prompt, timestamp
embedding = await self.pinecone_service.upsert_conversation_embedding(self.model, conversation_id, prompt, timestamp)
embedding_prompt_less_author = await self.model.send_embedding_request(prompt_less_author)
# Now, build the new prompt by getting the 10 most similar with pinecone
similar_prompts = self.pinecone_service.get_n_similar(conversation_id, embedding_prompt_less_author, n=5)
# When we are in embeddings mode, only the pre-text is contained in self.conversation_threads[message.channel.id].history, so we
# can use that as a base to build our new prompt
prompt_with_history = []
prompt_with_history.append(self.conversation_threads[message.channel.id].history[0])
# Append the similar prompts to the prompt with history
prompt_with_history += [EmbeddedConversationItem(prompt, timestamp) for prompt, timestamp in similar_prompts]
# iterate UP TO the last 5 prompts in the history
for i in range(1, min(len(self.conversation_threads[message.channel.id].history), 3)):
prompt_with_history.append(self.conversation_threads[message.channel.id].history[-i])
# remove duplicates from prompt_with_history
prompt_with_history = list(dict.fromkeys(prompt_with_history))
# Sort the prompt_with_history by increasing timestamp
prompt_with_history.sort(key=lambda x: x.timestamp)
# Ensure that the last prompt in this list is the prompt we just sent (new_prompt_item)
if prompt_with_history[-1] != new_prompt_item:
try:
prompt_with_history.remove(new_prompt_item)
except ValueError:
pass
prompt_with_history.append(new_prompt_item)
prompt_with_history = "".join([item.text for item in prompt_with_history])
print("The prompt with history is", prompt_with_history)
self.awaiting_responses.append(message.author.id)
self.awaiting_thread_responses.append(message.channel.id)
self.conversation_threads[message.channel.id].count += 1
original_message[message.author.id] = message.id
await self.encapsulated_send(
message.channel.id,
prompt_with_history,
message,
)
return
self.awaiting_responses.append(message.author.id)
self.awaiting_thread_responses.append(message.channel.id)
original_message[message.author.id] = message.id
self.conversation_threads[message.channel.id].history.append(
f"\n'{message.author.display_name}': {prompt} <|endofstatement|>\n"
)
if not self.pinecone_service:
self.conversation_threads[message.channel.id].history.append(
f"\n'{message.author.display_name}': {prompt} <|endofstatement|>\n"
)
# increment the conversation counter for the user
self.conversation_threads[message.channel.id].count += 1
# Send the request to the model
# If conversing, the prompt to send is the history, otherwise, it's just the prompt
if self.pinecone_service or message.channel.id not in self.conversation_threads:
primary_prompt = prompt
else:
primary_prompt = "".join(
self.conversation_threads[message.channel.id].history
)
await self.encapsulated_send(
message.channel.id,
prompt
if message.channel.id not in self.conversation_threads
else "".join(self.conversation_threads[message.channel.id].history),
primary_prompt,
message,
)
def cleanse_response(self, response_text):
response_text = response_text.replace("GPTie:\n", "")
response_text = response_text.replace("GPTie:", "")
response_text = response_text.replace("GPTie: ", "")
response_text = response_text.replace("<|endofstatement|>", "")
return response_text
# ctx can be of type AppContext(interaction) or Message
async def encapsulated_send(
self,
@ -776,12 +711,80 @@ class GPT3ComCon(discord.Cog, name="GPT3ComCon"):
from_context = isinstance(ctx, discord.ApplicationContext)
tokens = self.usage_service.count_tokens(new_prompt)
try:
tokens = self.usage_service.count_tokens(new_prompt)
# This is the EMBEDDINGS CASE
if self.pinecone_service and not from_g_command:
# The conversation_id is the id of the thread
conversation_id = ctx.channel.id
# This is the NO-EMBEDDINGS-SUMMARIZE CASE
if (
# Create an embedding and timestamp for the prompt
new_prompt = prompt.encode("ascii", "ignore").decode()
prompt_less_author = f"{new_prompt} <|endofstatement|>\n"
user_displayname = ctx.user.name if isinstance(ctx, discord.ApplicationContext) else ctx.author.display_name
new_prompt = f"\n'{user_displayname}': {new_prompt} <|endofstatement|>\n"
# print("Creating embedding for ", prompt)
# Print the current timestamp
timestamp = int(str(datetime.datetime.now().timestamp()).replace(".", ""))
starter_conversation_item = EmbeddedConversationItem(
str(self.conversation_threads[ctx.channel.id].history[0]), 0)
self.conversation_threads[ctx.channel.id].history[0] = starter_conversation_item
new_prompt_item = EmbeddedConversationItem(new_prompt, timestamp)
self.conversation_threads[conversation_id].history.append(new_prompt_item)
# Create and upsert the embedding for the conversation id, prompt, timestamp
embedding = await self.pinecone_service.upsert_conversation_embedding(self.model, conversation_id,
new_prompt, timestamp)
embedding_prompt_less_author = await self.model.send_embedding_request(prompt_less_author) # Use the version of
# the prompt without the author's name for better clarity on retrieval.
# Now, build the new prompt by getting the X most similar with pinecone
similar_prompts = self.pinecone_service.get_n_similar(conversation_id, embedding_prompt_less_author,
n=self.model.num_conversation_lookback)
# When we are in embeddings mode, only the pre-text is contained in self.conversation_threads[message.channel.id].history, so we
# can use that as a base to build our new prompt
prompt_with_history = [self.conversation_threads[ctx.channel.id].history[0]]
# Append the similar prompts to the prompt with history
prompt_with_history += [EmbeddedConversationItem(prompt, timestamp) for prompt, timestamp in
similar_prompts]
# iterate UP TO the last X prompts in the history
for i in range(1, min(len(self.conversation_threads[ctx.channel.id].history), self.model.num_static_conversation_items)):
prompt_with_history.append(self.conversation_threads[ctx.channel.id].history[-i])
# remove duplicates from prompt_with_history
prompt_with_history = list(dict.fromkeys(prompt_with_history))
# Sort the prompt_with_history by increasing timestamp
prompt_with_history.sort(key=lambda x: x.timestamp)
# Ensure that the last prompt in this list is the prompt we just sent (new_prompt_item)
if prompt_with_history[-1] != new_prompt_item:
try:
prompt_with_history.remove(new_prompt_item)
except ValueError:
pass
prompt_with_history.append(new_prompt_item)
prompt_with_history = "".join([item.text for item in prompt_with_history])
new_prompt = prompt_with_history
tokens = self.usage_service.count_tokens(new_prompt)
# Summarize case
elif (
id in self.conversation_threads
and tokens > self.model.summarize_threshold
and not from_g_command
@ -822,7 +825,6 @@ class GPT3ComCon(discord.Cog, name="GPT3ComCon"):
return
# Send the request to the model
print("About to send model request")
response = await self.model.send_request(
new_prompt,
tokens=tokens,
@ -833,9 +835,7 @@ class GPT3ComCon(discord.Cog, name="GPT3ComCon"):
)
# Clean the request response
response_text = str(response["choices"][0]["text"])
response_text = response_text.replace("GPTie: ", "")
response_text = response_text.replace("<|endofstatement|>", "")
response_text = self.cleanse_response(str(response["choices"][0]["text"]))
if from_g_command:
# Append the prompt to the beginning of the response, in italics, then a new line
@ -850,7 +850,6 @@ class GPT3ComCon(discord.Cog, name="GPT3ComCon"):
)
# If the user is conversing, add the GPT response to their conversation history.
# Don't append to the history if we're using embeddings!
if id in self.conversation_threads and not from_g_command and not self.pinecone_service:
self.conversation_threads[id].history.append(
"\nGPTie: " + str(response_text) + "<|endofstatement|>\n"
@ -859,24 +858,22 @@ class GPT3ComCon(discord.Cog, name="GPT3ComCon"):
# Embeddings case!
elif id in self.conversation_threads and not from_g_command and self.pinecone_service:
conversation_id = id
print("Conversation id is", conversation_id)
# Create an embedding and timestamp for the prompt
response_text = "\nGPTie: " + str(response_text) + "<|endofstatement|>\n"
response_text = response_text.encode("ascii", "ignore").decode()
print("Creating embedding for ", response_text)
# Print the current timestamp
timestamp = int(str(datetime.datetime.now().timestamp()).replace(".", ""))
print("Timestamp is ", timestamp)
self.conversation_threads[conversation_id].history.append(EmbeddedConversationItem(response_text, timestamp))
# Create and upsert the embedding for the conversation id, prompt, timestamp
embedding = await self.pinecone_service.upsert_conversation_embedding(self.model, conversation_id,
response_text, timestamp)
print("Embedded the response")
# Cleanse
response_text = self.cleanse_response(response_text)
# If we don't have a response message, we are not doing a redo, send as a new message(s)
if not response_message:
@ -1141,20 +1138,22 @@ class GPT3ComCon(discord.Cog, name="GPT3ComCon"):
self.awaiting_responses.append(user_id_normalized)
self.awaiting_thread_responses.append(thread.id)
self.conversation_threads[thread.id].history.append(
f"\n'{ctx.user.name}': {opener} <|endofstatement|>\n"
)
if not self.pinecone_service:
self.conversation_threads[thread.id].history.append(
f"\n'{ctx.user.name}': {opener} <|endofstatement|>\n"
)
self.conversation_threads[thread.id].count += 1
await self.encapsulated_send(
thread.id,
opener
if thread.id not in self.conversation_threads
if thread.id not in self.conversation_threads or self.pinecone_service
else "".join(self.conversation_threads[thread.id].history),
thread_message,
)
self.awaiting_responses.remove(user_id_normalized)
self.awaiting_thread_responses.remove(thread.id)
self.conversation_thread_owners[user_id_normalized] = thread.id

@ -34,4 +34,4 @@ GPTie: [RESPONSE TO MESSAGE 1] <|endofstatement|>
GPTie: [RESPONSE TO MESSAGE 2] <|endofstatement|>
...
You're a regular discord user, be friendly, casual, and fun, speak with "lol", "haha", and other slang when it seems fitting, and use emojis in your responses in a way that makes sense, avoid repeating yourself at all costs.
You're a regular discord user, be friendly, casual, and fun, speak with "lol", "haha", and other slang when it seems fitting, and use emojis in your responses in a way that makes sense, avoid repeating yourself at all costs. Never say "GPTie" when responding.

@ -27,7 +27,7 @@ from models.message_model import Message
from models.openai_model import Model
from models.usage_service_model import UsageService
__version__ = "3.1.2"
__version__ = "4.0"
"""
The pinecone service is used to store and retrieve conversation embeddings.

@ -27,12 +27,17 @@ class Settings_autocompleter:
ctx: discord.AutocompleteContext,
): # Behaves a bit weird if you go back and edit the parameter without typing in a new command
values = {
"max_conversation_length": [str(num) for num in range(1,500,2)],
"num_images": [str(num) for num in range(1,4+1)],
"mode": ["temperature", "top_p"],
"model": ["text-davinci-003", "text-curie-001"],
"low_usage_mode": ["True", "False"],
"image_size": ["256x256", "512x512", "1024x1024"],
"summarize_conversastion": ["True", "False"],
"summarize_conversation": ["True", "False"],
"welcome_message_enabled": ["True", "False"],
"num_static_conversation_items": [str(num) for num in range(5,20+1)],
"num_conversation_lookback": [str(num) for num in range(5,15+1)],
"summarize_threshold": [str(num) for num in range(800, 3500, 50)]
}
if ctx.options["parameter"] in values.keys():
return [value for value in values[ctx.options["parameter"]]]

@ -56,6 +56,8 @@ class Model:
self._summarize_threshold = 2500
self.model_max_tokens = 4024
self._welcome_message_enabled = True
self._num_static_conversation_items = 6
self._num_conversation_lookback = 10
try:
self.IMAGE_SAVE_PATH = os.environ["IMAGE_SAVE_PATH"]
@ -81,6 +83,32 @@ class Model:
# Use the @property and @setter decorators for all the self fields to provide value checking
@property
def num_static_conversation_items(self):
return self._num_static_conversation_items
@num_static_conversation_items.setter
def num_static_conversation_items(self, value):
value = int(value)
if value < 3:
raise ValueError("num_static_conversation_items must be >= 3")
if value > 20:
raise ValueError("num_static_conversation_items must be <= 20, this is to ensure reliability and reduce token wastage!")
self._num_static_conversation_items = value
@property
def num_conversation_lookback(self):
return self._num_conversation_lookback
@num_conversation_lookback.setter
def num_conversation_lookback(self, value):
value = int(value)
if value < 3:
raise ValueError("num_conversation_lookback must be >= 3")
if value > 15:
raise ValueError("num_conversation_lookback must be <= 15, this is to ensure reliability and reduce token wastage!")
self._num_conversation_lookback = value
@property
def welcome_message_enabled(self):
return self._welcome_message_enabled
@ -190,9 +218,9 @@ class Model:
value = int(value)
if value < 1:
raise ValueError("Max conversation length must be greater than 1")
if value > 30:
if value > 500:
raise ValueError(
"Max conversation length must be less than 30, this will start using credits quick."
"Max conversation length must be less than 500, this will start using credits quick."
)
self._max_conversation_length = value
@ -337,6 +365,7 @@ class Model:
try:
return response["data"][0]["embedding"]
except Exception as e:
print(response)
traceback.print_exc()
return

Loading…
Cancel
Save