AI Moderation!

Kaveen Kumarasinghe 2 years ago
parent c338d2aea3
commit 369c836a72

@ -15,6 +15,7 @@
<p align="center">
<img src="https://i.imgur.com/KeLpDgj.png"/>
<img src="https://i.imgur.com/jLp1T0h.png"/>
<img src="https://i.imgur.com/9XC95Lu.png"/>
</p>
@ -22,10 +23,15 @@
# Recent Notable Updates
- **AI-BASED SERVER MODERATION** - GPT3Discord now has a built-in AI-based moderation system that can automatically detect and remove toxic messages from your server. This is a great way to keep your server safe and clean, and it's completely automatic and **free**! Check out the commands section to learn how to enable it!
- **AUTOMATIC CHAT SUMMARIZATION!** - When the context limit of a conversation is reached, the bot will use GPT3 itself to summarize the conversation to reduce the tokens, and continue conversing with you, this allows you to chat for a long time!
- Custom conversation openers from https://github.com/f/awesome-chatgpt-prompts were integrated into the bot, check out `/gpt converse opener_file`! The bot now has built in support to make GPT3 behave like various personalities, such as a life coach, python interpreter, interviewer, text based adventure game, and much more!
- Autocomplete for settings and various commands to make it easier to use the bot!
# Features
@ -39,6 +45,8 @@
- **Redo Requests** - A simple button after the GPT3 response or DALL-E generation allows you to redo the initial prompt you asked. You can also redo conversation messages by just editing your message!
- **Automatic AI-Based Server Moderation** - Moderate your server automatically with AI!
- Automatically re-send your prompt and update the response in place if you edit your original prompt!
- Async and fault tolerant, **can handle hundreds of users at once**, if the upstream API permits!
@ -55,6 +63,8 @@ These commands are grouped, so each group has a prefix but you can easily tab co
`/help` - Display help text for the bot
### (Chat)GPT3 Commands
`/gpt ask <prompt> <temp> <top_p> <frequency penalty> <presence penalty>` Ask the GPT3 Davinci 003 model a question. Optional overrides available
`/gpt converse` - Start a conversation with the bot, like ChatGPT
@ -73,10 +83,14 @@ These commands are grouped, so each group has a prefix but you can easily tab co
`/gpt end` - End a conversation with the bot.
### DALL-E2 Commands
`/dalle draw <prompt>` - Have DALL-E generate images based on a prompt
`/dalle optimize <image prompt text>` Optimize a given prompt text for DALL-E image generation.
### System and Settings
`/system settings` - Display settings for the model (temperature, top_p, etc)
`/system settings <setting> <value>` - Change a model setting to a new value. Has autocomplete support, certain settings will have autocompleted values too.
@ -91,6 +105,19 @@ These commands are grouped, so each group has a prefix but you can easily tab co
`/system clear-local` - Clear all the local dalleimages.
### Automatic AI Moderation
`/system moderations status:on` - Turn on automatic chat moderations.
`/system moderations status:off` - Turn off automatic chat moderations
`/system moderations status:off alert_channel_id:<CHANNEL ID>` - Turn on moderations and set the alert channel to the channel ID you specify in the command.
- The bot needs Administrative permissions for this, and you need to set `MODERATIONS_ALERT_CHANNEL` to the channel ID of a desired channel in your .env file if you want to receive alerts about moderated messages.
- This uses the OpenAI Moderations endpoint to check for messages, requests are only sent to the moderations endpoint at a MINIMUM request gap of 0.5 seconds, to ensure you don't get blocked and to ensure reliability.
- The bot uses numerical thresholds to determine whether a message is toxic or not, and I have manually tested and fine tuned these thresholds to a point that I think is good, please open an issue if you have any suggestions for the thresholds!
# Configuration
All the model parameters are configurable inside discord. Type `/system settings` to view all the configurable parameters, and use `/system settings <param> <value>` to set parameters.
@ -130,6 +157,8 @@ DALLE_ROLES="Admin,Openai,Dalle,gpt"
# People with the roles in GPT_ROLES can use commands like /gpt ask or /gpt converse
GPT_ROLES="openai,gpt"
WELCOME_MESSAGE="Hi There! Welcome to our Discord server. We hope you'll enjoy our server and we look forward to engaging with you!" # This is a fallback message if gpt3 fails to generate a welcome message.
# This is the channel that auto-moderation alerts will be sent to
MODERATIONS_ALERT_CHANNEL="977697652147892304"
```
**Permissions**

@ -1,3 +1,4 @@
import asyncio
import datetime
import json
import re
@ -12,6 +13,7 @@ from pycord.multicog import add_to_group
from models.deletion_service_model import Deletion
from models.env_service_model import EnvService
from models.message_model import Message
from models.moderations_service_model import Moderation
from models.user_model import User, RedoUser
from models.check_model import Check
from models.autocomplete_model import Settings_autocompleter, File_autocompleter
@ -60,6 +62,10 @@ class GPT3ComCon(discord.Cog, name="GPT3ComCon"):
self.users_to_interactions = defaultdict(list)
self.redo_users = {}
self.awaiting_responses = []
self.moderation_queues = {}
self.moderation_alerts_channel = EnvService.get_moderations_alert_channel()
self.moderation_enabled_guilds = []
self.moderation_tasks = {}
try:
conversation_file_path = data_path / "conversation_starter_pretext.txt"
@ -243,8 +249,6 @@ class GPT3ComCon(discord.Cog, name="GPT3ComCon"):
)
# Close all conversation threads for the user
channel = self.bot.get_channel(self.conversation_threads[normalized_user_id])
if normalized_user_id in self.conversation_threads:
thread_id = self.conversation_threads[normalized_user_id]
self.conversation_threads.pop(normalized_user_id)
@ -478,6 +482,13 @@ class GPT3ComCon(discord.Cog, name="GPT3ComCon"):
# A listener for message edits to redo prompts if they are edited
@discord.Cog.listener()
async def on_message_edit(self, before, after):
# Moderation
if after.guild.id in self.moderation_queues and self.moderation_queues[after.guild.id] is not None:
# Create a timestamp that is 0.5 seconds from now
timestamp = (datetime.datetime.now() + datetime.timedelta(seconds=0.5)).timestamp()
await self.moderation_queues[after.guild.id].put(Moderation(after, timestamp))
if after.author.id in self.redo_users:
if after.id == original_message[after.author.id]:
response_message = self.redo_users[after.author.id].response
@ -501,8 +512,9 @@ class GPT3ComCon(discord.Cog, name="GPT3ComCon"):
)
self.conversating_users[after.author.id].count += 1
print("Doing the encapsulated send")
await self.encapsulated_send(
after.author.id, edited_content, ctx, response_message
user_id=after.author.id, prompt=edited_content, ctx=ctx, response_message=response_message
)
self.redo_users[after.author.id].prompt = after.content
@ -516,6 +528,12 @@ class GPT3ComCon(discord.Cog, name="GPT3ComCon"):
content = message.content.strip()
# Moderations service
if message.guild.id in self.moderation_queues and self.moderation_queues[message.guild.id] is not None:
# Create a timestamp that is 0.5 seconds from now
timestamp = (datetime.datetime.now() + datetime.timedelta(seconds=0.5)).timestamp()
await self.moderation_queues[message.guild.id].put(Moderation(message, timestamp))
conversing = self.check_conversing(
message.author.id, message.channel.id, content
)
@ -650,6 +668,7 @@ class GPT3ComCon(discord.Cog, name="GPT3ComCon"):
return
# Send the request to the model
print("About to send model request")
response = await self.model.send_request(
new_prompt,
tokens=tokens,
@ -949,6 +968,59 @@ class GPT3ComCon(discord.Cog, name="GPT3ComCon"):
self.conversation_threads[user_id_normalized] = thread.id
@add_to_group("system")
@discord.slash_command(
name="moderations-test",
description="Used to test a prompt and see what threshold values are returned by the moderations endpoint",
guild_ids=ALLOWED_GUILDS,
)
@discord.option(
name="prompt",
description="The prompt to test",
required=True,
)
@discord.guild_only()
async def moderations_test(self, ctx: discord.ApplicationContext, prompt: str):
await ctx.defer()
response = await self.model.send_moderations_request(prompt)
await ctx.respond(response['results'][0]['category_scores'])
await ctx.send_followup(response['results'][0]['flagged'])
@add_to_group("system")
@discord.slash_command(
name="moderations",
description="The AI moderations service",
guild_ids=ALLOWED_GUILDS,
)
@discord.option(name="status", description="Enable or disable the moderations service for the current guild (on/off)", required = True)
@discord.option(name="alert_channel_id", description="The channel ID to send moderation alerts to", required=False)
@discord.guild_only()
async def moderations(self, ctx: discord.ApplicationContext, status: str, alert_channel_id: str):
await ctx.defer()
status = status.lower().strip()
if status not in ["on", "off"]:
await ctx.respond("Invalid status, please use on or off")
return
if status == "on":
# Create the moderations service.
self.moderation_queues[ctx.guild_id] = asyncio.Queue()
if self.moderation_alerts_channel or alert_channel_id:
moderations_channel = await self.bot.fetch_channel(self.moderation_alerts_channel if not alert_channel_id else alert_channel_id)
else:
moderations_channel = self.moderation_alerts_channel # None
self.moderation_tasks[ctx.guild_id] = asyncio.ensure_future(Moderation.process_moderation_queue(self.moderation_queues[ctx.guild_id], 1, 1, moderations_channel))
await ctx.respond("Moderations service enabled")
elif status == "off":
# Cancel the moderations service.
self.moderation_tasks[ctx.guild_id].cancel()
self.moderation_tasks[ctx.guild_id] = None
self.moderation_queues[ctx.guild_id] = None
await ctx.respond("Moderations service disabled")
@add_to_group("gpt")
@discord.slash_command(
name="end",

@ -120,3 +120,13 @@ class EnvService:
except:
welcome_message = "Hi there! Welcome to our Discord server!"
return welcome_message
@staticmethod
def get_moderations_alert_channel():
# MODERATIONS_ALERT_CHANNEL is a channel id where moderation alerts are sent to
# The string can be blank but this is not advised. If a string cannot be found in the .env file, the below string is used.
try:
moderations_alert_channel = os.getenv("MODERATIONS_ALERT_CHANNEL")
except:
moderations_alert_channel = None
return moderations_alert_channel

@ -0,0 +1,113 @@
import asyncio
import os
import traceback
from datetime import datetime
from pathlib import Path
import discord
from models.openai_model import Model
from models.usage_service_model import UsageService
usage_service = UsageService(Path(os.environ.get("DATA_DIR", os.getcwd())))
model = Model(usage_service)
class Moderation:
def __init__(self, message, timestamp):
self.message = message
self.timestamp = timestamp
@staticmethod
def build_moderation_embed():
# Create a discord embed to send to the user when their message gets moderated
embed = discord.Embed(
title="Your message was moderated",
description="Our automatic moderation systems detected that your message was inappropriate and has been deleted. Please review the rules.",
colour=discord.Colour.red(),
)
# Set the embed thumbnail
embed.set_thumbnail(
url="https://i.imgur.com/2oL8JSp.png"
)
embed.set_footer(text="If you think this was a mistake, please contact the server admins.")
return embed
@staticmethod
def build_admin_moderated_message(moderated_message, response_message):
# Create a discord embed to send to the user when their message gets moderated
embed = discord.Embed(
title="A message was moderated in the "+moderated_message.message.guild.name+" server",
description=f"Message from {moderated_message.message.author.mention} was moderated: {moderated_message.message.content}",
colour=discord.Colour.yellow(),
)
# Get the link to the moderated message
link = f"https://discord.com/channels/{response_message.guild.id}/{response_message.channel.id}/{response_message.id}"
# set the link of the embed
embed.add_field(name="Moderated message link", value=link, inline=False)
return embed
@staticmethod
def determine_moderation_result(response):
HATE_THRESHOLD = 0.005
HATE_VIOLENCE_THRESHOLD = 0.05
SELF_HARM_THRESHOLD = 0.05
SEXUAL_THRESHOLD = 0.75
SEXUAL_MINORS_THRESHOLD = 0.1
VIOLENCE_THRESHOLD = 0.01
VIOLENCE_GRAPHIC_THRESHOLD = 0.1
thresholds = [HATE_THRESHOLD, HATE_VIOLENCE_THRESHOLD, SELF_HARM_THRESHOLD, SEXUAL_THRESHOLD, SEXUAL_MINORS_THRESHOLD, VIOLENCE_THRESHOLD, VIOLENCE_GRAPHIC_THRESHOLD]
threshold_iterator = ['hate','hate/threatening','self-harm','sexual','sexual/minors','violence','violence/graphic']
category_scores = response['results'][0]['category_scores']
flagged = response['results'][0]['flagged']
# Iterate the category scores using the threshold_iterator and compare the values to thresholds
for category, threshold in zip(threshold_iterator, thresholds):
if category_scores[category] > threshold:
return True
return False
# This function will be called by the bot to process the message queue
@staticmethod
async def process_moderation_queue(
moderation_queue, PROCESS_WAIT_TIME, EMPTY_WAIT_TIME, moderations_alert_channel
):
while True:
try:
# If the queue is empty, sleep for a short time before checking again
if moderation_queue.empty():
await asyncio.sleep(EMPTY_WAIT_TIME)
continue
# Get the next message from the queue
to_moderate = await moderation_queue.get()
# Check if the current timestamp is greater than the deletion timestamp
if datetime.now().timestamp() > to_moderate.timestamp:
response = await model.send_moderations_request(to_moderate.message.content)
moderation_result = Moderation.determine_moderation_result(response)
if moderation_result:
# Take care of the flagged message
response_message = await to_moderate.message.reply(embed=Moderation.build_moderation_embed())
# Do the same response as above but use an ephemeral message
await to_moderate.message.delete()
# Send to the moderation alert channel
if moderations_alert_channel:
await moderations_alert_channel.send(embed=Moderation.build_admin_moderated_message(to_moderate, response_message))
else:
await moderation_queue.put(to_moderate)
# Sleep for a short time before processing the next message
# This will prevent the bot from spamming messages too quickly
await asyncio.sleep(PROCESS_WAIT_TIME)
except:
traceback.print_exc()
pass

@ -317,6 +317,22 @@ class Model:
+ str(response["error"]["message"])
)
async def send_moderations_request(self, text):
# Use aiohttp to send the above request:
async with aiohttp.ClientSession() as session:
headers={
"Content-Type": "application/json",
"Authorization": f"Bearer {self.openai_key}",
}
payload = {"input": text}
async with session.post(
"https://api.openai.com/v1/moderations",
headers=headers,
json=payload,
) as response:
return await response.json()
async def send_summary_request(self, prompt):
"""
Sends a summary request to the OpenAI API

Loading…
Cancel
Save