AI Moderation!

2 years ago · 369c836a72
parent c338d2aea3
commit 369c836a72
5 changed files with 243 additions and 3 deletions
--- a/README.md
+++ b/README.md
@ -15,6 +15,7 @@
 <p align="center">
 <img src="https://i.imgur.com/KeLpDgj.png"/>
 <img  src="https://i.imgur.com/jLp1T0h.png"/>
+<img src="https://i.imgur.com/9XC95Lu.png"/>

 </p>

@ -22,10 +23,15 @@

 # Recent Notable Updates

+- **AI-BASED SERVER MODERATION** - GPT3Discord now has a built-in AI-based moderation system that can automatically detect and remove toxic messages from your server. This is a great way to keep your server safe and clean, and it's completely automatic and **free**! Check out the commands section to learn how to enable it!
+
+
 - **AUTOMATIC CHAT SUMMARIZATION!** - When the context limit of a conversation is reached, the bot will use GPT3 itself to summarize the conversation to reduce the tokens, and continue conversing with you, this allows you to chat for a long time!

+
 - Custom conversation openers from https://github.com/f/awesome-chatgpt-prompts were integrated into the bot, check out `/gpt converse opener_file`! The bot now has built in support to make GPT3 behave like various personalities, such as a life coach, python interpreter, interviewer, text based adventure game, and much more!

+
 - Autocomplete for settings and various commands to make it easier to use the bot!

 # Features
@ -39,6 +45,8 @@

 - **Redo Requests** - A simple button after the GPT3 response or DALL-E generation allows you to redo the initial prompt you asked. You can also redo conversation messages by just editing your message!

+- **Automatic AI-Based Server Moderation** - Moderate your server automatically with AI!
+
 - Automatically re-send your prompt and update the response in place if you edit your original prompt!

 - Async and fault tolerant, **can handle hundreds of users at once**, if the upstream API permits!
@ -55,6 +63,8 @@ These commands are grouped, so each group has a prefix but you can easily tab co

 `/help` - Display help text for the bot

+### (Chat)GPT3 Commands
+
 `/gpt ask <prompt> <temp> <top_p> <frequency penalty> <presence penalty>` Ask the GPT3 Davinci 003 model a question. Optional overrides available

 `/gpt converse` - Start a conversation with the bot, like ChatGPT
@ -73,10 +83,14 @@ These commands are grouped, so each group has a prefix but you can easily tab co

 `/gpt end` - End a conversation with the bot.

+### DALL-E2 Commands
+
 `/dalle draw <prompt>` - Have DALL-E generate images based on a prompt

 `/dalle optimize <image prompt text>` Optimize a given prompt text for DALL-E image generation.

+### System and Settings
+
 `/system settings` - Display settings for the model (temperature, top_p, etc)

 `/system settings <setting> <value>` - Change a model setting to a new value. Has autocomplete support, certain settings will have autocompleted values too.
@ -91,6 +105,19 @@ These commands are grouped, so each group has a prefix but you can easily tab co

 `/system clear-local` - Clear all the local dalleimages.

+### Automatic AI Moderation
+
+`/system moderations status:on` - Turn on automatic chat moderations. 
+
+`/system moderations status:off` - Turn off automatic chat moderations
+
+`/system moderations status:off alert_channel_id:<CHANNEL ID>` - Turn on moderations and set the alert channel to the channel ID you specify in the command.
+
+- The bot needs Administrative permissions for this, and you need to set `MODERATIONS_ALERT_CHANNEL` to the channel ID of a desired channel in your .env file if you want to receive alerts about moderated messages.
+- This uses the OpenAI Moderations endpoint to check for messages, requests are only sent to the moderations endpoint at a MINIMUM request gap of 0.5 seconds, to ensure you don't get blocked and to ensure reliability. 
+- The bot uses numerical thresholds to determine whether a message is toxic or not, and I have manually tested and fine tuned these thresholds to a point that I think is good, please open an issue if you have any suggestions for the thresholds!
+
+
 # Configuration

 All the model parameters are configurable inside discord. Type `/system settings` to view all the configurable parameters, and use `/system settings <param> <value>` to set parameters. 
@ -130,6 +157,8 @@ DALLE_ROLES="Admin,Openai,Dalle,gpt"
 # People with the roles in GPT_ROLES can use commands like /gpt ask or /gpt converse
 GPT_ROLES="openai,gpt"
 WELCOME_MESSAGE="Hi There! Welcome to our Discord server. We hope you'll enjoy our server and we look forward to engaging with you!" # This is a fallback message if gpt3 fails to generate a welcome message.
+# This is the channel that auto-moderation alerts will be sent to
+MODERATIONS_ALERT_CHANNEL="977697652147892304"
 ```

 **Permissions**
--- a/cogs/gpt_3_commands_and_converser.py
+++ b/cogs/gpt_3_commands_and_converser.py
@ -1,3 +1,4 @@
+import asyncio
 import datetime
 import json
 import re
@ -12,6 +13,7 @@ from pycord.multicog import add_to_group
 from models.deletion_service_model import Deletion
 from models.env_service_model import EnvService
 from models.message_model import Message
+from models.moderations_service_model import Moderation
 from models.user_model import User, RedoUser
 from models.check_model import Check
 from models.autocomplete_model import Settings_autocompleter, File_autocompleter
@ -60,6 +62,10 @@ class GPT3ComCon(discord.Cog, name="GPT3ComCon"):
        self.users_to_interactions = defaultdict(list)
        self.redo_users = {}
        self.awaiting_responses = []
+        self.moderation_queues = {}
+        self.moderation_alerts_channel = EnvService.get_moderations_alert_channel()
+        self.moderation_enabled_guilds = []
+        self.moderation_tasks = {}

        try:
            conversation_file_path = data_path / "conversation_starter_pretext.txt"
@ -243,8 +249,6 @@ class GPT3ComCon(discord.Cog, name="GPT3ComCon"):
            )

        # Close all conversation threads for the user
-        channel = self.bot.get_channel(self.conversation_threads[normalized_user_id])
-
        if normalized_user_id in self.conversation_threads:
            thread_id = self.conversation_threads[normalized_user_id]
            self.conversation_threads.pop(normalized_user_id)
@ -478,6 +482,13 @@ class GPT3ComCon(discord.Cog, name="GPT3ComCon"):
    # A listener for message edits to redo prompts if they are edited
    @discord.Cog.listener()
    async def on_message_edit(self, before, after):
+
+        # Moderation
+        if after.guild.id in self.moderation_queues and self.moderation_queues[after.guild.id] is not None:
+            # Create a timestamp that is 0.5 seconds from now
+            timestamp = (datetime.datetime.now() + datetime.timedelta(seconds=0.5)).timestamp()
+            await self.moderation_queues[after.guild.id].put(Moderation(after, timestamp))
+
        if after.author.id in self.redo_users:
            if after.id == original_message[after.author.id]:
                response_message = self.redo_users[after.author.id].response
@ -501,8 +512,9 @@ class GPT3ComCon(discord.Cog, name="GPT3ComCon"):
                    )
                    self.conversating_users[after.author.id].count += 1

+                print("Doing the encapsulated send")
                await self.encapsulated_send(
-                    after.author.id, edited_content, ctx, response_message
+                    user_id=after.author.id, prompt=edited_content, ctx=ctx, response_message=response_message
                )

                self.redo_users[after.author.id].prompt = after.content
@ -516,6 +528,12 @@ class GPT3ComCon(discord.Cog, name="GPT3ComCon"):

        content = message.content.strip()

+        # Moderations service
+        if message.guild.id in self.moderation_queues and self.moderation_queues[message.guild.id] is not None:
+            # Create a timestamp that is 0.5 seconds from now
+            timestamp = (datetime.datetime.now() + datetime.timedelta(seconds=0.5)).timestamp()
+            await self.moderation_queues[message.guild.id].put(Moderation(message, timestamp))
+
        conversing = self.check_conversing(
            message.author.id, message.channel.id, content
        )
@ -650,6 +668,7 @@ class GPT3ComCon(discord.Cog, name="GPT3ComCon"):
                    return

            # Send the request to the model
+            print("About to send model request")
            response = await self.model.send_request(
                new_prompt,
                tokens=tokens,
@ -949,6 +968,59 @@ class GPT3ComCon(discord.Cog, name="GPT3ComCon"):

        self.conversation_threads[user_id_normalized] = thread.id

+    @add_to_group("system")
+    @discord.slash_command(
+        name="moderations-test",
+        description="Used to test a prompt and see what threshold values are returned by the moderations endpoint",
+        guild_ids=ALLOWED_GUILDS,
+    )
+    @discord.option(
+        name="prompt",
+        description="The prompt to test",
+        required=True,
+    )
+    @discord.guild_only()
+    async def moderations_test(self, ctx: discord.ApplicationContext, prompt: str):
+        await ctx.defer()
+        response = await self.model.send_moderations_request(prompt)
+        await ctx.respond(response['results'][0]['category_scores'])
+        await ctx.send_followup(response['results'][0]['flagged'])
+
+    @add_to_group("system")
+    @discord.slash_command(
+        name="moderations",
+        description="The AI moderations service",
+        guild_ids=ALLOWED_GUILDS,
+    )
+    @discord.option(name="status", description="Enable or disable the moderations service for the current guild (on/off)", required = True)
+    @discord.option(name="alert_channel_id", description="The channel ID to send moderation alerts to", required=False)
+    @discord.guild_only()
+    async def moderations(self, ctx: discord.ApplicationContext, status: str, alert_channel_id: str):
+        await ctx.defer()
+
+        status = status.lower().strip()
+        if status not in ["on", "off"]:
+            await ctx.respond("Invalid status, please use on or off")
+            return
+
+        if status == "on":
+            # Create the moderations service.
+            self.moderation_queues[ctx.guild_id] = asyncio.Queue()
+            if self.moderation_alerts_channel or alert_channel_id:
+                moderations_channel = await self.bot.fetch_channel(self.moderation_alerts_channel if not alert_channel_id else alert_channel_id)
+            else:
+                moderations_channel = self.moderation_alerts_channel # None
+
+            self.moderation_tasks[ctx.guild_id] = asyncio.ensure_future(Moderation.process_moderation_queue(self.moderation_queues[ctx.guild_id], 1, 1, moderations_channel))
+            await ctx.respond("Moderations service enabled")
+
+        elif status == "off":
+            # Cancel the moderations service.
+            self.moderation_tasks[ctx.guild_id].cancel()
+            self.moderation_tasks[ctx.guild_id] = None
+            self.moderation_queues[ctx.guild_id] = None
+            await ctx.respond("Moderations service disabled")
+
    @add_to_group("gpt")
    @discord.slash_command(
        name="end",
--- a/models/env_service_model.py
+++ b/models/env_service_model.py
@ -120,3 +120,13 @@ class EnvService:
        except:
            welcome_message = "Hi there! Welcome to our Discord server!"
        return welcome_message
+
+    @staticmethod
+    def get_moderations_alert_channel():
+        # MODERATIONS_ALERT_CHANNEL is a channel id where moderation alerts are sent to
+        # The string can be blank but this is not advised. If a string cannot be found in the .env file, the below string is used.
+        try:
+            moderations_alert_channel = os.getenv("MODERATIONS_ALERT_CHANNEL")
+        except:
+            moderations_alert_channel = None
+        return moderations_alert_channel
--- a/models/moderations_service_model.py
+++ b/models/moderations_service_model.py
@ -0,0 +1,113 @@
+import asyncio
+import os
+import traceback
+from datetime import datetime
+from pathlib import Path
+
+import discord
+
+from models.openai_model import Model
+from models.usage_service_model import UsageService
+
+usage_service = UsageService(Path(os.environ.get("DATA_DIR", os.getcwd())))
+model = Model(usage_service)
+
+class Moderation:
+
+    def __init__(self, message, timestamp):
+        self.message = message
+        self.timestamp = timestamp
+
+    @staticmethod
+    def build_moderation_embed():
+        # Create a discord embed to send to the user when their message gets moderated
+        embed = discord.Embed(
+            title="Your message was moderated",
+            description="Our automatic moderation systems detected that your message was inappropriate and has been deleted. Please review the rules.",
+            colour=discord.Colour.red(),
+        )
+        # Set the embed thumbnail
+        embed.set_thumbnail(
+            url="https://i.imgur.com/2oL8JSp.png"
+        )
+        embed.set_footer(text="If you think this was a mistake, please contact the server admins.")
+        return embed
+
+    @staticmethod
+    def build_admin_moderated_message(moderated_message, response_message):
+
+        # Create a discord embed to send to the user when their message gets moderated
+        embed = discord.Embed(
+            title="A message was moderated in the "+moderated_message.message.guild.name+" server",
+            description=f"Message from {moderated_message.message.author.mention} was moderated: {moderated_message.message.content}",
+            colour=discord.Colour.yellow(),
+        )
+        # Get the link to the moderated message
+        link = f"https://discord.com/channels/{response_message.guild.id}/{response_message.channel.id}/{response_message.id}"
+        # set the link of the embed
+        embed.add_field(name="Moderated message link", value=link, inline=False)
+        return embed
+
+
+    @staticmethod
+    def determine_moderation_result(response):
+        HATE_THRESHOLD = 0.005
+        HATE_VIOLENCE_THRESHOLD = 0.05
+        SELF_HARM_THRESHOLD = 0.05
+        SEXUAL_THRESHOLD = 0.75
+        SEXUAL_MINORS_THRESHOLD = 0.1
+        VIOLENCE_THRESHOLD = 0.01
+        VIOLENCE_GRAPHIC_THRESHOLD = 0.1
+
+        thresholds = [HATE_THRESHOLD, HATE_VIOLENCE_THRESHOLD, SELF_HARM_THRESHOLD, SEXUAL_THRESHOLD, SEXUAL_MINORS_THRESHOLD, VIOLENCE_THRESHOLD, VIOLENCE_GRAPHIC_THRESHOLD]
+        threshold_iterator = ['hate','hate/threatening','self-harm','sexual','sexual/minors','violence','violence/graphic']
+
+        category_scores = response['results'][0]['category_scores']
+
+        flagged = response['results'][0]['flagged']
+
+        # Iterate the category scores using the threshold_iterator and compare the values to thresholds
+        for category, threshold in zip(threshold_iterator, thresholds):
+            if category_scores[category] > threshold:
+                return True
+
+        return False
+
+    # This function will be called by the bot to process the message queue
+    @staticmethod
+    async def process_moderation_queue(
+        moderation_queue, PROCESS_WAIT_TIME, EMPTY_WAIT_TIME, moderations_alert_channel
+    ):
+        while True:
+            try:
+                # If the queue is empty, sleep for a short time before checking again
+                if moderation_queue.empty():
+                    await asyncio.sleep(EMPTY_WAIT_TIME)
+                    continue
+
+                # Get the next message from the queue
+                to_moderate = await moderation_queue.get()
+
+                # Check if the current timestamp is greater than the deletion timestamp
+                if datetime.now().timestamp() > to_moderate.timestamp:
+                    response = await model.send_moderations_request(to_moderate.message.content)
+                    moderation_result = Moderation.determine_moderation_result(response)
+
+                    if moderation_result:
+                        # Take care of the flagged message
+                        response_message = await to_moderate.message.reply(embed=Moderation.build_moderation_embed())
+                        # Do the same response as above but use an ephemeral message
+                        await to_moderate.message.delete()
+
+                        # Send to the moderation alert channel
+                        if moderations_alert_channel:
+                            await moderations_alert_channel.send(embed=Moderation.build_admin_moderated_message(to_moderate, response_message))
+
+                else:
+                    await moderation_queue.put(to_moderate)
+                # Sleep for a short time before processing the next message
+                # This will prevent the bot from spamming messages too quickly
+                await asyncio.sleep(PROCESS_WAIT_TIME)
+            except:
+                traceback.print_exc()
+                pass
--- a/models/openai_model.py
+++ b/models/openai_model.py
@ -317,6 +317,22 @@ class Model:
                + str(response["error"]["message"])
            )

+    async def send_moderations_request(self, text):
+        # Use aiohttp to send the above request:
+        async with aiohttp.ClientSession() as session:
+            headers={
+                "Content-Type": "application/json",
+                "Authorization": f"Bearer {self.openai_key}",
+            }
+            payload = {"input": text}
+            async with session.post(
+                "https://api.openai.com/v1/moderations",
+                headers=headers,
+                json=payload,
+            ) as response:
+                return await response.json()
+
+
    async def send_summary_request(self, prompt):
        """
        Sends a summary request to the OpenAI API