csv, powerpoint, image, mp3 support

1 year ago · 464f332915
parent 3ed55b556f
commit 464f332915
4 changed files with 48 additions and 4 deletions
--- a/README.md
+++ b/README.md
@ -24,7 +24,7 @@ SUPPORT SERVER FOR BOT SETUP: https://discord.gg/WvAHXDMS7Q (You can try out the

 # Recent Notable Updates

- **CUSTOM INDEXES** - This is a huge update. You can now upload files to your server and use them as custom context when asking GPT3 questions. You can also use links to use webpages as context, and you can even use discord channels, or your entire discord server's messages as context! Read more in the 'Custom Indexes' section below.
+- **CUSTOM INDEXES** - This is a huge update. You can now upload files to your server and use them as custom context when asking GPT3 questions. You can also use webpage links as context, images, full documents, csvs, powerpoints, audio files, and even **youtube videos**! Read more in the 'Custom Indexes' section below.

 # Features
 - **Directly prompt GPT3 with `/gpt ask <prompt>`**
@ -97,7 +97,18 @@ These commands are grouped, so each group has a prefix but you can easily tab co

 ### Custom Indexes Commands

-TODO
+`/index add file:<file> or link:<link>` - Use a document or use a link to create/add to your indexes. If you provide a youtube link, the transcript of the video will be used. If you provide a web url, the contents of the webpage will be used, if you provide an image, the image text will be extracted and used!
+
+`/index query query:<prompt>` - Query your current index for a given prompt. GPT will answer based on your current document/indedx
+
+`/index load index:<index>` - Load a previously created index to query
+
+`/index reset` - Reset and delete all of your saved indexes
+
+`/index add_discord channel:<discord channel>` - Create an add an index based on a discord channel
+
+`/index discord_backup` - Use the last 3000 messages of every channel on your discord server as an index
+

 ### System and Settings

@ -238,6 +249,15 @@ For example, if I wanted to change the number of images generated by DALL-E by d


 # Requirements
+**For OCR, and document functionalities**:
+`pip3 install torch==1.9.1+cpu torchvision==0.10.1+cpu -f https://download.pytorch.org/whl/torch_stable.html`
+or
+`python3.9 -m pip install torch==1.9.1+cpu torchvision==0.10.1+cpu -f https://download.pytorch.org/whl/torch_stable.html`
+
+**For audio extraction for indexing from .mp3 and .mp4 files**:
+`python3.9 -m pip install git+https://github.com/openai/whisper.git`
+
+**All other dependencies**:
 `python3.9 -m pip install -r requirements.txt`

 **I recommend using python 3.9!**
@ -319,6 +339,7 @@ python3.9 get-pip.py

 # Install project dependencies
 python3.9 -m pip install --ignore-installed PyYAML
+python3.9 -m pip install torch==1.9.1+cpu torchvision==0.10.1+cpu -f https://download.pytorch.org/whl/torch_stable.html
 python3.9 -m pip install -r requirements.txt
 python3.9 -m pip install .

@ -375,6 +396,7 @@ python3.9 -m pip install .
 With python3.9 installed and the requirements installed, you can run this bot anywhere. 

 Install the dependencies with:
+`pip3 install torch==1.9.1+cpu torchvision==0.10.1+cpu -f https://download.pytorch.org/whl/torch_stable.html`
 `python3.9 -m pip install -r requirements.txt`

 Then, run the bot with:
--- a/models/index_model.py
+++ b/models/index_model.py
@ -113,10 +113,23 @@ class Index_handler:
            os.environ["OPENAI_API_KEY"] = user_api_key
    
        try:
+            print(file.content_type)
            if file.content_type.startswith("text/plain"):
                suffix = ".txt"
            elif file.content_type.startswith("application/pdf"):
                suffix = ".pdf"
+            # Allow for images too
+            elif file.content_type.startswith("image/png"):
+                suffix = ".png"
+            elif file.content_type.startswith("image/"):
+                suffix = ".jpg"
+            elif "csv" in file.content_type:
+                suffix = ".csv"
+            elif "vnd." in file.content_type:
+                suffix = ".pptx"
+            # Catch all audio files and suffix with "mp3"
+            elif file.content_type.startswith("audio/"):
+                suffix = ".mp3"
            else:
                await ctx.respond("Only accepts txt or pdf files")
                return
@ -128,7 +141,7 @@ class Index_handler:
            file_name = file.filename
            self.index_storage[ctx.user.id].add_index(index, ctx.user.id, file_name)

-            await ctx.respond("Index added to your indexes")
+            await ctx.respond("Index added to your indexes.")
        except Exception:
            await ctx.respond("Failed to set index")
            traceback.print_exc()
--- a/pyproject.toml
+++ b/pyproject.toml
@ -34,7 +34,12 @@ dependencies = [
  "gpt-index",
  "PyPDF2",
  "youtube_transcript_api",
+  "sentence-transformers",
+  "sentencepiece",
+  "protobuf",
+  "python-pptx",
 ]
+
 dynamic = ["version"]
 [project.scripts]
 gpt3discord = "gpt3discord:init"
--- a/requirements.txt
+++ b/requirements.txt
@ -12,4 +12,8 @@ flask==2.2.2
 beautifulsoup4==4.11.1
 gpt-index==0.3.4
 PyPDF2==3.0.1
-youtube_transcript_api==0.5.0
+youtube_transcript_api==0.5.0
+sentencepiece==0.1.97
+sentence-transformers==2.2.2
+protobuf==3.20.0
+python-pptx==0.6.21