Since the release of the gpt-3.5-turbo model by OpenAI, I have been wondering how companies are dealing with the token limit to gather insights from a large corpus of text without having to retrain the model.
To understand how it works, I created an open-source Jupyter notebook that creates a chatbot using vector embeddings (which is the workaround for the token limitation!). The chatbot connects Zendesk's knowledge base to ChatGPT, which can answer natural language questions using only the appropriate context.
I hope this can be helpful to anyone who is playing around with AI models in general :)
Have you considered using a proper vector database like Qdrant (https://qdrant.tech)? In the demo, it might be just fine to calculate the distance to all the articles in the knowledge base, but in a real case scenario, you'll sooner or later face scaling issues.
Also, libraries such as Langchain are integrated with both LLMs and vector databases, so you can prototype even faster.
I hadn't come across Qdrant before, but I will definitely check it out. Although I've been experimenting with milvus.io lately, these vector databases weren't on my radar when I first started exploring embeddings.
Langchain looks incredibly fascinating too! At first glance, it seems like it could be my go-to library for prototyping. I appreciate your suggestion.
To understand how it works, I created an open-source Jupyter notebook that creates a chatbot using vector embeddings (which is the workaround for the token limitation!). The chatbot connects Zendesk's knowledge base to ChatGPT, which can answer natural language questions using only the appropriate context.
I hope this can be helpful to anyone who is playing around with AI models in general :)