> "Do you mean to ask if I have a cutoff date for the data I was trained on? If so, the answer is yes. My training data includes text and code from various sources, and the most recent data I was trained on was from July 2023."
That can be true if it is using “tools” [1] and/or retrieval augmented generation. Something doesn’t have to be in the training set for it to be returned to you and used in generation as long as the model knows that a particular tool will be useful in responding to a particular prompt.
[1] This is what people call plugins that provide additional context to a gpt model
They (Google) are probably using tools in a different way. I would imagine if you ask Bard/Gemini something, it also does a google search at the same time and provides those results as a potential context that the chat bot can use to answer with. So it does a google search every question but doesn't always use it.
With chatGPT it only uses the tools if it thinks it needs it. So if it needs to do a search it will have to respond with do a search function, which then has to go do a search and then it provides that as context to the chatbot which then can respond from that data.
I think this is possibly true, but if it is, it blows GPT-4s use of "tools" out of the water. GPT4 browsing the web is much slower and doesn't feel as well-integrated. It feels about the same speed as me opening the page myself and reading it. Whatever Gemini did, it was significantly faster.
I don't know how they've specifically done it, either, but this is an area where Google has a ridiculous advantage over pure play AI shops. It's highly likely they have architected it for use cases like this from the outset, since the primary application of Gemini will be within Google's own products. They'll publish APIs, of course, and embed within Vertex AI on Google Cloud, but since the primary utility of Gemini will be to improve Search, Maps, Travel, Youtube, etc, I'd imagine they had a first class business requirement from the beginning along the lines of "must be easy to plug into existing Google data sources & products."
When Bard inserts that information unasked (as in something like "I'm sorry but I don't have that information due my training data cutoff being ...") It may quote other later dates. I got a response with "October 2023" at least once so far.
That error you got the first time means your query contains words that triggered the OpenAI content filter.
I agree with the other comments on the hallucinations in the content, hence why I did include a disclaimer at the bottom of every page. This project is something I did to just test out the idea of an encyclopedia-like UI on GPT.
I found myself frequently using ChatGPT to learn about new topics. I prefer it over Wikipedia because when I don't understand something, I can just ask it and it can clarify it for me until I get it. However, I found the chat UI to be unideal for this sort of thing, so I created this website using a UX that is aimed at educational use.
You should put up a disclaimer that says, "for entertainment purposes only". Calling this an encyclopedia and marketing it as educational just seems like a bad idea. The whole idea behind a traditional encyclopedia is usually that it is written and vetted by experts.
Honestly, you'd be way better off just using a basic rag architecture. When a user asks for a topic simply mirror the Wikipedia article and throw up a chat sidebar interface, so that the user can ask questions about it. At least by locking down your context window, you could minimize the number of hallucinations, which judging from some of the other comments sounds like it's already an issue.
Sorry guys, but it seems the server has crashed due to a sudden influx of traffic, and I'm attending a funeral service at the moment so I don't have access to my laptop. Will try to get the site back up asap!
Sorry to hear you lost someone close to you. I’ve prayed that Jesus Christ provides comfort to you and others involved. Do what you need to do and we’ll look at the site whenever it’s back up. No rush.
Edit: Forgot to add about the site crashing on heavy traffic. You might want to consider a CDN. Cloudflare is No 1 with a free option. StackPath was great but just shut down their CDN. I’m trying BunnyCDN now since it’s pennies per GB.
I'm just new to devops stuff because the things I usually build don't get that much traffic, and a single server did the job without the need for CDNs, load balancers, etc. I had to figure this stuff out just now over the past few hours to help the site cope with all the load.
The images are real images from the web. In most cases they match the topic you search for but in some cases they turn out to be unrelated. (I already have an idea on how to try and improve accuracy here).
As for the conclusions, you're right now that you point it out. I don't recall coming across conclusion sections in other encyclopedias I read. it's a format GPT (which is the underlying LLM I'm using) seems to like to use by default. I didn't disable that behavior because I guess a conclusion to wrap everything up for the reader isn't a bad thing?
TIL there are real blue bananas. I thought the image was generated by AI.
About conclusions, I guess it's the standard high school soulless esay that must have a conclusion at the bottom. I think it's better to remove the conclusions so it looks like Wikipedia, but if you like it you (obviously) can keep it.
I'll test out a conclusion-less format and see how my friends find it.
> I thought the image was generated by AI.
That was my initial plan, but I found AI-generated images to be more entertaining than informative, especially when the topic is new to the reader.
P.S. I don't know if you already tried this, but if you highlight any snippet of text in an entry, you can start a realtime chat about that text (without having to provide any context yourself).
And does something like this exist but for the web?