right now we support txt, csv, tsv, docx, xlsx, pdf, png, tif, tiff, jpg, and jpeg filetypes. we support either local files or aws s3 as the document store where the files are read from. so we can work with your messy files in s3 as they are without any standardizing!
that's a good question. for this use case, we have a custom models feature where the user can define an entity type (types) they're interested in and then a custom ner model is trained to identify the specified type. for this, a combination of llms and user input are used to generate training data for the ner model.
I tried out Amazon Bedrock, and used Tonic Validate to do a head to head comparison of very simple RAG system's built using embedding and text models available in Amazon Bedrock. I compared Amazon Titan's embedding and text models to Cohere's embedding and text models in RAG systems that employ Amazon Bedrock Knowledge Bases as the vector db and retrieval components of the system.
Is the staff (which includes you) is all volunteers, there’s so reason to expect or want people to have a “get shit done” attitude. You should all appreciate that you’re putting in any time together towards a shared goal.
They show that a decoder only transformer (which gpts are) are rnns with infinite hidden state size. Infinite hidden state size is a pretty strong thing! Sounds interesting to me.
Where we are today is a world where people do not generally worry about nuclear bombs being dropped. So seems like a pretty good outcome in that example.
The nuclear arms race lead to the cold war, not a "good outcome" IMO. It wasn't until nations started imposing those regulations that we got to the point we're at today with nuclear weapons.