Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Get your unstructured data AI-ready in minutes (tonic.ai)
38 points by icoe on May 28, 2024 | hide | past | favorite | 7 comments
Hey HN! We're excited to announce the launch of Tonic Textual, the secure data lakehouse for LLMs.

Simply stated, Tonic Textual allows you to build generative AI systems on your own unstructured data without having to spend time extracting and standardizing your data. In minutes you can build automated, scalable unstructured data pipelines that extract, centralize, standardize, and enrich data from your documents into an AI-optimized format ready for embedding, fine-tuning, and ingesting into a vector database. While in-flight, we also scan for sensitive information and protect it via redaction or synthetic data replacement so your data is never at risk of leaking.

You can try Tonic Textual completely free today – sign up here: https://www.tonic.ai/textual

We'd love to hear your feedback and comments after you try it out!

Docs: https://docs.tonic.ai/textual Demo: https://www.youtube.com/watch?v=pCKqz_9IfIk



What types of files and databases does this integrate with? Most of my files are in S3 and many of them are messy PDFs made from scanning physical documents. Do I need to standardize them all to txt's or csv's or something to get them to work right?


right now we support txt, csv, tsv, docx, xlsx, pdf, png, tif, tiff, jpg, and jpeg filetypes. we support either local files or aws s3 as the document store where the files are read from. so we can work with your messy files in s3 as they are without any standardizing!


I'm curious what types of sensitive data you scan for, besides personally identifiable information? For example, how do you scan for IP that might be custom to an enterprise?


that's a good question. for this use case, we have a custom models feature where the user can define an entity type (types) they're interested in and then a custom ner model is trained to identify the specified type. for this, a combination of llms and user input are used to generate training data for the ner model.


This is a significant advancement for securely managing unstructured data for generative AI. The ability to build automated, scalable data pipelines in minutes while ensuring data protection is impressive. Amazing work!


The ability to leverage unstructured data in this way is a game changer! Unstructured Data has always been the elephant in the room when we talk about Data Protection in the enterprise. It is great to see someone (you guys!) tackling this head on, and even better; aligning it with LLMs.

The FWD View team have been busy trying out the trial and is excited to be able to start bringing this capability to its clients.

Well done Team Tonic!


Thanks. When we talk to customers who are getting started with generative AI, we usually hear the two biggest concerns are how to avoid embarrassing data leaks and how to move quickly. We sincerely hope this makes a dent in both for everyone.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: