I've been building a voice+vision AI assistant for tradesman, industrial services, and DIY consumers the last 8 months.
It streams vision+voice+text+spatial data to a multimodal llm, to help users solve problems on the job. Basically a "Cursor" for tradesman. Most of my work now has been figuring out what to put in the context window, when, and how to provide the best deterministic response.
This assistant is a bridge between where we are now (little to no tech for these guys) and where the future will be (robotic automation). I think the in between stage will be a significantly longer timeframe than people realize, and I hope my app can provide value to these guys while they work. www.camerasearch.ai
It streams vision+voice+text+spatial data to a multimodal llm, to help users solve problems on the job. Basically a "Cursor" for tradesman. Most of my work now has been figuring out what to put in the context window, when, and how to provide the best deterministic response.
This assistant is a bridge between where we are now (little to no tech for these guys) and where the future will be (robotic automation). I think the in between stage will be a significantly longer timeframe than people realize, and I hope my app can provide value to these guys while they work. www.camerasearch.ai