alextttty's comments

alextttty · on Oct 17, 2024

Point is to actually avoid hallucinations, Think I didnt focus on it as much as I wanted. To avoid hallucinations, make sure you use strict prompt when creating the chatbot/api key. Combination of rag + strict prompt will make sure that AI does not go out of bounds of it contextualised knowledge and avoids using LLM's internal knowledge.

az09mugen · on Oct 18, 2024

On what basis you think you will avoid hallucination(s) ? LLM will always hallucinate. People did already some research, among which : https://arxiv.org/abs/2409.05746 and https://arxiv.org/abs/2401.11817 for example.

alextttty · on Aug 14, 2024

Made action to gen csvs + json Working on the mui data table frontend right now

alextttty · on Aug 14, 2024

Indeed!

fsndz · on Aug 14, 2024

and it changes the dynamics of the generative AI space completely ! absolutely exciting to watch. I am bullish on generative AI even if I think scaling laws will generate diminishing returns going forward.

alextttty · on Aug 14, 2024

Amazing! Please dont hesitate to open an issue or a PR Will update our dataset and add it.

alextttty · on Aug 14, 2024

Think its just dominating the industry with cheaper access to gpu's probably even subsidised pricing

alextttty · on Aug 14, 2024

Exactly, always best to rely on your hardware, we need to collect/add more data from self hosted models on different gpus/clouds to compare

edgoode · on Aug 15, 2024

We have 15+ clouds you can try on our platform if you're looking for a place to compare inference engines

Email me at ed at shadeform dot ai if we can help

alextttty · on Aug 14, 2024

Honestly this is very very fresh, I was tinkering with hosting some models and wanted to optimize costs, tried few inference engines. Just want to collaborate on organizing data.

Agree, we will add a MUI table very soon. Also some charts.

I genuinely want someone to roast the way I did my benchmark process described there. Want something good enough yet easy to run.

alextttty · on Aug 14, 2024

Thank you! yeah great source. Do they track throughput for open source models? and inference engines Thats kind of data I want to find as well

Deathmax · on Aug 14, 2024

For throughput data, well, you need to actually run prompts to gather the data which racks up costs fast and performance can vary based on input prompt lengths. The two sources I use are OpenRouter's provider breakdown [1] and Unify's runtime benchmarks [2].

[1]: https://openrouter.ai/models/meta-llama/llama-3.1-70b-instru...

[2]: https://unify.ai/benchmarks/llama-3.1-70b-chat

alextttty · on Aug 14, 2024

Yeah we want to do exactly this, benchmark and add more data from differnt gpus/cloud providers, will appreciate your help a lot! There are many inference engines which can be tested and updated to find best inference methods

Whiteshadow12 · on Aug 14, 2024

Goodluck, companies would love that. Don't get depressed unlike my tool I think you should charge, that might keep you motivated to keep doing the work.

It's a lot of work, your target users is companies that use Runpod and AWS/GCP/Azure, not Fireworks and Together, they are in the game of selling tokens, you are selling the cost of running seconds on GPUs.

agcat · on Aug 14, 2024

This is true especially if you are deploying custom or fine-tuned models. Infact, for my company i also ran benchmark tests where we tested cold-starts, performance consistency, scalability, and cost-effectiveness for models like Llama2 7Bn & Stable Diffusion across different providers - https://www.inferless.com/learn/the-state-of-serverless-gpus... Can save months of evaluation time. Do give it a read.

P.S: I am from Inferless.

alextttty · on Aug 14, 2024

Thank you!