More

backend-dev-33 · on April 10, 2025

After 100+ comments discussions here are quite difficult to read. So some kind of automatic summary of the popular discussions would be great. Somebody already did it: https://extraakt.com/

P.S. I am not the author: just found the link in the Llama 4 thread.

backend-dev-33 · on Oct 8, 2024

I was just told about this thing: https://aws.amazon.com/ec2/instance-types/g4/

one NVIDIA T4 GPU, 16 GB RAM, and, this is an EC2 instance, it means "install anything" all this for $0.526 /Hour

do you see any hidden gotchas?

backend-dev-33 · on Oct 8, 2024

UPDATE: how to do the same classification task using some hosting provider with GPU?

Let us discuss it here -> https://news.ycombinator.com/item?id=41768088

backend-dev-33 · on Oct 8, 2024

Thanks @Terretta

Well, the categories I use - do not overlap at all with the list of 1092 categories in Google Content Categories.

> it handles other classifications as well

hm... I highly doubt that. First of all - I do not see API to upload list of MY categories. Second: Can somebody with Google Cloud account try it? I have no account and when creating it - it asks for credit card...

backend-dev-33 · on Oct 7, 2024

thanks for Text Embedding Inference (never heard about it before)

> you can always run a zero shot pipeline in HF with a simple Flask/FastAPI application.

Yeah, sometimes things that are right in front of your nose, you don't see them. you mean this? https://huggingface.co/docs/api-inference/index

pilotneko · on Oct 10, 2024

Sorry, life got busy and I haven’t been able to get back to you. I was referring to pipelines in the Transformers package from Hugging Face. https://huggingface.co/docs/transformers/v4.45.2/en/main_cla...

These are essentially function calls for you to run pre-trained models. If you want to continue this conversation elsewhere, feel free to shoot me an e-mail. It’s just my username @ gmail.

backend-dev-33 · on Oct 7, 2024

Well, looking at Triton Inference Server + OpenVINO backend [1]...uff... as you said: "significant amount of development effort". Not easy to handle when you do it first time.

Is ONMX runtime + OpenVINO [2] a good idea ? Seems easier to install and to use: Pre-built Docker image and Python package... Not sure about performance (the hardware-related performance improvements - they are in OpenVINO anyway, right?).

[1] https://github.com/triton-inference-server/openvino_backend

[2] https://onnxruntime.ai/docs/execution-providers/OpenVINO-Exe...

kkielhofner · on Oct 8, 2024

Hah, it actually gets worse. What I was describing was the Triton ONNX backend with the OpenVINO execution accelerator[0] (not the OpenVINO backend itself). Clear as mud, right?

Your issue here is model performance with the additional challenge of offering it over a network socket across multiple requests and doing so in a performant manner.

Triton does things like dynamic batching[1] where throughput is increased significantly by aggregating disparate requests into one pass through the GPU.

A docker container for torch, ONNX, OpenVINO, etc isn't even natively going to offer a network socket. This is where people try to do things like rolling their own FastAPI API implementation (or something) only to discover it completely falls apart at any kind of load. That's development effort as well but it's a waste of time.

[0] - https://github.com/triton-inference-server/onnxruntime_backe...

[1] - https://docs.nvidia.com/deeplearning/triton-inference-server...

backend-dev-33 · on Oct 9, 2024

> additional challenge of offering it over a network socket across multiple requests and doing so in a performant manner.

@kkielhofner thanks a lot! now I realize it. I see, there is even GRPC support in Triton, so it make sense.

kkielhofner · on Oct 12, 2024

Make sure to check out the existing Triton client libraries:

https://github.com/triton-inference-server/client

backend-dev-33 · on April 9, 2023

is it possible to use SetFit through some command-line interface of some API?

backend-dev-33 · on July 13, 2019

And here Part1 as link: https://news.ycombinator.com/item?id=20256681

backend-dev-33 · on June 27, 2019

Well, these OSS developers created... 53 different commenting systems: https://lisakov.com/projects/open-source-comments/

Learning a new programming language or database? let's write a commenting system! )))

backend-dev-33 · on Aug 22, 2018

And Hetzner CX11 is €2.49 monthly and you have peace of mind regarding pricing (the price if FIXED).

Probably I have to rephrase the question to something like "Alternatives and side effects of Hetzner CX11 ?"

viraptor · on Aug 22, 2018

The per-hour pricing I mentioned is fixed for the micro instance. (Well... It can go lower :-) ) I'm taking about normal EC2 instances here, so that is a cx11 alternative.