Hacker Newsnew | past | comments | ask | show | jobs | submit | mdeichmann's commentslogin

Langfuse (https://langfuse.com) | Backend Engineer, Product Engineer, Design Engineer | Berlin Germany | in-person | Full-time I'm Max, co-founder and CTO of Langfuse (YC W23) - we're building open source LLMOps dev tooling (Github: https://github.com/langfuse/langfuse/). We started with observability and have branched out into more workflows over time (evals, prompt mgmt, playground, testing...). We have a bunch of traction and are looking for our fourth to sixth hire in scaling and building feature depth. We're hiring in person (4-5 days/week) in Berlin, Germany (salary ranges for each job 70k-130k, up to 0.35% equity).

We value quality in engineering at Langfuse: We try to find elegant solutions to complex engineering challenges, and we invested significant effort in enhancing our production setup (infra as code, top observability setup and more).

If you'd love shipping in open source, writing about what you work on and working hard with super interesting and sophisticated customers (devs) - reach out!

More info: https://langfuse.com/careers


Through one of such posts I found this blogpost - https://langfuse.com/blog/2024-12-langfuse-v3-infrastructure... which to me was very interesting. Maybe you should include it in your job description.


Are you open to sponsorship?


No.


Langfuse (https://langfuse.com) | Backend Engineer, Product Engineer, Design Engineer | Berlin Germany | in-person | Full-time

I'm Max, co-founder and CTO of Langfuse (YC W23) - we're building open source LLMOps dev tooling (Github: https://github.com/langfuse/langfuse/). We started with observability and have branched out into more workflows over time (evals, prompt mgmt, playground, testing...). We have a bunch of traction and are looking for our fourth to sixth hire in scaling and building feature depth. We're hiring in person (4-5 days/week) in Berlin, Germany (salary ranges for each job 70k-130k, up to 0.35% equity).

We value quality in engineering at Langfuse: We try to find elegant solutions to complex engineering challenges, and we invested significant effort in enhancing our production setup (infra as code, top observability setup and more).

If you'd love shipping in open source, writing about what you work on and working hard with super interesting and sophisticated customers (devs) - reach out!

More info: https://langfuse.com/careers


Thank you! If these builders have some feedback to share, ask them to reach out to us :)


Langfuse (YC W23) | https://langfuse.com | Full-Time | Berlin, Germany | on-site | LLM Observability and Analytics

Langfuse is open source [1] observability and analytics tool for LLM applications — think Amplitude and Datadog for LLM apps. Our users use Langfuse to understand what happens in production and use our insights to improve their applications. We have built a number of LLM applications during the last YC winter batch and realized how hard it is to debug and improve them and move beyond an MVP.

Details on the job:

- We work in-person in our office in Berlin, Germany - We heavily use the T3 stack [2] (Nextjs, Prisma, Tailwind, tRPC, shadcn) and have client SDKs in TS and Python and expect you to have experience with full-stack Typescript projects. - You will work on topics such as improving DX on the SDKs, think about and implement architecture improvements, or re-think how we illustrate LLM traces in our UI. - We want to build a tool that is recommended here on HN: you can build a tool you would want to use yourself.

Please see more details here: https://langfuse.com/careers or reach out directly to me: [email protected]

[1] https://github.com/langfuse/langfuse

[2] https://create.t3.gg/


Thanks a lot! We see teams adopt Langfuse quite early already. Say you have one or two engineers working on a rather complex LLM feature, they look for a solution like Langfuse already in a test environment before going to production. The majority observes their LLM features in production though. We dont see test-driven development as much but we do think that model and rule based eval will become more important in the future and CIs will only pass if a certain score was achieved.


This reads like a book, thank you so much for putting this together!

> About value prop: Thanks for the feedback! We are already trying to be as vocal about it as possible by writing great docs etc. but can probably do better.

> PLG & OSS: thanks for the hint, we will be careful around managing deployments within customer VPCs.

> Pricing: Currently picked storage as the first metric to price on as this varies a lot across users. Some use langfuse to track complex embedding processes with a lot of context, others just simple chat messages with relatively low-context, low-value events.

> OTel: We looked into it but did not go into all the details. We wanted to have a product out there fast and liked the experience of e.g. Posthog SDKs. I might reach out to you concerning this topic after investing more time on it. Thanks for the offer!

> OLAP: Agree, i also learned to tackle scaling issues once they appear and so far we are good. Interesting that Supabase has no horizontal scaling. This would be one of the main reasons to use it IMO.


Re: Supabase and timescaledb.

Just want to make a bit more clear, Supabase has the ability to do some distribution via replication, but it isn't a true multi-master DB.

Timescaledb does support a multi-node config (https://docs.timescale.com/self-hosted/latest/multinode-time...) on top of postgres but that isn't in the open-source apache-licensed version, instead it is only in Timescales's community BSL version which isn't license compatible with supabase

And yeah, please don't hesitate to reach out in regards to OTel... lots of opportunity but also not as simple ;)


Thanks for the suggestion. We love tremor as it perfectly fits into our React/Tailwind setup. Cubeis great for collecting data from multiple resources, caching aggregates, and providing an API to call from our React FE. I think this could be a solution for the future in case we run into performance issues or end up having data stored in different databases. I am rather wondering how we can provide our users with a DD like dashboard experience. We would love to provide many different graphs, the ability to select and filter data, maybe even SQL like queries from the FE.


Thank you so much, fully share your sentiment on this and aligned our domain language to OpenTelemetry. Currently users add lots of metadata and configuration details to the trace by manually instrumenting it using the SDKs (or via Langchain integration). We are thinking about integrating OpenTelemetry, as this would be a step function on making integrations with apps easier. However, hadn't had the time yet to figure out how capture all the metadata that's relevant as context to the trace.


Makes sense! If you're curious, I added an autoinstrumentation library for openai's python client here: https://github.com/cartermp/opentelemetry-instrument-openai-...

The main challenge I see is that since there's no standard that each LLM has for inputs/outputs (let alone retrieval APIs!) any kind of automatic instrumentation will need to have a bunch of adapters. I suppose LangChain helps here, but even then with so many folks ripping it out for production you're still in the same place.

Happy to collaborate on any design thinking for how to incorporate OTel support!


Yes, we were thinking about the lack of standards as well. I would be super happy to have a design discussion around the topic, i will reach out to you.


This is Max, one of the co-founders. We appreciate existing observability tools as they have saved us so much time in the past already. Excited to get your view on this! We've found many observability demands to be quite different when working on LLM applications. Mainly: Unpredictable input (users input free-form text that cannot be fully tested for), control flow highly dynamic when running on the textual output of a previous step and quality of output is not known at runtime (for the application it is just text). Many teams read manually through the LLM inputs and outputs to get a feeling for correctness or ask for user feedback. In addition, currently working on abstraction for model-based evals to make it simple to try which one works best for a use case and automatically run it on all production prompts/completions. One user described the difference to be that they use observability usually to know that nothing is going wrong whereas they use Langfuse many hours per day to understand how to best improve the application and navigate cost/latency/quality trade offs.


Hi, this is Max - one of the founders of Langfuse and super excited to show Langfuse to HN today. Thanks a lot for the suggestion. I had not heard of Tinybird but it seems like a great product. It could be valuable to use their materialized views to calculate aggregates for our analytics UI. We will need to discuss whether we can use them as they are not open source. However, for anyone reading this, they use Clickhouse under the hood and have created a knowledge base (https://github.com/tinybirdco/clickhouse_knowledge_base). I will browse it to learn more.


GPT wrappers [handshake emoji] ClickHouse wrappers


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: