More

adsharma · 2026-04-11T14:26:31 1775917591

Pgserver is not maintained. Had to fork it as pgembed, compile recent versions and bundle BM25 and vector extensions.

odie5533 · 2026-04-11T17:04:30 1775927070

Thank you for the fork! Back in November I made a PR to pgserver for postgres 18, but your version with vector extensions is even better!

adsharma · 2026-04-11T01:58:16 1775872696

Pgembed is pglite for native code.

nextaccountic · 2026-04-11T02:48:53 1775875733

Which one? https://github.com/Ladybug-Memory/pgembed or https://github.com/faokunega/pg-embed ?

Both will either download or ship the postgres binary with your app, and run it in a separate process. Pglite is different, it's actually a library (they have stripped some parts of the postgres source code) and it could in principle be ported to be used as a native library (meaning, you link to it and run as a single program)

There's even a draft PR to do this, but it became stale

https://github.com/electric-sql/pglite/pull/842

Right now what exists is, grabbing the pglite wasm and running it on native code on wasi

https://github.com/f0rr0/pglite-oxide

adsharma · 2026-04-16T01:15:57 1776302157

Experimenting with the pglite way, this time using vlang here:

https://github.com/adsharma/vpg

  v run cmd/vpg_test.v

adsharma · 2026-04-11T14:31:06 1775917866

The first one. It's forked from pgserver.

Yes, what you get is a multi-server postgres under the covers. But for many users, the convenience of "uv pip install...", auto clean up via context manager is the higher order bit that takes them to SQLite.

Of course the bundled extensions are the key differentiator.

With enough distribution and market opportunities we (yes, there is a for profit company behind it) can invest in making it truly embedded.

trueno · 2026-04-11T07:19:30 1775891970

frick this gotta be the tenth time ive read about pglite here and i always go look for native library. shame pr became stale, webapps simply arent always my end game. thanks for putting pglite-oxide on my radar tho, this is an interesting almost-there i can tinker with.

nextaccountic · 2026-04-11T11:06:36 1775905596

I am always checking out this project too, looking for a native build. I am quite happy that they added support for extensions though

adsharma · 2026-04-05T17:32:28 1775410348

There is a question of what benefit would it bring even if its open sourced?

Static python can transpile to mojo. I haven't seen an argument on what concepts can only be expressed in mojo and not static python?

Borrow checker? For sure. But I'm not convinced most people need it.

Mojo therefore is a great intermediate programming language to transpile to. Same level of abstraction as golang and rust.

melodyogonna · 2026-04-06T07:49:01 1775461741

Python has a performance problem. Most people may not need it, but many people do. Languages like Rust and Go are heavily adopted by Python programmers either trying to understand low-level concepts or looking for something more performant.

And this is before we talk about the real selling point, which is enabling portable heterogenous compute.

adsharma · 2026-04-06T17:59:13 1775498353

This is why transpilers exist.

py2many can compile static python to mojo, apart from rust and golang.

Is it comprehensive? No. But it's deterministic. In the age of LLMs, with sufficient GPU you can either:

  * Get the LLM to enhance the transpiler to cover more of the language/stdlib
  * Accept the non-determinism for the hard cases

The way mojo solves it is by stuffing two languages into one. There are two ways to write a function for example.

I don't think the cost imposed by a transpiler is worse. In fact, it gets better over time. As the transpiler improves, you stop thinking about the generated code.

adsharma · 2026-04-05T17:14:40 1775409280

Amazing people still keep discovering it. And google search fails to surface working implementations.

"Python to rust transpiler" -> pyrs (py2many is a successor) "Python to go transpiler" -> pytago

Grumpy was written around a time when people thought golang would replace python. Google stopped supporting it a decade ago.

Even the 2022 project by a high school student got more SEO

https://github.com/py2many/py2many/issues/518

adsharma · 2026-04-05T16:25:44 1775406344

Cython uses C-API. This one doesn't.

adsharma · 2026-04-05T15:54:52 1775404492

Static python as described in this skill.

https://github.com/py2many/static-python-skill

adsharma · 2026-04-01T01:13:40 1775006020

Do you have LongMemEval numbers for pgvector vs pgvector+ hybrid search?

adsharma · 2026-03-25T04:24:11 1774412651

Thank you for the shout out! I looked into your benchmark setup a bit. Two things going on:

- Ladybug by default allocates 80% of the physical memory to the buffer pool. You can limit it. This wasn't the main reason.

- Much of the RSS is in ladybug native memory connected to the python connection object. I noticed that you keep the connection open between benchmark runs. For whatever reason, python is not able to garbage collect the memory.

We ran into similar lifetime issues with golang and nodejs bindings as well. Many race conditions where the garbage collector releases memory while another thread still has a reference to native memory. We now require that the connection be closed for the memory to be released.

  https://github.com/LadybugDB/ladybug/issues/320
  https://github.com/LadybugDB/go-ladybug/issues/7
  https://github.com/LadybugDB/ladybug-nodejs/pull/1

adsharma · 2026-03-22T16:15:25 1774196125

This is not just a random idea.

AlexNet -> Tansformers -> ChatGPT -> Claude Code -> Small LMs serving KBs

Large LLMs could have a role in efficiently producing such KBs.

adsharma · 2026-03-22T16:12:54 1774195974

So this thing is based on Kiwix, which is based on the ZIM file format.

In the meanwhile, wikipedia ships wikidata, which uses RDF dumps (and probably 8x less compressed than it should be).

https://www.wikidata.org/wiki/Wikidata:Database_download

There is room for a third option leveraging commercial columnar database research.

https://adsharma.github.io/duckdb-wikidata-compression/

jrm4 · 2026-03-22T19:01:49 1774206109

And for those who are only vaguely familiar, this ZIM file format is not the same as the https://zim-wiki.org one.

hofrogs · 2026-03-22T19:10:09 1774206609

I am actually only vaguely familiar and I was wondering about that every time I saw the format referenced but never bothered to check, your comment is informative!

jrm4 · 2026-03-23T16:36:09 1774283769

Yeah, I'm a long time user/disciple of https://zim-wiki.org ; it was basically Obsidian but 15-20 years early. To do some of the things that are now trivially easy with Obsidian I learned scripting and such, so I'm familiar with this very weird coincidence/name collision.

xk3 · 2026-03-23T21:20:04 1774300804

> and probably 8x less compressed than it should be

ZIM uses zstd so it is pretty compressed--but the thing that takes a lot of room is actually the full-text search index built in to each ZIM file.

Unfortunately the UI of kiwix-serve search doesn't take full advantage of this and the search experience kinda sucks...

Have you done anything useful with RDF? Seems like it is just one of those things universities spend money on and it doesn't really do anything

skrtskrt · 2026-03-23T03:59:29 1774238369

I really curious about what the world of archival formats is like - is there consensus? are the most-used formats actually any good and well-supported,and self documenting?

synack · 2026-03-23T11:31:18 1774265478

Library of Congress has some well considered recommendations for archival. https://www.loc.gov/preservation/resources/rfs/TOC.html

For web content they recommend gzipped WARC. This is great for retaining the content, but isn’t easy to search or render.

I do WARC dumps then convert those to ZIM for easier access.