Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The missing standard library for multithreading in JavaScript (github.com/w4g1)
109 points by W4G1 18 hours ago | hide | past | favorite | 31 comments




I like this, but unfortunately it doesn't solve one annoying problem: lexical scope doesn't work and it will fail in an unexpected way.

If you reference something lexically, your code fails at runtime. Want to use an import? You have to use import() inside the closure you pass to spawn(). Typescript doesn't know this. Your language server doesn't know this. Access a variable that shadows a built in global? Now you're accessing the built in global.

The only way this could even be addressed is by having a full on parser. Even then you can't guarantee things will work.

I think the only "fix" is for JS to introduce a new syntax to have a function that can't access lexical scope, returning a value that either extends a subclass of Function or has a cheeky symbol set on it. At least then, it'll fail at compile time.


There is a simple solution to this problem, but it's not very popular: do the same thing Workers do, require using a separate file. All the tooling works out of the box, you have no issues with lexical scoping, etc. The only downside is it's (currently) clunky to work with, but that can be fixed with better interfaces.

I've been using a functionally identical implementation of this since I wrote it in my startup's codebase a decade ago. It's really handy, but definitely not without edge case issues. I've occasionally had to put in workarounds for false positive TypeScript/lint errors or a tool in the bundling pipeline trying to be too clever and breaking the output.

Overall it's great, and I'm glad to see a generic implementation of it which will hopefully become a thriving open source project, but ultimately it's a kludge. What's really needed is for JS to introduce a native standardized version of this construct which TypeScript and the rest of the ecosystem have to play nice with.


A linter rule provided by the library could be helpful here. I know it's just a workaround but probably easier than going for a solution that does compile time checks.


This should be the expected behavior when multithreading. It is the expected behavior when executing a child process, such as node’s child_process.fork.

Fork, and normal worker threads always enter a script, there's clearly no shared lexical scope. This spawn method executes a function, but that fn can't interact with the scope outside

While I agree with GP that this should be the expected behavior, your comment raises what I think is a large problem/wild-goose-chase in ‘modern’ language designs implementing concurrency.

The push from language designers (this applies across the high/low level spectrum and at all ranges of success for languages) to make concurrent code ‘look just like’ linearly read, synchronous, single-threaded code is pervasive and seems to avoid large pushback by users of the language. The complaints that should be made against this syntax design become complaints that code doesn’t do what developers think it should.

My position is that concurrent (and parallel) code IS NOT sequential code and languages should embrace those differences. The move to or design of async/await is often explicitly argued for from this position. But the semantic differences in concurrent code IMO should not be obscured or obfuscated by seeking to conform that code to sequential code’s syntax.


As soon as I read your username, I had to read it out loud to my girlfriend. Why is it so funny

I’d love a way to be able to specify that sort of thing. I wrote a little server-side JSX rendering layer, and event handlers were serialized to strings, and so they had similar restrictions.

Related tangent: Platformatic's NodeJS "Watt" server^1 supports parallelization with kernel-level load balancing across CPU cores. It looks like a game-changer for Node performance and efficiency in production. Apparently the new AWS "Lambda Managed Instances" do something similar (tho I'm short on details).

1. https://www.platformatichq.com/watt


Not "missing" at all. Stuff like this has been available for a decade, both as library and as compile-time optimization (which is arguably better)

One such example: https://github.com/developit/workerize-loader


This part is beautiful:

> Serialization Protocol: The library uses a custom "Envelope" protocol (PayloadType.RAW vs PayloadType.LIB). This allows complex objects like Mutex handles to be serialized, sent to a worker, and rehydrated into a functional object connected to the same SharedArrayBuffer on the other side.

It's kinda "well, yes, you can't share objects, but you can share memory. So make objects that are just thin wrappers around shared memory"


I'd be interested to see a comparison with https://piscinajs.dev/ - does this achieve more efficient data passing for example?

Lack of easy shared memory has always felt like a problem to me in this space, as often the computation I want to off-load requires (or returns) a lot of data.


This looks great. If it works as well as the readme suggests, this’ll let me reach for Bun in some of the scenarios where I currently reach for Go. Typescript has become my favorite language, but the lack of efficient multithreading is sometimes a deal breaker.

Exactly my thoughts. The only incompatibility with Bun is the unavailability of the `using` keyword:

> If you are using Bun (which doesn't natively support using and uses a transpiler which is incompatible with this library)...

I skimmed the issues but I couldn't find any issues on Bun regarding this except for: https://github.com/oven-sh/bun/discussions/4325


I added a bit more information about Bun compatibility:

> While Bun is supported and Bun does support the `using` keyword, it's runtime automatically creates a polyfill for it whenever Function.toString() is called. This transpiled code relies on specific internal globals made available in the context where the function is serialized. Because the worker runs in a different isolated context where these globals are not registered, code with `using` will fail to execute.


I’ve played around with webworkers and just could never seem to get over the latency issues

Interesting. Are you talking about the latency to spawn new workers, or getting data from the main thread to the worker? To give you an idea, this library uses a lazily initialized thread pool (thread-per-core by default), where tasks are shared between workers (like the Tokio library in Rust). This means workers only need to be initialized once, and passing data via structured clone is usually very fast and optimized in most engines. Better yet is to use ArrayBuffer or SharedArrayBuffer, which can be transferred or shared between threads without any serialization overhead.

It usually came from serializing and deserializing objects which here it’s a shared json buffer? But even then there’s a serialization bottleneck right? You’d have to be mindful about how the context and closures work across boundaries. Then there’s also spinning up the workers, but I suppose you could do this ahead of time. Maybe my complaint is self-inflicted and is ultimately avoidable - but the complexity begins to mount.

There’s also the queuing and blocking nature of web-workers, I wish they could asynchronously process messages the same way js IO works, but that’s not the case. Rather you are batching full units of work. The mental model is different.

Anecdotally in Firefox I must have run into some memory leak issues and had to hard restart.

Ultimately I ended up going with service workers, which yes sounds strange but I found to be much easier to work with. Cancellable requests, async, long living in the background … but maybe it just works best for me ;)


This is incredible! The SharedJsonBuffer got me all excited!

Writing module bundlers in Javascript had diminishing returns from multi threading because of the overhead of serializing and deserializing ASTs.

I wonder how far something like this would push the ceiling. Would love to see some benchmarks of this thing hauling ASTs around.


This seems very much worth a look!

(I suspect, to paraphrase Greenspun's rule, any sufficiently complicated app using Web Workers contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of this library...)


I'm confused why drop() is a function that you have to import inside the closure instead of a method.

It was a design decision to make the syntax feel as familiar to Rust as possible. But I do agree that it's a bit verbose and that it won't hurt to add a .dispose() handle to the objects themselves.

From an overall system point of view, this is the current pinnacle of footgun design.

The OS does thread management and scheduling, facilitates IPC, locking, etc. All of this is one big largely-solved problem (at least for the kind of things most people are doing in JavaScript today). But because of history, we now have a very popular language and runtimes that are trying to replicate all these features, reinventing wheels, and adding layers on inefficiency to the overall execution.

Sigh.


I don’t disagree with you about the additional inefficiency that is very likely to accumulate as JS adds more and more ‘features’ (via the language, frameworks, or libraries). But as a genuine question, isn’t this reimplementation (or any comparable library for multithreading) required by JavaScript’s position on sandboxing. I would be suspicious of intent if browsers were allowed to spawn any number of threads to execute non-trusted scripts at the level typically seen from more native application code.

Allowing access to native threading doesn’t imply that the API provided by the language is unrestricted. There is a (very wide) middle zone to land in.

Documentation here is exceptionally well written for a JS project, although move() doing different things depending on the type of data you pass to it feels like a foot-gun, and also how is it blocking access to arrays you pass to it?


The implementation of the shared json buffer is nuts

This is cool! Hope we can get multi-threaded wasm some time soon.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: