Python has a list of issues fundamentally broken in the language, and relies heavily on integrated library bindings to operate at reasonable speeds/accuracy.
Julia allows embedding both R and Python code, and has some very nice tools for drilling down into datasets:
It is the first language I've seen in decades that reduces entire paradigms into single character syntax, often outperforming both C and Numpy in many cases. =3
Griefers ranting about years old _closed_ tickets on v1.0.5 versions on a blog as some sort of proof of lameness... is a poorly structured argument. Julia includes regression testing features built into even its plotting library output, and thus issues usually stay resolved due to pedantic reproducibility. Also, running sanity-checks in any llvm language code is usually wise.
To be blunt: Moores law is now effectively dead, and chasing the monolithic philosophy with lazy monads will eventually limit your options.
Languages like Julia trivially handle conditional parallelism much more cleanly with the broadcast operator, and transparent remote host process instancing over ssh (still needs a lot of work to reach OTP like cluster functionality.)
Much like Go, library resources ported into the native language quietly moves devs away from the same polyglot issues that hit Python.
Python threading and computational errata issues go back a long time. It is a popular integration "glue" language, but is built on SWiG wrappers to work around its many unresolved/unsolvable problems.
Not a "smear", but rather a well known limitation of the language. Perhaps your environment context works differently than mine.
It is bizarre people get emotionally invested in something so trivial and mundane. Julia is at v1.12.2 so YMMV, but Queryverse is a lot of fun =3
Technically a Venn diagram's entire point is to visualize all possible set relations between N sets. Their "practical" use is explicitly visualizing this.
In popular terminology they are very often confused with Euler Diagrams [0] which represent meaningful relations in sets but not all possible. You shouldn't create Euler Diagrams this complex, but the raison d'etre of Venn diagrams is to visualize the complex nature of set relations.
There is always the complicated wires puzzle from "Keep Talking and Nobody Explodes". Where a 5 way Venn diagram encodes what action you need to take for a given state.
I agree it is a profound question. My thesis is fairly boring.
For any given clustering task of interest, there is no single value of K.
Clustering & unsupervised machine learning is as much about creating meaning and structure as it is about discovering or revealing it.
Take the case of biological taxonomy, what K will best segment the animal kingdom?
There is no true value of K. If your answer is for a child, maybe it’ 7 corresponding to what we’re taught in school - mammals, birds, reptiles, amphibians, fish, and invertebrates.
If your answer is for a zoologist, obviously this won’t do.
Every clustering task of interest is like this. And I say of interest because clustering things like digits in the classic MNIST dataset is better posed as a classification problem - the categories are defined analytically.
What use-cases do you see for the 270M’s embeddings, and should we be sticking to token embeddings or can we meaningfully pool for sentence/document embeddings?
Do we need to fine-tune for the embeddings to be meaningful at the sentence/document level?