Julia is a nice language, it's just tough to compete with Python.
- The beginner experience in Julia is still much worse than it is in Python. Stuff that should work intuitively sometimes doesn't, and when you get a cryptic error message, it's difficult to find relevant help online. And when you do find help, some of it is out of date because the language has changed over the past few years.
- You can squeeze a lot of performance out of Python and the ecosystem of libraries is hard to beat.
- Julia has to be way better than Python to give people an incentive to switch. Being just marginally better in some aspects of the language isn't enough. And it's very difficult to be much better than Python especially in useability and ecosystem.
> Julia has to be way better than Python to give people an incentive to switch.
A language doesn't necessarily have to give all the old programmers an incentive to switch, if it can position itself as a good language for new programmers to learn.
For example: at our institute (computational biology), we had a PhD student who was an early Julia adopter and wrote his model in that. Several students have since joined the project he started, so obviously they're now writing Julia too. That project's experiences with the language were so good, it soon became obvious that for our use case, Julia was superior to any other language we'd used so far. So pretty much the whole research group has now shifted to Julia, and that's what we teach new students. Slowly, other groups in our institute became interested, and more and more people are adopting it, which in turn means that their new students will also end up learning it in future.
If you work with a lot of data, Julia is already a 10-100x improvement over Python.
Being able to iterate and mangle huge columns with real lambdas and without having to marshal arguments to/from C++ is a huge advantage.
Where I used to spend hours in aggregate searching through docs for pandas/numpy, for stupid shit like "how do I shift but also skip NaNs", now I just write a for-loop in a couple minutes and get on with my work.
There's a whole subclass of tasks in R/pandas to work around the interpreter that just aren't needed in Julia.
For me at least it's well worth the syntactical warts and slow interpreter.
As an experienced python/data science user, this (creating fast complex column-wise transforms) is rarely a problem for me.
The truly huge advantage for Julia is how it plays with parralelism. The GIL makes it an absolute pain to do parallelism in python. Always ends up in threading hacks with numba or joblib, or multiprocessing, which has its own unfixable flaws
Examples: Basic, C (both Basic and C, to a degree), Visual Basic, PHP, Javascript, Python. I'm probably missing some. These displaced older languages just by being adopted by newbies.
This is an insightful and level-headed comment that applies equally to R.
Although Julia is a growing alternative to Fortran/C/etc for long-running computations, it remains awkward and unpleasant for interactive analysis. Users familiar with Python/R/etc must weight the benefits of Julia against its slow library startup, its cryptic error messages, and its thin documentation.
Also, the lack of a community repository for well-vetted Julia libraries can limit uptake by professional researchers who must be able to trust their tools. A real strength of R (in comparison not just to Julia but also to python) is that such a repository exists, and that it has automated testing across a range of computer architectures and versions of R, including not just unit testing within individual libraries, but also testing of related libraries.
> Julia is a growing alternative to Fortran/C/etc for long-running computations, it remains awkward and unpleasant for interactive analysis
Did you mean to write short-running scripts? If anything, the Julia dev workflow is biased towards interactive analysis in a REPL a la R or IPython/Jupyter. I don't mean to imply that there's no startup overhead, but how often are you restarting the REPL when doing EDA? Unless it's more than once every few minutes (which is a very odd workflow), then startup overhead is effectively amortized.
> A real strength of R (in comparison not just to Julia but also to python) is that such a repository exists
CRAN is certainly a cut above many other package repos here, but I'm not sure "trust their tools" can apply to all packages on there. Anecdotally, I've had a lot of issues with compiled dependencies and missing/out of date external assets on less well-trodden packages. There's a reason MRAN, Conda and JuliaBinaryWrappers exist after all.
For whatever reason, Julia package maintainers also seem more receptive to making their work compatible with other libraries as well. This goes beyond just multiple dispatch as well--imagine if tidyverse/non-tidyverse wasn't such a hard split.
There are warts with the beginner experience with Python, principally the awful situation with packaging.
If you care about performance in code that mixes together several packages in nontrivial ways, Julia is way better than Python.
There's a far broader range of libraries in Python than Julia, but none of them are going to prevent adoption of Julia when its performance advantages are crucial, because of the excellent facilities for using Python from Julia.
I'm not sure the package problem is really a problem for beginners. Just within the last year firsthand I've seen people in undergraduate classes, in graduate classes, and at work try Python the first time, and the default install of Anaconda worked for them in every case. The classes were taught by different professors, and they all suggested Anaconda independently and were not Python programmers.
It is overkill/brute force to install all the Anaconda default packages when a beginner is not going to use over maybe 5-10 libraries, but it's a solution that has worked flawlessly for beginners from my experience watching non-software engineers and "non-technical" people using Linux, Windows, and MacOS try Python for the first time in math and data science classes.
You can, but Pluto is (I would argue) a better notebook than Jupyter. Regardless, the latency experienced with Julia applies equally whether you use Pluto, Jupyter, or any other front-end.
You absolutely can use regular jupyter notebooks for julia! Pluto has some advantages, like being stored as a normal julia file. The julia startup time issues affect both.
Oh, man, this is indeed a major feature. My main point of friction with jupyter notebooks is the stupid json ipynb format. Why can't it be just a regular language file with comments?
> They contain code, rendered Markdown, images, plots, video players, widgets, etc.
The code could be verbatim python code (or whatever language the notebook uses), and the rest could be embedded inside comments. I don't see any problem with that (besides the very concept of "rendered Markdown" being totally out of order). The fact that they are saving it as json by default seems more to be laziness by the developers than a well thought-out solution, that could be just a straightforward serializer.
>and the rest could be embedded inside comments. I don't see any problem with that
Do you mean embedding images and plots inside comments? If yes, please elaborate on how you see that happening in the real world.
>The fact that they are saving it as json by default seems more to be laziness by the developers than a well thought-out solution, that could be just a straightforward serializer.
So, how would that well thought-out solution in the form of a "straightforward serializer" work? I have a flat file, and I want to display images, plots that you can zoom into out of, figures, etc. as comments. How would that happen?
>At the very least, you could put the whole json stuff inside a comment. It's already plain text, isn't it?
So instead of having the whole file as JSON, which is lazy and not well thought-out, we'll put all content in JSON, then put that JSON inside a comment in a plain text file. Do I read you correctly?
I feel we're making progress faster than these lazy Jupyter org bandits.
> we'll put all content in JSON, then put that JSON inside a comment in a plain text file
Only the "output" content. The code inside the cells is verbatim, and the markdwon cells are regular text comments.
See, I'm not discussing you just because. I have a legitimate problem with ipynb: very often I want to run the code of a notebook from the command line, or import it from another python program. This is quite cumbersome with the ipynb, but it would be trivial if it was a simple program with comments.
I believe people reading this are not detecting the sarcasm. I'm demonstrating that the Jupyter folks are not lazy engineers, and the "obvious" solutions people come up with are not that well thought-out when you start actually thinking about them.
You can also use VS Code notebooks and Julia support in VS Code keeps getting better. As a newcomer to Julia I am super impressed with the experience. No getting around loading the Plots package but producing a high quality plot and getting the data there is a much more enjoyable experience than pandas + numpy + Matplotlib + whatever tensor framework you’ve sworn to.
Do you still have latency issues in Julia 1.6? The latency improvements in the last 3 versions of julia have been so significant that I do not really notice it anymore. Supposedly there are additional speedups planned for 1.7.
I tried also recently the beta version of Julia 1.6 and the speed improvement of installing/loading package are quite impressive. Essentially, packages get precompiled after installation using multiple threads.
Beside this, if you only infrequently install/update package, you can use PackageCompiler.jl. I use it for PyPlot.jl (based on matplotlib), DataFrames.jl, ... and plotting some data quasi instantaneous as it is in python (even the very first time in a session).
Julia has the focus on scientific and numerical computing, and is overtaking the python/numpy combo in that niche. In addition to being considerably faster than python, it also has quite some innovative libraries in the area. This can also extend into machine learning, where python has been the go to language, despite its limitations.
For other areas, like web programming, there is no sign of Julia replacing Python in the forseable future.
I second this. Python is actually starting to get significant traction in the scientific community. Depending on the field, R, Fortran and Matlab (and even C++) still have a huge lead.
It's nice that Julia is getting noticed, but it's a distant blip in the radar.
The sci community is really hard to move from existing battle-tested and performant libraries.
I don’t have much insight on the scientific computing landscape in general, but here’s one notable data point: I worked on the CMS experiment of LHC (Large Hadron Collider) for a while, which is one of the highest profile experiments in experimental physics. The majority of CMS code is C++, which you can check for yourself at https://github.com/cms-sw/cmssw (yes, much/most? of the code is open source). What I worked on specifically was prototyped in Python, then ported to C++ and plugged into the massive data processing pipeline where performance is critical due to the sheer amount of data. So I probably wouldn’t put C++ in parentheses.
This need to rewrite, of course, is what Julia is trying to avoid. My workflow is exactly the same, and I’d love to be able to write code in a high-level language like Python and then use that directly instead of having to rewrite.
However, in my case the reason for rewriting isn’t just performance, but also to be able to build compiled binaries. Julia aims to be as high-level as Python but faster - is there a language that’s as high-level as Python but AOT-compiled?
Cython - in fact I think in 2021 if you want to write a pure C or pure C++ program, Cython is the best way to go, and just disable use of CPython.
The “need to rewrite” is actually a sort of advantage with Cython. You only target small pieces of your program to be compiled to C or C++ for optimization, and the rest where runtime is already fast enough or otherwise doesn’t matter, you seamlessly write in plain Python.
Using extension modules is just a time-tested, highly organized, modular, robust design pattern.
Julia and others do themselves a disservice by trying to make “the whole language automatically optimized” which counter-intuitively is worse than make the language overall optimized for flexibility instead of speed, yet with an easy system to patch optimization modules anywhere they are needed.
I have been using pythran for the last year and the nice thing is that you hardly have to rewrite anything but get speeds which are often as fast (or sometimes faster) than c modules.
The problem with cython is that to really get the performance benefits your code looks almost like C.
I agree with you on the optimize the bits that matter, often the performance critical parts are very small fractions of the overall code base.
> Using extension modules is just a time-tested, highly organized, modular, robust design pattern.
I really don't get this. I'am fully on the side that limitations may increase design quality. E.g I accept the argument that Haskell immutability often leads to good design, I also believe the same true for Rust ownership rules (it often forces a design where components have a well defined responsibility: this component only manages resource X starting from { until }.)
But having a performance boundary between components, why would that help?
E.g. This algorithm will be fast with floats but will be slow with complex numbers. Or: You can provide X,Y as callback function to our component, it will be blessed and fast, but providing your custom function Z it will be slow.
So you should implement support for callback Z in a different layer but not for callback X,Y, and you should rewrite your algorithm in a lower level layer just to support complex numbers.
Will this really lead to a better design?
> “But having a performance boundary between components, why would that help?”
It helps precisely so you don’t pay premature abstraction costs to over-generalize the performance patterns.
One of my biggest complaints with Julia is that zealots for the language insist these permeating abstractions are costless, but they totally aren’t. Sometimes I’m way better off if not everything up the entire language stack is differentiable and carries baggage with it needed for that underlying architecture. But Julia hasn’t given me the choice of this little piece that does benefit from it vs that little piece that, by virtue of being built on top of the same differentiability, is just bloat or premature optimization.
> “you should rewrite your algorithm in a lower level layer just to support complex numbers.”
Yes, precisely. This maximally avoids premature abstraction and premature extensibility. And if, like in Cython, the process of “rewriting” the algorithm is essentially instantaneous, easy, pleasant to work with, then the cost is even lower.
2. Allow each to pursue optimization independently, with clear boundaries and API constraints if you want to hook in
3. When possible, automate large classes of transpilation from outside the separate restricted computation domains to inside them (eg JITs like numba), but never seek a pan-everything JIT that destroys the clear boundaries
4. For everything else (eg cases where you deliberately don’t want a JIT auto-optimizing because you need to restrict the scope or you need finer control), use Cython and write your Python modules seamlessly with some optimization-targeting patches in C/C++ and the rest in just normal, easy to use Python.
> One of my biggest complaints with Julia is that zealots for the language insist these permeating abstractions are costless, but they totally aren’t.
This sounds like it might be interesting, but your later comments about overhead and abstraction costs sounds like you maybe don't understand what Julia's JIT is actually doing and how it leverages multiple dispatch and unboxing. Could you be a bit more concrete?
No I think that’s what I’m saying. When raising the issue that using multiple dispatch this way is premature abstraction that has intrinsic costs, all I get is the religious pamphlet about multiple dispatch.
In practice the multiple dispatch overhead is elided by the compiler. If it can’t be you’re doing something truly dynamic, which is generally unavoidably slower. It’s still a better place to be than everything being a generic Object type.
The nice thing about Cython is that you can have both - all the multiple dispatch you want with fused types, or escape that paradigm to do other things if you desire. It gives a lot of surgical control.
I don’t think that is true. As far as I know, Cython let’s you do function overloading and single dispatch via class inheritance. I think you also miss out on the type inference that lets you do things like pipe dual numbers through functions without any dispatch related overhead.
Does compiling with cython decrease the ffi overhead of the calls into native code? My problems with numpy have always been that I have to make a lot of calls on small bits of data and the ffi overhead eats all my performance gains. If I put more logic on the native side and made fewer bigger calls it would be faster, but that often doesn't make sense, or is a slope where putting the logic unto native pulls a data structure over or another related bit of logic until I just have a tiny bit of python left.
Probably. Cython compiles a C-style superset of Python into C. Then a C compiler compiles that to a Python-importable DLL/.so. So, the overhead to call a C function is no more than declaring its types (programmer person overhead) and then, in the generated C, the native C-linkage function can be called like any other. Now, just one C function calling another from another translation unit (i.e. object file or shared lib) can be "high" overhead (nothing like Py FFI), but you may also be able to eliminate that with modern compilers with link-time-optimization with some build environment care.
Just for reference, my experience is mostly computational genomics. R is king of analysis, and most of the actual "meat" is implemented in C++. But I work with other teams as well, so the experience is a bit more varied if you look across different areas.
It's all about which "bubble" you're in. Many people posting here work for startups using micro services (for which Go is a decent fit) and for companies close to the whole Docker/Kubernetes ecosystem, which is based on Go. So naturally they assume Go is huge.
My anecdata kind of tells me that Go is reasonably big, but it's not yet near .NET and Java, worldwide. But it could get there in a few years, I've seen/heard about some enterprises adopting it.
True, but I'm not talking about simple users. I'm talking about companies extending Kubernetes or building adjacent software. Even if their service doesn't necessarily integrate with Kubernetes, there is frequently a temptation to "follow your heroes".
Look at the whole Cloud Native Foundation thing, I think most of their projects are developed using Go.
So if you're using that stack, it's easy to assume that all new development everywhere is in Go.
It will probably balance out once the newness wears off Go (I think this is already happening).
It's not overtaking at all. It's seen growth in some areas.
The issue with regards to web programming/other programming is important, because sometimes it's useful to make a website/build another tool as a scientist. Python can do both easily.
They're not exactly mature frameworks yet though, which is more the point I'm making. Of course, you can do most things in Julia, but does it provide a good experience for it yet?
there are so many tools coming up around this in julia that it is arguably a problem.
THere was a whole session last juliacon that was just on web-dashboard tools like Dash.jl and Stipple.jl and several others.
And there was another half-session worth of other talks about web related things.
Seriously?!? Julia has no hope of overtaking Python in numerical computing by 2032, expecting movement by 2022 is just delusional. Here is a better prediction: by 2022 people using Python for numerical computing who started doing so in the previous year will exceed the number of people who have ever downloaded Julia since it was first released.
No, not seriously. But also seriously. The article is based on % change which is of course ridiculous because the % increase of a small population isn't at all interesting. And GP also has a ridiculous claim. So I'm offering a stake in the ground to determine if Julia is on the track that GP claims. If GP wants to come back and discuss where the stake should be, it will be an interesting conversation.
You can call Python directly from Julia https://github.com/JuliaPy/PyCall.jl so much of the Python library ecosystem (say, matplotlib) is available to be used in Julia programs.
That helps the adoption story quite a bit. You can do the number-crunching in Julia where performance counts, and then analyse and present the results using Python.
- Using Python directly is a better experience than calling Python from Julia
- I've never run into unsolveable performance issues with Python
So I guess I'm not in the target audience unless I just happen to be curious about a new language? That's kind of my overall point - even if Julia is a good language on its own and I work in data science, I don't have reasons to pick it over Python.
If you haven't hit a brick wall with python, it is just because you haven't run into the right problem. I was doing something that required lots of conditional operations on small matrices. The FFI into numpy's native library really bogged it down. I didn't have permission to install a compiler on that machine so I wrote it in vba in excel. It was 11x faster.
I said something similar in another thread, but for me it doesn't have to be better than Python, as that is largely going to be subjective, the package ecosystem just has to grow and have some offering, at all, for the things that I do.
Is still very barebones compared to Torch/TF/Flax and I would be hamstringing myself by switching to Julia even if I find the language otherwise attractive.
Thanks for this, I will definitely follow along there. Yeah if they can just check a few of those boxes I'm much more likely to at least try to more regularly work with Julia.
If it's a pure function. Oh and if you have state-based control flow you have to turn off the JIT. Etc. If you take a standard library like some thermodynamics simulator and throw Jax on it do you expect it to work without modification? Most of the time it'll fail right at the start by using the wrong implementation of Numpy. So no, that's not "ordinary functions": those are functions where people consciously put in the effort to rewrite years of work onto Jax which is very different.
I found the beginner installation/package installation experience a million times better than python (except that it’s tricky to explain that you type ] to enter the package manager but you don’t see the ] that you typed)
I think your "ecosystem of python libraries" is the key point. Python got a lot of mileage for a mass adoption from ML. Its libraries provided an "easy ML" for masses at the time ML got popular in science and job market, which quickly brought it into mainstream and built up its network effect.
A similar enabler in a new field could help Julia burst in as a general language. My 2c.
Autodiff is a place where there is a gulf between Julia and Python, one
that I think can't be bridged well: JuliaDiff is astonishingly flexible and performant.
I linked to the website (which was updated in May, but its contents could do with more work) because it has examples of how well the suite fits together.
I don't know much about Jax. I've seen competent benchmarks showing an order of magnitude benefit for using ReverseDiff from the AutoDiff suite over Autograd, which is what Pytorch uses for reverse-mode autodiff
I think you are confusing 1995 with 2005. Perl was in decline by 2000 and by 2005 it was terminal; you could probably count the number of perl shops of any consequence in that year on the fingers of one hand.
I don't think that is the case.
Sure Perl may have been in decline for ages, but people were not comparing Perl to Python for that long.
Simply because python hasn't existed that long.
Python 2.0 was released in 2000.
Python 1.0 was 1994, and Python 0.9 (first public release?) was 1991.
People like to substitute "10x better" here but I think the real number is 100,000x better, aka it's not possible by default. Q: What it would take to replace Windows? A: iPhone was a new product category that targetted a new market.
It does happen, though. C has mostly replaced FORTRAN for scientific applications. Not entirely, FORTRAN is (infamously) still used, but I don't know anyone who has started a new project with FORTRAN.
Just 6 years ago, I was taught Perl in my Introduction to Bioinformatics course. The teachers were still using Perl because it used to be the go-to language for bioinformaticians. The year after, and every year since, they've taught using Python.
We started a very large computational intensive project in Fortran. It is still easier to do maths in Fortran than in C/C++ and now, Fortran has a wonderful C binding system allowing direct call into C .so/.dll if you want to do some SQL or other kind of data input/output.
The idea that people were being forced to learn perl just for bioinfomatics as recently as 6 years ago fascinates me.
Python has gotten exceptionally lucky. I am sure the two or three remaining perl users on the planet are also on HN and ready to jump to its defense, but to me this just goes to show you how heavy the switching cost is for something like this is and also how lucky python was to have been the best language to switch to at this point. It was in the right place at the right time for a lot of these switches away from older languages in obvious decline and then it was able to leverage numpy and scikit to pick up a lot of additional momentum in ML and data science tasks. It is almost never the 'best' language for the job, but coming in as second choice on most tasks is a huge win.
Jack of all trades, master of none, but oftentimes better than some are at one.
- The beginner experience in Julia is still much worse than it is in Python. Stuff that should work intuitively sometimes doesn't, and when you get a cryptic error message, it's difficult to find relevant help online. And when you do find help, some of it is out of date because the language has changed over the past few years.
- You can squeeze a lot of performance out of Python and the ecosystem of libraries is hard to beat.
- Julia has to be way better than Python to give people an incentive to switch. Being just marginally better in some aspects of the language isn't enough. And it's very difficult to be much better than Python especially in useability and ecosystem.