Lux is similar to Flax (Jax) where the parameters are kept in a separate variable from the model definition, and they are passed in on the forward pass. Notably, this design choice allows Lux to accept parameters built with ComponentArrays.jl which can be especially helpful when working with libraries that expect flat vectors of parameters.
Flux lies somewhere between Jax and PyTorch. Like PyTorch, the parameters are stored as part of the model. Unlike traditional PyTorch, Flux has “functional” conventions, e.g. `g = gradient(loss, model)` vs. `loss.backward()`. Similar to Flax, the model is a tree of parameters.
You are also allowed to have bike share, buses, subway/light rail, and taxis. No one is saying trains are the only mode of transportation. I’ve lived in midwestern states with miles of farmland. In the town center, a bus network and bike paths meant I almost never needed a car. Folks who lived outside of town had the option to drive in and park their car at commuter lots.
I’m not the main dev on FastAI.jl, but I work on the Julia ML community team that supported this project.
> Since FastAI.jl uses Flux, and not PyTorch, functionality has to be reimplemented.
We are looking to offer a high level API for ML in Julia similar to fastai for PyTorch. The goal is to enrich the Flux ecosystem, so just calling into Python fastai wouldn’t be appropriate. FastAI.jl is built on top of several lower level packages that can be used separately from FastAI.jl. These packages help build out the ecosystem not just for FastAI.jl, but any ML framework or workflow in Julia.
> What does this mean for the development of fastai?
FastAI.jl is “unofficial” in that Jeremy and the fastai team did not develop it. But Jeremy knows about the project, and we have kept in touch with the fastai team for feedback. FastAI.jl doesn’t affect the development of Python fastai in any way.
> FastAI.jl has vision support but no text support yet.
> What is the timeline for FastAI.jl to achieve parity?
We’re working to add more out-of-the-box support for other learning tasks. Currently, we have tabular support on the way, but the timeline for text is not decided.
Note that the framework itself could already support a text learning method, but you’d have to implement the high level interface functions for it yourself. We just don’t have built-in defaults like vision. You can check out https://fluxml.ai/FastAI.jl/dev/docs/learning_methods.md.htm... for a bit more on what I mean.
> When should I choose FastAI.jl vs fastai?
It depends on what you need. PyTorch and fastai are more mature, but Julia and Flux tend to be more flexible to non-standard problems in my experience. If you’re interested, then give Julia/Flux/FastAI.jl a try. If we’re missing a mission critical feature for you, then please let us know so we can prioritize it.
You would have to add a learning method to tell it how to encode/decode graph data, but the framework is agnostic to the model choice. So any Flux model is supported.
It is not surprising to me that there is no text support yet, given the issues of integrating 3D (batch x sequence x features) RNN support (for CUDNN). This issue has prevailed so long that I came to believe it is impossible to integrate with the current Flux.jl stack.
Transformers.jl and TextAnalysis.jl already provide quite a bit of functionality for NLP, though to my knowledge neither makes use of RNNs. You may be interested in commenting on https://github.com/FluxML/Flux.jl/issues/1678.
You should check out Makie. Getting it set up can be a bit frustrating if things don’t go right, and there is a small learning curve for using `@lift`, but it is an absolute joy to use once you ramp up.
I use it for my research by default. You can pan, zoom, etc. The subplot/layout system is frankly a lot better than Matlab (and I enjoyed Matlab for plotting!). The best part is that I can insert sliders and drop downs into my plot easily, which means I don’t need to waste time figuring out the best static, 2D plot for my experiment. I just dump all the data into some custom logging struct and use sliders to index into the correct 2D plot (e.g. a heat map changing over time, I just save all the matrices and use the slider to get the heat map at time t).
On the NVidia A100, the standard FP32 performance is 20 TFLOPs, but if you use the tensor cores and all the ML features available then it peaks out at 300+ TFLOPs. Not exactly your question, but a simple reference point.
Now the accelerator in the M1 is only 11 TFLOPs. So it’s definitely not trying to compete as an accelerator for training.
In a more complex example where you actually take a variable, do some operations to it, then reassign it, Pluto.jl encourages you to separate that into multiple cells. The reason is each cell marks a distinct node in the dependency graph. If you prefer to use cells, then the notebook can be smarter about what lines actually need to get re-run and what don't.
A downside to using multiple cells is vertical spacing/visual noise. This is something that the package authors are currently thinking about addressing.
Most of that is not fundamental to Julia or Flux itself. It’s the difference between a monolithic package like TF and source-to-source AD in Julia. The former allows the designers to use their own data structures and external libraries to do optimizations. Source-to-source relies on the underlying IR used by Julia, making optimizations challenging without some compiler assistance. But all of that is in the pipeline with stopgap solutions on the way.
As with most things in Julia, the code developers don’t just want to hack changes that work, but make changes that are flexible, extensible, and can solve many problems at once. So, Flux isn’t ready for prime time yet, but it is definitely worth keeping your eye on it.
I have been using Flux for a year (or more?) and I have never found it to be slower than PyTorch or TF. Granted I am training at most ResNet-20 and mostly smaller models, so maybe there is larger training routines where people have issues. Every single one of these deep learning libraries is mapping to CUDA/BLAS calls under the hood. If you wrote the framework correctly, the performance difference should not be drastically different. And Flux doesn’t have much in terms of overhead. My lab mate uses PyTorch to train the same models as me and his performance is consistently the same or worse.
As for features, I think this is because people coming from TF or PyTorch are used one monolithic package that does everything. That’s intentionally not how Flux or the Julia ecosystem is designed. I’ll admit that there are a lot of preprocessing utility functions that could be better in the larger Julia ML community. But for the most part, the preprocessing required for ML research is available. This is mostly the fault of the community of not having a single document explaining to new users how all the packages work together.
Where the difference between Flux and other ML frameworks is apparent is when you try to do anything other than a vanilla deep learning model. Flux is extensible in a way that the other frameworks are just not. A simple example is with the same lab mate and I trying to recreate a baseline from a paper that involved drawing from a distribution at inference time based on a layer’s output then applying a function to that layer based on the samples drawn. I literally implemented the pseudo code from the paper because in Flux everything is just a function, and chains of models can be looped in a for loop like an array. Dumb pseudo code like statements where you just write for loops are just as fast in Julia. And it was! Meanwhile my friends code came to a grinding halt. He had to resort to numerical approximations for drawing from the distribution because he was forced to only use samplers that “worked well” in PyTorch. This is the disadvantage of a monolithic ML library. I didn’t use “Flux distributions,” I just used the standard distributions package in Julia.
This disadvantage to TF and PyTorch will become even more apparent when you do model-based RL. Flux was designed to be simple and extensible from the start. TF and PyTorch were not.
Ok my fault i was writing from perspective of ML engineer not reasercher (I'm using Julia for 1.5 year now and my bois reaserchers prefer pure julia solutions cause its easier to write u can use symbols and not using OOP etc.)
But for production ready models PyTorch and TF is miles ahead first of all: NLP, audio and vision based packages building frameworks, (attention layers, vocoders etc.)
then u have option to compile models using XLA and use TPU (about 2/3 times cheaper then gpu for most of our models [audio and nlp])
Next inference performance (dunno about now maybe this change but about ~8 months ago flux was about 15-20% time slower [tested on VGG and Resnet's]then pytorch 1.0 without XLA)
Time to make it to production: Sure maybe writing model from scratch can take a bit longer on PyTorch then Flux (if u not using build in torch layers) but getting in into production is a lot faster, first of all u can compile model (something not possible in Flux) and u can just use it anywhere from Azure and AWS to GCP and Alibaba Cloud make a rest api using Flask/Fast-api etc. or just using ONNX.
Dont get me wrong i love Julia and Flux but there is still a LONG way before most people can even consider using Flux on production enviroment not for reasearch or some MVP stuff.
I have no special insight into ML or Julia (though I love it), but one thing I can confirm from experience is that there is a huge difference between getting a model work once in an academic or research setting, and having something reliably and scalable work in production day after day. Mind boggling, totally different challenges.
Great summary! At the end you mention the difficulty extrapolating beyond an HH neuron model. I think curious readers will find the work of Jim Smith (https://fcrc.acm.org/plenary-speakers/james-e-smith-plenary) interesting in this regard. His work starts with the possible information representation scheme (temporal coding <=> binary coding) and a compute unit (SRM0 neuron <=> transistor) and builds up the equivalent of Boolean logic/algebra from there.
As opposed to a neuroscientist understanding a processor, Jim is a computer architect using his techniques to understand the brain.
There is https://github.com/rejuvyesh/PyCallChainRules.jl which makes this possible. But using some of the native Julia ML libraries that others have mentioned is preferable.