Fun fact: if you don't care about the auto-regressive aspect of NeuralProphet (it's turned off by default), you can implement the core of NeuralProphet/Prophet (piecewise linear trend + Fourier on weekly/daily seasonality) in about 60 LOCs with no other dependency than either torch or numpy+scipy.optimize, and without having to deal with Stan or the very poorly chosen heuristics of neuralprophet.
Another thing that both NeuralProphet and Prophet do extremely wrong by default is uncertainty estimation. The coverage probabilities are way off.
It's literally what I did at work last week, which is why I found this submission timely. I'd have to check with my employer if it can be made public. I don't see any reason why not, there's not much to it.
What did you use to implement the regularization of the trend breakpoints? Prophet by default uses a regular grid and thins them out with STAN. I couldn't find a quick regularization replacement in numpy/scipy/statsmodels with equivalent performance. (I don't want to drag in another huge dependency with Torch or TF).
Not directly from machine learning perspective, rather touching stability of use in production setup. We at VictoriaMetrics allow Prophet to be one of the models to use for time series anomaly detection in vmanomaly product. When it comes to cloud environment, Prophet, which uses `cmdstanpy` under the hood, allows little to no control over the backend, thus, resulting in unexpected crashes for read-only filesystems like Red Hat OpenShift, after the backend attempts to create assets in /tmp directory during model fit stage, so such dependencies may limit usage of a product in real-world scenarios.
This is interesting to me. Do you use a library to estimate the fourier series of a data series or have you implemented it from scratch? I've searched for this in the past but always got results RE. Fourier transforms, not series.
As others have pointed out, Prophet is not a particularly good model for forecasting, and has been superseded by a multitude of other models. If you want to do time series forecasting, I'd recommend using Darts: https://github.com/unit8co/darts. Darts implements a wide range of models and is fairly easy to use.
The problem with time series forecasting in general is that they make a lot of assumptions on the shape of your data, and you'll find you're spending a lot of time figuring out mutating your data. For example, they expect that your data comes at a very regular interval. This is fine if it's, say, the data from a weather station. This doesn't work well in clinical settings (imagine a patient admitted into the ER -- there is a burst of data, followed by no data).
That said, there's some interesting stuff out there that I've been experimenting with that seems to be more tolerant of irregular time series and can be quite useful. If you're interested in exchanging ideas, drop me a line (email in my profile).
> they expect that your data comes at a very regular interval
Does prophet rely on this assumption? For health timeseries data the tool of choice is survival analysis - typically using Cox proportional hazards regression or similar regression tools that are able to handle irregular or censored data.
I've seen some moves towards using fancy bayesian or fancier machine learning stuff for clinical trials but a big issue is that they are very difficult to communicate to their intended audience.
I tried Prophet via Darts, and all the models in Darts assume a regular time series.
Re: "fancier machine learning" -- I've seen different flavors of RNNs & LSTMs have some success in analyzing time series data. I've struggled to get them to work on real-world (i.e., messy) data, but have had some encouraging results with a transformer encoder-only NN.
What does Dart do that a multibillion dollar entity with an excellent open sourcing track record misses doing? Perhaps it addresses a niché case well. Genuinely curious
Darts isn't a specific model, it's a wrapper API for a wide variety of forecasting models, and Prophet is one of them. Other models may or may not outperform Prophet depending on the nature of your specific application and your time series data. You really have to test them to know. And Darts facilitates testing many models on the same data by putting them all behind the same API.
Also, Prophet was developed by a very small number of individuals at Facebook, it's not something they invested massive resources into.
A common strategy is interpolation. The challenge is that forecasting itself is a form of interpolation. So you're forecasting based on forecasted data.
Prophet is such an appealing package because it promises to abstract away all the difficult parts of forecasting. However, in practice it does not fulfill its promises. I think this is a good discussion of the problems: https://www.microprediction.com/blog/prophet
As others have pointed out, it is a good idea to encode domain knowledge in your time series model through specification and priors. Prophet rarely beats a well specified GLM or SARIMA in real world applications, especially when uncertainty estimates are needed. Professionally, I have succesfully applied Gaussian Processes to many such cases.
A GP is an intuitive and expressive way to code time covariance in a model. A famous example is the relative birthdays model, discussed by Gelman et al in Bayesian Data Analysis and here [1].
This library is old news? Is there anything new that they've added that's noteworthy to take it for another spin?
[disclaimer I'm a maintainer of Hamilton] Otherwise FYI Prophet gels well with https://github.com/DAGWorks-Inc/hamilton for setting up your features and dataset for fitting & prediction[/disclaimer].
I'm no time series expert, but from my experience and what I've heard, using Prophet for time series forecasting isn't recommended. It often leads to less-than-ideal results.
Curiously, in Medium-like (ie low effort) publications it's still the recommended way to tackle a forecasting problem. The promise of a model that can solve any time series problem sounds great, but not all that glitters is gold, and as you get more experience you discover that solutions like this usually don't work.
I used Prophet and personally I do not have any problems, but I agree with the criticism that the tool it’s extremely focused in ergonomics that abstracts important aspects of the tool that can be used to built better models [1].
I thought the biggest issue wasn't with the models themselves, but how Zillow decided to apply and act on them, which is why it didn't work in practice.
So on average their predictions may have been pretty good, but since each transaction also depends on the other party to accept their offer, and whether they get outbid, most of their predictions where the offer actually goes through would be on the tail end of where they slightly overestimated the price.
I was lucky to make and learn from that mistake pretty quickly with some algorithmic trading on much smaller amounts. With housing transactions being much larger and slower, you wouldn't learn this lesson until it was too late. Models never perform as well in practice as they do in theory, and you need to remember to account for both known unknowns and unknown unknowns.
Great comments! I've learned a lot from them. I'm just getting started with algorithmic trading and time series modeling, so I appreciate your insights.
I've honestly had consistently better results with standard regression models. I really love the idea of it, and maybe I need to be tuning it better somehow, but overall I haven't had a great experience.
Every time I, or someone at work with more experience than me, have tried Prophet it has ended up in changing the approach and trying a different technique. In my experience with time series hand-crafted recipes tend to work much more better than out-of-the-box solutions.
I agree completely. We always end up moving away from Prophet every time. The results from Prophet are just not very good, although it can be useful for a proof-of-concept.
Im a Data Engineer in a large consulting company and I have been incredibly impressed with AutoGluon for forecasting. You can build and train a model in around 10 lines of code and it frequently gets into the top 3 or 4% of competitions on Kaggle without much data pre-processing
Yes. I've tried using it for pretty straightforward time series forecasts, and I struggled to make it into something useful in a business context.
I'll disclaim that I'm just a finance dude and not a data scientist or programmer. But the documentation leads me to believe that I am in the target audience. I felt like I could grasp the basic mechanics after reading the paper, but I wish the documentation could help someone like me be more intelligent with the 'tuning' of the model. I could never get accuracy below 15% average error, which is too large for my use case.
Probably user ignorance, but that's my experience.
You are the primary audience. Time series forecasting with deep learning is fraught with inconsistency. Someone on r/ML went pretty hard on detailing a survey and the stuff that was SOTA 10 years ago still is. Wish I saved that thread. The dude was well published.
I updated my comment with the thread but it was actually about time series anomaly detection. Turns out it was the same dude in your second link, and your comment includes forecasting in the first link as well. Thank you!
aaaaand i just spent 3 hours watching that, trying to remember some parts of calculus, and reading all of the wikipedia articles and "also see" that were grey on white in the video. Then i fell asleep, but i wanted to thank you, as i also thanked the Prof. that made that video (on reddit).
This looks to me like something they’d be using for internal capacity planning. If so, they’d be asking it questions like, “how much capacity do we build out for the upcoming holiday rush?” I wouldn't be surprised if financial datasets are very noisy compared to service capacity metrics. I didn’t read the paper though, maybe this is addressed and maybe I’m wrong about the use case! But stuff like the below from the docs reads like capacity planning tool to me:
> As an example, let’s look at a time series of the log daily page views for the Wikipedia page for Peyton Manning. We scraped this data using the Wikipediatrend package in R. Peyton Manning provides a nice example because it illustrates some of Prophet’s features, like multiple seasonality, changing growth rates, and the ability to model special days (such as Manning’s playoff and superbowl appearances).
I'm sad to see no one has responded with a solution to your problem. You are absolutely the target audience, and in my experience, Prophet is "as good as it gets" to generalized forecasting.
While using Prophet for purely "forecasting" setup might not guarantee consistent high-quality results out of the box, especially for noisy and complicated time series data, at VictoriaMetrics we found it practically useful for anomaly detection task:
In our vmanomaly product, Prophet is one of the go-to models for anomaly detection in metrics data and it usually requires little tuning to achieve considerable results. The main purpose for the use of Prophet or similar forecasting models is to reformulate the task of anomaly detection:
- given fitted model M, ground truth Y_i for particular data point X_i, we produce forecast Yhat_i and its uncertainty estimate [Yhat_lb, Yhat_ub]
- if ground truth Y_i falls beyond the range of [Yhat_lb, Yhat_ub], we consider this point an anomaly
- the further Y_i is from the range, the higher the anomaly score would be. In our particular implementation for easier alerting purposes, anomaly_score > 1 means "anomaly"
Based just on the documentation, it seems there are some assumptions they expect the data to adhere to, and if they don't apply then it would not produce good results.
I have not been able to get good results either, but I have not tried it in the past year. I also tried many of the architectures in Darts. I have found that fairly straightforward architectures work well. That is, I can iterate on my own design for my own specific data (with all its specific covariates) and get better results than I could with Darts or Prophet.
Wondering how many people are now downloading this and other libs like Dart and trying to do stock market prediction or crypto price forecasting. Most of the devs i know, myself included, have dabbled in coding up trading algorithms at some point in time.
the hard part isn't the stats. it is all the information that people buy and setting up those ingest pipelines! If i had a satellite telling me when a certain big company has a lot of cars in the lot parked after hours, I could make a zillion bucks too!
First: Prophet is not actually "one model", it's closer to a non-parametric approach than just a single model type. This adds a lot of flexibility on the class of problems it can handle. With that said, Prophet is "flexible" not "universal". A time series of entirely random integers selected from range(0,10) will be handled quite poorly, but fortunately nobody cares about modeling this case.
Second: the same reason that only a small handful of possible stats/ML models get used on virtually all problems. Most problems which people solve with stats/ML share a number of common features which makes it appropriate to use the same model on them (the model's "assumptions"). Applications which don't have these features get treated as edge-cases and ignored, or you write a paper introducing a new type of model to handle it. Consider any ARIMA-type time series model. These are used all the time for many different problem spaces, and are going to do reasonably well on "most" "common" stochastic processes you encounter in "nature", because its constructed to resemble many types of natural processes. It's possible (trivial, even) to conceive of a stochastic process which ARIMA can't really handle (any non-stationary process will work), but in practice most things that ARIMA utterly fails for are not very interesting to model or we have models that work better for that case.
These insights are really awesome! It reminds me of the common aphorism in Statistics: 'All models are wrong, but some are useful.'These insights are really like a wake-up call, thank you!
Disclaimer: I haven't looked at the linked library at all, but this is a theoretical discussion which applies to any task of signal prediction.
Out of all possible inputs, there are some that the model works well on and others that it doesn't work well on. The trick is devising an algorithm which works well on the inputs that it will actually encounter in practice.
At the obvious extremes: this library can probably do a great job at predicting linear growth, but there's no way it will ever be better than chance at predicting the output of /dev/random. And in fact, it probably does worse than a constant-zero predictor when applied to a random unbiased input signal.
Except that it's also usually possible to detect such trivially unpredictable signals (obvious way: run the prediction model on all but the last N samples and see how it does at predicting the final N), and fall back to a simpler predictor (like "the next value is always zero" or "the next value is always the same as the previous one") in such cases.
But that algorithm also fails on some class of inputs, like "the signal is perfectly predictable before time T and then becomes random noise". The core insight of the "No Free Lunch" theorem is that when summed across all possible input sequences, no algorithm works any better than another, but the crucial point is that you don't apply signal predictors to all possible inputs.
Another place this pops up is in data compression. Many (arguably all) compressors work by having a prediction or probability distribution over possible next values, plus a compact way of encoding which of those values was picked. Proving that it's impossible to predict all possible input signals correctly is equivalent to proving that it's impossible to compress all possible inputs.
Another way of thinking about this: Imagine that you're the prediction algorithm. You receive the previous N datapoints as input and are asked for a probability distribution over possible next values. In a theoretical sense every possible value is equally likely, so you should output a uniform distribution, but that provides no compression or useful prediction. Your probabilities have to sum to 1, so the only way you can increase the probability assigned to symbol A is to decrease the weight of symbol B by an equal amount. If the next symbol is A then congratulations, you've successfully done your job! But if the next symbol was actually B then you have now done worse (by any reasonable error metric) than the dumb uniform distribution. If your performance is evaluated over all possible inputs, the win and the loss balance out and you've done exactly as well as the uniform prediction would have.
Tried it once. Its promise is to take the dataset's seasonal trend into account, which makes sense for Facebook's original use case.
We ran it on such a dataset and found out that directly using https://github.com/karpathy/minGPT consistently gives a better result. So we ended up using the output of Prophet as an input feature to a neural network, but the result was not improved in any significant way.
From my own experience, a properly cross-validated lasso regression over a wide range of autoregressive features beats FB Prophet by a good margin and offers nearly the same degree of automation.
I am intrigued on how this would perform on astronomical data.
If anyone is not aware there are many periodic phenomena in astronomy - e.g. variable stars which can have periods from minutes to hundreds of days.
The description of this library sounds like it's very tied to the human world - talking about yearly, weekly and daily seasonality.
[Weirdly though, we do sometimes see variability on 'human' timescales in astronomical data series. If maintenance is carried out weekly on a Monday that can add a signal into the data through missing datapoints.]
On this topic, does anyone know of a suitable time-series forecaster for multivariate analysis? Eg 8 independent/input variables, and one output variable? I've been using multiple linear regression (which works impressively!) but it doesn't take into account the time series, only the single prior day of inputs. Thanks :)
Not really sure what you are looking for, but the easiest might be to just add lags of your input variables in the same linear model that you are using.
If you are looking for an actual timeseries method I would checkout either darts [0] or statsforecast [1]. They are currently the most mature timeseries packages.
In Machine Learning conference papers, a common approach is to model relationships between variables using Graph Neural Networks (GNNs). Using GNNs is a powerful and flexible way to go. Maybe you can give it a try!
Thanks! I take it that this means a Generalised Linear Model? Could i ask for a link to a relevant article to get me started on the flavour of GLM that you recommend?
It's the mechanism used by Grafana's forecasting feature.
It's still not greatly explained and in many cases it makes it hard for the user to understand it's results as might be below zero for data that can only be positive numbers (requests per second, for instance).
Prophet has gotten a lot of attention since being released in 2017, I think because the idea of a fully automatic solution is very appealing to people. One of the original developers, Sean Taylor, recently posted a nice retrospective on the project's successes and failures:
https://medium.com/@seanjtaylor/a-personal-retrospective-on-... He quotes one of his earlier tweets:
If I could build it again, I’d start with automating the evaluation of forecasts. It’s silly to build models if you’re not willing to commit to an evaluation procedure. I’d also probably remove most of the automation of the modeling. People should explicitly make these choices.
Having worked on similar Bayesian time-series forecasting tools at Google, this matches my experience (though I've never used Prophet seriously, so please don't take this as any direct judgement of it as a software package). There is a lot of value in a framework that lets you easily experiment with different model structures (our version of this was the structural time series tools in TensorFlow Probability, see, e.g., https://blog.tensorflow.org/2019/03/structural-time-series-m...). But if you're forecasting something you actually care about, it's usually worth the time to try to understand yourself what structure makes sense for your problem, and do a careful evaluation on held-out data with respect to whatever metric you're really trying to optimize. A fully automated search over model structures is cute, but even when it works, it mostly just ends up rediscovering properties of the data you could or should have already known (e.g., of course traffic to your work-related website will have a day-of-week effect), so the cases where it really adds practical value are harder to find than you might like.
Even in the age of deep learning, I do think these relatively classical Bayesian models have a lot of value for many applications. Time-series forecasting tends to be a case where:
- you don't have a ton of iid data points (often, only a single time series),
- you'd like forecasts with principled uncertainty estimates, e.g., credible intervals, giving you a range of scenarios to plan for,
- you often do have a pretty good idea of what features are relevant to the process you're predicting, and
- you want to understand in detail what features the forecast is accounting for (and what it might be missing),
all of which play to the strengths of more classical, structured statistical models, compared to more data-hungry black-box deep learning models. So the basic ideas in Prophet and similar tools do still have a lot of relevance going forward, IMHO.
You mention classical models but Bayesian deep learning is a thing too. One can even retrofit existing DL models to obtain uncertainty estimates, at the expense of increasing (possibly doubling) the number of model parameters.
The quality of the uncertainty estimates is a question though.
I'd be curious to see how it performs on economics data compared to mainstream models (say DSGE) whose results have never impressed me with their predictive power.
Facebook developers are doing some really great stuff. For some reason it doesn't translate into a really great facebook or instagram. The experience is worse compared to 10 years ago. If they hired 10,001 of the best developers not working at facebook I think their products would be the same or worse. Is there a single person responsible for the vision?
They recommend checking out these for cutting-edge time series forecasting:
https://neuralprophet.com/
https://nixtla.github.io/statsforecast/