So, I have posted many forecasts and scored many forecasts on this blog. But it may be difficult for you to determine if my forecasts are good. For one, humans are bad at judging forecasts, because we rarely keep records.
For instance, you may hear someone claim that he or his friend is really successful at playing scratch-off tickets. With the exception of those few people who luck into a major jackpot, it is a mathematical near-impossibility that a regular scratch-off player could earn more than he spends. But you or he may not interpret it that way as you will have reported to you and he will primarily remember, the wins, not the many small losses. That's not an issue for judging my forecasts as I keep comprehensive records and, where possible, report my forecasts and results publicly, in a manner that allows verification.
The issue with judging my forecasts is that there is no baseline error by which to judge them. No one else makes specific, verifiable and public wine grape price forecasts. You could track one hundred sports pundits and judge their results against each other (and, if you're judging their picks against the spread, you'd probably find them to be in the same ballpark as a coin toss.) But, you can't do that for wine grape price forecasts.
So, what benchmarks should one use? Well, it depends on your situation. Let me outline the options I would recommend:
1) First, I believe you should judge my forecasts by the expectations I set. I give probability distributions. Do my predictions conform to these? That is, if I say that 60% of the time these forecasts will fall between X and Y, do they do that? A good forecaster understands, measures and communicates how reliable his forecasts are. The downside is that this takes a bit of effort and time and statistical literacy.
2) If you or your organization makes its own forecasts, you've got a great benchmark for me. Am I better than you? If so, by how much?
3) If you would benefit from accurate forecasts, you can measure how accurate a forecast needs to be to be worth paying for. That gives you a binary decision to make: should you pay me to forecast for you?
4) If you're reading these forecasts more casually, then let me give you this benchmark. Every few years, the forecasting community holds an "M-Competition". This is the world's premier forecasting competition. The last competition, the M-4 competition, saw teams forecast 100,000 time series. There is no way to use domain knowledge - all of the data is presented as numbers with no context. On the other hand, these teams show up with machine learning augmentation and cutting-edge forecasting techniques and software.
The M-4 competition was won by a one-man team named Slawek Smyl, from Uber. His absolute, average error rate for all time horizons (1 to 6 periods out), was around 13%. By those standards, I'm doing pretty good, although we can't really compare until I'm 6 periods out. The average, absolute error for the top 50 teams (of nearly 300 teams) was 16%.