“Future Leaks”

No, the Terminator isn’t coming through a time warp from the future, but “Future Leaks” can be destructive, too.

Imagine you have used advanced AI technologies to build a model that predicts stock price movements in the future with uncanny accuracy. As you stare at your screen, a cold sweat comes over you as your hypothetical profits shoot to the sky… You are going to be a million… no, a BILLIONAIRE! Visions of yachts… “This technology is amazing! I’ve found the holy grail!” you exclaim.

But reality sits not far away. If it’s too good to be true, it is. The clue is clear, excellent predictions, too excellent.

A future leak is when future information gets into the inputs of a predictive model. It can sneak in a variety of ways. The most direct way is to use a time-forward shifted input, such as using tomorrow’s information today, but that may be obvious enough to catch. More sneaky is when you or someone who supplied you data did a series of calculations with a future time shifted variable and though that variable isn’t directly used, anything computationally downstream has that future information. This is one of the risks of massive “data lakes” and “warehouses”. You best know the entire pedigree of each element of the data you are using as a model input, including it’s “temporal pedigree”.

In this case, “future” is relative. If looking at the time frame of the inputs, it is relative to the timestamp of each row of the input data at the most recent input at that moment. So if the input data is in the past and one or more of those inputs contain information unavailable at that specific moment in time, it’s a future leak. To clarify, if your inputs are from 2 days ago and you are building a model with an input having information one day ago and predicting one day into the future, that’s a future leak.

Sound confusing? That’s why future leaks are tricky sneaky things. You best have a clear understanding of what data you are using and anything that has been done to that data up to the point that it enters a model.

The problem is not limited to finance. This is true in any setting, industrial, marketing, everything.

– – –

Carl Cook is the founder and president of BioComp Systems, IntelliDynamics (R), and has been practicing prediction and forecasting since the late 1970s. He can be reached at cmcook@intellidynamics.net

