Lane-risk forecasting explained

Essay № iii Filed by: Engineering desk Date: February MMXXVI Length: 3,100 words

Lane-risk forecasting: how we turned eight noisy signals into one decision-ready number.

Disruption forecasting is hard not because we lack data but because we have too much. The Signal Routes score is the answer to a smaller question: what would an operator do on a Monday morning if they had only one number for this lane?

§I

The eight inputs.

A lane-risk model with one input is a thermometer. A lane-risk model with eighty inputs is a stage. Signal Routes uses eight. The eight were chosen against a single criterion: did the input, in the historical record of disruption events from 2014 onwards, materially shift the score in a direction that, in hindsight, the operator would have acted on? Inputs that failed that test — and we tested twenty-six candidates — did not make it in. The remaining eight are below.

№	Input	Source class	Lag
1	Customs flow against baseline	Customs declarations · resolved entities	D + 1 to D + 14
2	Sanctions list movement on the lane	UK · EU · US · UN · 23 others	Real-time
3	Carrier schedule integrity	Carrier feeds, partnered	Hourly
4	Port queue and dwell time	AIS · port-authority feeds	Hourly
5	Bank chain stress on lane counterparties	Resolved bank graph	Daily
6	Inland haulage capacity	Carrier · partner feeds · scraped capacity boards	Daily
7	Weather and ocean state	ECMWF · NOAA · lane-specific	6-hourly
8	Regulatory window changes	Customs authority bulletins · 41 territories	D + 0 to D + 3

None of the eight is, on its own, a forecast. The customs flow against baseline is a measurement. The sanctions list movement is a feed. The carrier schedule integrity is a derivative. Each input carries its own noise, its own lag, its own systematic bias against particular lane classes. The job of the model is to fuse the eight into a number that an operator can act on, with the contributors named so that the operator can see why the number moved.

§II

Why a single black-box model fails.

The temptation, given eight inputs and a target, is to fit a single sequence model — a gradient-boosted tree, a transformer, a recurrent network — against the joint history and ship the output. We did, in the first prototype, ship exactly that. It was wrong about the things we needed it to be right about, and right about the things we did not. It was right on average; it was wrong on the events.

The diagnosis was straightforward. A single model, trained against a joint history, learns the average behaviour of the joint history. The average behaviour is dominated by the periods in which no input is doing much; the events — the periods in which the operator needs the model most — are the long tail. The single model, in the long tail, regressed to the mean. The mean was not useful.

The fix was an ensemble that does not vote on the score but on the contributors. Each input has its own light model, fit against its own history with its own loss function. The eight light models produce eight contribution estimates per lane per week. The ensemble layer is a constrained sum, not a learned weighting, with the constraints derived from the methodology — sanctions list movement may move the score by up to so many points; weather may move it by up to so many; and so on. The score is the sum. The contributors are the eight terms.

§III

Our ensemble.

The eight light models, in current production, are six gradient-boosted trees, one Bayesian state-space model (for carrier schedule integrity), and one rule-based scorer (for regulatory window changes). The choice of model class per input was made against the historical record: which class produced the lowest residual against the held-out test set, with the smallest swings under input noise. The Bayesian state-space model is on the list because carrier schedules carry a structural lag that a gradient-boosted tree does not handle gracefully. The rule-based scorer is on the list because regulatory window changes are sparse, important, and best read as a structured event rather than a continuous signal.

The ensemble layer is a constrained sum. The constraints — the maximum contribution each input may make to the score — are part of the methodology and are versioned with it. When a constraint is changed, the change is published in the Library four release cycles in advance, with the calibration that motivated the change.

Ensemble at a glance

6 · gradient-boosted trees
1 · Bayesian state-space
1 · rule-based scorer
Σ · constrained sum, not learned weights
Constraints versioned · published

§IV

Calibration discipline.

A score of 70 must mean a 70 % probability of a disruption event of declared magnitude within the declared horizon. That is not a target; it is the definition. Calibration is verified against a rolling 52-week window per lane class — maritime container, maritime bulk, air freight, rail, road. The calibration plots are published in the Library against the corresponding release tag.

Where the calibration drifts beyond a published tolerance — twice, so far, in the life of the product — new scores are suspended on the affected lane class, a notice is posted in the Library, and the score does not resume until the calibration is back within tolerance. We have not shipped an out-of-calibration score and let the operator find out from the disruption, and we will not. The two historical suspensions are documented; the dates, lane classes, durations and root causes are public.

Horizon	MAPE · current	Calibration window	Released
1 week	1·8 %	Rolling 52 wk	v3.2
4 weeks	3·4 %	Rolling 52 wk	v3.2
12 weeks	8·1 %	Rolling 52 wk	v3.2
26 weeks	13·9 %	Rolling 52 wk	v3.2 · in test

§V

What operators actually do with it.

The most common move, observed across the carrier and forwarder workspaces, is the early conversion of marginal volume. A lane whose 4-week score moves from 30 to 65 with port queue and inland haulage named as the contributors is, for an experienced lane planner, a Monday-morning cue to convert a fraction of the lane's volume to an alternate while the conversion is still cheap. The 4-week MAPE of 3·4 % is, importantly, low enough that this move pays. Above an MAPE of roughly six percent on the planning horizon, the conversion cost dominates the expected loss; below it, the conversion pays.

The second-most common move is the L/C pricing adjustment. A trade-finance underwriter who reads the score and the contributors directly into the pricing model can write a facility whose terms reflect the lane state on the day the deal closes, with a documented audit trail back to the methodology and the calibration window. The contributors are stable across releases, so the pricing model does not need to be re-fit when the score moves; only the inputs change.

The 4-week MAPE of 3·4 % is the line beneath which the early conversion pays. The line is not editorial; it is calibrated. — Lane Risk Methodology v3.2 · §6

Read further.

For the customs-data layer that produces the inputs, see The hidden economy of customs data. For the operational discipline behind a Signal Routes briefing, see Practice. For the calibration plots, the change history of the constraints, and the historic suspensions, see the Library.

Customs data & AI → Library Request a briefing