Disruption forecasting is hard not because we lack data but because we have too much. The Signal Routes score is the answer to a smaller question: what would an operator do on a Monday morning if they had only one number for this lane?
A lane-risk model with one input is a thermometer. A lane-risk model with eighty inputs is a stage. Signal Routes uses eight. The eight were chosen against a single criterion: did the input, in the historical record of disruption events from 2014 onwards, materially shift the score in a direction that, in hindsight, the operator would have acted on? Inputs that failed that test — and we tested twenty-six candidates — did not make it in. The remaining eight are below.
| № | Input | Source class | Lag |
|---|---|---|---|
| 1 | Customs flow against baseline | Customs declarations · resolved entities | D + 1 to D + 14 |
| 2 | Sanctions list movement on the lane | UK · EU · US · UN · 23 others | Real-time |
| 3 | Carrier schedule integrity | Carrier feeds, partnered | Hourly |
| 4 | Port queue and dwell time | AIS · port-authority feeds | Hourly |
| 5 | Bank chain stress on lane counterparties | Resolved bank graph | Daily |
| 6 | Inland haulage capacity | Carrier · partner feeds · scraped capacity boards | Daily |
| 7 | Weather and ocean state | ECMWF · NOAA · lane-specific | 6-hourly |
| 8 | Regulatory window changes | Customs authority bulletins · 41 territories | D + 0 to D + 3 |
None of the eight is, on its own, a forecast. The customs flow against baseline is a measurement. The sanctions list movement is a feed. The carrier schedule integrity is a derivative. Each input carries its own noise, its own lag, its own systematic bias against particular lane classes. The job of the model is to fuse the eight into a number that an operator can act on, with the contributors named so that the operator can see why the number moved.
The temptation, given eight inputs and a target, is to fit a single sequence model — a gradient-boosted tree, a transformer, a recurrent network — against the joint history and ship the output. We did, in the first prototype, ship exactly that. It was wrong about the things we needed it to be right about, and right about the things we did not. It was right on average; it was wrong on the events.
The diagnosis was straightforward. A single model, trained against a joint history, learns the average behaviour of the joint history. The average behaviour is dominated by the periods in which no input is doing much; the events — the periods in which the operator needs the model most — are the long tail. The single model, in the long tail, regressed to the mean. The mean was not useful.
The fix was an ensemble that does not vote on the score but on the contributors. Each input has its own light model, fit against its own history with its own loss function. The eight light models produce eight contribution estimates per lane per week. The ensemble layer is a constrained sum, not a learned weighting, with the constraints derived from the methodology — sanctions list movement may move the score by up to so many points; weather may move it by up to so many; and so on. The score is the sum. The contributors are the eight terms.
The eight light models, in current production, are six gradient-boosted trees, one Bayesian state-space model (for carrier schedule integrity), and one rule-based scorer (for regulatory window changes). The choice of model class per input was made against the historical record: which class produced the lowest residual against the held-out test set, with the smallest swings under input noise. The Bayesian state-space model is on the list because carrier schedules carry a structural lag that a gradient-boosted tree does not handle gracefully. The rule-based scorer is on the list because regulatory window changes are sparse, important, and best read as a structured event rather than a continuous signal.
The ensemble layer is a constrained sum. The constraints — the maximum contribution each input may make to the score — are part of the methodology and are versioned with it. When a constraint is changed, the change is published in the Library four release cycles in advance, with the calibration that motivated the change.
A score of 70 must mean a 70 % probability of a disruption event of declared magnitude within the declared horizon. That is not a target; it is the definition. Calibration is verified against a rolling 52-week window per lane class — maritime container, maritime bulk, air freight, rail, road. The calibration plots are published in the Library against the corresponding release tag.
Where the calibration drifts beyond a published tolerance — twice, so far, in the life of the product — new scores are suspended on the affected lane class, a notice is posted in the Library, and the score does not resume until the calibration is back within tolerance. We have not shipped an out-of-calibration score and let the operator find out from the disruption, and we will not. The two historical suspensions are documented; the dates, lane classes, durations and root causes are public.
| Horizon | MAPE · current | Calibration window | Released |
|---|---|---|---|
| 1 week | 1·8 % | Rolling 52 wk | v3.2 |
| 4 weeks | 3·4 % | Rolling 52 wk | v3.2 |
| 12 weeks | 8·1 % | Rolling 52 wk | v3.2 |
| 26 weeks | 13·9 % | Rolling 52 wk | v3.2 · in test |
The most common move, observed across the carrier and forwarder workspaces, is the early conversion of marginal volume. A lane whose 4-week score moves from 30 to 65 with port queue and inland haulage named as the contributors is, for an experienced lane planner, a Monday-morning cue to convert a fraction of the lane's volume to an alternate while the conversion is still cheap. The 4-week MAPE of 3·4 % is, importantly, low enough that this move pays. Above an MAPE of roughly six percent on the planning horizon, the conversion cost dominates the expected loss; below it, the conversion pays.
The second-most common move is the L/C pricing adjustment. A trade-finance underwriter who reads the score and the contributors directly into the pricing model can write a facility whose terms reflect the lane state on the day the deal closes, with a documented audit trail back to the methodology and the calibration window. The contributors are stable across releases, so the pricing model does not need to be re-fit when the score moves; only the inputs change.
For the customs-data layer that produces the inputs, see The hidden economy of customs data. For the operational discipline behind a Signal Routes briefing, see Practice. For the calibration plots, the change history of the constraints, and the historic suspensions, see the Library.