Working with prediction samples

/v1/predict returns many sampled trajectories. This guide turns them into the things you actually need: a point forecast and a confidence band.

Why many trajectories

Temporis generates forecasts by sampling — it reads your history and draws a plausible continuation, the way a language model continues text. Each draw is one possible future, not the future. Ask for count draws and you get count independent trajectories in predictions.

The point of having many is the spread. Where the trajectories agree, the model is confident; where they fan out, the future is genuinely uncertain. That spread is your uncertainty estimate — which is why a single sample is rarely what you want.

Set count high enough to resolve that spread. A handful of draws gives a jittery, unreliable picture; somewhere in the range of 20 to 100 resolves the distribution well for most series. More is smoother but not free.

Count costs tokens

Prediction is metered per model token, and every trajectory generates a full horizon of tokens. Doubling count roughly doubles the cost of the call. Use enough samples to see the spread clearly, then stop.

A point forecast

To collapse the samples into one number per future timestamp, aggregate across trajectories at each timestep. The median is a good default — it's robust to the occasional wild trajectory; the mean works too if you prefer it.

Each prediction's data rows share the same timestamps in the same order, so you can line them up by index. Here we request 50 samples from the hourly_orders profile and compute a median orders forecast per timestamp:

import os, requests
from statistics import median

token = os.environ["TEMPORIS_TOKEN"]

resp = requests.post(
    "https://api.temporis.co/v1/predict",
    headers={"Authorization": f"Bearer {token}"},
    json={"data_profile": "hourly_orders", "count": 50, "temperature": 1.0, "top_p": 0.9},
)
resp.raise_for_status()
body = resp.json()

# Each prediction's rows are in the same timestamp order, so transpose by index.
preds = body["predictions"]            # length == count (50)
rows = list(zip(*[p["data"] for p in preds]))  # group row i across all samples

point_forecast = []
for step in rows:
    ts = step[0]["timestamp"]          # shared timestamp for this step
    values = [row["orders"] for row in step]
    point_forecast.append({"timestamp": ts, "orders": median(values)})

for f in point_forecast[:3]:
    print(f["timestamp"], round(f["orders"], 1))

Confidence bands

The median gives you a center line; a band gives you the range around it. Compute per-timestamp quantiles across the samples — the 10th, 50th, and 90th percentiles are a common choice. The result is three lines: a low edge (p10), the center (p50), and a high edge (p90).

Read the band as "about 80% of sampled futures fall between p10 and p90 at this timestamp." A narrow band means the model is confident; a band that widens as you look further ahead is the model honestly telling you the far future is less certain than the near future.

import os, requests
import numpy as np

token = os.environ["TEMPORIS_TOKEN"]

resp = requests.post(
    "https://api.temporis.co/v1/predict",
    headers={"Authorization": f"Bearer {token}"},
    json={"data_profile": "hourly_orders", "count": 50, "temperature": 1.0, "top_p": 0.9},
)
resp.raise_for_status()
body = resp.json()

preds = body["predictions"]
timestamps = [row["timestamp"] for row in preds[0]["data"]]

# Shape: (count, horizon) — one row per sample, one column per timestep.
matrix = np.array([[row["orders"] for row in p["data"]] for p in preds])

p10, p50, p90 = np.percentile(matrix, [10, 50, 90], axis=0)

band = [
    {"timestamp": ts, "p10": lo, "p50": mid, "p90": hi}
    for ts, lo, mid, hi in zip(timestamps, p10, p50, p90)
]
for b in band[:3]:
    print(b["timestamp"], round(b["p10"], 1), round(b["p50"], 1), round(b["p90"], 1))

Choosing count, temperature, top_p in practice

Three knobs shape the samples, and they do different jobs:

  • count controls resolution — how finely the spread is sampled. It does not change the shape of the distribution, only how clearly you see it.
  • temperature (greater than 0) controls spread — higher values draw more diverse, wider-ranging futures; lower values cluster the samples near the model's most-likely path. About 1.0 is neutral.
  • top_p (0 to 1) trims the tails — lowering it cuts the unlikely extremes, so you get fewer surprising jumps; 1.0 keeps the full distribution.

Sensible starting points, depending on how jumpy your series is:

Series charactercounttemperaturetop_p
Stable, smooth (steady demand)20–300.80.9
Typical501.00.9
Volatile, spiky (prices, bursty traffic)80–1001.10.95

Treat these as a place to begin. Draw a band, look at whether it's too tight to be honest or so wide it's useless, and adjust temperature first.

Ranking by loss

Sometimes you want a single representative path rather than a band — for a demo, a sanity check, or a quick "what does a typical future look like." Every trajectory carries a loss, and lower loss means the model considered that path more typical. Sort by it and take the front of the list:

# body is the parsed /v1/predict response
ranked = sorted(body["predictions"], key=lambda p: p["loss"])

most_typical = ranked[0]               # lowest loss = most representative path
for row in most_typical["data"][:3]:
    print(row["timestamp"], row["orders"])

# Or trim outliers: keep the most-typical 80% before computing a band.
keep = ranked[: int(len(ranked) * 0.8)]

For an actual forecast, prefer the median and a band over any single trajectory — the lowest-loss path is one plausible future, not an average of them. Ranking is most useful for picking a clean example or filtering outliers.

Related