-
Notifications
You must be signed in to change notification settings - Fork 669
Description
I'm attempting to backtest using Databento continuous symbology data on CME futures, in particular MNQ. Using the built in convert function like so databento.convert("mnq.zst", symbol="MNQ.v.0", output_filename="mnq.npz") appears to run successfully, but I cannot get a backtest to function on it. I have also tried symbol=None, the recorder never records any events and it says that the elapsed time is zero.
Reading the file like so outputs this:
import numpy as np
data = np.load("data/mnq.npz")["data"]
print("Data dtype:\n", data.dtype)
print("\nFirst row:\n", data[0])
OUTPUT>>>
Data dtype:
[('ev', '<u8'), ('exch_ts', '<i8'), ('local_ts', '<i8'), ('px', '<f8'), ('qty', '<f8'), ('order_id', '<u8'), ('ival', '<i8'), ('fval', '<f8')]
First row:
(3489660938, 1751979600000662016, 1751979600002752000, 22939.75, 1.0, 6865165450905, 130, 0.0)Here is some additional code that checks the file
data = np.load("data/mnq_l3.npz")["data"]
print("Total events:", len(data))
print("Unique event types:", np.unique(data["ev"]))
print("Timestamp range:", data["exch_ts"][0], "→", data["exch_ts"][-1])
OUTPUT>>>
Total events: 492544398
Unique event types: [1342177290 1342177291 1610612747 2415919114 2415919115 2684354571
3221225474 3221225475 3221225485 3489660930 3489660938 3489660939
3489660940 3489660941 3758096386 3758096394 3758096395 3758096396
3758096397]
Timestamp range: 1751979600000662016 → 1754827206396576768I am using this code to attempt to backtest, the default converted file does not enter the backtest loop at all, indicating that no time was elapsed, and zero records are recorded.
TICK_SIZE = 0.25
LOT_SIZE = 1
FEED_LATENCY = 30_000_000
ORDER_LATENCY = 30_000_000
DATA_FILE = 'data/mnq_l3.npz'
asset = (
BacktestAsset()
.data([DATA_FILE])
.linear_asset(1.0)
.l3_fifo_queue_model()
.constant_order_latency(FEED_LATENCY, ORDER_LATENCY)
.trading_qty_fee_model(0.01, 0.11)
.tick_size(TICK_SIZE)
.lot_size(LOT_SIZE)
)
hbt = ROIVectorMarketDepthBacktest([asset])
recorder = Recorder(1, 5_000_000)
@njit
def book_skew_strategy(hbt, recorder, skew_threshold):
asset_no = 0
tick_size = hbt.depth(asset_no).tick_size
half_spread = tick_size * 4
order_qty = 1.0
next_order_id = 1
while True:
hbt.clear_inactive_orders(asset_no)
depth = hbt.depth(asset_no)
recorder.record(hbt)
if hbt.elapse(100_000_000) != 0:
break
if depth.best_bid <= 1e-9 or depth.best_ask <= 1e-9:
continue
mid_price = (depth.best_bid + depth.best_ask) / 2.0
bid_qty = depth.best_bid_qty
ask_qty = depth.best_ask_qty
if bid_qty <= 1e-9 or ask_qty <= 1e-9:
continue
skew = np.log(bid_qty) - np.log(ask_qty)
orders = hbt.orders(asset_no)
values = orders.values()
while values.has_next():
order = values.get()
if order.cancellable:
hbt.cancel(asset_no, order.order_id, False)
if skew > skew_threshold:
price = round((mid_price - half_spread) / tick_size) * tick_size
hbt.submit_buy_order(asset_no, next_order_id, price, order_qty, GTC, LIMIT, False)
next_order_id += 1
elif skew < -skew_threshold:
price = round((mid_price + half_spread) / tick_size) * tick_size
hbt.submit_sell_order(asset_no, next_order_id, price, order_qty, GTC, LIMIT, False)
next_order_id += 1
return True
book_skew_strategy(hbt, recorder.recorder, 0.5)
hbt.close()
print("completed")
data = recorder.get(0)
stats = LinearAssetRecord(data).contract_size(5).stats(book_size=100_000_000)
print(stats.summary())
stats.plot()The symbol is MNQ.v.0, with the dbn cli tool I get this output.
$ dbn -m -J mnq.zst | jq
{
"version": 3,
"dataset": "GLBX.MDP3",
"schema": "mbo",
"start": "1751979600000000000",
"end": "1754856000000000000",
"limit": null,
"stype_in": "continuous",
"stype_out": "instrument_id",
"ts_out": false,
"symbol_cstr_len": 71,
"symbols": [
"MNQ.v.0"
],
"partial": [],
"not_found": [],
"mappings": [
{
"raw_symbol": "MNQ.v.0",
"intervals": [
{
"start_date": 20250708,
"end_date": 20250811,
"symbol": "42003472"
}
]
}
]
}I tried manually loading the npz file with numpy and feeding it into the data array to no avail, but the first few rows look like so:
[(3489660938, 1751979600000662016, 1751979600002752000, 22939.75, 1., 6865165450905, 130, 0.)
(3489660939, 1751979600003959040, 1751979600004080128, 22939.75, 1., 6865165450905, 130, 0.)
(3489660939, 1751979600003977984, 1751979600004089088, 22939.75, 1., 6865165450842, 130, 0.)
(3489660939, 1751979600004066048, 1751979600004156160, 22939.75, 1., 6865165450791, 130, 0.)
(3489660939, 1751979600004101888, 1751979600004200960, 22940. , 1., 6865165450792, 130, 0.)]
Printing the hbt.current_timestamp shows i64::MAX which is also a bit confusing, I checked the converted file for u64::MAX or i64::MAX in the timestamp columns and it has zero instances. The databento documentation says that u64::MAX indicates invalid or NaN timestamps.
I can provide a smaller file to test with, but I would just like some guidance on getting this working.
Could it be the SOD snapshots from Databento causing this?
I have not tried a normal MBO file but I will try now and update this issue post, but I do eventually want to get this to work on continuous files. I have also tried generating a csv and splitting each day, then converting each day to a npz, however, the result is the same with current timestamp showing i64::MAX and no records being recorded.