Standards and Conventions
Data Normalization
Normalization simplifies data by standardizing formats, making it easier to use. However, this process often introduces errors. This article explores these issues, trade-offs, and the reasoning behind Blockhouse's schema design.
1. On Orderbook: Price Impact Regression
Orderbook data processing involves transforming raw order book snapshots into a structured format that allows for meaningful market insights such as price impact regression.
Feature Engineering
The data is augmented with calculated metrics such as order flow imbalance (OFI), mid-prices, and market depth. This involves extracting insights about buying and selling pressure, identifying the middle ground between the highest bid and lowest ask prices, and calculating the size of orders at different levels in the book.
Resampling
A key normalization step involves resampling the data to a fixed time interval (e.g. 1s, 5s, 10s, etc.). The goal here is to further standardize the data which helps smooth out noise from high-frequency updates and ensures the dataset has uniform spacing.
For instance, data points within each interval are grouped together, with the latest price, the sum of order flow, and the average market depth captured to represent that time period.
Regression Feature Creation
After the data is normalized, additional regression-friendly features are created. This involves calculating forward returns to estimate future price movements and generating lagged versions of the normalized OFI to capture short-term trends. For example, the data is shifted forward to represent future returns
and backward
to include past observations, creating a comprehensive set of predictors.
Last updated