Home>Blog>Backtesting on Hyperliquid: How to Validate a Strategy Against 6 Months of Real Data
Backtesting on Hyperliquid: How to Validate a Strategy Against 6 Months of Real Data

Backtesting on Hyperliquid: How to Validate a Strategy Against 6 Months of Real Data

By CMM Team - 20-Apr-2026

Backtesting on Hyperliquid: How to Validate a Strategy Against 6 Months of Real Data

Every trader has a thesis. The Money Printer cohort goes net long, you follow, you make money. Funding spikes above 0.03% for twelve hours, you fade it, the rate normalizes, you pocket the carry. Liquidation risk crosses 70 on an asset where retail is over-leveraged, you wait for the cascade, you buy the flush. These are good hypotheses. The problem is that none of them are strategies until you test them against data that already happened and measure what would have actually occurred.

A backtest is the difference between "I think this works" and "this worked 63% of the time across 847 trades over six months with a Sharpe of 1.4 and a max drawdown of 8.2%." The first is a thesis. The second is an edge you can size.

This guide covers what data Hyperliquid makes available for backtesting, how to pull it through the API, how to structure a backtest that does not lie to you, the three ways backtests fail in practice (overfitting, lookahead bias, survivorship bias), how to run walk-forward validation to separate real edges from curve-fitted noise, and a complete worked example testing a cohort divergence strategy against six months of fills. By the end you will have the framework for testing any signal the API produces before you risk a dollar on it.

What historical data is available

Backtesting is only as good as the data behind it. On Hyperliquid, everything is onchain, which means the historical record is complete. HyperTracker exposes this data through several endpoints with different lookback windows.

Fills (raw trades). The /fills endpoint returns individual trade records going back roughly 6.5 months (from late July 2025 onward). Each fill includes the coin, side, size, price, timestamp, and the wallet that executed it. This is the rawest data available and the foundation for any price-based backtest. You can filter by coin, by wallet address, and by time window.

Position metrics. The /position-metrics/coin/{coin}/segment/{segmentId} endpoint returns per-coin, per-cohort aggregate metrics with up to 4 weeks of lookback. This is the data you need for cohort-based strategy testing: how was each segment positioned at each point in time? The 4-week window is shorter than fills, which means cohort-based backtests are limited to roughly a month of data per pull. For longer tests, you need to collect and store the data over time.

Positions (snapshots). The /positions endpoint returns position snapshots filtered by time, coin, wallet, and cohort. These are point-in-time snapshots of open positions, useful for reconstructing what the market looked like at any given moment. Note: positions are snapshots at irregular intervals, not a continuous feed. They capture state, not transitions.

Funding rates. The /funding/latest endpoint returns current funding. For historical funding, you would need to collect it over time or source it from Hyperliquid's own historical API. Funding is critical for any strategy that holds positions for more than a few hours, because the carry cost directly impacts P&L.

What you cannot backtest (yet). Cohort bias history is limited to a rolling 12-hour window. Historical order flow snapshots (stops, TPs, limit clusters) are current-only with no archival. If your strategy depends on these signals, you need to start collecting the data now and backtest once you have enough history.

How to structure a backtest that does not lie to you

Most backtests are wrong. Not because the code has bugs, but because the structure of the test introduces biases that make losing strategies look profitable. Here are the rules for building a backtest you can trust.

Rule 1: No lookahead bias. At every point in the backtest, the strategy should only have access to data that was available at that moment. If your signal uses cohort positioning from 14:00 UTC, the entry cannot happen at 13:55 UTC. This sounds obvious and is the most common backtest error. It happens when you use a DataFrame where the signal column is calculated from future data, or when you use close prices for signals and also for entry fills in the same bar.

Rule 2: Account for execution costs. Every entry and exit has a cost: trading fees (maker and taker), slippage (the difference between the price you wanted and the price you got), and funding payments if you hold through settlement intervals. A strategy with 0.3% average return per trade looks profitable until you subtract 0.05% maker fee on entry, 0.05% taker fee on exit (if you stop out), and 0.05% of slippage on each leg. Your 0.3% edge is now 0.1%, and one bad fill wipes a week of gains.

# Execution cost model
MAKER_FEE = 0.0002    # 0.02% maker
TAKER_FEE = 0.0005    # 0.05% taker
SLIPPAGE = 0.0003     # 0.03% estimated slippage per side

def apply_costs(entry_price, exit_price, side, holding_hours, funding_rate_per_hour):
    """Adjust P&L for realistic execution costs."""
    entry_cost = entry_price * (MAKER_FEE + SLIPPAGE)
    exit_cost = exit_price * (TAKER_FEE + SLIPPAGE)
    funding_cost = entry_price * funding_rate_per_hour * holding_hours

    if side == "long":
        raw_pnl = exit_price - entry_price
        net_pnl = raw_pnl - entry_cost - exit_cost - funding_cost
    else:
        raw_pnl = entry_price - exit_price
        net_pnl = raw_pnl - entry_cost - exit_cost + funding_cost  # shorts receive positive funding

    return net_pnl

Rule 3: Use out-of-sample data. Split your historical data into two periods. Use the first period (in-sample) to develop and tune the strategy. Use the second period (out-of-sample) to validate it. If the strategy works in-sample but fails out-of-sample, it was curve-fitted to the training data and has no real edge. A common split: use the first 4 months for development, the last 2 months for validation.

Rule 4: Measure the right metrics. Win rate alone is meaningless. A strategy with 90% win rate and a 10:1 loss-to-win ratio loses money. The metrics that matter:

  • Sharpe ratio. Risk-adjusted return. Above 1.0 is good. Above 2.0 is excellent. Below 0.5 means the edge is too thin to trade after costs.
  • Maximum drawdown. The largest peak-to-trough decline. If you cannot stomach a 15% drawdown, do not trade a strategy with a 15% max drawdown in the backtest, because live drawdowns are always worse.
  • Trade count. A strategy with a 3.0 Sharpe on 12 trades is statistically meaningless. You need at least 30 trades for basic significance, 100+ for confidence.
  • Profit factor. Gross profits divided by gross losses. Above 1.5 is solid. Below 1.2 means the edge is razor-thin and execution variance will eat it.
  • Win rate + average win/loss ratio. Together these tell you the shape of the P&L distribution. A 40% win rate with a 3:1 win/loss ratio is a trend-following strategy. A 70% win rate with a 0.5:1 ratio is a mean-reversion strategy. Both can be profitable. Neither is obvious from win rate alone.

Backtest Metrics Dashboard

The three ways backtests fail

Overfitting

You tune the divergence threshold from 0.25 to 0.27 because 0.27 produces a better Sharpe in the backtest. Then you adjust the holding period from 24 hours to 18 hours because 18 is slightly better. Then you add a filter for time-of-day because the strategy works better during Asian hours. Each adjustment improves the backtest by a fraction. Together they produce a strategy that is perfectly fitted to the historical data and will fail immediately in live trading because none of those parameter choices generalize.

The fix: minimize the number of free parameters. A strategy with one threshold and one holding period has 2 parameters. A strategy with 7 filters and 5 thresholds has 12 parameters and is almost certainly overfit. If your strategy needs more than 3 free parameters to be profitable, it is not a strategy. It is a description of the past.

Lookahead bias

Your signal function accidentally uses data from the future. The most common version: you calculate the daily cohort divergence using end-of-day data, but your entry happens at the start of the day. In the backtest this looks like a perfect predictor because the signal uses information the strategy would not have had at entry time.

The fix: lag every signal by at least one bar. If your data is 5-minute bars, the signal calculated at bar N should only generate an entry at bar N+1 at the earliest. Build this into the data pipeline, not as an afterthought.

Survivorship bias

You backtest a strategy on BTC and ETH because they are the most liquid assets. But you are only testing them because they survived and grew. You are not testing the assets that were liquid 6 months ago and are dead today. The strategy might only work on assets that went up, which is not a strategy but a filter for past winners.

The fix: include all assets that were tradeable during the backtest period, not just the ones that are tradeable today. If an asset was listed on Hyperliquid in August 2025 and delisted in December 2025, it should still be in your backtest universe for that period.

Walk-forward validation

Walk-forward is the gold standard for backtesting because it simulates how the strategy would actually be deployed: train on historical data, trade on new data, then retrain as more data arrives.

The process:

  1. Window 1 (train). Use months 1 through 3 to optimize your strategy parameters.
  2. Window 1 (test). Apply those parameters to month 4 without changing anything. Record the results.
  3. Window 2 (train). Slide forward: use months 2 through 4 to re-optimize.
  4. Window 2 (test). Apply to month 5. Record results.
  5. Repeat until you run out of data.

The final performance is the aggregate of all test windows. This is the closest approximation to live performance because each test window uses parameters that were fitted to data the strategy had never seen.

If the walk-forward Sharpe is above 1.0 and the degradation from in-sample to out-of-sample is less than 50%, the strategy has a real edge. If the walk-forward Sharpe drops below 0.5 or the degradation exceeds 50%, the in-sample performance was noise.

def walk_forward(data, train_months=3, test_months=1):
    """Run walk-forward validation and return aggregate results."""
    results = []
    total_months = len(data) // 30  # approximate

    for start in range(0, total_months - train_months - test_months + 1):
        train_start = start * 30
        train_end = (start + train_months) * 30
        test_start = train_end
        test_end = (start + train_months + test_months) * 30

        train_data = data[train_start:train_end]
        test_data = data[test_start:test_end]

        # Optimize on train data
        best_params = optimize(train_data)

        # Test on unseen data
        test_result = run_strategy(test_data, best_params)
        results.append(test_result)

    return aggregate_results(results)

Walk Forward Timeline

Worked example: testing cohort divergence

Here is a concrete backtest of the simplest cohort signal: go long when the Money Printer cohort is significantly more long than the Giga-Rekt cohort, go short when the opposite is true.

Setup:

  • Asset: BTC
  • Signal: cohort divergence (Money Printer vs Giga-Rekt) threshold 0.25
  • Entry: market order at the next 5-minute bar after signal triggers
  • Exit: 24-hour time-based exit (close at any price after 24 hours)
  • Position sizing: 1% risk per trade, stop at 3% from entry
  • Costs: 0.02% maker entry, 0.05% taker exit, 0.03% slippage each way, hourly funding applied
import requests
from datetime import datetime, timedelta

API_BASE = "https://ht-api.coinmarketman.com/api/external"
HEADERS = {"Authorization": "Bearer YOUR_JWT_TOKEN"}

def fetch_cohort_history(coin, segment_id, start, end):
    """Pull historical cohort positioning. 4-week max per call."""
    return requests.get(
        f"{API_BASE}/position-metrics/coin/{coin}/segment/{segment_id}",
        headers=HEADERS,
        params={"start": start, "end": end},
    ).json()

def fetch_fills(coin, start, end):
    """Pull historical fills for price data."""
    return requests.get(
        f"{API_BASE}/fills",
        headers=HEADERS,
        params={"coin": coin, "start": start, "end": end},
    ).json()

# Pull data
money_printer = fetch_cohort_history("BTC", 8, "2026-03-01T00:00:00Z", "2026-03-28T00:00:00Z")
giga_rekt = fetch_cohort_history("BTC", 15, "2026-03-01T00:00:00Z", "2026-03-28T00:00:00Z")

# Align timestamps, compute divergence, generate signals
# ... (alignment logic depends on the response format)

# For each signal:
#   entry = next bar's open
#   exit = 24 hours later
#   apply costs
#   record P&L

The code above is the data-fetching skeleton. The full backtest engine (signal alignment, position tracking, cost model, metrics calculation) adds roughly 150 lines. The point is not to ship a production backtester in a blog post. It is to show that the data exists, the endpoints are real, and the framework for testing a hypothesis is straightforward once you have the structure right.

What to expect from results. A well-constructed cohort divergence backtest on BTC typically shows a moderate positive Sharpe (0.8 to 1.5) on 4-hour to daily timeframes, with drawdowns in the 5% to 12% range. The signal works best during trending periods and underperforms during ranges. The walk-forward degradation is usually 20% to 40%, which is within the acceptable range for a real edge.

If your backtest shows a Sharpe above 3.0 or a win rate above 80%, something is wrong. Either the cost model is missing, there is lookahead bias, or the parameter space is overfit. Real edges are modest. The traders who compound them are the ones who trust the numbers over their intuition and size accordingly.

From backtest to live: the deployment checklist

A strategy that passes walk-forward validation is ready for paper trading, not live deployment. Here is the bridge:

Paper trade for 2 weeks. Run the strategy with real signals but no execution. Compare the theoretical fills to what the market actually offered at entry time. If the paper P&L is more than 30% worse than the backtest, the cost model needs adjustment.

Deploy at quarter size. Start with 25% of the intended position size. Run for 2 more weeks. This catches execution issues (partial fills, API latency, order rejections) that paper trading cannot simulate.

Scale to full size. If the quarter-size results are within 20% of the backtest, scale up. If they are worse, investigate why before adding size.

Monitor continuously. Compare rolling 30-day live performance to the backtest benchmark. If live Sharpe drops below 50% of the backtest Sharpe for more than 2 weeks, halt and re-evaluate. The regime may have changed.

Free tier on HyperTracker allows 100 requests per day: enough to pull a month of cohort positioning data for one asset in a single session. For production backtesting across multiple assets and longer timeframes, the Pulse tier ($179/mo, 50K requests) is the natural starting point.

Get free API access

Closing thoughts

A thesis is not a strategy. A strategy is not an edge. An edge is not a system. Each step requires the one before it, and the step that separates hobbyists from professionals is the backtest. Not because backtests are always right, but because a properly constructed backtest tells you what a strategy can do, what it cannot do, and how much it costs to find out.

The data is there. Six months of fills. Four weeks of cohort positioning per pull. Real prices, real trades, real wallets. The only thing between a hypothesis and an answer is the test.

The traders who trust a backtest over their gut are not less intuitive. They are more honest about what they do not know. The backtest does not tell you the future. It tells you what happened the last 847 times this signal fired. That is enough.