working notes

@ PL Capital

June 11, 2026

Market Gaps Outline Goal: identify gaps in web3/crypto I'd want to an RFS thesis on – plural, a point of view on where the opportunities…

Goal: identify gaps in web3/crypto I’d want to an RFS thesis on – plural, a point of view on where the opportunities are. Map the space, find gaps, each framed as a problem with a “why not”.

Intro: framing around the market structure / mechanism design / institutional-lag thesis – gaps coherent around a worldview. List 4-5 gaps each as the problem (i.e., who’s bleeding), why now (catalyst), + shape of the solution / why it’s hard. One could be focused on batch-settlement for event markets. Close with directionality (where to start).

Options:

Mechanism layer for on-chain event / prediction markets
MEV / extraction at the application layer beyond spot DEX
On-chain market-structure data/analytics infra
Cross-venue settlement/clearing as prediction markets fragment
something in stablecoin/RWA settlement-rails
something in prover/coprocessor infra

ATS structures

June 22, 2026

post-FBA/proof-of-neutrality empirics; future directions Thin token makers are severely bled (-5.5 --> -24.5¢, 100% negative, deepen with greater horizon – ∆2 --> ∆30). These…

Thin token makers are severely bled (-5.5 —> -24.5¢, 100% negative, deepen with greater horizon – ∆2 —> ∆30). These LPs are economically small in dollar figures only because the markets they act on are themselves quite shallow.

In a thicker market, adverse selection is a transfer from LP to informed trader – a cost, what my initial thesis was focused on preventing.

In the tail market cases I’m now considering – pure-information markets with little uninformed hedging flow – Milgrom-Stokey applies: with common priors and no liquidity motive, rational traders won’t trade, because every counterparty is trying to pick you off. These markets can’t form under a CLOB at all. They can only exist if someone is willing to lose money to buy information – i.e., a subsidizer (that’s what LS-LMSR is and was designed for).

Here, the adverse selection isn’t extraction to be minimized – it’s the sponsor purchasing information aggregation. The informed trader bleeding the market maker is doing the sponsor a favor: revealing their private signal into the price the sponsor is paying to discover.

So, the neutrality/anti-extraction frame inverts. The tail thesis isn’t “minimize adverse selection.” It’s “maximize information revealed per subsidy dollar.”

Potential Directions Based on this Framing: market formation for the sponsored tail

Markout-adaptive liquidity – my monitor becomes a control signal, not a sales tool. LS-LMSR’s open problem is that it ties the liquidity parameter b to volume, but volume ≠ inforamtion – markets can have high uninformed churn or low informed trade, and b-on-volume can’t tell the two apart (it also breaks the proper-scoring-rule property and lets prices sum > 1).
- Markout is a live estimate of how informed the flow actually is. An AMM whose liquidity adapts to a realtime adverse-selection signal – tightening for uninformed flow, repricing depth when markout goes persistently negative – could be a new primitive.
- The monitor I’ve just launched could help measure that input. This is the cleanest way my measurement work could survive the negative result – acting as a sensor in a loop.
Subsidy-optimal scoring-rule selection, made verifiable on-chain.
- we could reframe the design object: given budget B, choose the cost-function market maker that maximizes mutual information between final price and outcome.
  - i.e., Abernethy-Frongillo’s framework, which unifies scoring rules and convex cost functions
- A sponsor would specify that they have “$X” and want the most informative price on outcome “Y”, and get an optimally-parameterized AMM.
- Crypto primitives aforementioned would be particularly useful, unlike batch-clearing where the zk-proof was solution hunting for a problem.
Resolution cost as the real bottleneck.
- the on-chain LMSR attempts (Augur, Gnosis/Omen) didn’t die on the liquidity primitive – they died on resolution, gas, and UX. The marginal cost of trustworthy resolution does not scale down to micro-markets; UMA/Kleros don’t get you to per-market cost ~ 0.
- Could create a settlement-layer mechanism – amortized/batch disputes, or markets that resolve to their own pre-resolution price as a Schelling point unless bonded-disputed.
  - This is higher leverage than anything at the clearing layer.
Flow segmentation as mechanism design.
- I personally find depth-on-uninformed-hedging-flow quite societally useless, but at the end of that day that’s the flow that pays LPs – the two flow types simply require different mechanisms.
  - Hedgers want cheap immediacy (CLOB/CPMM is fine)
  - informed traders want a discovery mechanism whose price others believe
- most venues run one mechanism for both. the defensible primitive here is the router / self-selection mechanism that sorts flow by information content and clears ecah on its own rail.

The canonical instance of each: the biotech milestone market

No uninformed hedging flow, requires a subsidizing maker (potentially LS-LMSR), hard to resolve (scientific endpoint oracle), and – the part that closes the loop – the sponsor wants the information because the milestone probability re-prices the regulated security on my other layer (tokenized).

The dual-layer architecture might not be unrelated – it could act as a worked example where the security layer is the sponsor with a quantifiable economic reason to fund information aggregation, and the price feeds back into asset valuation.

Full thesis: information aggregation —> asset repricing —> more subsidy

Final Thesis:

verifiable, sponsored aggregation in a high-value vertical with an objective resolution & a closed loop to a regulated security

June 22, 2026

proof of neutrality Proof of Neutrality

Proof of Neutrality

Splits into a measurement tool & and cryptographic guarantee

“here’s how much the LPs are bleeding on Polymarket / Kalshi right now, measured from live data”
verifiable-neutral-clearing primitive (prove that a batch cleared fairly on-chain, sealed-bid so the BP can’t front-run / MEV).

Data Accessibility

Recon Prompt: open the kalshi-polymarket-microstructure repo, report exactly how it ingests Polymarket data at the moment (API vs. on-chain, what fields, whether aggressor direction is present or inferred), identify the cheapest path to on-chain CTF Exchange fills with ground-truth aggressor sign, and sketch the minimal historical-backtest pipeline to compute aggregate maker-side adverse-selection markout – sketch only, no build.

Results:

Part 1 didn’t observe real trades at all. It actually just computed fills by a price-through proxy on hypothetical passive orders over 30-second L2 book snapshots – so the negative-markout finding is only real within that proxy, but it’s two inference layers removed from ground truth: it never saw a real trade, and it never had a real aggressor sign. That’s a huge gap – my historical proof is a model of adverse selection on hypthetic orders, not a measurement of it on actual ones.
- The live on-chain tool isn’t a productization of a proven result – it’s the first actual measurement of the thing I’ve only modeled thus far.
Architectural Verdict: fusion is required. On-chain OrderFilled gives us ground-truth fills, aggressor side, and maker addresses – but no mid-price. Markout requires a mid to mark against, and the mid only exists in the CLOB API book. So the tool must fuse two feeds: ON-CHAIN (ground-truth aggressor + addresses + match time) and the API Book (the mid reference), time aligned by tx/order hash. And the agent flagged the sharp edge: on-chain block_timestamp is settlement time, which lags the off-chain match by a little bit. If I align the mid to settlement time instead of match time, my markout is measuring the wrong window – contaminated by exactly the way the sim’s terminal-vs-fill-time markout was.
- The bridge (join on-chain OrderFilled to CLOB /trades by tx/order has to recover the true match time) is what has to be right.
v1 CTF Exchange is legacy; v2 (live since late-April, address ...B996B) is the build target, plus a separate Neg Risk Exchange for multi-outcome markets whose address the recon could not confirm – missing it means under-counting multi-outcome fills. So the build targets v2, and confirming the Neg Risk Exchange address is a pre-build task.

Next Steps:

Build 1: on-chain v2 OrderFilled decode —> ground-truth maker/taker/side —> aggregate maker-side markout against a trade-price proxy, in-memory.

Stage 0: Contract Verification

verified everything against primary sources, caught a real error in the prior recon, and surfaced 2 findings that change the Stage 1 plan.

Contract Verification PASSED. Both v2 addresses are confirmed against PolygonScan name tags and official Polymarket docs (primary sources, not blog guesses – exactly the bar CLAUDE.md §2 set). It also caught that the prior recon’s Neg Risk candidate was the v1 address – the v2 pair is 0xE111.../0xe2222... with matched vanity prefixes. It even recorded the depcrecated v1 addresses in contracts.py to prevent accidentally subscribing to them.

Findings:

The two target contracts share the same v2 topic0, because both the CTF Exchange v2 and the Neg Risk CTF Exchange v2 run the same CTFExchange v2 source. Covering both multi-outcome and binary markets is quite simple – one event signature, one topic0, subscribed across two addresses. The multi-outcome coverage I insisted on costs nothing extra.
v2 makes aggressor direction explicit on-chain. The maker’s side is now an explicit uint8 field in the event (BUY = 0, SELL = 1), so direction is ground truth. The agent worked out the exact recovery rule (aggressor = taker; maker = SELL —> aggressor bought; price = amount rati).
- The tokenId --> (market, YES vs NO) mapping needs OFF-CHAIN METADATA but the agent correctly notes that’s not needed for direction or price (both fully on-chain), so it’s properly out of scope for Build #1.

Risks Moving Forward:

maker-event vs taker-event de-deduplication. A single match emits N per-maker OrderFilled events plus a taker aggregate plus 1 OrdersMathced.

Stage 1: Live On-Chain Stream + Decode (NO markout yet).

Task 1: de-duplication confirmation on a live tx – gated first on purpose. PolygonScan ABI API now needs an Etherscan v2 key, so the agent couldn’t solve this in Stage 0. Stage 1 routes around by using the RPC directly (i have the WS endpoint). if I build the stream and THEN discover a match emits both a per-maker and taker-aggregate OrderFilled events, every fill I’ve been counting would be doubled, and any markout I compute later is inflated – and I’d have built the whole loop on a wrong counting basis. Confirming the emit set on one real transaction before writing the loop locks the counting rule against reality. When it reports, the thing to check is that the de-duplication rule isn’t just asserted but confirmed against what the live tx actually emitted (i.e., that the agent looked at a real match and verified the event set matches the Stage 0 expectation).

Task 2: sample decoded fills with correct direction – when the stream runs, look at the same fills…

do the aggressor directions look plausible?

I’ll commit Stage 1 when Task 1 confirms the de-duplication rule against a live tx and Task 2 shows real de-duped fills with same ground-truth directions from both exchanges.

Stage 1 Findings:

IT WORKS. A 90-second live run decoded 2,882 de-duplicated aggressor matches (4,169 CTF + 3,146 NegRisk/multi-outcome lines) from both exchanges, both directions, with ground-truth aggressor sign – and the per-maker legs reconcile to the taker aggregate.

real fills with real aggressor direction, de-duped correctly

Task 1 Outcome: confirmed the de-dup rule on a live tx

Another Material Finding: The Stage 0 direction rule (“aggressor = opposite of maker’s side on the same tokenId”) is incomplete, and the agent only found it by sampling live data. it sampled 8 live matches and found 7 of 8 are mint/merge – the per-maker and taker legs reference different, complementary tokenIds whose prices sum to exactly 1.000. Only 1 of 8 was a same-token swap. So the Stage 0 rule – which assumed the aggressor and maker trade the same token – would have mislabeled the aggressor’s token for the majority of matches.

Correct Rule: read the aggressor’s leg directly from the taker-aggregate log (maker-field = aggressor, side = aggressor’s own side, tokenId/amounts = the aggressor’s actual leg) rather than inferring it as “opposite of the per-maker leg.”.

Stage 2: Aggregate Maker-Side Adverse-Selection Markout (PROXY MID).

Three objectives:

mint/merge leg handling
sign convention and whether the aggregate comes out negative
drop rate – trade-price proxy needs a subsequent trade in the same token within the horizon to mark against – and thin markets may not have one.
- If the drop rate is high (i.e., > 50%), my markout is computed on a non-random subset (only the liquid tokens that traded again quickly), which biases the aggregate toward active markets. That’s the single biggest weakness of the proxy approach.
- This will be fixed by the API-mid integration in the next build

Stage 2 Findings: NUMBER CAME BACK NULL

The mint/merge gate passed cleanly. The worked example marks against the aggressor’s actual token (token X), keyed by each log’s own tokenId, with per-maker legs (token Y) feeding Y’s path, not X’s – “so the sign can’t invert via the wrong leg.”
The sign is null, and reported as null. Median = 0.000¢ at every horizon, mean tiny, and sign-unstable across runs (∆10 was -0.759¢ in run 1, + 0.197¢ in run 2, +0.030¢ in run 3). When the sign flips run to run, you don’t have a signal, you have noise.,
The drop rate is ~15-17%, plus ~1,300 fills left incomplete at shutdown – reported separately, never counted as marked.

The null finding is expect and is not a failure. The agent diagnosed WHY the proxy can’t see the signal. A trade-price reference prints at the bid (for sells) and the ask (for buys), so the proxy conflates spread capture with adverse selection. The one semi-consistent pattern (SOLD slightly positive, BOUGHT slightly negative) is “partly a spread artifact,” not a real markout.

The proxy is TOO CRUDE to separate “the maker earned the spread” from “the maker got picked off,” and those two effects roughly cancel in the aggregate, leaving ~0.

The stage 2 number does NOT answer the demand question of whether flow originators have LP clients being adversely selected. It’s null because the instrument is too crude, not necessarily because there’s no extraction.

Build 2 – API-mid integration – is necessary before I have a demand number.

Internal Contradiction:

My simulation found a large, robust adverse-selection effect (the ~13sigma result), but my live proxy measurement found null. Three reasons as to potentially why:

The effect is real but the proxy is too crude to see it (build 2’s API mid will reveal it)
The effect is real in the sim’s parameter regime but smaller in live Polymarket than the sim suggested – the mid will show a real but modest number
live Polymarket LPs are NOT meaningfully adverse selected in the current regime – in which case the demand argument is dead

Build #2 – API /CLOB mid integration

Real mid-based maker-side markout. Stages 0-2 of Build #1 (ground-truth on-chain fills + price-proxy markout) committed.

The proxy returned null because a subsequent-TRADE price prints at bid (sells) / ask (buys), so it conflates spread capture with adverse selection. The fix is to mark each fill against the order-book MID at horizon ∆, which removes the spread artifact. This build is the real demand test: does a mid-based markout reveal net adverse selection that the proxy couldn’t see?

Task A: mid source, recon + decide. We need the Polymarket order-book mid per tokenId, time-aligned to fills. Before building, resolve the data path (web-search/fetch) as needed:

Polymarket’s CLOB API: the endpoint(s) for the orderbook / best bid/ask per token (market/asset id). REST snapshot polling vs. the CLOB WS (book/price_change channel). State what’s available, rate limits, and whether the WS gives a live book we can maintain a mid from.
KEY MAPPING PROBLEM: on-chain fills key by ERC-1155 tokenId; the CLOB API keys by its own token/asset identifier (and conditionId/market). I’ll ask the agent to confirm how to map an on-chain tokenId —> the CLOB API’s token id so we fetch the RIGHT book for each fill. If this mapping needs Gamma/CTF metadata, I’ll ask it to state exactly what call resolves it. If it can’t be resolved cleanly, STOP and report – a wrong mapping marks against the wrong token’s mid (silent corruption, worse than the proxy).
Recommend the cheapest correct approach: maintain live books via CLOB websocket for the tokens we see fills in, vs. REST-poll a mid at fill-time+∆. State the tradeoff.

Task A Results: green light for Task B

mapping is identity. the CLOB token_id is the on-chain ERC-1155 tokenId, same integer, no translation, no Gamma lookup.

agent took a real on-chain full (tokenId, decoded BOUGHT @0.25) and queried the public CLOB directly: /midpoint?token_id=...773047 returned a mid, and /book?token_id=..773047 echoes back asset_id = the same on-chain tokenId. That’s the “verify against reality before trusting” discipline applied to the exact thing that would have silently corrupted the number. The wrong token mid risk is dead.
The mid source is public, no auth and gives me the off-chain clock I need.

Recommendation: WS market channel for a continuous per-token mid timeline(I need historical mid at fill-time and fill time + ∆, which REST polling can’t recover – it only gives mid at present moment), seeded with a REST /book snapshot per newly-seen token so a mid exists immediately. The comparison table makes the case cleanly.

Task B: fusion + mid markout

subscribe to / poll the CLOB book for tokens we’re seeing on-chain fills in; maintain a current mid per tokenId (and its timestamp).
TIME ALIGNMENT (load-bearing – the settlement-lag problem):
- on-chain block_timestamp is SETTLEMENT time, which lags the off-chain match
- fill-time mid (mid at/near the match) and mid at match + ∆
- Use the CLOB book timestamps, not the chain settlement time, as the mid clock. Measure and REPORT the settlement-vs-mid-clock offset empirically (this is the error-budget number Stage 1 left a TODO spot for)
Recompute maker-side markout against the MID: maker_markout(∆) = -s • (mid_{t+∆} – fill_price), same sign convention as Stage 2 (negative —> maker adversely selected).

Build #2 Results (Task B):

The build succeeded – the mid removed the spread artifact. The worked example is clean – same fill, proxy markout + 5.0¢, mid markout + 6.5¢, and the -1.5¢ difference is the bid/ask print artifact the mid strips out.

The mechanism I’d hypothesized (trade-price prints at bid/ask, conflating spread with adverse selection) is confirmed and rm. The mid produces a stable median of +0.500¢ at every horizon, every run – vs the proxy’s median ~0 with a sign that flipped each run.

The stable signal is +0.50¢/share – positive – meaning the typical maker is NOT net adversely selected. They’re capturing roughly half the spread, which is what a healthy market-maker is SUPPOSED to earn. A positive maker-side markout means the LP comes out ahead at the typical fill. So the headline finding from live Polymarket is…

typical LP is not getting picked off – they’re earning the spread, as designed

This isn’t what my simulation had predicted (the ~13σ AS result), and it’s not what the company thesis had assumed (“LPs bleed to informed flow, sell them a neutral clearing layer”.)

Why the thesis still works >

the signal isn’t uniformly “No adverse selection”

the mean is tail-dominated and sign-unstable (∆30 mean ranges -0.2¢ to +3.8¢ across runs). The median is +0.5 and stable, but the mean swings – which means a few large-move fills carry heavy adverse selection even though the typical fill doesn’t
- in other words: most fills are benign spread capture, but there’s a heavy tail of fills where the maker got run over
The direction asymmetry is consistent across runs: BOUGHT positive, SOLD negative – aggressive selling adversely-selects makers more than aggressive buying. That’s a reputable signature, small-n and noisy, but directionally stable.

At the typical fill price, Polymarket LPs are not adversely selected; they earn ~half the spread. Adverse selection is real but lives in a heavy tail of large-move fills and shows up as a persistent aggressive-sell=side negativity. Broad net extraction is not present in these windows.

Build #3 – per-maker attribution

per-maker / per-address adverse-selection attribution. At this point, Builds 0-2 are committed (verified contracts, ground-truth decode + de-duplication, corrected mint/merge direction mid-based markout).

Build #2’s aggregate said the TYPICAL maker is NOT broadly adversely selected (stable +0.5¢/share median = half-tick spread capture), BUT adverse selection is real and CONCENTRATED – a heavy tail (drives an unstable mean) + a persistent aggressive-SELL-side negativity. The aggregate hides the distribution. This build answers the one question that decides whether there’s a viable business model: does the tail extraction CONCENTRATE on identifiable, specific market-makers (—> a customer) or SMEAR across everyone (—> the +0.5¢ median is the whole story, no concentrated extraction)?

Build 3 Results: concentrated AS/extraction

It’s not a smear, it’s bimodal. The distribution of medians erodes and widens with horizon: at ∆30 you get 29 makers with medians < -1¢ AND 33 makers ≥ +1.5¢, distinct from the +0.5¢ spread-capture bulk. ~40 of ~109 qualifying makers carry persistently negative, horizon-deepening medians – separable from the bulk.

Adverse Selection Mechanism & Why These Specific Makers

A resting LP posts a quote in thin token market, an informed aggressor takes it, and the price drifts against the LP afterward – deepening with time (∆2 —> ∆30). So why specifically thin-token / tail-market makers?

The BUY-side concentration is a complementarity effect, not a behavioral one.
- Aggressors mostly buy, and in a mint/merge match the LP ends up holding the complementary YES/NO leg, so the victims show up as BUY-side LPs.
The thin-token concentration is the real economics behind the A/S.
- Very few LPs compete in thin tail markets, so one maker repeatedly absorbs the informed flow with no one to share the adverse selection.

Potential Build #4 – coverage expansion

The thin-token concentration collides with my measurement’s primary limitation – the 200-token WS cap means I’m systematically under-sampling the thin tokens, which is where the bulk of bleeding makers exist. My current count of ~40 bled makers is likely a floor, not a ceiling. Two refinements in build #4 specifically –

Coverage (refinement 1) directly attacks the floor-not-ceiling problem.
Volume-weighting (refinement 2) decides whether we have a good customer model.
- Current bleeding-set is identified by per-fill medians, and those makers are individually very small (≤ 1% of volume each, 1-4 tokens). A business made of “many small diffuse victims” is a hard sell.
- Volume-weighting would ask the question in $$ figures – is there a large maker bleeding real money, hidden in the tail because they trade few-but-big fills?
- If yes, that’s a concentration, high-value customer and a far easier sell.

Build #4 Results:

Design decisions (coverage expansion): the empirical cap probe finding the silent freeze at ~755 (with the detail that price_change keeps flowing so frame-level liveness misses it); the C = 500 safe cap; the eviction reframe (Build #3’s gap was append-without-eviction, not concurrency – ~122 late thin tokens dropped); the N = 2-shard pool with dynamic routing and 120s idle eviction; the three-layer book-level watchdog; and the zero-Alchemy-quota confirmation.
What the live run did: fresh 360s run, 38,736 marked rows to a separate file (baseline preserved), peak 1,500 tokens across 3 shards, 0 freezes, 14 evictions – plus the honest caveat that it hit the 1,500 ceiling and dropped 60 tokens.
What it found: 94.3% coverage with the full bucket breakdown and zero stale contamination; floor-not-ceiling NOT confirmed (with the time-window confound states); volume-weighted showing net-profitable LPs, the biggest maker as biggest winner, -$799 total bleed vs +$18,810 PnL; the side-by-side 27-in-both / 18-hidden breakdown; pUSD decimals = 6 verified; fragility flagged.

Build #5 – Regime-Window Experiment

Goal: Build #4 (quiet window) came back partly-negative for LP-protection: LPs net profitable (+ $18.8 vw), biggest maker biggest winner, bleed -$799 / ~4% of profit, smeared across tiny makers (aggregate vw markout ∆10 = +0.776¢/share). The open confound is now regime. This build instruments a single continuous feed that spans pre-spike —> event-spike —> post-spike, regime-tags every fill online, and runs a pre-committed decision rule to a verdict – KILL, REGIME-CONDITIONAL KEEP, or FULL KEEP – for the maker-protection thesis.

The point of a continuous feed like this is to de-confound regime from the time-window (Build #4 compared different time windows and couldn’t separate coverage lift from regime).

Task A Results:

Part 2 of Proof-of-Neutrality: necessary crypto primitives

Everything I’ve done so far measures extraction / AS, doesn’t prevent it.

Necessary Guarantees for “Neutral Clearing”

operator honesty: whoever runs the batch auction must clear it according to the stated uniform-price rule and not, say, insert their own order at the clearing price, reorder, or selectively exclude.
- this platform would act as the adversary – the clearing operator.
- the property I need here is verifiability: anyone can check the clearing was done correctly (mid was calculated fairly) without trusting me and without re-running it themselves
pre-clearing privacy / front-running resistance: even if the operator is perfectly honest, on a blockchain the sequencer or block producer sees transactions before they’re final and can insert their own ahead of the batch (MEV).
- the adversary here is the chain itself
- the property I need is that order contents are hidden until the batch clears, so there’s nothing to front run

These two are independent – you can have a verifiable auction that’s still front-run. You can also have a sealed-bid auction run by a dishonest operator (nobody front-ran it, but the operator cheated the clear).

CoW protocol has neither from what I can tell – it gets economic neutrality through solver competitions (many solvers bid to settle the batch, the best execution wins, so no single party can cheat for long). That works, but it’s “neutral because competition disciplines it,” not “neutral because it’s cryptographically impossible to cheat.”

My Goal: move from economically-disciplined neutrality to cryptographically-guaranteed neutrality.

Primitives:

verifiability for operator honesty
- zk-SNARK or zk-STARK – the clearing computation (collect orders, find the uniform price where supply meets demand, allocate fills, apply pro-rata rationing) is just a deterministic function. I could run it off-chain, then publish a succinct proof that I ran that exact function on those committed inputs and got this output – without revealing the inputs if I don’t want to, and without anyone having re-execute it.
- On-chain, a verifier contract checks the proof in a matter of milliseconds. This is the ‘verifiable’ part of verifiable clearing. Auction clearing is a well-structured computation – but SNARKs are smaller proofs / cheaper to verify but need a trusted setup; STARKs are bigger but transparent (no trusted setup) and post-quantum. For an auction the circuit is the clearing rule.
pre-clearing privacy for preventing MEV / front-running. three potential primitives to implement
- commit-reveal (simplest) – everyone submits a hash of their order (commit), then after the submission window closes, reveals the actual order. Nothing to front-run during commit because it’s just hashes.
  - weakened by the ‘griefing problem’: a participant who sees the early reveals can choose NOT to reveal their own (they “abort”), selectively withdrawing based on information – which is itself a form of value extraction, and I’d need to introduce penalties/bonds to discipline it. SOLVABLE
- threshold encryption (most production ready) – orders are encrypted to a committee’s shared public key, the sequencer orders the encrypted blobs (so it’s ordering noise – nothing to front-run), and only after order is decided does a threshold of the committee (i.e., 13 of 19) collaborate to decrypt.
  - ex: Shutter Network – live, on Ethereum.
  - tradeoff: trust assumption moves to the committee – you’d be trusting that fewer than the threshold collude to decrypt early. Not entirely ‘trustless’.
- verifiable delay functions (VDFs) / time-lock encryption (most elegant, least production-tested) – orders are encrypted such that they can only be decrypted after a fixed wall-clock delay (a computation that’s inherently sequential and can’t be parallelized away), removing the need for any committee at all.
  - nobody can decrypt early because the math forbids it until the delay elapses
  - tradeoff: VDFs are heavier, the tooling is less tested, and getting the delay parameter right against improving hardware is a live research question.

Initial Thoughts on Architecture:

threshold encrypted sealed bids —> ordered while encrypted —> decrypted after ordering —> cleared by the uniform-price rule —> zk proof published that the clear was correct.

This addresses both properties: nobody front-ran (encryption till ordered), and nobody can cheat the clear (the proof).

June 16, 2026

**pred markets <> us equities Current thesis: we're seeing institutionalization of prediction markets, but primarily on the participant and access…

Current thesis: we’re seeing institutionalization of prediction markets, but primarily on the participant and access side (i.e., Wintermute streaming two-sided quotes, FCM membership, OTC desks, ICE’s data moves, etc.). The mechanisms themselves are still continuous CLOBs – we haven’t yet seen batching/auction theory explicitly. The one mechanism intervention on record – Polymarket’s probability-dependent taker fee – was a tax on the rent, not a redesign of how orders are matched (this is the point on time-bounding ≠ batching the matching in my FBA substack article).

Equities got institutional participants AND eventually a mechanism response (FBAs in theory, smart markets in practice); prediction markets have the participants now arriving and still no mechanism response.

What a smart market aims to do:

Removing latency rents is a substrate (batching, randomization). What makes it truly ‘smart’ is the optimization layer: clearing for maximal API under expressive, multi-security constraints.

Future directions:

A counterfactual batching study. I now have ~500k rows of orderbook snapshots across Kalshi & Polymarket. I could replay them under a hypothetical randomized periodic auction (varying the interval from 100ms to 500ms to 1ms) and estimate how much of the measured pathology disappears. Pathologies to watch include how many crossed-book states would clear at a uniform price, how much of the negative post-fill markout is sniping-rent that batching deletes vs. genuine adverse selection that survives.
- Goal: batch-matching would have improved retrospective markouts by X in this asset class
Drop an FBA venue into my simulator. My orderbook/AMM/hybrid sweep already stress-tests venue designs against the same Bayesian trader population – adding a randomized-batch-auction arm is incremental engineering, and it gives me an experimental version of #1: same agents, continuous vs. batched, measure adverse selection, LP markout, and informed-trader welfare.
diff-in-diff of polymarket dynamic-fee launch.
- Goal: did taxing probability actually improve quote stability, depth, & markouts on the 15-minute markets – or did it just move the race?
legging risk in event markets. Correlated contracts are everywhere in prediction markets: mutually exclusive outcomes that should sum to one, the save event on two venues, conditional structures. Anyone trading those spreads legs in sequentially today.
- Method: measure the legging cost (how much the second leg moves against you after the first fills). If it’s material, that’s the demand case for atomic multi-contract execution in event markets.
Synthetic-NBBO / consensus-quality piece. Event markets have no consolidated reference price – I could construct one across venues and measure how often it’s crossed, locked, or fading around catalysts.

Company Thesis:

Every market that acquires sophisticated participants eventually needs a mechanism response; prediction and on-chain markets are hitting that point now, the next wave of participants is machine rather than human (which is bad for continuous matching), the response is batch/optimizing clearing – “neutral matching”. I want to sell that clearing layer to the flow originators who are bleeding, proven first on event markets where the extraction is easily measurable.

Mechanism: Batch/optimization clearing removes the within-batch speed advantage and clears everyone in a round at one uniform price. That’s neutrality. CoW is the proof as a DEX ($35B+ in lifetime volume, freezing the book every ~30s and auctioning settlement to competing solvers, with the batching itself preventing MEV).

Model: Not a venue, it’s a clearing/settlement layer. The end-customer is the flow originator (frontend, wallet, aggregator, or agent framework whose users or agents are getting picked off), who routes flow to us for protection.

Beachhead: event/on-chain markets.

Expansion: agent markets, where neutrality becomes structural

Refined Pitch:

Verifiable Neutrality: A batch auction off-chain requires you to trust the operator to actually run the batch fairly (i.e., they didn’t audit the orders, didn’t insert their own, they picked the clearing price fairly, and didn’t reorder).

On-chain, the clearing computation can be verifiable: the batch, the orders, the objective function, and the resulting uniform price are all attestable and anyone can recompute that the clear was neutral.

Neutrality is enforced by the settlement layer.

Most MEV solutions are mitigations: encrypted batching kills MEV at the mechanism level. On a continuous on-chain book, the adversary isn’t just a fast trader – it’s a block producer, who sees your order in the mempool and if they want to, can front-run, back-run, or sandwich it. Batch auctions already collapse intra-batch ordering (everyone clears at one price, so transaction order inside the batch stops being extractable).

Taking it one step further, you could encrypt the orders till clear time (this is possible thanks to FHE & ZK). A batch auction is uniquely suited to this because it already collects orders over an interval and reveals nothing till the uniform print – so threshold-encrypted or commit-reveal submission means the block producer can’t see what they’d front-run.

It’s essentially a sealed-bid-batch. Off-chain mechanisms don’t need this because they hide the book by being a trusted operator. This is one of the biggest gripes with on-chain markets.

------------------

Prediction and on-chain event markets are institutionalizing fast — Wintermute streaming two-sided quotes across Kalshi and Polymarket, Galaxy’s OTC desk, Clear Street as the first institutional FCM, ICE distributing Polymarket data — but they still clear on continuous order books, the same matching rule Budish-Cramton-Shim showed manufactures a speed race that taxes resting liquidity. I measured it: across 15 cross-listed markets, the visible cross-venue edge clears only at an institutional fee tier neither venue offers, and post-fill markout on the resting side is uniformly negative — the apparent free spread is adverse selection paid to whoever’s fastest. Equities took a decade and a Flash Crash to answer this with frequent batch auctions; these markets have the sophisticated participants now and no mechanism response yet.

I’m building that response — a batch-clearing and settlement layer for event contracts. Orders collect over a short interval and clear at a single uniform price, so there’s no within-batch speed advantage to extract; CoW Protocol is the live proof this attracts flow on-chain. The part that makes it decentralized infrastructure rather than a batch auction on a new venue: on-chain, the operator-trust a continuous market forces on you can be replaced with cryptography — the clear becomes verifiably neutral (anyone can recompute the uniform price was honest, no trusted operator), and sealed-bid submission encrypted until clear time removes the information MEV feeds on rather than mitigating it after the fact. Specialized for event-contract microstructure — bounded [0,1] prices, discrete resolution, cross-market correlation — and sold to the flow originators whose users are bleeding, not as a competing venue, which dissolves the cold-start. I’ve proven the rent exists mechanically; the experiment running now is whether batching removes it under adaptive behavior — preregistered with a kill condition: if extraction doesn’t fall as the batch interval grows, the primitive isn’t worth building.

----------

Prediction and on-chain event markets are institutionalizing fast but still clear orders on continuous orderbooks – essentially the same mechanism that hands a rent to whoever’s fastest (i.e., MMs) and taxes everyone who provides resting liquidity through unfilled orders. we saw this in equities back when HFT became popular.

I’m working an analogous mechanism solution for on-chain event markets – essentially a batch-clearing and settlement alyer that clears everyorder at one uniform price so there’s no speed advantage to extract, and minimizing reliance on trustworthy block producers via decentralized infrastructure. So, think of verifiably neutral clearing that anyone can recompute, and encrypted order submissions that the block-producer can’t front-run with their own orders.

June 14, 2026

necessary properties of crypto infrastructure in 2026 1. Verifiable Contracts – agreements whose correct execution anyone can check cryptographically, so you can trust the…

Verifiable Contracts – agreements whose correct execution anyone can check cryptographically, so you can trust the math rather than trusting a counterparty or intermediary to honor the deal.
Neutral Markets – venues where no participant gets a structural edge baked into the mechanism (no privileged access, no rent extracted by being faster to deploy capital, or by gatekeeping), so the rules apply equally to everyone.
Trustless Systems – systems whose correctness is guaranteed by design (cryptography, consensus) rather than by trusting the operator, so you don’t have to believe anyone is honest for the system to work properly.

1 & 3 already have a strong primitive base – the foundational technologies that make verifiability and trustlessness real.

(i.e., Fully Homomorphic Encryption that lets a computer run computations directly on encrypted data and return an encrypted results, without ever decrypting it so a server can process data without seeing it; Zero Knowledge Proofs letting one party prove a statement is true without revealing the underlying data; Decentralized Compute fostering networks that coordinate distributed machines to train or run models and do general computation without routing through a single cloud provider so you get compute without surrendering control to any hyperscaler; Crypto-native identity allowing for portable, user-controlled identity and credentials that live with the user rather than inside a platform’s database so users own their identity and data across apps instead of re-creating it inside each walled garden)

Missing primitives under property 2 – neutral markets: neutral matching for AI-agent prone markets; neutral settlement rails for on-chain event markets; verifiable execution-quality infrastructure; pre-trade privacy for agentic order flow.

Neutral Matching and Batch-Settlement Rails both rely on the claim that batch clearing measurably reduces extraction versus continuous matching on event-contract flow. The mechanical counterfactual batched settlement limitation proves the rent exists, but doesn’t prove that batching removes rent under endogenous behavior.

Integrating the FBA auction acts as a gate to building the matching primitive – tells us whether batching helps, at what pi, at what immediacy cost, and for whom. If the pi-curve returns flat, batching doesn’t help on event flow.

June 11, 2026

FBA-sim: fixing maker-fill on the CLOB Problem: when a resting limit order is filled by an incoming market order, the maker's side is never recorded as a…

Problem: when a resting limit order is filled by an incoming market order, the maker’s side is never recorded as a TradeRecord – venues/clob.py _execute_buy/_execute_sell just shrink the (agent_id, qty, oid) tuple in the deque. Only the taker side reaches ‘_on_trade’. This means maker-side PnL and markout are undercounted on CLOB and Hybrid today, and would make any FBA-vs-CLOB comparison insignificant.

Goal: make maker fills recorded symmetrically with taker fills, WITHOUT changing matching/economics behavior – purely a recording fix.

Essentially, a trade has two sides: a taker (the person who crosses the spread to trade now) and a maker (the person whose resting limit order was sitting there waiting). Right now my simulator only writes down the taker’s side of each trade. When someone’s resting order gets filled, the maker’s half just quietly disappears from the book without being recorded.

The fix is to record both sides every time, so each trade produces two entries (maker + taker) instead of just taker.

The whole point of the FBA simulation is measuring how badly liquidity providers get picked off – and liquidity providers are makers. If the maker side isn’t recorded, I’d be blind to the exact people I’m trying to quantify. Even worse, the new FBA venue would record both sides while the old CLOB records one, so any difference between then could just be a bookkeeping mismatch rather than a real effect of batching. The fix makes both venues count trades the same way, so the comparison is honest.

The test that proves it worked is simple: total quantity bought should equal total quantity sold across a run. Every trade has two equal sides, so they must balance – if they don’t, something’s still being dropped.

How it Works:

Venue —> environment fill channel. New MakerFill dataclass and venue.drain_maker_fills() (default [], so AMMs are untouched and the abstract methods are unchanged. The CLOB’s 4 crossing loops now buffer one MakerFill per resting order consumed – agent, maker side, qty, the resting limit price, order id, and the venue mid bracketing the individual consumption (including level cleanup). HybridVenue delegates the drain, so LP (-2) and bootstrap (-1) executions hit the tape too. Matching arithmetic is untouched – only bookkeeping lines were added inside the loops.
Recording. _on_trade (&& execute_market_order) drains maker fills into TradeRecords with liquidity=“maker”, fees_paid = 0, capital_committed = 0 – maker capital was committed at rest time, so the fill charges nothing. Maker legs are appended before the aggregate taker record so the mid trajectory still ends on the post-execution mid. TradeReocrd gained liquidity: str = “taker” so the two sides are distinguishable downstream.
The zero-qty / capital decoupling. The old zero-qty rows couldn’t simply be dropped because _sync_costs reconciled agent budgets from trade_log – a fully-resting limit’s capital charge lived only in its zero-qty row. So, capital flows moved to a new cost_log (one CostEntry per intent, identical amounts and timing to before), _sync_costs and the exhaustion metric now read that and trade_log holds fills only. This guarantees maker records can never double-charge or spuriously clear pending_cost – verified bit-identical delpoyed/pending_cost in the comparison.

June 10, 2026

FBA sim – FBA venue build The FBA venue must be a resting limit book cleared by a periodic unfirm-price call auction every pi ticks – this is…

The FBA venue must be a resting limit book cleared by a periodic unfirm-price call auction every pi ticks – this is traditional from Budish et al.

Limit orders rests across batches till filled or cancelled (since that’s how LPs provide the depth); market orders are single-batch and expire if there’s no opposite interest that clear.

Prediction and event-contract markets are moving on-chain and are about to traded primarily by autonomous agents, not humans. The mechanisms they run today (i.e., continuous central-limit orderbooks & automated market makers) reproduce the exact latency race and adverse-selection pathologies that broke equity markets and FX markets back in 2010.

State & Cadence:

Constructor takes tau_ticks: int (clear interval) plus whatever case Venue needs. Maintain a resting limit book (reuse the CLOB’s book structures – will not invent a new price type.
tick() increments an internal counter; when (counter % tau_ticks == 0), run a clear. Orders submitted during [last_clear, this_cleear) participant inb this clear. The sweep already fires venue.tick() at priority -50 before signals/decisions/trades, so submits at step t land in t+1’s clear - rely on that, don’t add a scheduler.

Submit Semantics (deferred)

submit_limit_order: add to resting book, return an OrderResult with filled_quantity=0, remaining_quantity=qty, a real order_Id (“acceted/pending” convention from Entry 1 dataclass). No synchronous fill.
submit_market_order: queue as a market-side participant for the next clear.

Clearing Process (native reimpl of `batch_counterfactual/auction.py`)

candidate prices are distinct resting / arriving **limit** prices (market orders participate at every price but define no candidates – with both F and S step functions breaking only at limit prices, interior volumes never exceed candidate columes, so scanning candidates finds the max).

June 10, 2026

FBA-sim: fixing maker-fill on the CLOB Problem: when a resting limit order is filled by an incoming market order, the maker's side is never recorded as a…

Goal: make maker fills recorded symmetrically with taker fills, WITHOUT changing matching/economics behavior – purely a recording fix.

The fix is to record both sides every time, so each trade produces two entries (maker + taker) instead of just taker.

How it Works:

Venue —> environment fill channel. New MakerFill dataclass and venue.drain_maker_fills() (default [], so AMMs are untouched and the abstract methods are unchanged. The CLOB’s 4 crossing loops now buffer one MakerFill per resting order consumed – agent, maker side, qty, the resting limit price, order id, and the venue mid bracketing the individual consumption (including level cleanup). HybridVenue delegates the drain, so LP (-2) and bootstrap (-1) executions hit the tape too. Matching arithmetic is untouched – only bookkeeping lines were added inside the loops.
Recording. _on_trade (&& execute_market_order) drains maker fills into TradeRecords with liquidity=“maker”, fees_paid = 0, capital_committed = 0 – maker capital was committed at rest time, so the fill charges nothing. Maker legs are appended before the aggregate taker record so the mid trajectory still ends on the post-execution mid. TradeReocrd gained liquidity: str = “taker” so the two sides are distinguishable downstream.
The zero-qty / capital decoupling. The old zero-qty rows couldn’t simply be dropped because _sync_costs reconciled agent budgets from trade_log – a fully-resting limit’s capital charge lived only in its zero-qty row. So, capital flows moved to a new cost_log (one CostEntry per intent, identical amounts and timing to before), _sync_costs and the exhaustion metric now read that and trade_log holds fills only. This guarantees maker records can never double-charge or spuriously clear pending_cost – verified bit-identical delpoyed/pending_cost in the comparison.

June 9, 2026

FBA-sim build plan Design Requirements:

Design Requirements:

FBA arm has to prove that a batched venue reduces extraction: less negative LP markout (the Thesis-B bleeder stops bleeding; users of on-chain event-contract markets).
FBA arm has to prove that there’s less value lost by uninformed takers, vs. continuous CLOB & AMM, on event-contract-style flow.
Mechanism only works if there’s a speed asymmetry to neutralize – MOST IMPORTANT. Batching protects no one if sniping isn’t already happening at scale.
- The agent population must have a latency differential:
  - slow liquidity providers who post two-sided quotes and update with lag;
  - fast/informed traders who react to news first and pick off stale LP quotes before they can cancel orders (‘sharps’);
  - noise takers who provide benign flow
- On a CLOB, the fast trader wins the race after every news move; on an FBA, the jump and the LP’s repricing land in the same batch, so there’s no pickoff.
- first dev task is to incorporate latency-differentiated agents
Event-contract flow, not generic. Fair value bounded in [0,1] since it’s a probability, slow diffusion punctuated by Poisson news shocks – discrete jumps are what create stale quotes and gie batching something to protect against. If my fair-value process is a plan random walk today, I need to add a jump component and the [0,1] bound.
The central knob is the batch interval pi (INDEPENDENT VARIABLE)
- sweep pi from continuous (existing CLOB) up through ~30s (CoW-like), and show extraction collapsing as pi grows. The headline figure is sniper PnL (or LP Markout) vs. pi per venue – the batching divident as a curve.

Existing State of Orderbook-Hybrid-Amm-Sim repo:

the venue.tick() fires once per timestamp at priority –50, before signals/decisions/trades, is a clean batch-clear seam – the FBA venue accumulates submissions during the step and clears on its tick. And rerun_clob_and_merge is an exact precedent for an additive fba-only run, so I wouldn’t recompute 900 cells.

Decisions:

The price process has no jumps: I designed the whole arm around news shocks reacting stale quotes that batching protects agains. In reality, truth is a static latent-factor draw that never moves; “news” is a Poisson stream of noisy signals about a fixed truth, not jumps in the truth.
- This means that fast traders can’t pick of a stale quote after the fair value jumps (because the fair value never really jumps.
- What does exist is informational: the tail-signal agents receive more-preceise signals
Latency mechanism exists but is zeroes: observation_delay schedules decisions at now + delay but the sweep sets it to 0 for everyone, so ordering falls through to heap-insertion/agent-list order. Clearly FBA needs delays on to have anything to neutralize. But given finding #1, I’ll be precise about what the delay represents: it’s not “reaction time to a price jump,” it’s “how many steps till an agent acts on the latest signal round.” The fast/informed agents get short delay, the LPs get longer delay – so informed flow acts on fresh signals while LP quotes reflect older ones, and the batch is what collapses that ordering advantage. So the delay differential and the signal-tier structure (routine vs. tail signals) have to be wired together – fast access to precise tail signals is the edge batching gets rid of.
- This is the experimental knob
No markout, no effective spread – but mid_price_before/after per fill exists, both are post-hoc computable. This is confirmed.
- Non-negotiable to add since markout is the core headline – it’s the metric that maps to the bleeder I identified before (LPs). The raw matieral’s there, I’ll compute markout at ∆ and effective spread from TradeRecord after the runs, not inside the hot loop.

The log showed OrderResult returning synchronously from submit_* and MarketEnvironment._on_trade builds the TradeRecord from it – so a deferred FBA full (submitted during the step, filled on tick()) breaks the synchronous-return assumption. That’s the single trickiest integration point. The FBA venue has to return an “accepted, pending” OrderResult on submit and emit the actual TradeRecord at a clear time. We need to see how _on_trade and the trade-recording path handle a fill that arrives at tick rather than at submit, or I’ll silently drop FBA fills from the metrics.

maker-fill bug is an important finding after scraping the old repo: when a resting limit gets hit, the maker’s fill is never recorded – the CLOB just shrinks the deque tuple, no TradeRecord, no callback. So the existing CLOB/hybrid PnL already undercounts maker-side executions. This matters for two reasons

it’s a pre-existing caveat in the published-adjacent sim that I should note in build_log regardless of FBA.
second (more importantly) if the CLOB silently drops maker fills but the new FBA venue records on both sides (which is needs to in order to compute LP markout), then FBA vs CLOB is not an apples-to-apples comparison – FBA would show more recorded volume and different PnL purely as a recording artifact, not a mechanism effect. That would be a fatal confound in exactly the headline result.

June 9, 2026

forward looking thoughts for counterfactual – pred market ATS smart market? Thoughts on building an ATS Smart Market for event contracts / on-chain markets, since institutional markets (equities…

Thoughts on building an ATS Smart Market for event contracts / on-chain markets, since institutional markets (equities / FX) are dominated by incumbent ATS (i.e., OneChronos).

Caveats to building an ATS Smart Market – easy to build a matching engine (optimization-based auction; commoditized) – moat is regulatory licensure (SEC reg., broker-dealer status, FINRA, Form ATS-N), liquidity network effects (empty venue is worthless, and it’s hard to bootstrap two-sided institutional flow from my desk), institutional trust and track record, connectivity system (FIX gateways, OMS/EMS integration, broker routers). These things structurally favor incumbents.

Really, I need to swap the mechanism / participant-side approach to market structure to solving a problem that I care about. Whose pain is acute enough that they’d route flow to me, switch venues, or adopt to my system?

Since equities / FX are dominated by existing ATS, might be worth exploring event contracts / on-chain markets for two reasons: there’s no entrenched smart-market incumbent there, and one-chain settlement changes the substrate (may not need to be a registered US ATS at all; it’s a different regulatory surface).

Logical Next Step – FBA-sim on hybrid orderbook repo

Other thoughts:

CoW Protocol settles batch auctions where orders are grouped into batches and auctioned to solvers who compete to find the best settlement, with the grouping itself enforcing enforcing rules that precent MEV. Thus, there are live, large-scale batch-auction surface that attract real flow by protecting it from extraction.

The Bleeder: whoever the current mechanism taxes

Batch auctions help 2 parties, and hurt 1.
- They help the liquidity provider being sniped (the LP whose resting quote goes stale on news and gets picked off by faster capital before they can cancel their order – this is demonstrated by my Thesis B finding, where LP-edge markets show negative post-fill markout, i.e., market makers bleeding to adverse selection).
- They also help the taker getting extracted – front-run, sandwiched, or paying the latnecy tax baked into spreads.
- They hurt the latency arbitrageur – whose entire edge is being fastest.
- Thus the bleeders are LPs and extracted takers; enemy is fast arbitrageurs.

The Switcher: whoever has the pain and the agency to move flow and the ability to do it without needing everyone else to move first

The professional market makers who isn’t the fastest. Strong pain, real agency, and the mechanism is designed for their fail-point. The issue is a single MM moving to an empty venue doesn’t yield necessary liquidity. They also follow liquidity, not create it. So, the MM is the SECOND SWITCHER.
The flow originator – as shown by the CoW protocol. The party with pain && agency && the ability to move unilaterally IS NOT A TRADER; it’s whoever owns or routes the order flow: a prediction-market frontend, a wallet, an aggregator, a Telegram trading bot, a “smart-money” copy-trading app – any surface where end-users initiate event-contract trades adn currently eat the extraction.
- flow originators can move flow the instant they integrate, without waiting for a liquidity network to form, because in the CoW protocol you don’t run liquidity – **solvers/MMs compete to fill the flow you bring.**The originator integrates one API; their users stop getting pitcked off; the originator can even capture surplus to share back. That rids the cold-start problem that kills the venue version of this: with a protocol or application integrates CoW, their users’ traders become intents in the same batch auction, the same MEV protection and uniform clearing prices apply – which is what makes it useful as infrastructure, not just a consumer product.

THUS: the ideal user of an event-contract / on-chain market ATS smart market is whoever brings flows: a prediction-market / event-contract flow originator whose users are currently bleeding to extraction and stale-quote pickoff

a frontend
wallet
trading-bot layer that sits on top of Polymarket-style on-chain markets
anything that integrates my batch settlement layer so their users get protected, surplus-returning fills.

The end-bleeder (not switcher) is that flow originator’s retail/semi-pro user losing the markout that I previously quantified.

This is the protective settlement layer in front of existing flow, and MMs compete to fill it.

single-sided wedge
integrate one partners at a time
no liquidity to bootstrap

The CoW model is the founder-tractable version of this idea, and it’s the one whose local ideal user I can identify.

Caveats:

The on-chain event-market flow may be too small today. Polymarket is the whole game and its volume is spiky around big events; outside election season the addressable extraction might not support a business.
- this is TAM issue, not mechanism
Polymarket already runs its own off-chain CLOB and could batch internally or add protection themselves – the same “incumbent will built it” objection.
- Defense: I sit across venues and frontends as neutral infrastructure, which a single venue won’t do because it competes with the others.
- Is that neutrality valuable?
Spot-DeFi batch settlement is crowded (CoW, UniswapX, 1inch Fusion, dozens of intent protocols). My only defense is event-contract specialization – the fact that these markets have discrete resolution, news-driven stale-quote dynamics, and cross-venue fragmentation that generic spot solvers don’t model.

Bridge to the Build: designing the FBA-sim arm to answer this user’s question not a generic one.

The flow originator needs to believe that routing its users’ flow through a batch auction measurably reduces what they lose.
Headline output of the simulator should be the change in taker markout / extraction and LP adverse-selection markout under a CoW-style batched-solver venue vs. continuous CLOB & AMM baselines, on event-contract shaped flow (with discrete resolution, news shocks, the stale-quote dynamics from my Part 1/2 work measured on real books).
- I’ve built the empirical case a flow originator would need to integrate – and the single most on-thesis artifact I could build. The simulation becomes the proof-of-value for the named user, beyond a mechanism study.

June 6, 2026

[Counterfactual Batching] – Exp 1 – episode detection + first-clearance This is the first experimental arm on the frozen panel. The goal is to detect every crossed episode across the 10…

This is the first experimental arm on the frozen panel. The goal is to detect every crossed episode across the 10 included pairs, run a counterfactual uniform-price call at each episode’s first cycle under 5 fee regimes (gross / retail / retail + rebate / institutional / zero), & quantify who could have cleared what.

Design Decisions:

Episode = contiguous gross-crossed cycles; ends on uncross, gap, or termination. EPISODE_GAP_MAX = 600s bridging tolerance, decoupled from book.py’s unchanged 90s staleness bound – a 424s daemon gap mid-cross isn’t a sign that the market’s uncrossing. Bridged episodes carry gap_adjacent + bridged-seconds (606 episodes bridge a gap); methods state the assumption: ≤ 600s unobserved within an episode assumed continuously crossed. The sensitivity table (90/300/600s —> 300/267/241 NYK episodes) is now published; parameter doesn’t drive results.
Full-capture semantics, no duration floor, no merging across genuine uncrosses – fleeting crossings are the latency-race phenomenon, not noise. Duration-stratified reporting instead (<1m / 1-5m / 5-30m / > 30m).
per-contract vs size-weighted always labeled, never mixed unlabeled. The caveat that flow’s held fixed head the writeup.

Findings:

1,289 episodes / 10 pairs. The flagship NYK 15.15h window is the extreme tail, not the typical state: NYK alone is crossed in 56.5% of cycles across 9 days, in 241 distinct episodes. “Crossed” is a recurring condition of this market paid, and is detailed for the first time.
Duration concentration: >30min bucket = 18% of episodes but 86% of crossed-minutes – persistence dominated by long adverse-selection-priced states, with a large population of fleeting crossings beneath.
Fee-cliff holds at episode level: clearable fractions gross 1.00 / retail 0.057 / rebate 0.130 / institutional 0.774. Part 1’s access narrative holds through the upgrade from snapshots to episodes.
Size-weighted (real ladders): flagship episode = $468 across 132k contracts at first clearance; clearalne fractions tie out exactly with the per-contract path.
Objective disagreement: max-volume and max-aggregate-PI choose different clearing prices in 1,864 sized rows – real-world books where the smart-market objective is not the textbook call auction.
Kelce-market anomanly + reconciliation: kelse ~52% retail-clearable vs. Part’s “0 of 15.” Resolved: ~85% vintage (regime emerged ~06-02, post-dating Part 1’s 05-28 snapshot; clearable fraction 21% pre-essay —> 99% post), ~0% instrument (pm_micro.arb walker reproduced 0.519 identically), ~15% fee-structure conditioner (median C=0.03; retail wall 1.10¢ in the tail vs 3.50¢ central; clearability needs both wide cross and low wall – each alone clear 0%). Part 1 is unedited; regime change is purely a Part 2 finding.

New Infrastructure Built:

arm_a_clearance.py (episode engine, stratified summaries, sensitivity, reframed sanity gate – NYK longest episode = flagship, passing), extract_ladders.py (scoped gz extraction, 100% both-venue coverage at all 1,289 episode starts; reconstructed top of book ties out to book.py exactly), arm_a_sized.py (full clear() on real ladders, both objectives, per tier), reconcile_kelce.py (reproducible anomaly adjudication), figures a1-a5 + reconciliation figure, RESULTS_A.md + RECONCILIATION_KELCE.md.

Caveats:

mechanical counterfactual, flow is fixed; per-contract primary, size-weighted where ladders extracted; mid-markout proxy reserved for Exp 2 (no trade tape); 30s sampling aliasing on fleeting episodes (sub-minute bucket is a lower bound); bridging assumption as above; first-clearance only – repeated-auction dynamics within long episodes note yet modeled (that’s the interval-dial question, Exp 3 logic at panel tier).

What this cross-state analysis (Exp 1) sets up: the episode table is the spine for 2 (markout decomposition per episode – now also the adjudicator of whether kelce’s retail-clearable crosses survive adverse selection, i.e., the deeper test of Part 1’s thesis), 3 (interval sweep), 4 (joint cross-venue clearance at 30s tier). Ladder extraction de-risks all sized claims downstream.

June 6, 2026

[Counterfactual Batching] – AUCTION ENGINE \\NOTE\\ The smoke test implemented in this auction-engine build is the whole point – the new engine has to derive my…

**NOTE** The smoke test implemented in this auction-engine build is the whole point – the new engine has to derive my published finding – crossed gross, dead at retail fees, alive at the institutional tier – from independent machinery. When that table prints AGREES, I’ll have closed the loop between the Book Reconstruction phase’s empirical findings & this engine, and the fee-blocked-vs-uncrossed distinction it spits out per cycle is a core figure for any write up.

The goal of building this auction engine & cutoff generation feature was the counterfactual math itself. It’s essentially a deterministic uniform-price call auction that can replay at any moment of the froze panel and answer whether a crossed-market state would have cleared, at what price, with what price improvement, and under which fee regime. There’s no raw data porting, it’s based entirely on the frozen data collected during the DATA AUDIT phase.

auction.py – three execution paths:

clear(orders, objective, fee_tier) – full quantity-aware single-market call auction. Feasibility is fee-adjusted per leg by the resting order’s venue (buy clears at p if p + leg fee ≤ limit; sell if p – f ≥ limit). Two objectives:
- max_volume (textbook call auction)
- max_agg_pi (API – this is the smart market objective)
- Tie-break: midpoint of the optimal price interval, rounded to the finer venue tick (ASSUMPTION-1). Marginal rationing: pro-rata by quantity, no time priority, largest-remainder rounding (ASSSUMPTION-2). Pure functions, Decimal throughout, no RNG.
clear_joint(order_a, orders_b, ...) – both venues merged into one book for a paired market: the primitive for Exp 4 (joint cross-venue auction thesis). Settlement-rail impossibility remains a writeup caveat, not code.
clearance_bounds(book_a, book_b, fee_tier) – the price-only path for the 30s panel (where sizes are NONE): does a uniform price exist that both venues’ best quotes accept after fees, what’s the feasible interval, per-contract PI to each side at midpoint.
- Infeasibility will return a reason – gross-uncrossed vs. fee-blocked – and that distinction is itself a finding to note.
- book_to_rders increases if sizes are NONE, structurally preventing invented quantities from every entering the full engine.

cutoffs.py – fixed_grid and seeded randomized (gaps ~U[0.5T, 1.5T]) generators, parameterized T from 30s to 60min for panel runs (sub-second is reserved for the Polymarket-only case study). Gap/outage-aware: a cutoff landing where book_state returns NONE emits an explicit SkippedAuction record with reasoning – never an unmarked skip. This is fully reproducible under seed.

Validation: 46 tests passing (15 new): hand-computed 4-order clearings on both objectives; a fixture where the two objectives choose different prices; gross-uncrossed vs. fee-blocked at different tiers; Kalshi parabolic-fee feasibility flips at extreme C; pro-rata + remainder; tie-break/tick rounding; a clear_joint case executing volume neither venue could alone; cutoff determinism and outage skips properly.

Smoke-Test: Running clearance_bounds over every cycle of the frozen NYK window: 100% clearable gross with median per-contract API ~0.20¢ (Kalshi side) + ~0.30¢ (Polymarket side) = 0.50¢ published cross, split at midpoint; 0% clearable at retail fees; 100% at the institutional tier. AGREES. The engine independently re-derived my initial finding that the spread is real, retail-dead, institutionally-accessible – through entirely new machinery (this auction engine).

June 6, 2026

[Counterfactual Batching] – BOOK RECONSTRUCTION Goal was to produce a singe, trusted interface between raw daemon data and all downstream analysis: every arm queries…

Goal was to produce a singe, trusted interface between raw daemon data and all downstream analysis: every arm queries book.py rather than touching individual CSV data. It was built entirely against the frozen replay set (1.06M rows, content-hashed in FROZEN_MANIFEST.json so every umber downstream is reproducible from a frozen dataset.

BookState: venue, market_id, ts, best bid/ask.

Prices were normalized to Decimal probability in [0,1] for both venues (Kalshi cents and Polymarket decimals converted; raw values + tick sizes preserved as attributes). YES-side convention everywhere; NO views derived on demand, never stores – kills an entire class of side-convention bugs. A valid BookState requires both sides present; one-sided or error rows are treated as gaps, and are never fabricated.

Lookups:

book_state(venue, market, t) – last snapshot at or before t, R9-safe parsing (the embedded-newline error rows from the audit). Returns NONE – never a stale fabrication – if t falls in a gap beyond the 90s staleness tolerance or inside the konwn 10.1h outage (gap windows loaded from the audit inventory).
paired_state(pair_id, t) both venues’ legs via the existing pairing maps; NONE if either leg is unavailable.
is_crossed(pair, fee-tier) / cross_size() – gross and fee-adjusted variants.

fees.py: verbatim port of the Part 1 fee models – Kalshi parabolic, Polymarket with category tiers – wrapped in four study tiers (retail, retail + PM-rebate, institutional 0.30/0.20zero) behind one leg_fee() signature. Pure functions, table-tested against hand-computed values at C = 0.05 / 0.50 / 0.95.

Validation: 31 unit tests (normalization round-trips, at or before semantics on a synthetic gap fixture, None-inside-outage, R9 multi-line fixture, fee tables, paired lookup with missing leg). The smoke test rebuilt the full NYK window through the new layer – 1,749 cycles resolved, zero NONE in-window, 100% crossed gross, median cross 0.50¢, max 1.5¢ – AGREES with both the raw spike and the published Part 1 finding. The layer independently reproduces known ground truth.

Known Limitation (load-bearing): bid_sz / ask_sz are NONE from the 30s panel – top of book prices only. Real-data claims are per-contract till the per-episode gz ladder extraction lands; the full ladders exist in raw gz (kalshi 1¢ ticks, PM 0.1¢) but aren’t queryable yet. Size-weighted / depth-distribution claims are gated on that extraction.

Design principles worth remembering: honest absence over fabricated presence (NONE > stale state); normalizes once at the boundary, analyze in one unit space; gaps are first-class data, not exceptions; every layer must reproduce a known result before anything new is built on it.

This has all been committed & pushed. Next step is building out the math for the auction layer.

June 6, 2026

[Counterfactual Batching] – DATA AUDIT Audit Goals:

Audit Goals:

row counts per vneue • market • day. find the gaps (deploy downtime, market resolution)
for one Kalshi and one Polymarket row: every field, and which timestamp is whose – exchange stamped, my receipt-stamped, or both?
- Both venues’ APIs differ here; this decides how experiment C handles the ±40ms jitter test
depth: full ladders, top-N levels, or top-of-book only – per venue? Arm A’s “price improvement distributed” claim needs at least a few levels
trades: did the deamon capture prints, or quotes only?
- quotes-only is fine – my markout method already models fills – but it must be stated in the writeup
WebSocket windows: list every sub-second capture window I have – market, venue, start/end, message type (deltas vs. state snapshots). This inventory is the boundary of what Exp. C can claim
NYK market specifically: confirm the 14.8h window’s rows are intact end-to-end on both venues, since it’s about to become my flagship figure

Audit Results:

Exp 1 (crossed-state clearance) is stronger. full ladders in the raw gz means “PI distributed across depth” is a claim I should focus on, at the cost of an extraction pass. the NYK figure is stronger: clean window, zero nulls, self-check agrees with the published numbers. I can use that chart.

Exp 2 (market decomp) must clarify “mid-markout” rather than “realized PnL”, with horizon sensitivity (1/5/15 min) since frozen books bias the short end. That’s a labeling thing.

Exp 3 (Interval Dial – WS tier) is not a Polymarket-only case study from one 127-minute window – an illustrative appendix, not a headline.

Exp 4 (joint cross-venue auction) isn’t dead, though, and this is important to note: the audit blocked the sub-second join auction, but the joint cross-venue auction at the 30s tier runs fine off the panel – and at 30s intervals my 40ms cross-venue timestamp error is just noise.

Thesis arm survives; it lives at T ≥ 30s.‘

Built Thus Far:

full data inventory (DATA_AUDIT.md) – 3 capture paths mapped from writer code and verified against storage: the 30s REST panel (1.06M venue-rows, 172MD) plus 1GB of raw gz with full depth ladders (meaning Exp 1 gets real depth claims), a 5s event overlay, and one WS window. Clean on dupes, monotonicity, schema drift; honest on the gaps: 97.99% frozen consecutive books, 209 gaps including a 10.1h outage, several dead market-legs.
reproducible replay set – 1.06M rows, content-hashed (266K gz files indeed). Any number in t he eventual writeup can be regenerated from a pinned, verifiable dataset.
Flagship figure – knicks_spike.png. raw rows to chart, self-checked against the published finding (1,749 snapshots/venue, zero nulls, 100% crossed, median 0.5¢).
one real bug caught and killed

Future Directions

episode-collapse semantics (crossed states, not snapshot counts)
mid-markout proxy labeling throughout
exclusion rules applied (10 markets in, 6 out, appendix’d)
outage scrubbed from denominators
exp statuses
still need to build every line of analysis logic, including the book layer, fees, auction engine, cutoffs, arms

June 5, 2026

counterfactual pred-market batching study Build Plan

Build Plan

Caveat: my panel is comprised of 30s REST snapshots + sub-second WS windows, and those two resolutions support different levels of claim. I could structure the study so every result is labeled by the data tier that earned it.

Claim: Replaying observed orderbooks under counterfactual batch auctions, X% of crossed-book time clearn, $Y of displayed edge converts to price improvement for resting participant, and post-fill markout improved by Z – with the effect rising/saturating at interval length T.

Caveat: this is a mechanical counterfactual held under fixed order flow. Under a real batched ATS, participant behavior would adapt accordingly.

Data Audit: need to inventory exactly what the daemon has. Per venue, need the following:

top of book only or depth ladders?
trades captured or quotes only?
cancel/replace deltas in the WS stream or state snapshot?
timestamp provenance (exchange-stamped vs. receipt-stamped)?

Then write the tier map: 30s panel —> auction intervals ≥ 30s and snapshot-level clearance. WS windows —> sub-second interval simulation, but only inside those windows.

Book Reconstruction: One module: book_state(venue, market, t) returning best bid/ask (+ depth where I have it), built from the snapshot panel with WS deltas overlaid where they exist.

Normalize the two venues into a common schema with fees attached as per-venue functions.
Deliverable is a tested layer that my old markout code & new auction engine both sit on.

Auction Engine: Small, well-tested uniform-price call auction: take all resting interest at a cutoff, compute the clearing price, allocate fills.

implement two objective functions:
- classic max-executable-volume
- max aggregate price improvement
Differences between them on the same data are themselves a finding.

Individual Experiments:

Crossed-state clearance. Every snapshot is a candidate auction. For each, I need to ask whether a uniform-price call would clear the crossed-dialocated states, at what price, and how many dollars of price improvement would be distributed & to whom?
- use the NYK market as a core casse study: 15 hours of crossed book becomes – under a single counterfactual call – an instant clear at a price instide the cross, with the $166 of apparent edge being paid to the resting parties as API instead of sitting un-takeable
Markout Decomposition. Re-ren my fill-realism model with counterfactual fills occurring at batch clearing prices, then compute the same 5-minute post-fill markouts. The delta between continuous-modeled and batch-modeled markout is ~= the sniping rent that batching deletes; whatever negative markout survives is genuine informed flow.
- I showed all 8 of the curated markets had negative markout; this experiment answers how much of that is mechanism and how much is information.
Interval Dial (WS tier). Sweep T inside my sub-second windows: markout improvement and clearance rates vs. staleness cost (clearing-price deviation from the continuous mid-path, time-to-fill for marketable interest). This would output the race-resistance-vs-responsiveness thing.
Joint Cross-Venue Auction (thesis). One batch clearing across both venues’ merged, fee-adjusted books. This is the ‘mechanism that lives above both venues’ thesis I previously argued is required being tested. I could measure dislocated collapse and total PI unlocked vs. the per-venue arms. Settlement-rail impossibility acknowledged explicitly (USD vs. USDC rails); the point is sizing the prize, not claiming deployability.

Robustness:

behavioral endogeneity; snapshot aliasing (use WS windows to bound what 30s sampling misses); displayed-liquidity-only assumption; fee-tier sensitivity (retail vs. institutional counterfactual); clock-sync sensitivity (re-run with ±40ms cutoff jitter and show stability); cancel-behavior assumption (resting interest at cutoff treated as firm).

June 5, 2026

core definitions Dark ATS – ATS are alternative trading systems. An SEC/FINRA-regulated trading venue that isn't an exchange. "Dark" =…

Dark ATS – ATS are alternative trading systems. An SEC/FINRA-regulated trading venue that isn’t an exchange. “Dark” = zero pre-trade transparency: no displayed quotes, no order-book feed; trades only become public when printed to the take post-execution.

Multilateral auction – many-to-many matching in one event: several buyers can clear against several sellers in the same security simultaneously, rather than the bilateral one-buyer-one-seller fills of a continuous CLOB.

Aggregate Notional Price Improvement – for one order, price improvement equals your limit price – the clearing price you got. essentially, how much better you did than the worst you said you’d accept; multiplied by the shares filled to get dollars; summed across every order and every security in the auction. that sum is the objective function the optimizer maximizes.

NBBO – National Best Bid and Offer: the best displayed bid and best displayed ask across all U.S. exchanges, consolidated via the SIP feeds. It’s the regulatory reference price; dark venues must execute at or within it.

Single price per symbol – symbol is the ticker (i.e. one security, maybe AAPL). every base-environment fill in that security in that auction clears at one uniform price – like a call auction – instead of a ladder of different prices as orders walk the book

Firm orders – if matched, you’re filled. No last look, no fading after the fact.

Optimization – which orders fill, how much, at what prices, to maximize API

Auction-time inputs (what Bidder Logic can condition on, computed at auction time):

Imbalance – excess buy vs. sell interest in the symbol within the auction
Spread – NBBO ask minus bid at that moment
Price dislocation – how far the current price/mid sits from a comparison value (e.g., recent mid) – a “deviation from fair” guage
Passing volume – the executable volume flowing through the auction(s); an activity-level signal
Quote fade / NBBO consensus – instability of the reference quote: how much the NBBO is flickering/canceling, and how much the exchanges agree. Effectively a staleness detector – the equities version of the stale-quote problem aforementioned in prediction markets.

Combinatorial Optimization – optimization where solutions are discrete combinations (which subset of orders, in what amounts) rather than a smooth dial; the solution space explodes exponentially and generally NP-hard.

A combinatorial auction is one where bids reference bundles or joint constraints, so winner determination is itself a combinatorial optimization – similar to the spectrum-auction lineage that OneChronos runs.

Limit Order – fill me at price X or better.

Midpoint Peg – my price floats at the NBBO midpoint, whatever it is at auction time

Target order – this is OneChronos term – the real FIX order(s) that pre-registered Bidder Logic is based on

PoP – physical network node where subscribers connect. Orders are timestamped there at the network edge on receipt, and that timestamp determines auction eligibility

Crossed markets – best bid above best ask (locked = equal).

Base environment – the default all-to-all auction pool: every order not directed into a Nexus.

Omnimarket – an umbrella spanning two or more of your Nexuses: the order is represented in all of them at once, the optimizer fills it wherever API is maximized, and fills can span multiple Nexuses at different prices

June 5, 2026

nexus – private rooms (custom counterparty offering) A 'Nexus' (custom group) lets a subscriber or their client designate orders to interact only with a specified user or…

A ‘Nexus’ (custom group) lets a subscriber or their client designate orders to interact only with a specified user or group – or only with their own orders – instead of the base environment

OneChronos creates it on request with written counterparty consent, and orders carry the NexusID.

On top of the Nexus is an Omnimarket, which spans multiple Nexuses and the order executes whichever Nexus(es) best match API.

fills can span Nexuses at distinct prices, with ties broken randomly
An order can try its Nexus first, then expose residual quantity to the base environment within the same auction cycle, repeating every cycle of its life

Bridge to prediction market auction-theory work I’ve done

Nexus is the on-venue answer to the off-book migration I wrote about in the ‘future directions’ post. Thesis #3 was that arbitrage gradually migrates off-book

institutional flow exits the public book via FCM membership, blocks, OTC desks

In equities, the same demand exists (curated counterparties, bilateral trust, private rooms) – and OneChronos’s approach seems to be internalizing that demand inside the auction mechanism rather than losing it to bilateral OTC.

June 5, 2026

Order Lifecycle (One Auction Cycle) 1. Entry – broker sends a Limit or Midpoint Peg order – or a Target Order that references pre-registered Bidder Logic,…

Entry – broker sends a Limit or Midpoint Peg order – or a Target Order that references pre-registered Bidder Logic, making it an Expressive Bid.
- Orders are firm; book is dark – no feeds, no pre-trade transparency.
Timestamping at the edge. Orders are timestamped at the network level at the Point of Presence (PoP) as received. THAT timestamp (not arrival at the matching engine) determines auction eligibility.
Cutoff – each auction’s cutoff time is drawn AT RANDOM (20-200ms after the previous auction completes), cannot be gamed.
Pre-match checks
- locked/crossed market tests
- subscriber-configured risk checks
- erroneous-order flagging
Solve (matching engine) – optimization determines the configuration of buys and sells that maximizes total Aggregate Price Improvement
- (limit price minus clearing price) • filled quantity
- summed across all orders and all securities in the auction
- One clearing price per symbol per auction in the base environment
- multilateral fils.
- makes an NP-hard combinatorial solve run 10 times a second since the engine rubs probabilistic search and RL techniques to scale and allocate compute
Symmetric Release – post-matching, results broadcast to all PoPs, and each PoP holds them till a predetermined moment before the next cutoff, then disseminates execution reports simultaneously
- speed is neutralized at both ends: random entry cutoff going in, synchronized release coming out
- NOBODY CAN LEARN THE AUCTIONS RESULTS EARLY ENOUGH TO RACE THE NEXT ONE
Settlement

June 5, 2026

OneChronos product surface OneChronos operates a dark U.S. equities ATs that hosts auctions on average 10-20 times a second throughout the trading…

OneChronos operates a dark U.S. equities ATs that hosts auctions on average 10-20 times a second throughout the trading day, matching orders independently of when they arrive – time-randomized periodic auctions designed to reduce gamability of markets.

Mechanically: rather than matching continuously (i.e. as is done in a continuous CLOB), it periodically holds multilateral auctions seeking an optimal match (via novel solve algorithm) across all eligible orders, with each auction’s cuttoff time (release time) drawn at random 20-200ms after the previous one.

Match Priority is based ion aggregate notional price improvement, & all matches clear within the NBBO at a single price per symbol; full non-displayed, no data feeds, firm orders only.

Smart Market – combinatorial auction techniques (2020 Economics Nobel) + AI, running at a speed and scale that wasn’t previously possible.

----------------

OneChronos is essentially operated a production version of the batch-based markets I theorized for prediction markets. The time-randomized + network buffer is the answer to the tension I was thinking about (race-resistant vs. responsive tradeoff).

----------------

US. Equities product surface – Expressive Bidding, Conditionals, Nexus

Newer product surfaces: a spot FX venue brining the optimization-based auction model to currencies + European Equities.

Primitives – Neutral Markets

June 19, 2026

FBA Venue Integration into [orderbook-hybrid-amm-sim] A Frequent Batch Auction venue simulation integrated into the existing orderbook clearing comparison repo tells us…

A Frequent Batch Auction venue simulation integrated into the existing orderbook clearing comparison repo tells us whether batching even helps, at what pi, at what immediacy cost, and for whom. If the pi-curve comes back flat, then batching doesn’t help on event flow.

Workflow for June 16, 17.

Finish FBA Arm first – nearest-term, highest-certainty piece, venue’s already built and tested, and it directly validates Gaps 1 and 2 (neutral matching; neutral settlement rails for on-chain event markets).
Latency/information wiring & pi-curve
The proof tells us which primitive to build.
- If batching demonstrably protects LPs on event flow, the matching/settlement primitive is the next build.
Gaps 3 & 4 (execution-quality infrastructure; pre-trade privacy) will remain as thesis points in my neutral-markets paper for now.

§9 – Read-Only Recon before Latency/Information Differentiation Wiring (§5.2):

Specifically interested in one runtime uncertainty: that the sweep zeroes observation_delay for everyone. Need to know how a non-zero delay actually propagates through the event heap.

Got sidetracked yesterday, state as of June 17:

4.1 – 4.3 is done and correct.

Venue ABC (venues/base.py), 6 methods: including submitting market orders, submitting limit orders, canceling orders, get_state() —> VenueState, estimate_impact(side, qty), tick().
- AMM, CLOB, & hybrid all implement these methods
Clock/cadence: the sweep fire venue.tick() once per integer timestamp via a venue_clock event at priority -50.
- pi = clear every N ticks; just a counter in the venue, not a scheduler.
Agents: no base class; they satisfy a PopulationAgent Protocol (observes / decide / review / fire_noise + fields observation_delay, review_interval, arrival_rate_per_unit).
- Naive Gaussian Belief Agent
- Tail Aware Gaussian Belief Agent
- Aggregated Evidence Agent (cross-market)
- Joint Factor Fair Value Agent (joint factor posterior)
- Event Driven Noise Agent
Latency plumbing exists but is currently OFF: observation_delay schedules a decision at now + delay, but the sweep sets it to 0 for everyone, so same-timestamp ordering falls through to heap insertion order (agent-list order)
- turning the delays on is what would give the FBA venue something to actually neutralize
Fair value – NO price process: truth is a static log-linear latent-factor draw at t = 0, anchored to opening mids, never moves, unbounded, no jumps
Metrics
Sweep
Clearing – only continuous matching today

4.4 FBA venue – DONE, built + tested, committed & pushed

venues/fba.py – canonical Budish-Cramton-Shim: resting limit book cleared by a periodic uniform-price call every tau-ticks.
Deferred submits: submit_limit_order / submit_market_order; these both return an “accepted, pending” OrderResult (filled = 0, remaining = qty, real order_id), no synchronous fill.
- Limits rest across batches till filled or cancelled.
_solve_clearL candidates from limit prices, max-volume objective
_run_clear: capture mid-Before, solve, apply book mutations, capture mid_after, stamp the same pre/post-clear mids on every full in the batch, tag liquidity (“maker” = limit leg, “taker” = market leg).
Drain wiring: extended the Entry-2 channel rather than a parallel one – MakerFill gained liquidity & fees_paid (defaults keep CLOB/hybrid byte-identical); env drain generalized to drain_venue_fills(sim) —> _record_drained_fills.
get_state() mid = resting-book best-bid/ask midpoint between clears. estimate_impact = approximate clearing-price move from adding qty.
VERIFICATION:
- 12 FBA tests pass (hand-computed clear, uniform price, midpoint tie-break, conservation w/ rationing, pending semantics, resting persistence + partial remainder, market expiry, determinism, clear-time-vs-submit-time mids, pi = 1 batches every tick, estimate_impact, env-drain both legs w/ clear timestamp).
- 22 pass.

As of now, the venue is correct and cleanly recorded, but insert = with observation_delay = 0, there’s no speed/info asymmetry for batching to neutralize, so an FBA-vs-CLOB run right now would show ~null difference by construction. Wiring that symmetry is the next task

Dev Tasks td:

Turn the sim into an apparatus that takes (venue mechanism, agent population w/ latency/info structure) and outputs

LP markout
taker markout
adverse-selection cost
and price-discovery lag per mechanism.

Headline figure: pi-curve: extraction (LP markout / sniper-equivalent PnL) seen falling as the batch interval pi grows, plotted against the immediacy costof waiting. This yields an optimal pi.

Claim to Test: a batched venue reduces information-asymmetry extraction vs. CLOB and AMM on event-contract shaped flow, at a quantifiable immediacy cost.

5.2 – Design Point – latency/information differentiation

Turn observation-delay on, differentiated by agent role: better-informed agents (tail-signal recipients) act on fresh signal rounds at short delay; LP-style agents quote at longer delay so their resting quotes reflect staler beliefs.
Wire the signal-tier structure (routine vs. tail signals) together with the delay so that fast access to precise tail signals is the edge a batch erases (everyone’s acting on the same signal-round, and clear at a uniform price).

5.3 – Markout Metrics

Compute markout at ∆ and effective spread from TradeRecord.md_price_before/after + the liquidity tag, all outside the hot loop.
- Markout is the headline metric because it maps to the bleeder that we really care about (the adversely-selected LP).
- I’ll make sure FBA fills (clear-time mids) and CLOB/hybrid fills (now symmetric after the maker fix) are computed on the same definition so cross-venue comparison is honest.

5.4 – Endogenous LP spread (this sim’s edge over the frozen 1.06M row study)

Let LP spread respond to expected markout: an LP sniped less under batching should quote tighter, so the welfare gain shows up as a narrower spread for everyone not just redistributed PnL. This is what lets the sim answer the question that the mechanism FBA counterfactual couldn’t – what would participants actually do under batching.

5.5 – Honest piece

batching is NOT free – it trades immediacy for protection (orders have to wait up to pi ticks before clearing. I’ll have to report the tradeoff, not just the dividend: if FBA cuts extraction but reduces volume or delays informed price discovery, that cost goes in the headline next to the benefit.

The most credit framing of the result is that extraction falls faster than immediacy cost rises, but only up to pi. This is a real optimum.

LP / Market-Maker Agent – Read-Only Recon

The book and matching layer already support the channel we want: price priority & no self-trade prevention menas a tight LP quote gets hit first. The only real blocker is capital accounting: ‘deployed’ is monotonic and never released, so a continuously-requoting LP exhausts its budget by construction. That’s the one thind a spec must solve.

green light: the book + fill path, & bootstrap ladder both confirm that a real LP gets picked off. Price priority is decisive (the match loops always take _ask_prices[0] / _bid_prices[-1]), there’s no self-trade prevention and no agent_id filter, so an LP quoting inside the bootstrap’s ±0.1% owns best price and is hit first. Thus, adding the LP agent is enough; we don’t have to rework the bootstrap.

finding: the LP isn’t blocked by matching, it’s blocked by capital accounting. deployed is monotonic – it only ever grows (_sync_costs accumulates, nothing decrements on fill/cancel/offset). This is mostly fine for belief agents that fire occasionally; but it’s fatal for a two-sided LP that reposts both legs every review tick, since each requote commits fresh collateral and cancel releases nothing. The LP exhausts its budget and goes silent after a few requotes.

fix options:

unbounded budget
release on cancel_order – how real MMs work (cancel a quote & get your collateral back)
- risk: touches the cost-log/_sync_costs invariant that the Entry-2 maker-fix made byte-identical, so it might break the zero-delay baseline reproducibility.
net-inventory margin, LP-only
- only option that’s both economically faithful and preserves reproducibility.

remaining uncertainties:

capital-release model: i’ll implement a net-inventory margin for the LP only
Claude.md says “informed-as-maker fills = 0,” but handoff §4.3 reports hybrid informed_pnl moved from -6.878 —> -7.614 from “informed resting limits that got hit.” so it’s not exactly 0
no self-trade prevention —> the spec must keep LP bid < ask every quote
own builder vs. bolting the LP onto the existing two – the pi curve population may want a dedicated LP-vs-informed builder

proposal: a 5th dataclass LpMarketMakerAgent quoted via review() (long review_interval, and per §5.2 a longer observation_delay/staler belief than FAST informed): each requote it cancels prior orders, recomputes a deliberately stale mid, and returns a 2-element list, quoting inside the bootstrap’s ±0.1% to own priority and absorb informed takes. gets its own ROLE_LP bucket so its markout is reported separately.

delta is the knob that §5.4’s endogenous-spread arm later makes respond to realized markout. The gating decision here is to resolve UNCERTAINTY 1 first – without capital release the LP can’t requote, which silently kills the channel.

BUILD §5.4 – LP / market-maker agent actual implementation

I prompted the LP to do two things that can’t both happen – ‘cancel old quotes’ && ‘just return a list of orders’. In order to cancel an order you need its ID, but the architecture throws the ID away when an order is placed, and there’s no way to look it up later. So an LP that only returns order-lists could never cancel – it’d pull up 160+ stale quotes over a run and get picked off on garbage prices.

Decisions:

option C: LP places and cancels its own orders directly on the venue inside its requote step, keeping the IDs itself. This mirrors how the existing bootstrap book already works, no new venue code.
option C’: keep ‘return a list’ literally, but add a new venue method to cancel-by-agent-id. this costs an API addition and splits the LP’s actions awkwardly across two timing phases.

Decided on option C. manage quotes directly on the venue inside review(), mirroring the bootstrap ladder; return[]. Proceed – build LP + builder + ROLE_LP bucket + smoke gates straight through.

What’s built thus far:

Diagnosed why 4a came back null: latency wiring was inert because every agent trades against the deep static bootstrap book – nobody hits anybody else’s quotes, so there’s no extraction to measure. The fix is a dependency the handoff had scheduled later (§5.4), so the build order got inverted: the quoting mechanism has to come before latency produces any signal.
- built a dedicated LP/market-maker agent (B) – the bleeding liquidity provider that gets picked off – rather than just thinning the bootstrap book. This way, we’re modeling the actual object my thesis is about –
- ran a read-only that confirmed the green light for building the LP/market-maker agent – a competitively-priced LP will get filled first (price priority, no self-trade prevention). Also, noted that capital accounting only ever grows, so a continuously-requoting LP would exhaust its budget and go silent.
- made the capital call (option 3): isolated net-inventory margin, walled inside the LP class so the four existing agents’ baselines stay byte-identical (protects my G1 reproducibility guard by construction).

Post LP/market-maker agent build: well-built overall

built the complete LpMarketMakerAgent class
verified the LP’s review() will actually fire (first one at now + review_interval), and that the LP never enters the shared _sync_cost/cost-log path (its maker fills carry capital_committed = 0), so it owns its own deployed field – i.e., capital isolation holds, Gap 1 (neutral matching) is protected

What the code actually does:

quotes a two-sided bid/ask just inside the bootstrap (half_spread_pct forced < 0.001, so it always sits in front and gets hit first) – this is the extraction channel
updates its belief on a long delay (obervation_delay = 50, v FASH informed being faster) – so it’s deliberately stale and gets picked off.
decide() only updates belief and returns nothing: THE LP NEVER TAKES, only rests.
tracks its own net inventory by reading the shared trade log read-only and margining the net position, never touching shared accounting.
in review(): cancels its prior quotes by ID, check a solvency gate (sit out if capital used ≥ budget), reposts both legs directly on the venue, returns [].

LP/market-maker agent not actually bleeding:

these finds came after the smoke tests were run. the LP isn’t bleeding (return sign was positive, which is a modeling insight, not an incident of the inputs I provided).

G1 Byte-Identical – the diverse/clob baseline is provably unchanged (working tree vs. committed, via stash). informed_pnl_total, lp_rent_total=-742.36, n_trades=60 all identical. Capital isolation held; the incumbent path was untouched (_sync_cost/cost-log)
G2(a) – passed – 18 LP fills (was ~0). the channel exists
G2(b) – failed – LP PnL +0.87 (profitable). the ‘failure’
G3 – passed – 61% fills in 2nd half. solvent
G4 – passed – deterministic

core finding from new sweep w/ LP-market-maker-agent:

observation_delay against a static (frozen) truth doesn’t make the LP wrong, it makes it slower to converge to the right answer. A delayed-but-unbiased belief still centers on true fair value. and a market maker quoting a spread around an approximately-correct fair value earns the spread by construction – it buys below fair, sells above fair.

The LP only bleeds if it’s filled disproportionately on the wrong side, which requires its belief to be biased, not just lagged. Two ways to manually bias it for this backtest:

truth moves and the LP’s quote goes stale
the LP is genuinely worse-informed than the takers, so the takers systematically know something the LP doesn’t

Information Asymmetry >> Observation Delay / Latency

My thesis is that verifiable batching reduces information-asymmetry extraction, not ‘latency arbitrage’ alone. This sweep with a dedicated LP/market-maker agent w/ a dedicated observation-delay parameter proves that latency-without-information-asymmetry produces NO EXTRACTION.

When we do build the FBA arm and (hopefully) show that batching reduces the bleed for liquidity providers, I’ll know the bleeding is coming from the right source, not any modeling/parameter artifact I can hard code into the simulation.

Reverting Frozen Truth – NO RANDOM WALK

We initially frozen truth so that convergence to true probability is a well-defined target and markout is clean. If truth wanders, I’d have to decide what the LP is even being marked against – terminal fair? fair-at-fill-time? – and adverse selection vs ‘the walk just moved against the agent’ becomes a hard question to answer.

A walk also has a volatility parameter, which introduces a new free knob.

Options for Reverting Frozen Truth Effect on Reverse Adverse Selection:

Path A – Moving Truth – test whether the latency channel already produces adverse selection once truth moves, before adding any information asymmetry. This is more faithful to my thesis and tests the mechanism I’ve already built. There’s no precise-degredation parameter.
- Cost: bigger blast radius (shared env, not an isolated class); needs a fresh baseline definition; needs the markout-window decision to avoid confounding adverse selection with inventory risk; adds a walk-vol parameter to deal with
Path B – Option 1, degrade information – isolated to the LP class, keeps the frozen-truth baseline intact, smaller and cleaner change, directly instantiates that “informed agents have better signals”.
- Cost: different extraction mechanism (information asymmetry) than the one I’ve built (latency), and it sidesteps the question of whether my latency mechanism works at all.

I’ll follow Path A since I’ve already identified a meaningful gap in the world.

BUILD – Phase: markout-at-fill-time rework.

Per the moving-truth recon, _trade_mtm_pnl (rent.py) and rent_and_pnl/pnl_by_role take fair_prices_by_market as ONE scalar per market and apply to every fill, IGNORING rec.timestamp. Against any non-static truth this measures inventory risk, not adverse-selection – and even in the frozen world, marking the LP’s fills against terminal fair rather than fair-at-fill-time muddies the adverse-selection read. This phase makes markout time-aware. It changes the MEASUREMENT only – no agent behavior, not truth process, no world change. Truth is still frozen this phase, but that’ll soon change as well.

Recap:

Built: time-indexed fair-value marking. FairValueAt accessor + frozen_fair_value() favtory in rent.py; each fill is now marked against fair(market, timestamp) instead of a flat scalar; sweep.py threads a frozen accessor through; new test_markout_fill_time.py. Terminal-fair marking still available (AMM lp_rent deliberately keeps it).
Gates:
- G-MARK byte-identical (the load-bearing no-op invariant).
- G-REG 29 passed / 1xfailed / 1 failed = the known G2b LP-positive, unchanged.
- G-DET deterministic
- G-FUTURE the artificial two-point series proves time-indexing actually fires (fill@t4000 marks at 110, total +10 – not terminal’s 20, not t0’s 0). All green.

‘Variable Truth’ Build

(a) the markout rework is now down

(b) a deterministic walk path + threading the timestamp into signals is a contained build

(d) the belief-model decision is still a bit uncharted, and touches every agent

I’ll pursue building a more faithful environment since a frozen truth fundamentally cannot express the thesis around FBAs in multi-agent worlds & LP adverse selection.

Right now, every agent uses a guassian_scalar_nif_update as the underlying belief model, which assumes a stationary target. If I was to run that model against a moving truth, every agent’s posterior would lag and over-weight stale signals – so the LP bleeds, but so does the interpretation since we can’t tell if the LP was picked off by better-informed takers, or if everyone’s mis-specified filter cannot track a moving target.
Recon:
- In this existing belief model, lag is governed by accumulated precision (more precision —> lower gain —> more lag)
  - precision accumulates at variable rates per class because obs_precision is deliberately class specific
    - The Tail agent’s obs_precision is ~2k–40k per signal, so after one tail draw, its precision explodes, its gain collapses to ~0, and its belief freezes – so against a moving truth, my best-informed agent tracks the moving truth the worst
    - The Naive/LP agents (fixed ~0.5-0.7k per signal) stay responsive and track better.
- Under the old model, a moving truth would show the LP outperforming the informed agents – not because of any real information/latency asymmetry, but because the informed agents’ filters freeze faster (just math). The bleed is inverse.
faithful environment requires a Kalman step
- can’t get an interpretable moving-truth result from the precision-only filter. better approach is to add a time-aware update with process variance q (decay prior precision by q•∆t before absorbing each signal), at both update sites (the scalar helper and joint-factor’s own ∆-shrink), gated on q>0 with a hard short-circuit at q=o so the frozen world stays byte-identical.
- each agent’s q equal to the truth’s actual walk variance, making them correctly-specified trackers
- then a delayed agent is an optimal tracker that simply updates later, so its staleness is genuine (the thesis) rather than a filter pathology

Kalman time-aware scalar belief update

gaussian_scalar_nif_update gains keyword-only q=0.0, dt=0.0 with a hard short-circuit; the 4 scalar agents (Naive, Tail, Agg, LP) each get a q field, a _last_update_tick dict, and update_posterior(signal, now) computing per-market ∆t; sweep.py threads belief_process_var (default 0) through to the scalar constructors. Joint-factor is untouched
At this point, the markout process is time-indexed, and the Kalman belief model is commited
- next step is to built the joint-factor matrix ∆-shrink, and then the walk itself with q = walk-variance so agents are correctly specified trackers, then finally a convergence-metric retarget

Build: markout re-point to the walk path (fair-at-fill-time against a moving truth.

The markout time-indexing rework (commit 9b2b8d5) built a FairValueAt accessor; now it’s fed frozen-fair-value (constant per market). Phase B left markout still marking against the t=0 truth. Against a MOVING truth, however, marking a fill at terminal/t=0 fair instead of fair-at-the-fill’s timestamp measures INVENTORY RISK (the LP held a position while the walk drifted), not adverse selection. This step points the markout accessor at the walk PATH so fills are marked against path[m, t_fill] – the prerequisite for an interpretable LP bleed.

RECAP:

markout is re-pointed to read the walk path. New helper _walk_path_fair_value(info_env, until_ts) returns a FairValueAt giving true fair price at each fill’s tick (reuses the log_fair_value_at accessor from Phase b); ru_single_simulation swaps it to when walk_var>0m keeps frozen_fair_value at walk_var=0. sweep.py +18/-5, new test.
All gates green: G-ID byte-identical at walk_var=0 (full precision), G-MARK-PATH (accessor genuinely reads path[m, t_fill], late fills differ from t=0 by > 1e-3), G-INV-vs-AS check, G-RED 45 passed / 1 xfailed / known-G2b, G-DET deterministic

Integrating the FBA venue as a runnable sweep mechanism.

“fba” added to MechanismName/guard; _fba_venues_from_truth(τ) builder; tau_ticks threaded through; drain_venue_fills wired into _pulse; and – the P1 catch – FBAVenue added to the route_log_space_trade dispatch so informed agents can trade on it

Measurement – the FBA result, paired by seed.

The wiring probe (single-seed, unconfirmed) already showed that under the walk, FBA (pi = 1) bled ~8.5x less than CLOB (-19.8 vs. -170), and the bleed was roughly FLAT across pi (pi = 1: -19.8, 10: -19.4, 50: -21.4, 200: -18.3). So the protection looks like a DISCRETE STEP at the mechanism switch (continuous CLOB —> uniform-price batch), largely pi-dependent – not a smooth pi-slope. This measurement will confirm or deny that, properly, paired across seeds.

Core Question: does switching from the continuous CLOB matching to FBA uniform-price clearing robustly REDUCE the LP’s latency-driven bleed, measured as a paired-by-seed difference?

Secondary Question: does longer batch interval pi reduce it FURTHER, or is it flat in pi (as the probe suggests).

FINDINGS:

switching from continuous CLOB matching to FBA uniform-price batch clearing robustly reduces the LP’s latency-driven bleed.

paired rediction +86.6, SEM 6.7, ~13σ, 95% of seeds reduced. The LP’s markout goes from ~-92 (CLOB) to ~-5 (FBA). The pairing did exactly what it was designed to do: the ±72 per-seed walk noise collapsed to a paired SEM of 6.7, so the reduction is overwhelmingly significant even through the absolute bleeds are noisy.

The kill condition was not med – batching curbs the extraction. The thesis is proven end-to-end in a moving truth environment.

Validity Checks:

the shape is a clean step, and the agent confirmed it’s genuine per-fill protection, not an artifact of the parameters I set.

The transfer check is honest. As the LP bleeds less (+87), the informed gain shrinks (~-50), so the extraction is prevented, not relocated – directionally. The agent flagged it’s not 1:1: the LP improves +87 while informed gain drops only ~50; the gap goes to lower traded volume and the bootstrap/noise counterparties. So the honest read is “directionally a prevention, not a clean conversation.”

batching prevents extraction, but we can’t yet claim a clean wealth-conservation identity.

in a faithful moving-truth simulation with correctly-specified agents, switching from continuous matching to frequent batch auction reduces the liquidity provider’s latency-driven adverse selection by ~95% (paired reduction +86.6 ± 6.7, 13σ, 95% of seeds), with the protection coming from uniform-price clearing itself rather than the batching interval, and the extraction prevented rather than relocated.

VSA Markets

June 26, 2026

Behavior Optimization / ACF Yesterday, worked around the circular-input issue with the agentic trader population underpinning the market maker…

Yesterday, worked around the circular-input issue with the agentic trader population underpinning the market maker itself (ended up quantifying minority pivotality & thresholds for manipulation costs for neutrality proofs).

My population is essentially CredentialedTrader, which samples a static signal once at init and pushes it through the LS-LMSR cost function; NoiseTrader is directionless. No learning, no strategy, no response to book state beyond C(q). They’re parametric stochastic processes whose only job is to move price via the cost function. “Agentic” is fundamentally generous – it’s a liquidity-generating sampling process.

In order for my market structure to behave like a real market, I could follow two paths:

(A) Behavioral realism of agents – making them act like real traders (order-splitting, momentum, belief-updating). This would actively break the thesis. Pillar 3’s result – sponsors can’t set the price, substrate reverts to 0.5, only informed flow moves it – depends on the noise layer being dumb and directionless.

(B) Microstructural face-validity of the market’s output – showing the price series and order flow my venue emits carry the universal signatures of real markets (square-root price impact, volatility clustering, signed-order-flow autocorrelation). This is defensible, demo-useful, and requires no changes to the agents intrinsic behavior – it’s a measurement layer on the venue.

Constraints on B:

Regimes: my real order-flow data is illiquid (NBA Finals at $32-41M OI, the cross-venue-eligible panel, active Polymarket books. VSA targets the illiquid tail, where essentially no real trade tape exists. I can’t calibrate to real magnitudes without calibrating the wrong regime.
- What would I need to claim that my thin-market regime matches real thin-market data?
  - A trade tape – not L2 snapshots – from genuinely thin, resolved markets. My frozen Kalshi/PM set is L2 snapshots of liquid markets, wrong on both accounts. The one place I already hold the right thing is the neutrality monitorL on-chain OrderFilled events are a real trade tape, and the thin-token makers are genuine thin-regime.
  - Breadth: thin markets are sparse by construction, so per-market power is low; anything finer than markout needs a panel – tens to low-hundreds of thin Polymarket markets – to get CIs that mean anything.
  - Flow labeling: “A small informed cohort drives price” is a claim about who, so I’d need wallet-level informedion/noise separation. My monitor already does this via persistent markout sign.
  - Outcomes, for any convergence claim. polymarket resolves (UMA); my frozen Kalshi set largely doesn’t.
What I do have tail-relevant ground truth on is my Polymarket neutrality monitor, which already measures the adverse-selection signature in the thin-token regime (bimodal markout, ~40/109 makers persistently negative, BUY-side thin-token toxicity). That’s the closest I have to real tail-market behavior.
- If my synthetic venue, under rising informed share, reproduces that signature – maker markout going negative exactly as informed flow rises – I’ve have shown that the substrate generates the same pathology I measured live, in the regime that matters, and I’d have tied the sim back to the empirical work I did via Polymarket WebSocket.
- In the VSA model, the maker is the subsidized LMSR, so that markout is the sponsor’s subsidy cost.
- This is objectively a good next step.
- Framing:
  - Scope of Facts: price-impact concavity, order-flow autocorrelation, markout-vs-informed-share
  - Cross-venue framing: name the distinction between LMSR & CLOB
Autocorrelation. Signed order-flow autocorrelation (long-memory, Lillo-Farmer) is a deep microstructure fact. The catch is that long-memory flow comes from order-splitting of large meta-orders and from herding – two behaviors I excluded to keep the agent population neutral. My agents draw once and trade once; their flow is near-IID by construction. I’ll see at most short-range sign persistence while the informed cohort is active, decaying fast – not the slow power-law of real flow.
- The visual will likely show a mismatch, and fixing that would mean adding order-splitting/herding, which reintroduces the manipulability I already decided not to include. The null is the result – “My substrate carries no exploitable order-flow memory, consistent with neutrality.”
- Including autocorrelation comparison makes the ACF stronger: if real Polymarket flow shows long memory (it will – order-splitting & herding live in real flow) and my substrate is near-IID, the side-by-side is the point. “We predicted near-IID flow from the neutral-substrate design, and that’s what we measured.” ACF needs the real signed-fill sequence, so the recon I’ll run shortly checks for that explicitly alongside markout findings from my Polymarket-Neutrality-Monitor.
Recon: is there frozen per-fill tape, does it carry (or join to) a mid, and can I reconstruct a time-ordered signed sequence.
- Findings:
  - Markout – intact. Per-fill, signed, mid-based, with the {2,10,30}s horizon dimension preserved, so the horizon-deepening discriminator – my honesty test on the real side – still applies. I’ll use maker_markout_b4.jsonl (38,736 rows, 1,045 tokens, 1,880 makers, ~94% coverage). This ships with a real overlay.
  - Price Impact – dead. Fill price and mid are both absent, and the agent’s correction is right: three markout equations in four unknowns is underdetermined, so I can recover inter-horizon mid increments but never an absolute mid or fill price. Realized impact-vs-size needse an absolute mid I don’t have. This visual runs synthetic-only – keep it as “the LMSR produces a concave impact curve,” drop any real-match claim.
  - ACF – salvageable, but not from the raw rows. The data is per-maker-leg: one taker sweeping N resting makers becomes N identically-signed rows at the same second. Feed that raw into an autocorrelation and I manufacture positive short-lag structure than an attribution artifact, not order-flow memory. The fix is to aggregate to per-(token, second) net-signed flow first, collapsing each N-maker event to one observation. Whole-second timestamps then make it a coarse, conservative proxy – which actually strengthens the contrast: if a conservatively-built real series still shows memory while my substrate is near-IID, the “we predicted neutral, near-IID flow and that’s what we see” framing lands better.
Measurement-only build: produce three real-vs-synthetic microstructure comparisons for the demo. AUDIT-FIRST.
Next Steps for Thin-Market Behavior:
- Microstructure – does the synthetic venue reproduce the statistical signatures of real price formation (impact shape, LP markout, flow memory)? This is a data capture question.
  - Does it match milestone markets specifically?
1. Upgrade Data Capture – one redesigned capture would fix most of what broke: persist the CLOB mid joinable to fills (unlocks real price-impact-vs-size – the figure that failed – and proper mid-to-mid markout, killing the 3-equation/4-unknown problem); persist the taker-aggregate tape instead of per-maker-leg (kills the ACF attribution artifact); capture millisecond WS-receipt timestamps (lets ACF resolve sub-second splitting); run for days across the thin-token universe (turns n=3 adverse makers into a powered set). My build-4 shared pool already has the coverage infrastructure.
  - Max claim I can make is that we’re reproducing the microstructure of a real on-chain tail panel.
2. Closer Analogs – scan for traded markets structurally nearer a milestone binary: FDA-decisions, drug-approval, clinical-readout markets on PM/Kalshi – discrete resolution, small informed cohort, thin uninformed flow.
  - BLOCKED
    - Polymarket prunes intraday price granularity for old/resolved markets, so the historical mid is daily-only nine months out. That’s not a tooling gap I can work around – the data is structurally gone at the venue. The “horizons” {2,10,30}s collapse to one daily point, markout sits at sub-cent tick-noise on a near-zero CRL’d token (+0.0005, IQR around 0), and I can’t even assess horizon-deepening with one horizon.
    - The microstructure-match-to-thin-markets claim is now dead on all three regimes I could possibly access (liquid cross-venue, live Polymarket makers, resolved FDA catalysts
    - The per-fill mid I need for markout is never durably available on these venues, by design.
3. Characterize the realism/neutrality frontier: instead of brute-forcing a match, quantify the tension I just ran into. Use the harness to progressively inject strategic flow – order-splitting, then momentum, then belief-updating – and measure both axes at each step: how much realism it buys (does concave impact appear, does flow memory appear) against how much it costs (Pillar-2 cost-to-manipulate falls, Pillar-3 seed-neutrality degrades). The deliverable is a frontier: realism vs manipulation-resistance. This is consistent with, not a reversal of, the earlier “don’t sophisticate the agents” decision – the strategic agents live in a controlled experiment to prove the trade0off, then stay out of production. It converts “I couldn’t match real behavior” into “matching real behavior provably requires the strategic flow that breaks neutrality – here’s the trade-off quantified,” which is sharper to hand a technical reader than a single matched figure, costs no new capture, and reuses my existing code.

3: Realism/Neutrality Claims

I’m going to prove that the realism my simulation ‘failed’ to show via comparison to real thin-market data is a feature, not a bug – by demonstrating that the only way to add it is to break the property that makes our model trustworthy.

I spent a whole session chasing the claim that my synthetic market behaves like a real market. I tried to show it in three ways:

markout
order-flow memory
price impact

Each failed. The synthetic price impact came out linear instead of curved; the synthetic order flow had to memory; the real-data overlays I needed to prove a match turned out to be unattainable on Polymarket (the mid is pruned daily, the tape is too thin). Four figure, still no match to real thin markets. It’s easy to read at this point that real markets are richer, my simulation isn’t realistic enough.

**In reality, my simulation lacks those realistic textures for a good reason. I deliberately built them to be *dumb.***My noise traders are directionless; my informed traders draw a signal once and trade it; nobody splits large orders to hide them, nobody chases a trend, nobody updates their belief by watching the price. This is what my neutrality claim rests on. Pillar 3 showed the sponsor can’t move the price because the crowd is non-strategic and reverts to 0.5. Pillar 2 showed an attacker can’t lead the market because there’s no reflexible flow to lead.

What this build will do: Instead of arguing this purposeful lack of strategy, I’ll measure it. I’ll take my existing neutral sim as the floor – call it Level 0 – and then add strategic behavior back in, one rung at a time, in increasing order of how much “mind” I give the agents.

Rung 1: order-splitting – agents slice big trades into streams of small ones. This is the gentlest addition, and it’s the one that should make price impact curve the way real markets do.
Rung 2: momentum – some agents start trading in the direction the price is already moving. Now the flow has memory, like real flow. But momentum is reflexive: a market that chases itself can be pushed.
Rung 3: belief-updating – agents start inferring information from the price itself. This is the most realistic and the most dangerous: it’s the herding mode I flagged a while ago, where a manipulated price gets treated as signal and the crown piles in behind an attacker.

Measurements: does the realism appear (does impact curve, does flow gain memory – my old measurement layer), and does the neutrality hold (can the sponsor move the price now, can an attacker – my Pillar 2 and 3 tests).

Deliverable: single curve – realism on one axis, manipulation-resistance on the other, one point per rung. The expected shape is a clean downward trade-off – as I climb toward realistic behavior, neutrality falls, and the belief-updating rung is where it falls off a cliff. That curve says something we can’t wave off – “Yes, our market is less ‘realistic’ than a real exchange – necessarily so, because the realism comes from exactly the strategic flow that a sponsor or attacker would exploit. Our substrate agents trade realism for neutrality on purpose, and here’s the frontier that proves the price of each strategic addition.”

FINDINGS:

L0 reproduced the baseline to full float precision, the injection is byte-identical at intensity 0, every strategic variant draws the same RNG count as the agent it replaces, so strategy is the only varying factor.
Phase 4 confirms no neutral agent, vendored file, or cost function was touched.
Concave impact: restored by nothing. Exponent stays ~=1 at every rung. The agent’s mechanism note is correct and worth absorbing – the square-root law is a meta-order property, and my per-fill impact measurement structurally can’t see it; slicing a meta-order just adds small same-signed locally-linear fills. So FIG-4’s “failure” was never going to be fixed by order-splitting - it was a measurement-level mismatch, not a deficiency in the agent substrate.
- Clean answer to “why no concave impact”
L2 momentum is the one real realism win – flow memory flips from -0.13 to +0.83 – and it costs Pillar 3, not Pillar 2 (seed-reversion deviation 0.003 —> 0.078). That’s my thesis, intact, on one signature: restoring flow memory provably degrades neutrality.
L1 & L3 erode Pillar 2 was buying almost no realism. This is the against-expectation result that matters most: order-splitting disarms the reactive defense (my credentialed traders become scheduled executors that don’t lean in when the adversary pushes), so neutrality falls before you get any realism.

June 25, 2026

neutral agents codebase / LOI prep Letters of Intent

Letters of Intent

Weakpoints:

does an auditor accept a thin sponsored price as a Level-3 input
is the convertible math robust
is the agent population neutral

Segments:

auditor / valuation-MD (most-binding): need a technical opinion – “would you accept this as an ASC-820 Level-3 input, and under what conditions?”
sponsor side (biotech CFO / BD / royalty holders): “we’d run a milestone market on asset X if the mechanism worked.”
capital side (financiers who’d price the note premium): need them to validate the premium-compression claim – would they genuinely charge a lower premium against a verified price?
mechanism-credibility (microstructure / market-design people): need them to validate the agent population & neutrality.

Agent Population – Codebase

Inherently, a sponsored market whose price is set largely by my agentic traders is theoretically the opposite of neutral. Agents must be calibrated to converge to truth, demonstrably, & verifiably – measured against real off-chain / on-chain market behavior and provable to a third party.

agent classes;
benchmark that compares the agent-driven price path against real analogous market behavior;
A/B / ablation tests to show convergence is robust and not an artifact of a particular agent mix or seed;
neutrality/manipulation-resistance demonstration – show the price converges to truth even when an adversarial or sponsor-aligned agent tries to push it, which is the literal answer to “can the sponsor rig it.”

Recon Prompt: For convergence-and-neutrality validation around the existing two-class agent population (CredentialedTrader + NoiseTrader).

Goal: Prove that the agent population produces TRUSTWORTHY, well-calibrated, manipulation-resistant prices – provably, and benchmarked against real market behavior – i.e., the answer to the skeptical question of “how do you know your agents converge to truth?”

Findings: circularity problem is universal and unsolvable by any internal metric. Every Brier score in each of my previous builds is calculated as (price – p_star)^2, scored against the latent parameter the signals were drawn from. A market converging to its own own true_p is mechanically guaranteed, not evidenced. So, what’s not circular?

Adversarial truth-restoration should be the lead

Current Population:

CredentialedTrader (sim/agents/credentialed.py) – confirmed uses a static-signal PROXY, drawn once at init, never re-sampled.
- decide() reads only price_yes and b; edge = signal – price_yes; trades size = aggressivness • |edge| • b toward the signal if |edge| ≥ min_edge.
- Likely to be contested proxies:
  - no belief updating – real informed traders Bayes-update on order flow, this one holds a fixed point estimate forever
  - no budget/inventory/risk limits – unbounded repeated trading
  - size ∝ b is a modeling convenience, not a microfounded demand
  - homogenous signal precision (one σ)
  - no strategic/timing behavior (no order-splitting, no adverse-selection avoidance)
NoiseTrader (sim/agents/credentialed.py) – uniform direction rng.integers (0,2). Proxy: zero-information symmetric, memoryless – a stand-in for “uninformed flow,” not a calibrated noise model.

Both traders move the LS-LMSR price only through execute_trade(), where (price = softmax(q/b)); there is NO ORDER BOOK, so “manipulation resistance” here means cost to move the cost-function price, not book spoofing.

Circulatory Concern:

The claim that “the market converges to true_p” is circular by construction. signal ~ N(true_p, σ) —? price = size-weighted mean of active signals —> true_p by Law of Large Numbers. Brier/convergence are then scored against true_p, the same parameter the signals came from. A single market hitting its own true-p is mechanically guaranteed; it’s not evidence of truth-finding. Every Brier in every prior repo is (price – p_star)^2, marked against the latent parameter, never a realized outcome. No repo samples Bernoulli(p_star). So the circularity is universal, not specific to our minimal vendored subset.

Potential Work-Arounds:

Adversarial truth-restoration ~ STRONGEST, genuinely non-circular.
- Introduce a manipulator not seeded from true_p which pushes price toward a target away from truth; measure whether the informed population restores truth, how fast, and at what cost to the manipulator.
- Non-circular because it tests truth winning a contest against a non-truth force – the “can the sponsor rig-it” question
Real-market dynamics benchmark – THIN, non-circular.
- The sim’s convergence shape/speed/event-response compared to real resolved markets, whose dynamics are independent of our generator. Genuinely external, but the data is severely limited – usable for a qualitative “does it move like a real market” sanity check, not a powered statistical claim.
Calibration across a suite – NECESSARY, not sufficient, semi-circular.
- Run many markets at varied true_p, resolve Bernoulli(true_p), score price vs realized outcomes (Brier/reliability curve). This can falsify – it catches systematic miscalibration from LS-LMSr spread bias, finite-N signal-mean error, or maker skew – but a pass is weak evidence because price = true_p and outcomeBernoulli(true_p) make calibration hold nearly by construction. Frame it as a bias detector / falsification gate, not proof.

In the circular setup, the market seed, informed agents’ beliefs, and everything else all pointed at true_p, so convergence was mechanical. The fix is to deliberately separate them so that “price tracks the informed cohort” and “price tracks the agents/seed” make different, distinguishable predictions – then show which one wins.

seed the market at a neutral or deliberately-wrong value
given the informed agents a belief that differs from the seed
make the noise agents directionless (not centered on the informed belief)
ask: does the resting price move to the informed belief, or stay at the seed / wander with the noise?
- If informed-controlled and agent-controlled predicted the same price, the test proves nothing
- Forcing them apart is what converts it from tautology into a real, falsifiable experiment about who controls the price

Build 1: Validation Harness (post-recon).

Non-circulatory design principle: market seed, informed agents’ belief (p_informed), and the noise distribution MUST be set independently so that “price tracks informed” and “price tracks seed/agents” predict DIFFERENT prices.

Seed the market at a NEUTRAL or deliberately-OFFSEt value (i.e., 0.50, or a value far from p_informed); give CredentialedTraders a belief p_informed that DIFFERS from the seed; keep NoiseTraders directionless (not centered on p_informed). If the experiment is set up so informed-controlled and agent-controlled give the same prediction, it proves nothing and must be rejected. Never seed at p_informed. Never let the agent defaul/seed coincide with the informed belief.

Results: the mechanism’s core claim – a thin informed minority controls a stable price that a neutral agentic substrate makes liquid – is demonstrated non-circularly, with the threshold (~2%) honestly reported as a best-case floor that Pillars 2-4 will pressure-test upward.

A small informed minority is pivotal – it controls the market’s resting price against a large directionless agentic majority. Price settled at ~0.748 from a 0.50 seed toward the informed belief of 0.75.
The pivotal threshold (control ≥ 0.9) is ~2% of the population – but this is the floor, an artifact of two favorable conditions: a near-perfect informed signal (informed_sigma = 0.01) and directionless (mean-zero) noise, so the informed cohort is the only directional force in the market
The decline is monotone – even at 1% (2 agents) price is pulled 82% of the wya to the informed belief, though it stops settling tightly.
Depth changes stability, not control – as liquidity/subsidy rises, the control metric stays flat (~0.984) but path volatility and worst single-tick jump fall quickly. This reproduces the initial cold-start finding: with no depth the first trade moves price unilaterally (0.32 jump); deep markets move smoothly and hold.
the experimental separation is real, not nominal – seed (0.50) ≠ informed belief (0.75), offset enforced in code (post_init) refuses to run if they coincide), noise genuinely directionless, and the control metric distinguishes “tracks informed” (1.0) from “stuck at seed” (0.0). Thus, the result is non-circular.

Flagged Caveats:

informed flow is still a static-signal proxy (sampled once at init) – carried forward, not hidden.
2% threshold reflects non-adversarial noise only; the real “can the sponsor rig it” test is Pillar 2 (adversary pushing against the informed cohort), where the threshold will rise
A new dependency (pandas) was pulled in by the mandated vendoring of convergence_tick –– flagged, pinned, added to requirements.txt
b-runaway observed at low informed fractions (b grew x140-243); labeled, doesn’t affect the pivotality conclusion (price still settles)
informed_sigma = 0.01 models experts as near-identical and near-correct – unrealistic; real experts disagree. (This is why I initially flagged adding a realistic-dispersion robustness cut to Pillar 4).

Build 2: Adversarial Manipulation-Resistance

This tests whether the informed cohort WINS a contest against a non-informed pusher – non-circular by construction. Relabeling holds (p_informed = informed cohort belief, never “truth”).

Setup – 3 distinct points so the resting price discriminates among outcomes:

seed at p_seed (i.e., 0.50). informed believe p_informed (i.e., 0.75). adversary pushes toward p_target chosen DISTINCT from BOTH (i.e., 0.30 or 0.95) so “informed wins” (price —> 0.75), “adversary wins” (price —> p_target), and “stuck” are all distinguishable. asset p_target ≠ p_informed ≠ p_seed in code.
extend the vendored single-shot AdversarialTrader to a SUSTAINED, TARGET-SEEKING pusher: it trades repeatedly to drive and HOLD price at p_target, with a configurable capital/size budget.

Result:

Answer to “can a sponsor rig it” – theoretically no. The frontier discriminates correctly (3 distinct anchors: seed 0.50 < informed 0.75 < target 0.95, so informed-win and adversary-win land at different prices).

At any finite budget tested (≤1000), the informed cohort defeats the adversary at every fraction down to 2% – the adversary exhausts its capital AND informed below ~5-10%; at ≥20% informed, the price holds even against an infinite-budget attacker.

The agent proved the experiment isn’t rigged toward the informed by showing the adversary genuinely wins in the unlimited-capital + tiny-informed corner. That’s the credibility feature – a test where the defender always wins is not a good test; this one involves the defender losing exactly where theory says it should.

Build 3: Substrate Neutrality

Testing whether my own agents are secretly setting the price. Once this lands, I’d have close all three versions of the skeptical questions I outlined earlier today: outsiders can’t rig the price, the minority of informed traders that should control price do, and my own population imposes no meaningful direction.

Build 4: multi-seed CI bands; realistic-dispersion pivotality cut; fixed-b market variant; widened adversary budget to determine the finite-budget win-transition

Cost-to-manipulate came back LOW, and the agent reported it straight instead of softening it. Net cost-to-hold is ~~$154-$483; the adversary holds *fair-valued inventory, so the real barrier to rigging is gross capital (~~$8.5k),* not a net loss they absorb. In order words, an attacker doesn’t lose much to manipulate - they tie up capital in a position that’s worth roughly what they paid, and the ~$20k finite-budget figure flips a 2% market..

What’s found to be defensible: who wins is NOT identical between fixed-b and standard. The LS-LMSR b-growth – the thing that looked like an accounting feature – is itself part of the defense: as the adversary trades heavily, b grows, which amplifies informed flow’s ability to push back.

So in a real (standard market), the adversary’s ~$20k flips a 2% market but a 5% market only partially flips even at $100k.

The net cost to manipulate is low, but the gross-capital barrier plus LS-LMSR’s b-growth defense means that beyond ~5% informed participation, even a $100k adversary can’t fully rig the price.

RECAP

Initial Goals:

Compile / Build out the agent-trader population – the most proprietary part of the model and the part we’d own end to end.
Treat integrity, A/B testing, and neutrality vs. analogous markets as the core concern, because a sponsor setting the price is a valid point of concern.
Decision made up front: prove convergence / control with the existing two-class population before adding complexity (benchmark-first, no new archetypes).

Key Reframes & Conceptual Shifts:

I realized that any synthetic agent population converging to a true_p it was seeded from is fundamentally tautological. In a fully synthetic sim, “truth” is the generator, so no internal metric escapes circularity.
The solution: the sim’s job isn’t to prove the price is correct. The VSA market doesn’t originate truth – it aggregates and amplifies the small real informed population that already exists in a TA/modality. The agents + sponsor subsidy are a liquidity substrate, not an information source.
So the claim changed from “our agents converge to truth” (circular, indefensible claim) to “a small informed minority controls a manipulation-resistant price against a large neutral agentic majority”
- inherently a mechanism-rooted claim, non-circular, more defensible
Relabeling enforced everywhere: true_p —> p_informed (“the informed cohort’s belief”); never “truth” in code, output, or artifacts.

What the recon found (before building)

lmsr-preclinical-markets already contained most of the validation scaffolding (Brier, k-consecutive convergence, attack/recovery metrics, a single-shot adversarial agent, a 2,400 run sweep) – none were previously vendored into this repo.
The circularity was universal, baked into every repo’s metrics (all Brier scored vs the latent parameter, never a realized outcome) – not specific to my code
The current population is two classes: CredentialedTrader (informed, static signal, sampled once at init – just a labeled proxy) + NoiseTrader (directionless). Both move price only through the LS-LMSR cost function (THERE’S NO ORDER BOOK).
The real-market comparator (kalshi-polymarket-microstructure) is severely data-limited (no recorded outcomes, no trade tape) – supports only an illustrative “looks market-like” check, not a powered claim. This was intentionally dropped from the build because it tests correctness, not mechanism.

What was built – the four-pillar harness (validation/)

Pillar 1 – informed-minority pivotality: sweep informed:agent ratio with seed ≠ infgormed belief (enforced in code), directionless noise, a control metric that distinguishes “tracks informed” (1.0) from “stuck at seed” (0.0). Plus a depth/stability test.
Pillar 2 – Adversarial manipulation-resistance: extended the single-shot AdversarialTrader into a sustained, target-seeking pusher; ran a 2D frontier (informed fraction x adversary budget) of who wins; restoration metrics repointed to p_informed; a fixed-b variant built as an instrument to de-entangle cost from b-runaway.
Pillar 3 – Substrate Neutrality: zero-informed (pure-agent) test for directional drift + graceful-degredation test.
Pillar 4 – Robustness: multi-seed % CI bands (idiom attributed from compute_ci_band, reimplemented not vendored – the source was truth-coupled) + a numpy-only bootstrap A/B comparator. Plus realistic dispersion and widened-budget cuts.
Vendored with provenance: metrics.py, adversarial.py (verbatim, repointed to p_informed). New dep: pandas (pulled in by the vendored convergence_tick; flagged, pinned). Committed: 15 files, private repo, data/ gitignored (artifacts regenerate from seed).

Discovered:

Pivotality threshold is ~2-5% under a near-perfect signal – but this is a FLOOR, not the headline. At realistic expert disagreement (σ = 0.10) the CI bands widen sharply – pivotal in expectation, but a tight per-instance guarantee needs a larger cohort.
The adversary genuinely wins in the unlimited-capital + tiny-informed corner – proving the test isn’t rigged toward the defender
Cost-to-Manipulate is LOW. The fixed-b instrument showed net cost-to-hold is only ~~$154-$483; the adversary holds fair-valued inventory, so the real barrier is gross capital (~~$8.5k), not a loss they absorb.

June 25, 2026

product architecture / thesis Mechanism

Mechanism

Sponsor subsidizes a continuous LS-LMSR market to discovery the probability of a discrete R&D milestone; that price reprices a financial instrument tied to the firm’s asset/IP; the repricing benefit justifies continued subsidy.

Value Prop: coupling between the instrument the market trades (milestone position) and the instrument that gets repriced. Every layer below exists to facilitate that coupling and enforce verifiability.

This is an issuer model, not a protocol.

Layers

Asset – turns a milestone claim into a sponsor-funded tradable position, and issue the repriced instrument it’s coupled to.
- dual-layer / RWA tokenization
- LS-LMSR for illiquid markets
- Open Question:
  - Which instrument to originate? CVR-like royalties, SPV tranche, milestone note?
Trust – cryptographically prove the discovered price is neutral – not self-dealt by the sponsor, not picked off by faster capital.
- microstructure (builds 1-5)
- verifiable / neutral clearing mechanism
- Open Question:
  - Is the proof legible to a non-crypto buyer (i.e., auditor, CFO, etc.)?
Market behavior – how the market prices, behaves, and resists manipulation; regime-condition clearing at catalyst windows
- event-contract microstructure
- FBA counterfactual
- Open Question:
  - Does a thin, sponsored market price hold up against a real catalyst spike?
Resolution – how a milestone market ends – wiring the endpoint to an authoritative source, with a dispute path
- Open Question:
  - who adjudicates an ambiguous readout, and on what evidence?
Sponsor Economics – who pays, why, and whether the price is worth more than the subsidiy. this is the demand side
- Tetlock-Hahn applied
- coupling test
- Open Question:
  - Does the sponsor benefit exceed the LMSR max loss for a real asset?

Decision Point – Coupling Mechanism

“Price discovery reprices an instrument” can mean a ton of different things. Essentially, this coupling mechanism (and it’s neutrality / verifiability) is the true value proposition here.

Informational Coupling – price acts as a reference number which the sponsor may choose to consider. Essentially selling a forecast. Customer in this case is an IR/comms team.
- quite weak
Valuation-standard Coupling – the price feeds a recognized fair-value methodology (i.e., ASC 820 Level 4, 409A, fund NAV).
- defensible on an otherwise-unmarkable asset.
- Customer in this case is a controller / auditor / fund CFO
- Illiquid biotech R&D presents the issue where ‘marks’ are just speculations
Contractual / mechanical Coupling – the instrument’s terms reference the price – conversion, coupon, NAV mandate, CVR interim transfer value. This way, real money moves with the probability.
- Customer in this case is a CFO / treasurer
- This is the strongest case as it requires an issuer to manage it. It makes the prediction-market load-baring by construction.

You can’t bind a contractual coupling onto an instrument you didn’t structure yourself. The path to being an issuer and the path to building a strong coupling mechanism are the same.

Looking at the asset layer, I’m not proposing to ‘tokenize a milestone claim,’ but rather ‘originate a repriced instrument whose valuation is methodologically or contractually bound to the discovered price.

Decision Point – Repriced Instrument

CVR (contingent value right).
- Reprices via interim transfer value.
- This makes sense if you want the purest milestone-coupled security and a secondary-trading use case
- This doesn’t make sense if you need a primary financing event – CVRs are deal byproducts, not capital raises
Milestone-contingent note / SAFE-like
- conversion price or coupon steps on the probability
- This makes sense if the sponsor is raising capital now against the milestone (clean cost-of-capital narrative)
- This doesn’t makes sense if the sponsor isn’t actively financing; nothing to attach to
- ONLY OPTION WHERE COUPLING DRIVES A LIVE FINANCING DECISION && THE CONVERSION-REFERENCES-PRICE MECHANISM IS CONTRACTUALLY CLEAR
Single-asset royalty interest
- NAV / mark for the holder
- This makes sense if the holder is a fund needing defensible marks (recurring buyer)
- This doesn’t make sense if the asset has no royalty structure yet – we’d be creating cash-flow rights from scratch
Single-asset SPV equity tranche
- NAV mandate
- This makes sense if you want full control of the wrapper (closest to the dual-layer model I previously built) and a captive sponsor
- This doesn’t make sense if the regulatory/setup cost is too high for an initial model

The royalty/SPV are the recurring-revenue versions to potentially expand into; the CVR is a weak commercial thesis.

Regulatory & Accounting Considerations

Of course, a market that reprices a security is making a fair-value claim with securities-law and audit consequences. I.e., the stronger the coupling between layers, the heavier the regulatory concern.

Tetlock-Hahn, made concrete, is as follows: the sponsor’s benefit must exceed the LMSR max loss they fund.

The benefit of that price discovery takes one of three forms (matching the customer classes above).

usable forecast
cost-of-capital reduction
defensible Level-3/NAV mark

Notably, the subsidy side is quite favorable in biotech specifically: a strong base-rate prior (clinical phase-success by indication) means the maker only has to price the residual uncertainty, so the max loss is small compared the value of the decision/valuation produced.

Decision: underwriting on cost-of-capital makes the most sense, as the defensible mark is inherently complementary. It’s hard to quantify ‘usable forecast’. CoC is the only metric that’s large, quantifiable, and recurring

a credible continuous probability on a milestone lets the sponsor raise milestone-contingent capital at a tighter spread, or mark an existing instrument higher, and that ∆ is promising.

ASC-820 (US fair-value accounting): ranks valuation inputs in three tiers:

Level 1 = quoted prices in active markets (a liquid stock)
Level 2 = observable inputs other than quoted prices (comparable trades, yield curves, etc)
Level 3 = unobservable inputs (mgmt’s internal models & assumptions)
- all early-stage R&D assets, CVRs, or private royalties are Level 3; marked by a DCF with a internally-derived PoS the holder picks (or an advisor picks, i.e., an investment bank)
- the value-proposition here is that the market-price derived by an on-chain, verifiable / neutral prediction market sponsored by the holder can replace that internally-derived PoS with an observed one

Potential Revenue Model

Some combination of issuance fee, subsidy management, and a take on the market.

Risks

The issuer / mechanical-price-coupling model is a high regulatory load. 3 major risks:

Auditor Acceptance – if a thin sponsored price is not a recognized fail-value input, the valuation-standard and mechanical products both collapse to a ‘forecast’.
Conflicted-Issuer Problem – you issue the instrument, run the market, and the sponsor funds the subsidy
- verifiable neutrality is key to make that trustworthy, and even a clean proof may not beat the optics for a conservative counterparty.
Issuer Model Concentrates Regulatory & Legal Cost

Integrated Codebase for v0:

Goal: a self-contained end-to-end simulation of the sponsored-aggregation loop – a sponsor funds an LS-LMSR subsidy; agents trade a milestone market; price path; a milestone-contingent NOTE reprices via conversion (contractual coupling); sponsor’s realized cost-of-capital ∆ vs. a no-market baseline; a neutrality-proof artifact.

Findings:

The verdict is KILL in both regimes tested. Under heavy informed flow, the maker did start being adversely selected (deepening flipped True), but it stayed immaterial in dollar-figures (~$2.79) because an LMSR reprices on every trade, so a sharp informed wall makes the price efficient fast rather than leaving the sustained post-fill drift that large negative markout needs.

In other words, my simulated sponsored LMSR maker is structurally more robust to adverse-selection than the CLOB makers my initial monitor measured on live Polymarket orderflow.

Live Deployment: https://aggregation-demo-cyan.vercel.app/

Clarifications

What “issuer, not protocol” means.
- Issuing the note means creating and selling the security – acting as the legal entity that structures the instrument, defines its terms and places it with investors. A protocol, by contrast, just provides infrastructure others use to issue (i.e., Uniswap doesn’t own the tokens traded on it).
- We can only write a binding coupling onto an instrument we control the terms of.
Sponsor Self-Dealing – what that actually looks like
- sponsor’s paying for and standing up the exact market whose price determines how much their asset is worth.
  - motive & means to push price in their favor by seeding trades
  - i.e., seeding the market at an inflated prob.; trading into their own market through a sock-puppet to lift the price before a financing; choosing market parameters (liquidity level, resolution source, when the market opens/closes) to bias the outcome; quitely halting/restarting if the price moves against them
  - proved via ‘material harm to the maker’ / post-fill markout
Sponsorship/Liquidity Flow – how companies can fund a working market.
- An LS-LMSR market doesn’t match buyers to sellers as in a CLOB ; an AMM quotes both sides continuously, and someone has to fund that maker’s worst-case/base loss. That funder is the sponsor.
- Mechanics:
1. the sponsor deposits a subsidy (the LMSR’s bounded max loss – in my simulation ~$2.86 for the chosen liquidity parameter; in reality a function of how much depth the sponsor wants for credibility sakes).
2. That subsidy capitalizes the automated maker, which then stands ready to take the other side of any trade at the LMSR-determined price.
3. Now the agent trader population transacts against the maker: the informed human traders (TA specialists, analysts, people with any read) buy or sell based on their private view, moving the price toward their belief; the agentic/noise flow trades for non-informational reasons & provides churn.
  - The maker absorbs all of it – its price updates after every trade per the LMSR cost function. The sponsor’s subsidy is what makes this possible – without it, there’s no counterparty and (per Milgrom-Stokey), no market.
4. If the milestone resolves and the maker ran a profit, that returns to the sponsor. If it lost up to the bound, that loss is the price the sponsor paid to buy a credible probability.
  - The subsidy is the cost of information, not a trading loss. (Tetlock-Hahn)
How Cost of Capital ∆ is measured.
- WACC is just the rate of return a financier demands to provide money, expressed as a %. If a biotech raises $1M and the investor requires 15% return, the cost of that capital is $150k. Lower cost of capital is obviously in the best interest of the company.
- In this model WACC is calculated as r = r_base + π, where r_base is the baseline financing rate (what any borrower pays) and π is the risk/information premium – the extra return the investor demands because they’re uncertain about, and distrustful of, the milestone probability their note is priced on. The ∆ (the saving0 is measured as a straight before/after comparison of the financing cost of the same note:
  - Baseline: the note is priced off the sponsor’s self-reported prior probability, and because the investor distrusts a self-reported number, they charge a high premium π_high (12% in the sim – although that’s an average WACC, or Ke, for any clinical-stage private biotech, in fact on the lower end). Financing cost = note_value(p0) • r/(1+r) with that high rate —> $202,474 in the sim.
  - With-market: the same note is priced off the independently verified market probability, the investor trusts it, charges a lower premium π_low (4%, improbable for biotech, needs to be adjusted for realism) —> $133,373 in the sim.
  - ∆ = baseline – with-market = $69,101. That’s the proposed saving for the sponsor.
- The saving results from the level channel (the probability itself changed, p0 0.70 —> p_mkt 0.728), which moves the note’s value – generic to any forecast, and in my simulation slightly negative – versus the premium channel (12% —> 4% at the same probability), which only exists because the estimate is verifiable.
  - The saving lives in the premium, not the probability itself. A better point estimate isn’t that valuable, trust in the estimate is what generates value.
  - Important: note_value • r / (1+r) is a sylized one-period cost of carry, not how a real convertible’s WACC is computed (which really involves the conversion option’s value, dilution, time to milestone, discount rate). My simulation is a mechanistic demonstration, not a true pricing model.
Verifiable Neutrality & Use-Case
- In the orderbook clearing context, neutrality is used to prove the maker wasn’t picked off (adversely selected), or prove the mid price was fairly computed.
- In a stakeholder market, however, “neutral” doesn’t mean “the maker didn’t bleed value”, it means “the sponsor didn’t rig the price their about to reprice their own asset against”.
  - Verifiable Neutrality is the cryptographic answer to every self-dealing vector considered:
    - prove the market was seeded at a declared prior and not silently re-seeded;
    - prove the sponsor didn’t trade into their own market;
    - prove the resolution followed the pre-committed source and rule;
    - prove the parameters weren’t changed mid-flight;
    - prove the price the note references is the actual market-cleared price and not a cherry-picked snapshot of trade convergence
- In other prediction markets (Polymarket, Kalshi), no participant is pricing their own security off the result, so neutrality is a nice-to-have feature. Here, the sponsor is, so the price is only worth anything to an external party – the investor or auditor – if that party can verify the incentive-crossed sponsor didn’t author it.
  - Without proof, you can’t claim a “market price,” just a number the sponsor produced internally.
  - Verifiable Neutrality is what converts π_high into π_low.
Note mechanics / repricing logic
- Traditional milestone-contingent convertibles (SAFE-style): a sponsor raises cash now (face F) by selling a note that doesn’t pay interest like debt; instead it converts into equity at a future event – here, the milestone resolution.
  - The note specifies what the holder gets in each outcome: if the milestone hits (YES) the note converts into equity worth more (because the asset is de-risked); if it misses (NO) it converts into less.
  - Traditionally, the conversion terms are fixed in the contract, negotiated up front off the parties’ guesses about the PoS.
- Value Proposition: repricing off the continuous market. Instead of fixing conversion off a guessed probability, the note’s conversion economics reference the live market-implied probability p.
  - note-value(p) = g(p) • [ p • V_yes + (1-p) • V_no ]
  - That last part in [ ] is the probability-weighted expected conversion value – what the equity you convert into is worth on average given probability p (Y_yes = $1.4M, V_no = $550k).
  - g(p) is the conversion ratchet – a multiplier that runs from 0.97 (at p = 0) to 1.10 (at p = 1), so a higher market-implied probability earns the holder slightly more favorable conversion. As the market price p moves, note_value(p) moves with it – that’s the contractual coupling, and it’s why panel 3 of the demo lets the user scrub p and watch the value reprice.
    - The point is that conversion isn’t a negotiated guess anymore; it’s bound to an observed, continuously-updating, verified probability.
- A verified price for an asset is cheaper to borrow against because when an investor buys a note, they’re taking on the risk that the probability it’s priced at is wrong. Two sources of uncertainty, science & valuation integrity. If the probability is the sponsor’s self-reported figure, the investor assumes it’s optimistically biased, so they pad their required return with a large premium to protect against being lied to (π_high). If the probability comes from a market the investor can verify was neutral – not seeded by the sponsor, not traded by the sponsor, resolved by a pre-committed source – it’s trustworthy.
  - Not enough uninformed hedging flow for a completely unsubsidized LS-LMSR market, not enough liquidity.
  - So we can try to build a subsidized prediction market on an asset that’s verifiably neutral, so as to have the same valuation effectiveness & trust as an unsponsored true market.
  - Lower premium = lower required return = lower WACC = sponsor raises the same money for cheaper.
- Quiet vs. stress/catalyst regime:
  - two distinct market condition, different levels of informed trading
  - Quiet: ordinary times – little new information, mostly noise/agentic flow, price drifts near prior
  - Stressed/Catalyst: high-information period – trial readout, PDUFA date, data release – when informed traders with real signal pile in and the price moves sharply.
  - I’m proving that an LMSR maker holds up without any adverse selection / - markout in both regimes – this is the value of credibility compression
  - The regimes are my stress-test, and passing both is what showed me the differentiator isn’t “we protect the market under fire,” it’s “we make the price trustworthy regardless.”

Public Sim Change Log (@ vsamarkets.dev)

Change header line, currently “A five-step walkthrough for a sponsor / CFO. Every figure below comes from one simulation run,” to something more descriptive of the larger goal.
- Changed to “Price discovery as infrastructure – using prediction markets to reprice the illiquid, tail assets that traditional valuation can only estimate.”
- Thesis line: “We believe a market’s purpose is truth convergence and the repricing it enables — not liquidity for its own sake. Most prediction markets end at the bet; ours begin there, turning a verified probability into a repricing of real assets. We build for the illiquid tail others skip — milestone-driven, low-uninformed-flow markets like early-stage life-sciences R&D — using agentic trading, LS-LMSR market-making, and cryptographic neutrality as the ingredients for societal repricing where traditional valuation can only guess.”
Cosmetic: color scheme

kalshi-polymarket microstructure

June 2, 2026

latency thesis based in crypto, FX, & HFT precedents Physical path in prediction-market latency arbitrage

Physical path in prediction-market latency arbitrage

Most content in this space exists in vendor blogs, repos, and a live latency-monitoring service, not academia/research. Academic rigor exists in adjacent domains – HFT equities, FX, etc. Doesn’t seem to be any formal bridge to Kalshi/Polymarket.

hyperlatency.glassnode.com

exists to help traders decide where to co-locate. probes RTT worldwide to crypto exchanges, blockchain validators, oracle gateways, & prediction markets directly.

publishes the server origins I’d otherwise have to reverse-engineer (i.e. PM’s CLOB at clob.polymarket.com origins froms AWS eu-west-2 in London; Kalshi at api.elections.kalshi.com origins from AWE us-east-2 in Ohio)
explains our calibration smoke test results – my machine measured Kalshi ~20ms / Polymarket ~95ms – and now I know why. Kalshi’s servers in Ohio and PM in London. The ~38ms differential isn’t noise, it’s geography.
co-loc problem is quite difficult to address

vendor ecosystem: VPS providers selling sub-millisecond proximity

definitely demand here. multiple providers sell prediction-market-specific VPS hosting, and their published numbers are concrete and consistent:

from a US East Coast connection, round-trip latency to PM’s CLOB averages ~130ms; from Dub it drops under 5ms; from Amsterdam, ~10ms
a server in the same facility as PM’s infra can reach at 1-5ms, versus 20-50ms for a VPS in the wrong region and 150ms+ for home internet. They cite Equinix LD4 in London at 0.56ms and NY4 in New York at 0.36ms as the co-location prime.
one provider claimed under 0.5ms latency to PM’s CLOB from a Dub VPS, marketing Dublin as the closest unrestricted location to PM’s London AWS

A Dublin or London cloud VM would collapse my PM RTT from ~95ms to single-digit. However, it’d also increase Kalshi RTT due to geography trade off.

Market design caveat!

Latency-arb used to be real and large enough that the venue itself intervened. On Polymarket’s zero-fee 15-minute crypto markets, bots monitored small delays between PM’s internal pricing and spot prices on Binance/Coinbase, entering near 50/50 and exiting once prices converged – at least one wallet executed thousands of trades in a single month with high success rate.

PM introduced dynamic taker fees in early 2026 specifically to curb latency arb on 15-minute crypto markets.

dynamic taker fees then funded the maker rebates program – i.e., they taxed the latency takers to pay the liquidity providers, which is the market-design move predicted in Budish et al.

Adjacent Findings:

Budish, Cramton & Shin – “The High-Frequency Trading Arms Race” (QJE 2015)

Aquilina, Budish & O’Neill – “Quantifying the HFT Arms Race” (2020).

CLOB is a flawed design: at high-frequency horizons, cross-market correlations break down, creating mechanical arb opps for whoever’s fastest, and competition doesn’t shrink the opportunities, it just raises the speed barrier needed to capture them
empirical paper found latency-arb races are very frequent (about one per minute per symbol for FTSE 100 stocks), extremely fast (modal race 5-10 millionths of a second), account for substantial portion of trading volume, and are concentrated – the top 6 firms with 80% of races. each race is worth ~half a tick, but volume increases stakes

Budish’s proposed solution: frequent batch auctions – discretize time so tiny speed advantages stop mattering. PM’s 15-minute and 5-minute markets are already a crude form of time-discretization. So there’s a novel research angle here – do prediction markets’ short-duration-binary structures function as accidental frequent-batch-auctions, and does that change the latency-arb dynamics versus a continuous CLOB?

Infrastructure Approaches:

PTP hardware timestamping and kernel-bypass are now available on commodity cloud. AWS published a detailed tick-to-trade latency guide for digital-asset trading that documents the following:

PTP Hardware Clock (PHC) on supported EC2 instances tightens clock error to typically under 40 microseconds, with hardware packet timestamping attaching 64-bit nanosecond-precision timestamps at the Nitro NIC level.
Kernel-bypass via DPDK, AF_XDP zero-copy, and SR-IOV, plus network-optimized instances that cut p99.9 tail latency by up to 85%

No longer need dedicated physical fiber, c6in/m6in EC2 instance with PHC enabled and DPDK configured is enough. For measurement purposes, PTP hardware timestamping on a cloud CM is the single highest-leverage upgrade. Could replace software clock (the source of the jitter that sets the existing ~100ms floor) with sub-40-microsecond hardware time, collapsing the clock-uncertainty component of my floor by orders of mag.

The network-path differential would remain, but I could measure it with hardware-grade timestamps rather than jittery software ones.

Prior Open-Source

Binance —> Polymarket latency-arb bot that exploits the 2-10 second lag between Binance real-time BTC prices and Polymarket’s 5-minute BTP up/down odds, with explicit advice to deploy on a Dublin or London VPS for lowest latency. architecture is similar to here: separate feed handlers, test/live executor split, results.csv logging
Polymarket x Kalshi systematic-arb writeup on dev.to with a “production-grade architecture” section, focused on short-horizon BTC markets

Takeaways:

~100ms floor is now explained and bounded by geography / physical datacenter locations + software-clock jitter. PTP hardware timestamping on a cloud VM would kill the jitter component to tens of microseconds; it can’t kill the network-path component, because the two venues are on different continents and no access point is close to both. Even this build leaves us with an irreducible geographic differential we must measure and subtract.
goal now is to document the floor dynamically – there’s a real gap. The latency-arb-on-prediction-markets material is all vendor marketing and bot repos; the academic coverage stops at equities & FX.
- Could address the cross-venue latency structure of Kalshi vs. Polymarket, document the irreducible transatlantic floor, and explain why retail can’t close it. conclude with how it relates to Buddish’s arms-race and to Polymarket’s dynamic-fee countermeasure
practical upgrade: incremental and budget-scaled
- software continuous-probing + exchange-timestamp arbiter
- cloud VM at a chosen vantage to measure from a known entwork position instead of home connection
- PTP-enabled EC2 instance to kill clock jitter

Looking Back –

Initially interested in market-structure/liquidity layer theses. This week’s work on Kalshi/Polymarket has been focused on characterizing the microstructure of existing prediction-market venues and discover where edge could live – arb dead at accessible fees, LP edge dead to adverse selection, latency edge gated by co-location and an irreducible transatlantic floor. No necessarily a deviation from the market-structure focus, but an empirical foundation.

I can’t credibly propose a better market-structure layer for a new asset class without first understanding why the existing venues’ microstructure produces the inefficiencies it does.

Kalshi-Polymarket formalizes what’s broken and why.

Retail-vs-Institutional Latency-Arb Question: Measurements found that cross-venue edge exists but only at the infrastructure/fee tiers that retail can’t reach – institutional fees, co-location, sub-ms timestamping. This is an empirical finding about access asymmetry, and it’s the seed of a thesis: if the edge is structurally gated to institutions, is there a market-structure design (layer) that either democratizes access to it or eliminates the rent entirely? This would be a bridge from measurement to infrastructure.

Substack Framing:

Equities/FX-native formalization already exists (Budish, Cramton & Shim; Aquilina, Budish & O’Neill). The continuous limit order book mechanically generates latency-arbitrage rents that accrue to the fastest participant, competition raises the speed bar rather than competing the rent away – FBAs are a potential fix. They can discretize time so that speed advantages below the batch interval stop mattering. The empirical FX/equities work quantifies the races (=1/min/symbol, modal race microseconds, top firms win ~80% of races in these markets).

I’m interested in whether prediction markets’ short-duration-binary structures function as accidental frequent-batch-auctions.

Polymarket’s 5-minute and 15-minute crypto markets are time-bounded – since they resolve at a fixed instant. It’s essentially a crude form of the time-discretization that Budish prescribes as the fix to the latency arms race.
They’re still not pure batch auctions – still match continuously within the window via a CLOB. New question is whether the short resolution horizon changes the latency-arb dynamics relative to a continuous, indefinite CLOB – even though the matching engine is still continuous?
I’ve thus far measured that Polymarket already introduced dynamic taker fees specifically to kill latency-arb on these short-horizon markets. That’s a venue choosing a fee-based countermeasure rather than a batch-auction one – which is itself a datapoint about whether the short-horizon structure suffices. I guess it didn’t since they needed to implement the fee.

New Question: Does a short-duration binary’s fixed resolution horizon attenuate latency-arbitrage rents relative to a continuous CLOB – and if not, what does that tell us about whether time-bounding is a substitute for, or merely a complete to, the frequent-batch-auction fix prescribed in Budish et al?

May 31, 2026

Kalshi websocket client build Rationale behind setting up a WS client for Kalshi orderbook data & messages is to remove the 1.5s sampling asymmetry…

Rationale behind setting up a WS client for Kalshi orderbook data & messages is to remove the 1.5s sampling asymmetry so I can make sub-second lead-lag claims. However, sub-second claims are require the previously discussed clock-sync cross-check to pass – and on a symmetric WS-vs-WS capture, so I’m now comparing two venues’ local-receive timestamps where the network-latency differential between me <—> Kalhi and me <—> Polymarket becomes the dominant uncertainty at sub-second scale.

We previously saw ~340ms PM network jitter while monitoring the Colombia resolution event. If Kalshi’s path is even ~100ms different, a 200ms Kalshi lead could be attributed to pure network geography. Symmetric websockets results in finer sampling, but don’t automatically return trustworthy sub-second latency figures. I’d still need to characterize the network-latency differential before any sub-second lead is defensible.

An initial WS auth can be built, involving RSA-PSS signing, tested on Kalshi’s demo environment first per their guidance, confirming a clean handshake and real book message on quiet markets. I could then run the reconnect self-test. The true endpoint is seeing a validated symmetric client exists, not that any event was actually captured tonight.

Design Constraints:

Demo-first vs straight to prod: Kalshi’s docs recommend developing against a demo. I’m going to build and validate the auth handshake against demo-api.kalshi.co first, which will tell us whether the RSA-PSS signaling is correct, and if my account permissions are right. Then, I’ll switch to prod for deployment. The only caveat is that demo books might be thin and simulated, so the handshake could validate on demo, but the true capture runs on prod. I’ll take this two staged approach to the WS setup.
Kalshi’s official SDK v. hand-rolling the signal: Kalshi docs also mention an official async Python SDK (kalshi-python-async) which handles RSA-PSS signing for me. Hand-rolling gives me control and no dependency, but signing bugs are a significant concern and the SDK would eliminate them. Thus, I”ll use the official SDK for auth/signing, wrapping its WS in my existing connection-manager pattern (reconnect, heartbeat, dual-logging) so it slots into ws_leadlag.py.
Scope – replace Kalshi’s REST fallback in the existing ws_leadlag.py, or build a separate client? I’ll just modify ws_leadlag.py so Kalshi has a real WS path and keeps the REST fallback (if WS auth fails at runtime, degrade to the 1.5s REST that already works so as to not lose any capture).

May 31, 2026

[EXP 4b] – latency / lead-lag thesis EXP-4 requires websockets, which I currently don't have set up. Everything till now has been REST polling at 30s…

EXP-4 requires websockets, which I currently don’t have set up. Everything till now has been REST polling at 30s intervals. Latency/lead-lag measurements require sampling faster than the lag I’m measuring. I.e., if Kalshi leads Polymarket by even 5-20 seconds, a 30s poller is too slow to capture any lead.

Engineering risk associated with building websockets – requires persistent dual connections, reconnection logic, clock synchronization between two venue feeds, local-vs-exchange timestamp handling, message-ordering guarantees. Note that any clock-sync issue would produce incorrect lead-lag numbers.

Clock synchronization – claiming that any venue moved before the other, I’d need both timestamps on the same clock. I could either use my local receive-time for both, or use exchange-provided timestamps.

the lead-lag signal I’m trying to capture is probably quite large relative to network-latency differences (on the order to tens of milliseconds), but not large relative to possible inter-venue clock skew.
I’ll use local receive-time as the initial measure, but log exchange timestamps as well (where available), and make the build report both plus the implied discrepancy. Local receive-time is always available and is the honest measure of what a co-located agent would actually observe (which is more relevant for a trading agent since you trade on what you receive, not the venue’s internal clock).
- Then, the exchange timestamps can be used as a cross-check. if local-time lead and exchange-time lead agree directionally, I can trust it. if they diverge, i’d have found a network feature and wouldn’t subsequently trust the magnitude.
- Depending on any discrepancy, I can then determine if I’ll need to implement skew-calibration.
Dead/tail markets (CLE delisted; retirements at 0.02) wouldn’t generate enough updates to measure lead-lead, so I’ll focus on the ~6-8 live markets to stream.
Run duration / trigger:
- Option 1: stream continuously and extract move-events post-hoc, or
- Option 2: stream specifically around known catalysts (Colombia today, NBA Finals games). given today’s event is imminent and is a guarateed repricing catalyst, the websocket client should be built and tested beforehand, then run live during the event. which also makes it the higher-resolution complement to F.1’s 5s REST capture of the same event.
when elections open today at 6pm ET, I’ll run this command: --start-utc 2026-06-01T05:00:00+00:00

After running a poll check, returned 1.28% real-error rate, errors are a failure mode the websocket would have to survive.

ConnectError: connection reset by peer (1728x)
nodename nore servname not known (572x – DNS failures)
RemoteProtocolError: Server disconnected (56x)
as well as a 1h22m gap.

This is essentially 3.5 days of a mature, tested REST poller with retry logic. The network connection to these venues drops connections, loses DNS, and has multiple-hour gaps. A websocket client built and tested in the last 3 hours before a resolution event, with reconnection logic that hasn’t yet proved survivability with real drops, deployed as a primary capture on a one-shot event, is risky.

I’ll label the experimental websocket build as such – so as to maintain project credibility. This is a websocket lead-lag mesaurement from a client that hasn’t been validated against a clock-sync ground truth.

Plan:

run both during tonight’s resolution event (F.1 – tested + experimental websocket setup)
F.1 is the primary capture since it’s validated.
I’ll label the websocket data as provisional til the clock-sync cross-check passes.
At 6pm ET (22:00 UTC), I’ll launch F.1’s Colombia capture.
Simultaneously, I’ll launch the newer websocket client on the selected 6-8 catalyst-active markets, with reconnection logic, logging both local-receive and exchange timestamps.
Both will capture the full count window.
Tomorrow, I’ll build F.2 / EXP-4 analysis, cross-checking the websocket’s local-time lead against exchange-time lead. If they agree, the websocket data will have been validated, acting as the primary-source lead-lag figure, which I’ll deploy on the NBA resolution events thereafter.

Dual-venue websocket client build for cross-venue lead-lad (EXP-4b)

Goal: stream live orderbook updates from Kalshi & Polymarket on 6-8 active markets, logging both LOCAL-RECEIVE timestamps (primary endpoint) and EXCHANGE timestamps (cross-check). Built to run tonight during the Colombia first-round count, ALONGSIDE the F.1 REST capture which remains the capture of record. This client is experimental until its clock-sync cross-check passes.

Time constraint: needs to be deployable in <60min. Should favor a correct minimal client that survives reconnection errors over a feature-rich one that doesn’t. Reconnection resilience is NON-NEGOTIABLE (the REST daemon loggied 1728 connection-reset + 572 DNS failures over 3.5 days – connection should be presumed to drop tonight).

Markets (6-8 active/catalyst, read condition_ids/tickers from markets.yaml):

Colombia: intl_president_co_aesp, intl_president_co_pval, intl_president_r1_co_ica (tonight’s catalysts)
NBA live: nba_finals_okc, nba_finals_sas (active series, frequent price moves)
Peru: intl_president_pe_rpal (deepest book, moves)
Optional: intl_mayer_kr_oseh, us_mayor_la_kbas

Venue WEBSOCKET endpoints – verify current endpoints/subscribe formats from each venue’s API docs before building (no assumptions):

Kalshi: websocket orderbook/ticker channel (needs auth? confirm – if it requires API key auth and we don’t have ws auth set up, fall back to fast REST poll loop for Kalshi at 1-2s and note; don’t want to block the build)
Polymarket: CLOB websocket (market channel). Confirm subscribe message shape and whether it pushes full book or deltas

If either venue’s ws can’t be stood up in time, that venue degrades to a 1-2 REST loop and the client logs which mode each venue is in. A working asymmetric capture is better than a broken symmetric capture.

Client Requirements (scripts/ws_leadlag.py):

Two concurrent connections (asyncio). Each writes every update to a shared append-only log
per message logged: local_recv_utc (tz-aware, time.time() at receipt), exchange_ts (venue-provided, nullable), venue, market_id, best_bid, best_ask, mid, raw_seq/update_id if provided.
Reconnection: on disconnect/reset/DNS-fail, exponential backoff reconnect (0.5, 1, 2, 4, 8s, cap, infinite retries), re-subscribe on reconnect, log a RECONNECT event row with gap duration. A dropped feed must self-heal, not die. Shouldn’t let one venue’s drop kill the other
Heartbeat: every 30s log a STATU line per venue (msgs received since last status, connection state, seconds since last msg). This is how I’ll be able to tell a market that’s quiet but alive compared to silently dead. This is crucial since pre-results books may be flat.
Output: data/raw/ws_leadlag/colombia_r1/.jsonl (append-only, crash-safe – flush per write). gitignored
Graceful SIGINT: flush and exit, log session summary (total msgs/venue, reconnect count, total downtime).
caffeinate-wrapped run command for the live session

Build Analysis: clean

reconnect gate passed (0.757s recovery, re-subscribe, one venue drop doesn’t kill the other)
both venues are live
timestamps tz-aware and dual-logged
trusted as experiment: Yes
ready to run during resolution event tonight

**NOTE: Kalshi’s websocket requires API-key auth even for public orderbook data, so Kalshi fell back to a 1.5 REST poll.**

This means the two venues are being sampled at different rates and through different mechanisms. Polymarket is being sampled via true-push websocket (sub-second, event-driven); Kalshi = 1.5 REST poll. Asymmetry could directly skew lead-lag measurement –

If Kalshi appears to “lag”, i can’t tell whether Kalshi actually repriced slower or whether the 1.5s poll just observed the move up to 1.5s after it happened while the Polymarket websocket caught its move instantly.
The measurement floor on any “Kalshi lags” signal is ~1.5s, and any lead smaller than that is unmeasureable / just an artifact of the measurement asymmetry
Conversely, a signal that Polymarket lags Kalshi is more clear, because Kalshi poll can only make Kalshi look slower, not faster – so if a PM lags even with this handicap working against that finding, I can likely trust it more.

Live Update: 21:15 UTC

Realized that polls close at 4pm UTC-5, which is 5pm ET = 21:00 UTC, not the initially thought 6pm ET I’d been waiting for. I ran commands at 5:12pm ET.

Found that preliminary results will begin flowing on the Registraduría platform shortly after the 4pm close. Counting is apparently pretty fast, with results clear before sundown.

I immediately launched the capture commands, with an immediate start each. I’ll deal with implementing the Kalshi API Key credentials for a websocket client build for Kalshi tomorrow, after the resolution event / repricing period ends (sometime tonight ET).

launched the websocket for polymarket (starts capturing on launch, no scheduled start): caffeinate -i uv run python scripts/ws_leadlag.py --out data/raw/ws_leadlag/colombia_r1
then launched F.1 with start-utc = now, not 22:00 as initially scripted: caffeinate -i uv run python scripts/poll_event_window.py --markets intl_president_co_aesp,intl_president_co_pval,intl_president_r1_co_icas --interval-sec 5 --start-utc 2026-05-31T21:15:00+00:00 --end-utc 2026-06-01T05:00:00+00:00 --label colombia_r1
the 30s daemon is already capturing – that’s the backstop in this analyses.

After ~10 minutes, both terminal commands were running with zero errors, zero reconnects. At this point I’m recording the event at three different resolutions.

Important Signal: PM message volume in the first ~2-3 minutes of polling (via websocket client for PM) was quite low and intermittent – last_msg gaps of 5.6s, 20.5s, 35.6s. That means the Polymarket book on the Colombia markets weren’t repricing much at this point.

A few minutes later, I saw +N values of 12, 18, 36, and 115, in a single 30s heartbeat with last_msg 0.1s ago. That was a 6x jump from the +18 a moment earlier. This showed PM repricing pretty hard. The onset was ~21:18 UTC; the sharp acceleration was in the last minute or two, and this spike is exactly the event I was trying to capture tonight, now being recorded at PM sub-second resolution.

Later, when I deal with implementing the API key credentials for websocket client build on Kalshi, I’ll note ~21:18 – 21:20 UTC as when the results began moving the books. F.2 will want that as --catalyst-utc; and i’ll pin it exactly from the jsonl later for exactly-second catalyst timing.

For now, I’ll let the websocket + REST poller at 1.5s intervals run. The asymmetry to keep in mind is as follows: PM is catching every sub-second tick of this repricing, while Kalshi is sampled every 1.5s. So I’ll be able to measure “did Kalshi’s 1.5s-sampled mid move before or after Polymarket’s tick-level mid” at seconds-scale resolution – which is the right scale for this catalyst. I’m interested in whether PM lags Kalshi (this would survive the sampling handicap) or the reverse (Kalshi lags PM, which the 1.5s REST interval could artificially generate).

May 28, 2026

[EXP 12] – LP backtest using existing daemon data Scope

Scope

Exp-12 was initially framed as a feasibility study using the E.1 dataset and EXP-3a’s direction-correct engine. That’d function as a paper backtest of Liquidity Provisioning strategy – essentially post passive at level xyz, wait, & then simulate fills. But the goal is to model realistic-fill behavior, which is arguably a little harder since it involves queue priority, adverse selection, fade-on-cross, etc. These are the unmodelled assumptions that most LP value figures rest on at the moment. The daemon dataset will be the primary source of data here.

EXP-12a: fill realism model + applying to the existing LP edges

Goal here is to build a probabilistic-fill model (queue position, adverse-selection cost, fade probability), apply it to the 8 LP-edge markets from EXP-3a/c, and replace each exclusive-fill at whatever displayed depth dollar figure with a fill-probability-adjusted expected $/contract figure. This should theoretically result in HONEST LP-edges.

First gate is calibration data. A fill-probability model needs ground truth (essentially moments where a resting order would have actually filled or not). I don’t have resting orders in this dataset as is. I do, however, have 1,745 snapshots of both books at 30s resolution (which in hindsight was a good interval decision), every observed price change between snapshots (which is evidence about what would have happened to a resting order), and per-snapshot top-of-book depth, which let’s us proxy queue position.

A resting order at price X fills only if the market trades through X or to X with size depleting the queue ahead. I can then reconstruct these events between snapshots – although imperfectly, since I won’t be able to see intra-snapshot ticks, but still well enough to estimate full probability as a function of distance from mid, queue depth ahead, time-to-event-catalyst, and volatility regime.

Actually, thinking about it now, the 30s interval might not be clear enough. The capture window from F.1 should help calibrate around catalysts where intra-second dynamics matter the most, but that data won’t be collected and grepped till a few days from now for the sake of a thorough dataset. So, for now, I’ll base this fill model on 30s-resolution data, nothing that calibration around catalysts is preliminary until F.1 5s data is grepped.

Now I need to think about adverse selection. When the resting order fills, we need to ask WHY if filled. If it’s because uninformed flow crossed the spread, I can keep the spread. If it’s because informed flow knows the price is moving and lifts our order, I’d lose. Empirically, we’re testing whether the mid moved against us in the second/minutes after the hypothetical fill we’re modeling. I can measure this directly from the dataset. I’ll market at +30s, +5min, +30min post-fill.

If the market timestamps are systematically negative, any “edge” is adverse-selection-paid spread, which doesn’t really count.

Findings:

adverse selection is pervasive – all 8 markets show negative net 5min markout. The cross-venue LP ‘edge’ is primarily adverse-selection-paid spread
1 REAL_EDGE (provisiona) / 2 MARGINAL / 3 ADVERSE-SELECTED / 2 SUB-FILL
The 1 real-edge (co_aesp, +2.67x cross) survives only on 4 genuine fill events (crossed ~4.3% of the time per EXP-3c) – statistically too thin to mean anything.
Exclusive-fill figures (EXP-3) all overstate realized edge by ~1-2¢/contract – i.e. a markout haircut
nyk’s gross edge is negative before markout (Kalshi maker fee exceeds the 0.4c spread) – confirms the maker-fee-bind
arod/kr_oseh are SUB-FILL: edge is behind a 1-2¢ half-spread, P (both legs fill) < 2%.
79/79 green

I wanted to test for regime-impact. After slicing the EXP-21a markouts by regime to test whether adverse selection is conditional, I found the following:

no market has a tradeable regime – zero of 8 show non-negative net 5min markout in any regime bin with ≥ 20 fills. Adverse selection seems to be unconditional within this window
hour-of-day: underpowered (≤112 fills spread across 24 bins), 4 non-negative buns all fail the fill floor – noise, not candidates
Catalyst proximity: 0 fills within 2h of any catalyst – nearest catalysts (Colombia May 31, Seoul June 3) are soon, so near-catalyst behavior is yet to be characterized, not negative. this is gated by F.1 more dense data grep.
Volatility: only 2 markets have evaluable high-vol bins (co_pval, kr_oseh); both negative – consistent with “where high-vol is measurable, it’s adverse.”
79/79 green

Clear caveat to note is that markets.yaml resolution dates are year-offset to 2027/2028 – it used the correct F.1 event dates as a workaround for the catalyst slice. That’s a latent data bug that’s worth fixing later (the resolution_date field in markets.yaml is wrong). Although it didn’t affect this analysis since the agent used real dates.

May 28, 2026

[EXP 3] – fee model correction & sensitivity sweep Findings from 3a

Findings from 3a

stage-1 gate is now answered: corrected-taker = 0/15 flips. THERE IS NO EXECUTABLE ARB THAT SURVIVES REALISTIC TAKER FEES.
execution-mode gradient: 0 taker —> 1 mixed —> 7 both-maker. this is the EXP-12 thesis being quantified. Edge doesn’t exist if the agent’s simply crossing the spread, it only exists as it moves to passive execution.
- The opportunity on these venues is liquidity provision, not arbitrage.
Need to look into AROd re-characterization, because Build D found that arod’s +5.85¢ paper edge clears the fee threshold. This now shows that +5.85¢ was mid-discrepancy, not at-the-touch spread – the executable at-the-touch spread was ~1.1¢, which doesn’t clear anything.
- Proper finding isn’t that “depth binds before fees”, it’s that fees bind under taker; depth only binds once the PM leg flips to maker.

updated thesis about agentic arb: no capturable edge exists, at least on the taker side at any market in the panel under the correct fee structure (simulated). However, there may be opportunity to predict liquidity provision / maker side (EXP-12), or take advantage of latency / lead-lag (EXP-4), where the edge isn’t to cross the spread for free, but to be on the right side before the slower venue reprices.

Findings from 3b

goal here was to gain a multi-snapshot persistance understanding of the previously identified takeable subset. I then prompted to convert the single-snapshot $73-at-institutional-fee-structure result into a frequency-characterized one: across the full daemon history, how often is each of the 8 crossed markets actually crossed, how big is the takeable edge when it is, and how does it correlate across markets (i.e. single regime or independent).
Results:
- 100% of snapshots have ≥ 1 takeable market crossed at institutional fees across the 14.5hour measured window (Median total $190.82).
- 5 persistent (nyk, kelce, co_pval, pe_rpal, la_kbas), 2 INTERMITENT (kr_oseh, arod), 1 RARE (co_aesp), 0 snapshot only.
- nyk seems to be driving most of the headline – $165.89 median, 100% crossed for 14 hr straight, max $406.
- Median |corr| across the 6 variable markets was 0.15, which makes sense since edges are largely independent, not a single regime. This is actually the most favorable structural outcome of the sweep.
- co_aesp’s EXP-3b $23 figure doesn’t seem to replicate here, only 4.3% intraday, clustered in the 04Z window. The D.2 snapshot caught a transient event, not a recurring window.
- arod showed mild TOD patterns (high 04-07Z, drop between 09-11Z, recovers 13-16Z) – partial fit to “US-business-hours liquidity.” 51/51 tests still pass
Concerns:
- nyk at $165 median, 100% crossed for 14 hours isn’t real institutional arb, of course. no actor with 0.30% access would let that sit for multiple seconds even, let alone hours. there are two possible explanations I can think of
  - (a) nobody currently has 0.30%/0.20% access on these venues – institutional fees are counterfactual, so the ‘edge’ sits there because no one can take it. the arb is real only at a fee tier that doesn’t really exist for any market participant on kalshi/polymarket today
  - (b) adverse selection – K bid 0.30/ PM ask 0.285 are informed quotes; lifting them is structurally a losing trading because the quoters know something. the ‘exclusive-fill’ assumption inflates the dollar figure.
- The key assumption now is exclusive-full at displayed depth. that’s the same category error EXP-3a’s direction fix caught, and is one level deeper: direction-correct but adversity-blind.
- also, co_aesp non-replication is also a single-snapshot outlier

Key outcomes for the rest of the broader ‘agentic PM terminal’ build are as follows:

institutional-fee arb result is structurally real (8 markets, mostly persistent, independent) but the persistence itself is evidence that it’s not freely takeable. if 0.30% access existed for any market participant, nyk’s 14-hour $165 would have been taken.
- possibilities for why this is happening were discussed above (that fee tier doesn’t actually exist, or it exists but quotes are informed/adversely-selected). In the latter case, the agent’s edge is adverse-selection avoidance, not fee-tier-access.
the agentic arbitrage thesis is essentially null that this point.
new focuses of the build are as follows:
- EXP-4 (exploiting latency / lead-lag). the Knicks market confirmed that Kalsh leads Polymarket in most cases; i’ll test this with the upcoming Colombia election market resolution point this Sunday. I like this thesis because lead-lag doesn’t require fee-tier access, it just requires being faster than the slower venue’s pricing.
- EXP-12 (liquidity provisioning – LP layer). EXP-3a & c provided clear anchors for this – 8 markets with provideable spread, PM rebate active means co_aesp goes from -0.000c to + 0.670x per contract when posting passive. Different agent than an arb-focused trader, this would be a market-making agent with fill risk and inventory.

May 28, 2026

[F] – Event Driven Dynamics Test This dev task's timing depends on when the Colombia catalyst actually happens. First round election is Sunday May 31,…

This dev task’s timing depends on when the Colombia catalyst actually happens. First round election is Sunday May 31, 2026. Note that the catalyst is not a single instance. Polls open at 8:00AM (local) and preliminary results are typically available by 7:00PM (local). So, the resolution window is several-hour evening of vote-counting.

Goal: build the event-driven capture harness (F.1) for the Colombia first-round presidential election, Sunday 2026-05-31. This is a targeted high-frequency overlay + windowing harness, NOT a second 16-market poller. The existing E.1 daemon keeps running at 30s across all markets; F.1 adds dense capture on the Colombia markets only during the event window, plus the analysis scaffold to window data around the catalyst.

May 27, 2026

[E] – time-of-day/day-of-week autonomous polling agent This is a background build – the idea is to open the poller now and hold it open through upcoming resolution events…

This is a background build – the idea is to open the poller now and hold it open through upcoming resolution events relating to the 16 markets in the updated dataset.

Design choices:

include all 16 markets, including degenerate ones. whether a lighthoused book ever revives over a week is itself a datapoint. I’ll tag these markets so analysis can segment active vs. degenerate markets.
interval is 30s. it’s free on rate-limits and increases time-of-day resolution significantly. only cost is the scheduler.
metrics: reusing fetch_snapshot.py machinery (normalize —> microstructure —> arb mid-discrepancy), append one long-format row per market per snapshot to data/processed/timeofday_poll.csv, plus raw JSON dumps to the gitignored data/raw/ as recompute insurance. no parallel fetch path.
timestamps will be stored as UTC, tz-aware throughout. this is quite load-bearing.
Failures should show as a null-row + error string, so as to not crash the series without detection.

At this point, the poller is running. max_cycles=unlimited, have some clean daemon cycles logged with 16/16 markets, 0 errors. I’ll CTRL+C out of the log, but will come back in an hour to check whether per-poll ellapsed time isn’t creeping higher, since we only have 20s of headroom before it collides with the 30s interval.

Update (+11hrs): largest gap is still the same value as the night before; 4 errors / 93k rows = ~0.00% error rate. 4 errors resolved themselves into a known quantity. They’re all PolyApiException[status_code=None, ... Request excep[tion]]; status code None means these are client-side network blips – transient connection drops to Polymarket, not API rejections or rate limits – which would be much more concerning.

Overall – poller is healthy after extended timeframe.

May 27, 2026

[D] – expand market coverage settling for 10-15 pairs in this initial sweep of build D; selection criterion include minimum $50k combined…

settling for 10-15 pairs in this initial sweep of build D; selection criterion include minimum $50k combined cross-venue OI as the volume floor (rids the $52 World Cup markets from Phase 2), explicit stratification across probability ranges (some central 0.3-0.7, some tail < 0.1 to test lighthouse/delisting at scale), excluding anything resolving < 2 days; no additions to markets.yaml for now.

Agent round 1: building scripts and producing candidate table for review

patch: edited src/pm_micro/discovery.py and scripts/validate_markets_yaml.py: replacing any ‘assert len(token_id) == 77’ with ‘assert 76 <= len(token_id) <= 78 and token id.isdigit(), \ f”Token ID malformed: len={len(token_id)}, value={token_id[:20]}…”
- reran scripts/discover_markets.py from scratch (no cache layer expected).
- regenerated data/processed/discovery_candidates.md.
reports
- number of previously score-zeroed rows that are now scored
- new top-10 by match_score
- confirmation that NBA Finals trio scores are unchanged
WORKED:
- NYK bucket drift shows a real-time confirmation of Finding 3. Knicks-clinch repricing was still active during the re-run – Polymarket-Kalshi midprice continues to move in alignment ~ 36 hours after the signal. This is a clear datapoint for the “silent tracking through some channel that isn’t trade execution” finding.
- Top-10 contamination: tells us that matcher’s ceiling. out of 10 rows: 1 verified true pair (Kelce), 2 plausible (CA Gov primary at #8/#10 – Becerra and Swalwell in the same race, so token-set is right for the wrong reason), 7 false positives (5 XRP date-coincidences, Trump+World Cup, Bernie+Alaska Senate). That’s expected – token_set_ratio without semantic gating will keep doing this on finance/sports/geographic terms. Unfortuantely, adding more compute to fuzzy matching won’t help; i’ll need to review manually.
- The recovered 6 pairs are the real output, since they span CA-11 House primary (2 candidates), AK Senate, tariff macro, Aaron Rodgers retire, and Trump attending NBA Finals. That’s the category diversity the original top-10 didn’t have – politics + sports + macro all in one batch.
follow-on task (curation): produced a curated, eyeball friendly view of the 92 candidates for manual selection. didn’t modify markets.yaml. didn’t re-run discovery so as to keep same set of pairs identified per selection criteria. Read from data/processed/discovery_candidates.md as the source of truth.
- Results:
  - 19 same_event is exactly the actionable set. the 55 shared_domain_only + 13 shared_entity_only confirm what I’d suspected initially, that token_set_ratio is a noisy signal, but with semantic tagging, we can pull the real candidates cleanly. The 0 same_race_diff_side is itself a finding – pm venues converge on the same candidate set per race, so there’s no naturally-occurring cross-venue “A on Kalshi, B on Polymarket” structure within a single contest. this is worth noting.
  - The 13 shared_entity_only politics rows (primary-vs-general) are a different research inquiry. These rows enable a conditional pricing analysis that’s structurally different from the same-orderbook-two-venues setup behind all current findings.
    - This is good D3 build content
  - So, total 13 new additions —> 16 in markets.yaml, good outcome from additional sweep

Agent round 2: add picked entries to markets.yaml and rerun Ph 3/4. keeping curation decisions gated by me.

Goals for round 2 are as follows:

expand markets.yaml from 3 —> 16 entries with the 13 picks from below, fixing the validator’s 3 outstanding 404 errors via explicit delisted markers, and re-run Ph3/4 on the expanded dataset.
Gaurdrails: existing 3 markets.yaml entries must remain functional, so I don’t want to alter their condition_id, token_ids, or category listing. I’m only going to add the delisted markers per the spec.
- Use the new discovery.py helpers; don’t introduce parallel fetch patchs
- use the loosened ID validation (76 ≤ len(token_id) ≤ 78 AND isdigit)
- no new src/pm_micro/ modules. new files allowed under scripts/, data/, notebooks/ per the my allowance for new code organizations
- on any fetch failure, the agent should stop and report rather than automatically retrying the fetch.
To-dos:
- write scripts/expand_markets_yaml.py (fetch + validate 13 picks, resolve LA/Cultural pitcks, build merged yaml)
- run expansion script; inspect resulting markets.yaml
- run validate_markets_yaml.py – must exit 0
- run fetch_snapshot.py + compute_arb.py —fresh
- run pytest to ideally return 6/6
- report + commit the added 13 to markets.yaml

May 27, 2026

follow-on builds Non-exhaustive list of additive empirical studies & simulations to run that build off the findings of the executable…

Non-exhaustive list of additive empirical studies & simulations to run that build off the findings of the executable arb analysis.

Current Claim: paper discrepancies of 0.5–1¢ sit below the ~3¢ conservative fee threshold, so executable arb is zero, so value of a terminal is in observability, not capture. Is this true?

This claim holds for the specific markets I’ve sampled, at the specific times I’ve sampled them at, under the specific fee assumptions I’ve used. It’s one data point against a thesis that’s already been executed (Lean.xyz).

Below is a thorough map of where the experimentation can go. Organized from cheapest / most-direct to broadest.

Layer 1 – Re-examining current assumptions

A. Sample markets at higher frequency. With minute-level or sub-minute polling, I could observe intraday spread movement that’s not visible to existing design.

Hyp: paper discrepancies briefly spike above the fee threshold during news events, then decay without notice.
Method: Poll both venues every 15s for 4 hours during a known volatility event and look at the right tail of the discrepancy distribution

B. Calibrate fees properly instead of using conservative defaults. I’d assume that real arbitrageurs on prediction market systems likely aim for closer to 0.5–1¢ round-trip fees on volume (Polymarket has maker rebates & volume tiers to optimize with; Kalshi has market-maker programs with reduced per-contract fees). I could re-run the executable arm computation under a realistic fee model – maker-side on both venues with volume-tier discounts. If the threshold drops, some of the observations become marginal arb opportunities.

C. Account for latency, partial fills, and adverse selection in execution simulation. The current compute_executable_arb() function assumes simultaneous fills at observed prices. Real cross-venue execution faces sub-second routing, race conditions with other arbitrageurs, partial fills on one leg, and the risk that the price moves between when I’d read the orderbook and when the order actually arrives.

Method: Build a stochastic execution model with realistic latencies (e.g., 50ms – 300ms per venue) and quote-staleness, then re-run.
Exepcted Outcome: even paper-arb opportunities lose significant edge to execution friction (i.e. 30% – 70%)

Layer 2 – Scale the dataset being tested

D. [COMPLETE] Expand market coverage. Three NBA Finals markets is statistically thin. I could expand to 20-50 cross-venue pairs across categories – election markets, Fed meetings, crypto price levels, sports finals across multiple leagues. Even if the coverage discovery work is painful (which I know from the previous build), a larger dataset could surface patterns that a curation of 3 markets simply cannot.

Hyp: Arb opportunities concentrate by category – political markets may show larger and more persistent discrepancies than sports because the venues attract different trader demographics.

E. Time-of-day, day-of-week effect.

Hyp: cross-venue discrepancies are larger during low-liquidity hours (US nights & weekends) and tighter during peak liquidity. A terminal could highlight the predictable hours when arb capture is most accessible.
Method: use the same market set, retain snapshots of metrics per market per venue every 2 hours for a week.

F. Event-driven dynamics. Capture orderbook state immediately before, during, and after discrete events.

Hyp: discrepancies blow out for tens of minutes after an event as one venue re-prices faster than the other, then converge.
A terminal’s value proposition is “be in position before the other venue catches up.” This would be a directly product-relevant finding.

Layer 3 – Mechanism and venue-architecture studies

G. Decompose where the spread actually comes from. My current spread observations are aggregates. I could decompose them by the following criteria: hoe much of a venue’s spread is structural (rule-based price tick size, fee structure), how much is market-maker inventory cost, how much is adverse selection? This determines whether spread is something a a terminal user can avoid or just absorb.

H. Market-maker presence detection. I could identify which markets on each venue have institutional vs. retail-only quote provision. Poly market and Kalshi each have known market-maker programs.

Hyp: markets with active MMs on both sides show tight spreads but limited cross-venue discrepancy; markets with MM on only one side show wider spreads but persistent cross-venue gaps.

I. Synthetic-arb opportunity surface. My existing dataset showed direct (YES vs YES) and synthetic (YES_kalshi + NO_polymarket) structures. I could expand this approach using triangular arbs across multiple markets that resolve on related events (e.g., Lakers championship + Lakers conference + Lakers division – these are all dependent events … parlays).

Hyp: structural inefficiencies exist not just within a single market across venues, but across related markets on the same venue.

J. Orderbook imbalance as a leading indicator. I could compute the imbalance between bid and ask sizes at top-of-book on each venue.

Hyp: persistent imbalance on one venue could predict the direct the cross-venue spread will move in.
A terminal could surface this in real-time as a leading indicator of WHERE TO POSITION

Layer 4 – Simulation && What-if studies

K. Counterfactual MM strategy on the current data. I could take my existing snapshots and ask: if a market-maker were placing both-sided quotes on both venues, what spread would they need to charge to break even after adverse selection?

Method: run a simulated MM agent through my orderbook history.

L. Simulate a terminal-style trader walking the cross-venue surface.

Method:
- Build a backtest where an agent observes both venues, has a terminal’s hypothetical latency profile (e.g., 50ms cross-venue routing), and trues to capture observed discrepancies.
- Compare against a naive trader using each venue alone.
- The performance gap is the dollar value of cross-venue infrastructure.
This is the direct answer to whether a terminal for cross-venue prediction market arb is actually valuable.

M. Stress-test the fee threshold sensitivity. Plot executable arb opportunity volume as a function of round-trip fee.

Layer 5 – Broader venue-design research

N. Inventory-aware quoting on cross-venue books. A terminal-style MM has visibility into both venues. Question is, how does optimal quoting change when I know the inventory on Venue A while quoting on Venue B?

O. Quote update latency across venues. Measure how quickly each venue’s orderbook responds to trades on the other venue.

If Polymarket updates within 100ms of a Kalshi print but Kalshi takes 5s to react to a Polymarket print, the asymmetry itself is exploitable and a terminal can surface it.

P. Resolution risk model. When CLE delisted on Polymarket but Kalshi maintained stub liquidity, that was structural information.

Method: Build a model that predicts which markets on which venues are likely to delist or enter lighthouse mode based on liquidity and probability state.
This is essentially the kind of risk-surface model a professional trader terminal would be designed to expose.

Q. Bridging on-chain (Polymarket) and off-chain (Kalshi) settlement risk. Kalshi settles in dollars same-day; Polymarket settles in USDC on Polygon with on-chain risk and gas costs. A cross-venue trader should be able to bear settlement-currency risk that’s invisible to my current execution model.

Quantifying this is really hard, but important.

Layer 6 – What this connects to in my broader work

R. Connect to my venue-mechanism sweep. My orderbook-amm-hybrid-sim finding was that hybrid CLOB + LP venues reduce agent-to-agent trade volume by 50% by absorbing noise flow. The cross-venue work is the empirical version of that question: does a cross-venue MM layer (which is effectively what Lean is) produce the same noise-absorption property? I have both halves of the question; the synthesis could be more valuable then either alone.

S. Reproduce the pm-AMM analytical results empirically. The pm-AMM paper derives volatility shape phi • 39.9 / sqrt(steps_remaining) from Gaussian score dynamics. My data – essentially just multiple snapshots of the same market approaching resolution – is exactly the kind of data I’d need to verify their analytical result empirically. Whether real prediction markets actually exhibit Gaussian score dynamics is itself an open empirical question.

May 26, 2026

Results & Documentation Note that all exec_* fields are 0.0 across every market & every structure. The full book-walk produced zero fillable…

Note that all exec_* fields are 0.0 across every market & every structure. The full book-walk produced zero fillable contracts for every market-structure combination. Every cell of the executable arb table is 0.0.

What’s different than expected is that the NYK row has discrepancy_direct = -0.15cents meaning Polymarket YES at 0.2585 is cheaper than Kalshi YES at 0.260, but executable direct is still 0.0. This odd, but makes sense because lockable spread is computed at top-of-book, but the executable computation requires crossed books (best bid on one venue > best ask on the other). A 0.15 cents mid-discrepancy doesn’t necessarily produce a crossed book.

To summarize: mid-discrepancy ≠ executable arb opportunity. Even though the NYK market showed a 0.15c paper discrepancy in the mids, neither best bid crosses the other market’s best ask (polymarket best bid 0.258 < Kalshi ask 0.27; Kalshi best bid < polymarket best ask 0.259).

This tells us that no edge exists at all at the top-of-book level in the first place, because the spreads on at least one venue are wide enough that no cross-venue match occurs.

Thus, a trading platform can benefit a user by observing pricing convergence and venue-specific liquidity events in real time, since the edge that exists at the mid level never crosses the book and so never offers an executable trade.

Full Project Recap

The goal was the build a real-data artifact that explores cross-venue prediction-market trading infrastructure. I wanted to operate on native Kalshi / Polymarket data.

I already had a venue-mechanism sweep in pure simulation to stress-test hybrid vs. amm vs. clob orderbook designs, and an LS-LMSR liquidity model built. Both were technical, but neither operated on actual prediction market data.

What was built:

A new repository – kalshi-polymarket-microstructure - public.

I started with a pipeline validation block. Repo was scaffolded with uv + pyproject.toml. I then built two thin venue clients: clients/kalshi.py (public /markets/{ticker}/orderbook endpoint, no authentication necessray) and clients/polymarket.py (real-only ClobClient + Gamma API for market discovery). I aimed to generate one validation notebook pulling one orderbook from each venue.

The next phase of the build (Ph2) involved mapping existing public markets. The initial goal was to curate 8 cross-venue pairs across 4 categories (sports, macro/Fed, politics, crypto). This discovery process surfaced that Kalshi’s /markets?status=open&limit=500 returns auto-generated paylay markets (KXMVESPORTSMULTIGAMEEXTENDED with 418 entries) that swamp the result set. I then pivoted to the Kalshi /series endpoint for proper catalog browsing. This revealed the cross-venue universe was much thinner than I’d initially anticipated – many Kalshi series (Fed, Politics) had no Polymarket equivalents, and many Polymarket markets (GTA VI tail bets) had no Kalshi equivalents. I ended up settling on 3 NBA Finals 2026 markets – OKC, CLE, NYK – as the only category with substantial bilateral volume ($110 combined open interest). I added a helper function – kalshi.search_markets()– and documented the edge cases (NYK NO token 404, Polymarket silent 404 on malformed IDs). The ‘asymmetry beyond NBA Finals’ finding came out of this part of the project, which ended up being a headline finding.

Then, I started building the actual analysis library – src/pm_micro/normalize.py (a dataclass-based unified book representation; reconstricted Kalshi asks by complementarity) and src/pm_micro/microstructure.py (spread/depth/mid computations). This library involved 6 pytest tests covering the load-bearing math – a snapshot script fetched all three markets, normalized, and produced a CSV with 11 rows of metrics. The three figures in the repository visualize spread by venue, depth within 1cent of mid, & book shape (price levels). 3-regime findings: OKC was clean, NYK 20x spread asymmetry, CLE demonstrated a one-sided tail.

Phase 4 – cross-venue arb. I implemented src/pm_micro/arb.py with 3 layers (paper mid-discrepancy, naive crossed-book, executabl-after-fees) and 2 structures (direct YES vs YES, synthetic YES_Kalshi + NO_Polymarket). It also involved integrating the conservative fee model (2% Polymarket taker; $0.02/contract on Kalshi), which required a driver script supporting both --fresh and snapshot-replay modes. The fresh run revealed that OKC’s paper discrepancy had decayed from 1cent -> 0.5cent -> 0.0cent over ~36 hours, and that Polymarket’s NYK book had disappeared entirely between snapshot. Executable arb was 0 across every market-structure combination.

24 hr update to market states: new 404s

Today, the following features were meant to be run / implemented:

snapshot ledger – create data/processed/snapshot_ledger.yaml; populated with thee three known historical observations plus a fourth from a new fresh fetch
run one more fresh fetch (uv run python scripts/compute_arb.py --fresh). i then examined the output and appended a 4th entry to snapshot_ledger.yaml using the actual timestamp of this run and the actual OKC discrepancy from the output. the status of the NYK YES Polymarket book is whatever the run reports (likely still a 404; i’ll document the actual observation). the idea was that if the 4th OKC discrepancy is 0.0 (i.e. convergence held), the finding would be strengthened. if it had reverted (i.e. back to 0.5cents or 1 cent) the finding changes and I’d document that.
i’d then update scripts/compute_arb.py to append on the new ledger writes. I actually modified the script so that each —fresh run automatically appends an entry to snapshot_ledger.yaml. Specifically, at the end of main() when running with fresh=True, the script would read the existing ledger, append a new entry built from the actual run output for OKC, and write back.
lastly, I intended to create notebook/04_writeup_figures.ipynb to document everything

Here’s what actually happened after implementing all these new builds / changes:

CLE Polymarket book had alo 404’d. This was pretty substantial. Going in, the assumption was that NYK was the only market that Polymarket drew from. The 4th fresh run shows that CLE’s Polymarket YES is non 404 too – same ‘No orderbook exists for the requested token id’ error as I’d previously gotten for NYK. The last snapshot had it active (138 ask levels, pricing the tail at ~0.4%). This meant one of my findings just got stronger. the de-listing pattern isn’t NYK-specific – it’s two of the three Polymarket books in the dataset I curated. Polymarket is systematiclaly withdrawing liquidity from these markets as the Finals approach resolution, but Kalshi seems to be maintaining books on all three throughout. This is a much sharper venue-reslience claim that I’d originally had, if it’s correct.
OKC convergence seems to have held – and both venues moved together. Between observation 3 (0.00¢) and observation 4 (0.00¢), both venues’ OKC mids dropped from 0.485 to 0.455 (a 3-cent re-pricing of the championship probability, presumably from an actual NBA Finals game between fetches). The discrepancy stayed at 0 through that move. The venues didn’t just equilibrate once and freeze – they’re tracking each other through subsequent price changes.
The auto-append spec bug is correctly fixed. The agent (Opus 4.7) caught that the literal '404' if nyk-row["Polymarket_yes_mid"] is None else "active" check would mislabel snapshot-fallback rows as “active” because the fallback preserves the snapshot’s non_None mid value. They fixed it by extending the check to also flag data_source == "snapshot_fallback" as 404. This is good because the bug would have polluted future ledger entries.

Concern

I wasn’t sure that Polymarket’s NYK book had disappeared – wanted to verify all claims aforementioned about snapshot discrepancies briefly. Ran a terminal command to directly query Polymarket for the NYK YES book using the markets.yaml token, and also re-fetch the canonical token ID via Gamma API to compare. This does a few things: prints the token IDs I’d previously stored in markets.yaml; fetches the orderbook firectly from those tokens; & re-discovers the Knicks market via Gamma API and shows the current canonical token IDs, plus whether the market is active / closed.

Outcome: The finding was pretty concerning. The returned Gamma API tokens DID NOT match the stored tokens. This means either Polymarket re-issued tokens for the Knicks market since I initially curated, or (more likely) the values stored in markets.yaml are subtly different from the canonical ones.

Essentially the token IDs I was using to query Polymarket data were stored as truncated 14 character strings in markets.yaml. This was noted while creating the script, but a failsafe of “if Cursor or the validation step flags a length mismatch, re-fetch data” was implemented. Unfortunately, neither method caught the mismatch. Every Polymarket query since (each Phase 3 snapshot row for NYK & fresh fetch for NYK + the recent diagnosis) has been hitting Polymarket with 14-character long truncated token IDs for market search, of course then returning 404s, which I’ve been wrongfully interpreting as ‘Polymarket withdrawing liquidity’.

To address this, I parsed through markets.yaml to see what it actually contained – the damage was contained. Looking at the actual state, the OKC tokens are full-length and validated & CLE tokens are full length and validated. All three NYK IDs are truncated. I then fixed markets.yaml NYK entry and adjusted it use the canonical token IDs from the diagnostic just run. Note that I used yes_token_id and no_token_id ending in the values from the diagnostic. Then I retested CLE to verify whether the CLE Polymarket query genuinely 404s or if there’s a different bug.

CLE re-query: CLE tokens are actually full length (77 chars each) – and YES still returns 404 from Polymarket’s CLOB. This is arguably the cleanest possible diagnostic outcome for understanding what’s happening. Thus, the CLE 404 is real, not a token bug – Polymarket withdrew CLE (the extreme tail probability market, ~0.4% YES probability at Phase 3 snapshot). NYK appeared to disappear because of a stored-token truncation bug; the canonical NYK book is likely still live (pending agent’s fix & run).

After re-running the CLOB verification on the proper NYK tokens, NYK YES returned 125 bids & 90 asks, implying real, verified book & the token bug suspected. NYK NO returned 404 – genuinely delisted.

However, there’s some nuance left to consider in the NYK YES book. the 125-bid / 90-ask figure looks healthy, but looking at the prices – best_bid = 0.001, best_ask = 0.999 – the book has a 99.8-cent spread on a market where midpoint should be ~26¢. NYK is structurally active but in a degenerate state with quotes parked at the extremes – market-makers are no longer quoting a meaningful mid because everyone knows the answer, but they’re leaving boundary quotes in place to catch any flow.

The Knicks have presumably been eliminated, market-makers have packed up the meaningful quotes, and what’s left is degenerate boundary liquidity. This is an interesting microstructure finding: Polymarket leaves “lighthouse” quotes at 0.001/0.999 on effectively-resolved markets, rather than fully delisting (like they did with CLE at NYK NO).

Next Steps: re-run & regenerate

Step 1 is to re-run the fresh fetch. This pulls new orderbook data for NYK (now with correct tokens), and appends a fresh entry to snapshot_ledger.yaml, and regenerates arb_results_fresh.csv with correct NYK data.

worked, meaningful result
- OKC – mid_disc direct 0.00¢ synthetic=-0.00¢ – full convergence holds. 5th data point is at 0.
- CLE – still 404 Polymarket fallback to snapshot. This confirms CLE is genuinely delisted from Polymarket
- NYK – new data point: mid_disc direct=0.25¢ synthetic=n/a. Polymarket NO failed (404, structural – confirms my Phase 1 finding). But the direct discrepancy of 0.25¢ is important. Kalshi NYK and Polymarket NYK YES are both producing real prices at the moment.
  - Specifically, this means that Polymarket YES is in the aforementioned “lighthouse mode” – best_bid 0.001 / best_ask 0.999 – and Kalshi is pricing NYK YES somewhere around 25¢ from the prior snapshot. The 0.25¢ figure is a *midpoint-*of-mids value. Polymarket’s midpoint of (0.001 + 0.999) / 2 = 0.500, vs. Kalshi mid ~0.26. This makes sense given the mid_disc direct=0.25¢ means $0.0025, which is a conventional spread value. If so, Polymarket & Kalshi NYK mids are within 0.25¢ of each other.

Step 2 is to regenerate the writeup figures. This re-executes the writeup-figures notebook against the new ledger and arb CSV, producing fresh PNGs.

initially failed on a fixable bug
ValueError: time data "2026-05-26T20:54:50.706060+00:00" doesn't match format "%Y-%m-%dT%H:%M:%S%z"
- Pandas’ pd.to_datetime is getting stuck on the snapshot ledger’s timestamps because they contain microseconds, but the format string in the notebook didn’t allow for them.

After re-fetching, found that Kalshi’s NYK book showed differently than initially thought. Kalshi NYK YES has 4.65M contracts bid at $0.01. And NO has 8.15M contracts bid at $0.01. That’s substantial stub liquidity at the extreme tails on both sides of the market. This is essentially the same “lighthouse mode” Polymarket showed, but at a much larger scale an on Kalshi. Both venues have effectively packed up the meaningful quotes on NYK and left only boundary liquidity. Market’s resolving towards no, likely because the Knicks have been effectively eliminated, or so thought.

Note that the spread between the actual best bids on both sides of Kalshi tells a more interesting story. YES bid at $0.01, NO bid at $0.01 – meaning YES ask (=1 – NO bid) = $0.99. So Kalshi best_bid = $0.01, best_ask = $0.99 on YES. That’s identical to Polymarket’s degenerate state.

Both venues are in the same degenerate state. Both have the same midpoint by simple averaging: 0.50. The “0.25¢” mid-discrepancy reported by compute_arb.py --fresh is the difference between Polymarket’s $0.50 mid and Kalshi $0.50 mid – ~0. The 0.25¢ can be attributed to rounding noise, fee model, or size-weighted-mid drift. It’s not a real cross-venue opportunity.

At the end-of-event, both Kalshi & Polymarket converge to the same degenerate lighthouse state, which I find to be a really interesting microstructure observation. The venues agree even when they’ve stopped pricing meaningfully – boundary quotes at $0.01/$0.99 on both venues, no real flow possible.

May 25, 2026

Ph4 – executable arb table Design Choices before building:

Design Choices before building:

what actually counts as ‘arb’, and on what side?

three things could be on the arb table

paper mid-discrepancy only. for each market, compute polymarket_mid – kalshi_mid. rank markets by absolute discrepancy. doesn’t account for execution cost at all
naive crossed-book arb. for each market, check if Polymarket’s best bid > Kalshi’s best ask (i could buy on Kalshi, sell on Polymarket, lock in spread). then, compute the lockable spread and the size available at top-of-book. this a bit crude, but still defensible
full executable arb after fess: for each market, compute the spread I can actually lock in after round-trip transaction costs on both venues. Polymarket maker rebates, Polymarket taker fess, Kalshi $0.02/contract execution fee. Essentially, we can walk the orderbook, if there’s a crossed marekt, determine how many contracts we can actually fill before the spread closes?

I’ll follow all three, in order, with the last as the headline output.

How should I handle the YES/NO token asymmetry?

Polymarket has separate YES and NO tokens, each with its own orderbook. Kalshi treats YES and No as a single market. So, “buy YES on Kalshi” and “buy YES on Polymarket” are comparable, but this would mean a second arb pathway: buy YES on Kalshi, simultaneously buy NO on Polymarket – if combined price < $1.00 after fees, we’ll have locked in risk-free profit – in theory.

This is crucial, because it’s the arb structure that exists on Polymarket but not on Kalshi (since Polymarket NO is its own tradable token). Can Lean’s terminal surface this type of cross-token cross-venue trade?

To sum, Ph4 should compute two arb structures per market:

DIRECT: YES_Kalshi vs YES_Polymarket
Synthetic: YES_Kalshi + NO_Polymarket should sum to $1.00; profit if sum < $1.00 after fees

How precise should the Fee model be?

Real fees are pretty messy. Polymarket chares 0 fees for makers, ~2% for takers. Kalshi charges $0.01/contract on execution + a fee scaled to settlement; CFTC fees on top. Two paths:

Conservative: Use round-number assumptions (2% Polymarket taker, $0.02 Kalshi execution, no rebates). Documented a conservative; real fees might be lower.
Calibrated: Look up actual current fee schedules and code them precisely.

The conservative approach should be fine for Phase 4. I’ll document the assumption – fees are approximate anyway. I’ll also write a short table of fee assumptions in the README and a single function apply_fees(side, venue, price, size) to keep it auditable.

Outputs?

I’ll follow the same approach as Phase 3

src/pm_micro/arb.py – currently empty placeholder, gets implemented this phase
scripts/compute_arb.py – reas the Phase 3 snapshot, computes arb, writes results
data/processed/arb_results.csv – headline table of fee assumptions too
notebooks/03_cross_venue.ipynb – loads the CSV, prints headline time, produces 2-3 figures (mid-discrepancy across markets, arb-size-after-fees, sensitivity of arb to fee assumptions)

Should I use the existing Phase 3 snapshot, or re-fetch?

I can either use existing, which is faster and all the data is right there & perfectly aligned with Ph3 figures. Or, I can re-fetch fresh, since markets may have moved, especially with the NBA Finals actively playing – could produce different findings.

I’ll use both – run Ph4 on the existing Ph3 snapshot first to validate the pipeline works, then re-fetch fresh data and re-run to get current numbers. The re-fetched run becomes the version embedded in the Ph5 writeup. Total extra cost is ~30 API calls.

Results: markets.yaml documented. notebook rendered correctly. two figures and a summary table.

The Paper mid-discrepancy by market and structure figure shows essentially nothing at OKC (full convergence), nothing at CLE (incomplete data), and -$0.15 at NYK (negative direct discrepancy, meaning Polymarket YES is cheaper than Kalshi YES).

Figure 2: ‘executable arb after conservative fees’ shows flat-line zero across all three markets, both structures. thus, paper edge exists, executable edge doesn’t.

May 25, 2026

Ph2 – Market mapping – hand-curated markets.yaml. 6-10 markets that exist on both venues. Market Curation: Cross-Venue Market Mapping

Market Curation: Cross-Venue Market Mapping

Goal: build a curated set of 8 binary prediction markets that are listed on both Kalshi & Polymarket, spanning 4 categories (Macro/Fed, Crypto, Politics, Sports).

Methodology: agent proposes candidate pairings per category; user approves each before it lands in markets.yaml. The idea is that manual curation beats fuzzy text matching because question wording diverges sharply between venues (Polymarket is conversational, Kalshi is formal).

Output: markets.yaml with 8 approved entries.

Design choices:

Since phase 2 is meant to produce markets.yaml (the cross-venue mapping), how should I select pairings?
- manual curation, agent assists. this way, the agent helps discover candidates (high-volume markets on each venue, fuzzy-matches on question test) and proposes pairings, but I’d have to manually approve each entry before it lands in markets.yaml. The agent builds the discovery infrastructure; i’ll make the curation decisions
- this was, no mismatched pairings will occur (e.g., Kalshi’s “Fed hike by Dec 2026” paired with Polymarket’s “Fed cut by Dec 2026” – same underlying question, opposite resolution).
How many markets & what categories?
- 8 mapped markets across 4 categories, two per category for some category-level analysis
  - Macro/Fed (already have one)
  - Crypto
  - Politics
  - Sports
Appropriate Ph2 output?
- markets.yaml + discovery notebook + a kalshi.search_markets() function added to the client
- helper is needed, and the discovery notebook becomes an audit history.

BugFixes before Curating Markets Manually:

Issue 1: ValueError in every Polymarket cell (ValueError: Unknown format code ‘f’ for object of type ‘str’). This is likely happening because Polymarket’s Gamma API returns ‘volume’ as a string, not a floar. This causes every Polymarket candidate listing to crash before printing the candidates. Thus, we can’t see Polymarket markets to choose from.

Issue 2: All 4 Kalshi sports series returned empty

Issue 3: All Kalshi politics series were empty. Same root cause as #2 – wrong series ticker guesses.

All issue fixed.

Rerun – no more format errors, but deeper structural issues in the jupyter notebook. Kalshi has FEDHIKE markets, but Polymarket has no markets mentioning ‘Fed’. This isn’t a search-string issue, there genuinely doesn’t appear to be any active Polymarket Fed market at the moment. Regarding Crypto: Kalshi has 50 KXBTC markets – but every single one with a 24-hr Bitcoin price range market resolving today (May 25, 2026) with vol = 0.0. Same series, every market’s dead. Polymarket has 1 BTC market – the “$1M before GTA VI” lottery ticket I already came across. This isn’t a true match, one is a daily intraday price range, the other is a multi-year tail bet. Regarding Politics markets: 0 Kalshi politics markets found via keyword filter (across 500 results). Polymarket showed 1 result (“Trump out as President before GTA VI?” with $657,824 volume – but no Kalshi equivalent exists right now. Regarding Sports markets: 0 Kalshi sports markets via keyword filter. Polymarket has 3 NBA Finals markets (OKC, Cleveland, Knicks).

This means I have one real cross-venue pairing showing up right now – the Fed hike market – and I don’t even have a Polymarket equivalent for it. The other three categories have 0 viable pairings. Here’s what I think could be happening:

Possibility 1 – seasonal thinness – it’s late May 2026 so maybe just a factor of the timing.

Possibility 2 – keyword filter is too narrow. The Kalshi /markets?status=open&limit=500&min_volume=5000 query is returning 500 markets, but my keyword filter is finding 0 Sports / 0 Politics. That suggests that either the keywords don’t match Kalshi’s title conventions, or Kalshi’s high-volume markets right now are entirely in categories I’m not searching (crypto daily ranges, weather, financials, etc.). I ran a diagnostic terminal cmd to show what series of Kalshi markets are actually open right now without any keyword filter or volume filter, essentially showing the real shape of the data – what Kalshi has and doesn’t have.

Output:

Total returned: 500

Top 20 series by market count: KXMVESPORTSMULTIGAMEEXTENDED 418 markets; KXMVECROSSCATEGORY 82 markets.

This means the 500-limit is hitting before we’re seeing any relevant markets. The output is heavily dominated by one or two series. Therefore, I need to use pagination or min_volume filters set higher to skip noise. The right Kalshi discovery pattern is per-series, not unfiltered. This worked in the FEDHIKE query earlier – when we specify the series_ticker, we get clean numbers of markets . The unfiltered query is useless in this purpose; the series-filtered query is the right approach.

Ran another diagnostic to query Kalshi’s /series endpoint (the catalog of series, not individual markets within them) and filters my keywork. The output gave real series tickers for politics, sports, & crypto. Kalshi turned out to have way more series than I was initially guessing at. So, I dumped the full series list into a file.

After putting the JSON on disk, here were the series that looked most cross-venue-eligible based on what’s already on Polymarket:

Macro/Politics – Kalshi: KXTRUMPOUT – matches because Polymarket has “Trump out before GTA VI at $657k volume.
Sports/NBA – Kalshi: KXNBA (NBA championship) – matches because Polymarket has 3 NBA Finals markets (OKC/Cleveland/Knicks)
Sports/NBA – Kalshi: KXTEAMSINBAEF – same matchip, different framing – could be a second NBA pair
Macro/Fed – Kalshi: KXFEDCHAIRNOM – matches before Polymarket may have Fed Chair markets
Crypto – Kalshi: BTCRESERVESTATES, KXETHATG – tail-event crypto – Polymarket has GTA-VI-style tail bets
Sports/World Cup – Kalshi: KXWCCONTINENT, KXWCFURTHESTADVANCING – both relate to WC 2026

Before manually curating, I ran another brief check to confirm that KXNBA and KXTRUMPOUT actually have active markets with adequate volume.

The check returned 4 active markets under KXNBA with substantial volume:

KXNBA-26-SAS: San Antonio, $30.2M
KXNBA-26-NYK: New York, $24.8M
KXNBA-26-OKC: Oklahoma City, $18.6M
KXNBA-26-CLE: Cleveland, $18.5M

Polymarket showed 3 NBA Finals markets earlier: OKC ($13.75M), CLE ($18.59M), Knicks ($16.01M). This gives us three direct cross-venue matchups, which seem to be the strongest cross-venue pairs available in the entire dataset. High volume on both sides, identical resolution criterion, same event. Polymarket doesn’t have a “San Antonio wins” market in those 3 results, but Kalshi does – that’s fine for the 3 that I can pair.

Even if I was to get zero other pairs from any other category, these 3 NBA Finals markets alone would make a credible cross-venue microstructure study because of the volume and the timeline, since Finals are happening right now, meaning the orderbooks are most active.

Key issue to flag is rate limiting: 429 Too Many Requests on the 5th call. Kalshi’s public market data is capped at about 30 requests per second – but separate from that, there’s likely a per-minute or per-hour quota that I hit because of the bulk discovery work across multiple terminals and the agent’s verification calls.

I’m going to skip the full notebook curation. Will continue with three NBA Finals pairs, populated directly. The Polymarket condition_ids and token IDs need to be looked up still.

May 25, 2026

Ph3 – microstructure metrics from orderbook data Design choices:

Design choices:

Snapshot vs. polling: phase 3 is meant to produce microstructure metrics from orderbook data, which can be approached in two ways.

snapshot (1 fetch per market) – pull each orderbook once, compute spread/depth/mid, write to processed CSV. this way, we can capture exact moments on the books.
polled snapshots (N fetches over a window) – pull orderbooks every 30s for 30 min while I’m working. Captures variability – spread tightness across time, depth fluctuations, quote-update frequency. ~30 min runtime.

in theory, the polled version will produce meaningfully more interesting findings (quote stability is itself a microstructure variable), but requires me to present and isn’t rate-limited. I’ve already hit a 429 today. I’ll take a snapshot approach to Ph3, and consider polling for Ph4 (where the cross-venue comparison benefits most from temporal data). Phase 3 is about establishing per-venue baselines at the end of the day.

Another design choice is what metrics to optimize for. Per market per venue, I’ll look for the following:

best bid / best ask / mid (mid could be mean simple, volume-weighted, time-weighted – recall the TODO I added in Ph2)
spread: absolute (cents) and relative (% of mid)
top-of-book depth: size at best bid + size at best ask
depth @ 1 cent from mid: for context on how the book deepens over time
number of price levels populated

Decision 3: output shape – three options. I could use a CSV per market in data/processed/, plus an aggregate data/processed/microstructure_summary.csv. I could just return the aggregate CSV. OR, I could return an aggregate CSV + a notebook (02_microstructure.ipynb) that loads the CSV, prints the table, and produces 2-3 figures (spread distribution, depth comparison, etc.). I’ll take this last path.

Lastly, need to determine where the normalization lives. Currently have src/pm_micro/normalize.py as an empty placeholder from Ph1. Need to write two functions:

normalize_kalshi_orderbook(raw: dict) --> NormalizedBook, which takes Kalshi’s orderbook_fp and returns a unified (bids, asks) representation in dollar prices, both sides reconstructued
normalize_polymarket_orderbook(book: OrderBookSummary) --> NormalizedBook takes Polymarket’s OrderBookSummary and returns the same unified format

Note that NormalizedBook is a dataclass with bids: list[PriceLevel] and asks: list[PriceLevel], sorted appropriately.

Then src/pm_micro/microstructure.py (also empty from Ph1) is functional: takes NormalizedBook, returns the metrics dict. The goal is to produce an actual analysis library of pm_micro. Phases 1-2 were purely scaffolding and market selection.

POST AGENT

Key finding was the OKC was the only clean pair.

This is a clear cross-venue pair. Both venues are pricing OKC at 47-49cents probability with spreads inside ~210 bps (2.1% of mid). and critically, complementarity holds across venues: Polymarket YES ($0.48 bid) + Polymarket NO ($0.51 bid) = $0.99, which is correct assuming the missing $0.01 is the spread itself.

This means I now have comparable spreads, comparable prices, both venues active.

Looking at CLE, there’s structural one-sidedness on Kalshi.

Kalshi YES: incomplete book

Polymarket YES: bid=0.004 ask=0.005 spread_bps=2222

Polymarket NO: bid=0.995 ask=0.996 spread_bps=10

What the agent caught is that Kalshi has 73 NO bids zero YES bids on CLE. Polymarket is pricing CLE at ~0.4% probability (Cleveland’s effectively out). Nobody on Kalshi will even bid a fraction of a cent for YES – the market’s so lopsided that the YES book is empty, and the asks (reconstructed from NO bids) start at $0.01. This is structural one-sidedness in extrememe-probability markets. And it’s a real microstructure finding: in tail-probability markets, one venue can have a fully one-sided book while another still has both sides prices (because Polymarket’s deeper retail flow generates symmetric activity even at extreme probabilities; Kalshi’s lower retail activity doesn’t).

Also wanted to note the 2222 bps spread on Polymarket YES (22% spread). This shows the lopsided-book midpoint problem manifesting empirically – at 0.4% probability the “midpoint” formula is meaningless and the spread expressed as bps of mid is mathematically misleading.

Looking at the NYK finding, there’s clearly missing Polymarket NO tokens

Kalshi YES: bid=0.25 ask=0.27 spread_bps=769

Polymarket YES: bid=0.258 ask=0.259 spread_bps=39

Polymarket NO: PolyApiException[status_code=404, … ‘No orderbook exists’]

For one, the spread asymmetry is massive on NYK. Polymarket has institutional market-maker flow on this contract that Kalshi doesn’t, or NYK is just way more actively traded on Polymarket.

Also, both venues price NYK ~25.8% probability. Same fail value, very different liquidity profiles.

Finally, Polymarket NO token returns 404. as per the Agent’s interpretation, token_id might be wrong, or the NO token gunuinely has no book. Either way this is a markets.yaml data quality issue that I’ll need to look into.

What’s most unexpected is that the agent says CLE wrote both rows (with NONE for the missing side fields), but the total is 11 not 12. The math is 3 markets • 4 books (kalshi-yes, kalshi-no, polymarket-yes, & polymarket-no). NYK’s polymarket-no failed the fetch entirely, so it returned 11 instead of 12. CLE did write both Kalshi rows even though one side was empty – the metric fields are just None for the empty side.

The goal up till now was cross-venue arb analysis across 3 markets. After this phase, I see that the dataset is more textured than that:

OKC: the genuine cross-venue arb candidate (clean data on both sides)
CLE: the tail-probability one-sided-book case study (interesting microstructure, but no cross-venue arb because the price is essentially 0/100)
NYK: the spread-asymmetry case study (20x difference between venues) _ a data quality bug to fix.

Overall, three different findings rather than three replicates of the same finding.

May 25, 2026

Ph1 – repo scaffolding + market selection setup repo, two thin client modules (clients/kalshi.py, clients/polymarket.py), one notebook that pulls one known market from…

repo, two thin client modules (clients/kalshi.py, clients/polymarket.py), one notebook that pulls one known market from each venue and prints the orderbook. Validates the data layer end-to-end before any mapping work.

clone the GitHub repo into ~/Downloads/Projects/
Scaffold the directory tree (src/pm_micro/, notebooks/, data/, test/)
Write pyproject.toml with uv, including kalshi-python, py-clob-client, pandas, httpx, jupyter, matplotlib, pyyaml, pytest
Write thin client wrappers (src/pm_micro/clients/kalshi.py and polymarket.py) – enough to call get_orderbook on each
Write notebook/00_pipeline_validation.ipynb that fetches one hardcoded market from each venue and prints both orderbooks side-by-side
Write a stub README pointing at the notebook
Initial commit, push to main

The goal of the full project, again, is empirical cross-venue microstructure analysis of Kalshi and Polymarket prediction markets.

May 22, 2026

cross-venue empirical work on Kalshi & Polymarket data output: notebook + write-up + repo

output: notebook + write-up + repo

goal: quantify executable arb surface across [x] markets. the venue sweep (AMM / CLOB / hybrid) showed that LP layers in hybrid orderbooks absorb noise flow; the cross-venue data would show where that flow gets routed inefficiently in current markets.

kalshi’s market data is public (not auth required); trading requires JWT auth. the GET /markets/{ticker}/orderbook endpoint is no-auth, and there’s an official kalshi-python SDK. Note that kalshi’s orderbook only returns bid data due to the reciprocal pricing model – asks on YEs are constructable as 1 – bid_on_NO. thus, i’ll need to reconstruct the full book

polymarket’s public endpoints like the orderbook and market prices don’t require auth. they py-clob-endpoint package can be initialized read-only without a private key for get_order_book, get_midpoint, get_price. Markets are discovered via Gamma API at gamma-api.polymarket.com/markets. Each market has a condition_id (the question) and two clobTokenIds (one for YES, one for NO) – each token has its own orderbook.

this removes gating risk – no API key wait, no waiting on Polymarket support.

From the market-making strategy repo starred (https://github.com/octavi42/prediction-market-maker), it’s a strategy for a simulated environment – not real Kalshi/Polymarket data. some useful points, however –

regime-based methodology (the +$40/sim inflection from discovering ‘monopoly mode’ – when the competitor’s quote disappears, true prob is extreme, and the strategy flips). this could help notice regime shifts in real cross-venue data.
volatility shape phi_factor • 39.9 / sqrt(steps_remaining) independently arriving at the pm-AMM analytical form – pattern of ‘empirical sweep landing on the analytical answer,’ which is also the shape of the write-up i’m hoping to produce
formatting

Build plan – kalshi-polymarket-microstructure

decision choices:

pull current orderbooks for somewhere between 5-10 mapped markets, run analysis on snapshots. a short polling window (every 30s for 30 min on 2-3 markets)
two metrics – microstructure (spread, top-of-book depth, mid) per venue per market; cross-venue (mid discrepancy, executable arb after fees). headline would be the arb table.

Phase 1 – scaffold + pipeline validation. create repo with pyproject.toml, two thin client modules (clients/kalshi.py, clients/polymarket.py), one notebook that pulls one known market from each venue and prints the orderbook. validates the data layer end-to-end before any mapping work

Phase 2 – market mapping. markets.yaml. 6-10 markets that exist on both venue. ideas:

2026 world cup winner (Spain / Brazil / Argentina / France / England – each is its own binary, ~5-6 cross-listed)
Fed rate decision next meeting
Bitcoin price > $X by end of year
Recession in 2026
USA midterm Senate control (Nov 2026)
Possibly: ChatGPT-5 release, Germany / Brazil election outcomes per the Paradigm Predictions piece)

Phase 3 – microstructure extraction. for ewach market, per venue, determine: best bid/ask, mid, top-3-level depth, implied spread in cents. Normalize Polymarket’s two-token structure to a single YES-price view to match Kalshi’s framing

Phase 4 – cross-venue + arb. mid-price discrepancy in cents. for each market, compute executable arb size after fees (polymarket marker orders have zero fees and earn rebates; takers pay trading fees on executed orders; Kalshi has the $0.02/contract execution fee + 7% on winnings model – for cross-venue arb I’d be modeling the transaction cost, not settlement cost). output is a ranked table of arb opportunities.

Phase 5 – README + figures

prediction-market-infra-general

June 4, 2026

project scoping June 4, 2026 Landscape as of Today:

Landscape as of Today:

Wintermute announced on May 29 that it’s now streaming two-sided quotes on event contracts across Polymarket & Kalshi, and notably quotes dynamically between the two venues to facilitate positioning without sharp price wings.
Jump Trading, SIG, & Galaxy Digital are already active as market makers, and prime brokers Clear Street and Marex have built clearing on-ramps for hedge fund clients.
- Clear Street became the first institutional FCM to join Kalshi’s exchange and clearing house.
- Galaxy launched an institutional OTC prediction market on June 2, clearing its first $10M order on Kalshi.
- The ICE/Polymarket play is explicitly about data: ICE is the exclusive distributor of Polymarket’s event data to institutional capital markets and launched a Signals and Sentiment tool for institutional clients back in February.
At this point, my Daemon poller has ~500K rows straddling all of these entry dates.

Potential Future Directions:

Institutional-entry event study – Wintermute, Clear Street, Galaxy OTC are all dated treatments; my Daemon is the before/after panel for each. Outcomes that I’ve already computed: cross-venue overlap frequency and size ($191-median surface); spreads; depths; LP markout.
- Falsifiable: does institutional MM entry compress the cross-venue dislocation, and does markout get worse for remaining passive makers (sniping intensifies as Budish predicts)? This converts both closed negative theses into the “before” arm of a diff-in-diff – the negative results become the baseline, which is the best possible result for them.
Cross-venue coupling detection – if one inventory (Wintermute) is quoting both books dynamically, the venues stop being independent. lead-lad should theoretically collapse toward the ~100ms floor and quote-update correlation should increase. I could use the detection-system built in EXP-4a.
Arb migrates off-book – my counterfactual 0.30% / 0.20% tier (not offered) is now arriving through a different door: FCM membership, block trades, Galacy’s OTC desk.
- Hypothesis: the 8/15 institutional-tier arb doesn’t get competes away on-screen, it gets internalized off-screen. Detectable shadow: depth or inventory shifts in your book data without corresponding on-book prints. Hard, but even a clean negative (“no on-book footprint”) supports the rent-privatization thesis.
Accidental-FBA cross-sectional test – I could formalize the FBA convergence thesis: if short-horizon binaries are accidental batch auctions, adverse selection per unit volume should increase with time-to-resolution. Group daemon markets by horizon, regress markout on horizon. This turns the essay’s load-bearing claim into a falsifiable result.
Consensus-quality / societal angle – Wintermute’s own framing is that tighter spreads should improve the quality of probability signals, making the venues resemble derivatives markets rather than side bets – and ICE is monetizing exactly that signal.
- Methodology: Brier-score calibration of venue mids vs. outcomes, pre/post institutional entry, plus cross-venue disagreement as a consensus-quality metric. If calibration improves while rents concentration (per #1 and #3), that tension – better public signal & privatized extraction – is the question I’d like to explore this summer presented in a single chart, and it’s a clean hook.

Brief Definitions:

adverse-selection: people who choose to trade against your resting order systematically know something you don’t. You end up filling when filling hurts your PnL. Uninformed flow trades against you randomly; informed flow trades against you selectively.
markout: way to measure AS. take the full price, then look at the market price some fixed interval later (i.e. I prev. used 5-min intervals in my kalshi-polymarket analysis) and compute the PnL of the position over that window.
- negative markout = price systematically moved against you after fills. you’re being adversely selected
- LP Markout = the post-fill PnL of your hypothetical resting quote
sniping: a mechanism outlined in Budish et al. when public news moves the true value, a race starts between the MM trying to cancel/update its now-stale quote and fast traders trying to hit it first. the fast trader who wins “snipes” the stale quote. It’s adverse selection delivered at latency speed; in continuous markets, whoever’s microseconds faster will collect.
rents: an economist’s term for profits earned from a structural position rather than from creating value. The latency rent: profits that exist only because you’re faster, or fee-tiered cheaper, than other
- Budish et al. argue that continuous-market mechanism creates these rents, and the arms race to capture them is socially wastefull. My fee-cliff result is a rent gated by access, not trading skill.
FCM (Futures Commission Merchant): the CFTC-regulated intermediary category, essentially a “broker for derivatives”: holds customer funds, guarantees trades to the clearinghouse, nets margin.
- FCM Membership (at Kalshi for example) means a firm like Clear street can now clear trades on behalf of clients (hedge funds), instead of every participant needing a direct retail-style account. It’s the mechanism that lets institutional size flow participate.
Block Trades: large trades negotiated privately between two parties, then reported to the exchange and cleared there, bypassing the public orderbook altogether. exists because dumping institutional size into a thin book would move the price against you.
OTC (over the counter): trades arranged directly between counterparties off any exchange entirely. An OTC desk (like Galaxy Digital’s) is a firm that stands ready to be your counterparty for size. You call them, they quote you a price, the trade happens bilaterally. Block trades clear on-exchange; pure OTC may not touch it at all.
ICE (Intercontinental Exchange): the exchange operator that owns the NYSE, among many other clearinghouses & exchange; its biggest profit engine is actually selling market data. which is why its Polymarket investment being structured around exclusive data distribution rather than trading is the tell: it values prediction markets as a signal product.

To sum, spread is what MMs earn, adverse selection / sniping is what they lose, markout measures the losing amount, rents are who structurally captures the difference, and FCM/Block Trades/OTC are the new institutional doors through which that capture may be gradually moving off my instrument’s radar.

working notes

@ PL Capital

ATS structures

Potential Directions Based on this Framing: market formation for the sponsored tail

The canonical instance of each: the biotech milestone market

Final Thesis:

Proof of Neutrality

Data Accessibility

Stage 0: Contract Verification

Stage 1: Live On-Chain Stream + Decode (NO markout yet).

Stage 1 Findings:

Stage 2: Aggregate Maker-Side Adverse-Selection Markout (PROXY MID).

Stage 2 Findings: NUMBER CAME BACK NULL

Internal Contradiction:

Build #2 – API /CLOB mid integration

Build #2 Results (Task B):

Why the thesis still works >

Build #3 – per-maker attribution

Build 3 Results: concentrated AS/extraction

Potential Build #4 – coverage expansion

Build #4 Results:

Build #5 – Regime-Window Experiment

Part 2 of Proof-of-Neutrality: necessary crypto primitives

Necessary Guarantees for “Neutral Clearing”

What a smart market aims to do:

Future directions:

Company Thesis:

Refined Pitch:

How it Works:

State & Cadence:

Submit Semantics (deferred)

Clearing Process (native reimpl of batch_counterfactual/auction.py)

How it Works:

Design Requirements:

Existing State of Orderbook-Hybrid-Amm-Sim repo:

Decisions:

Logical Next Step – FBA-sim on hybrid orderbook repo

Other thoughts:

Design Decisions:

Findings:

New Infrastructure Built:

Caveats:

Lookups:

Audit Goals:

Audit Results:

Built Thus Far:

Future Directions

Build Plan

Bridge to prediction market auction-theory work I’ve done

Primitives – Neutral Markets

Workflow for June 16, 17.

§9 – Read-Only Recon before Latency/Information Differentiation Wiring (§5.2):

Got sidetracked yesterday, state as of June 17:

Dev Tasks td:

LP / Market-Maker Agent – Read-Only Recon

BUILD §5.4 – LP / market-maker agent actual implementation

What’s built thus far:

Post LP/market-maker agent build: well-built overall

LP/market-maker agent not actually bleeding:

Information Asymmetry >> Observation Delay / Latency

Reverting Frozen Truth – NO RANDOM WALK

BUILD – Phase: markout-at-fill-time rework.

‘Variable Truth’ Build

Kalman time-aware scalar belief update

Build: markout re-point to the walk path (fair-at-fill-time against a moving truth.

Integrating the FBA venue as a runnable sweep mechanism.

Measurement – the FBA result, paired by seed.

FINDINGS:

VSA Markets

3: Realism/Neutrality Claims

Letters of Intent

Agent Population – Codebase

Recon Prompt: For convergence-and-neutrality validation around the existing two-class agent population (CredentialedTrader + NoiseTrader).

Build 1: Validation Harness (post-recon).

Build 2: Adversarial Manipulation-Resistance

Build 3: Substrate Neutrality

Build 4: multi-seed CI bands; realistic-dispersion pivotality cut; fixed-b market variant; widened adversary budget to determine the finite-budget win-transition

RECAP

Discovered:

Mechanism

Clearing Process (native reimpl of `batch_counterfactual/auction.py`)