Search for a command to run...
Dataset Description 1.1 Source and Collection of raw data All swap events emitted by the Uniswap v3 ETH/USDC 0.05% fee-tier pool(0x88e6A0c2dDD26FEEb64F039a2c41296FcB3f5640) on Ethereum mainnet werecollected between blocks 24,384,021 and 24,469,780 (04–16 February 2026,11.98 days) using a custom JSON-RPC collector that processes blockssequentially and caches transaction receipts per block to minimise RPCoverhead. Each record corresponds to one swap event and contains fifteen fields:block number, log index, timestamp, three address fields, trade side,base and quote quantities, execution price, price impact, gas used, gasprice, transaction fee, and transaction hash. The structure intentionallymirrors proprietary trader logs used in CEX microstructure research,enabling direct application of order-flow analysis methods todecentralised venue data. Address decomposition. Three distinct roles are captured per swap:tx_sender (the account that signed the transaction), swap_sender(the msg.sender of the pool's swap() call, typically a router), andrecipient (the address credited with output tokens). In direct swapsall three coincide; in router-mediated swaps they diverge. Thisdecomposition is the basis for the forensic separation of MEV bots fromretail users and routing contracts in Section 4. Derived fields. Execution price is decoded from the sqrtPriceX96value in each Swap event log. Price impact (impact_pct) is computedsequentially within each block—each trade's impact is measured againstthe execution price of the immediately preceding swap in the sameblock—consistent with the sandwich attack literature. Thegas_price_gwei field records effectiveGasPrice from the transactionreceipt; under EIP-1559 this is the priority tip, not base fee plus tip.This is confirmed empirically: 61.3% of transactions record sub-1-Gweivalues, inconsistent with mainnet base fees during the period (1–20Gwei). Gas fields are therefore treated as signals of inclusion urgencyrather than absolute cost. 1.2 Sample Overview Thirty records with sub-wei ETH quantities and zero USDC volume wereexcluded as contract dust artefacts. The cleaned sample is summarisedbelow. Parameter Value Block range 24,384,021 – 24,469,780 Block span 85,759 blocks Duration 11.98 days (287.6 hours) Unique blocks with swaps 55,309 (64.5% utilisation) Total swaps (cleaned) 121,258 Unique recipient wallets 3,801 Buy-side transactions 60,625 (50.0%) Sell-side transactions 60,663 (50.0%) The 50/50 directional split and near-equal buy/sell volume ($2.49B vs$2.51B, <0.4% imbalance) serve as internal validity checks. The slightsell-side excess is consistent with the 25% ETH price decline over thewindow ($2,222 → $1,661). Statistic USDC Total volume $5,000,640,038 Mean trade $41,229 Median trade $2,642 Std. deviation $117,598 90th percentile $114,297 99th percentile $488,827 Maximum $6,687,762 The mean/median ratio of 15.6× indicates a strongly right-skeweddistribution: a small number of large block trades dominate volume whilethe majority of swaps are retail-sized. 1.3 Structural Break: 5–6 February 2026 Date Txs Volume Median trade 2026-02-04 6,093 $372M $6,712 2026-02-05 17,796 $1,145M $9,774 2026-02-06 16,043 $828M $5,000 2026-02-07 8,950 $421M $2,806 2026-02-08–16 (avg/day) ~8,600 ~$262M ~$1,350 Dataset documents a market crash. Volume on 5 February was 2.9× the post-crash daily average; median tradesize was 6.2× the final-week median. The five highest within-block priceimpacts (3.09%–2.04%) all fall within blocks 24,392,963–24,406,027. Thispattern is characteristic of a liquidation cascade driving anomalouslylarge trades and elevated MEV opportunity. The observation windowtherefore contains two regimes: crash episode (4–6 Feb, $2.35B) andpost-crash normalisation (7–16 Feb, $2.65B). Robustness of findingsacross sub-periods is verified in Appendix A. 1.4 Wallet Categories Recipient addresses were cross-referenced against Etherscan labels. Category Txs % Txs Wallets Volume % Vol Median trade MEV-labelled 41,175 33.9% 154 $2,697M 53.9% $11,253 Uniswap routers 14,564 12.0% 7 $192M 3.8% $682 Unlabelled 65,519 54.1% 3,640 $2,112M 42.3% $2,104 154 MEV-labelled wallets account for 53.9% of pool volume despite 33.9%of transactions, with a median trade 4.3× larger than unlabelled wallets,consistent with selective targeting of large trades where extractablesurplus exceeds gas costs. The seven Uniswap routers mediate small retailorders (median $682). The behavioural analysis in Sections 4–5 is appliedto the full sample without pre-filtering by label, using only on-chainobservables. 2. Processed Data 2.1 Unit of Analysis and Sample Restriction The unit of analysis is the individual recipient wallet. We retain onlywallets with at least 5 swaps over the observationwindow, applying both bounds before feature construction. The lowerbound ensures sufficient per-wallet observations for distributionalfitting; after filtering, the analytical samplecomprises 321 wallets. 2.2 Feature Engineering We compute 26 per-wallet features grouped into five categories:address-based, temporal, volume, block-level, and CEX-derived. Wherefeatures overlap with those in Niedermayer et al. (2024), we followtheir definitions; features specific to the on-chain builder detectioncontext are defined below. 2.2.1 Address-Based Features Following Niedermayer et al. (2024), n_leading_zeros countsleading zeros in the wallet's 40-character hexadecimal address.Addresses with more leading zeros are typically mined deliberately toreduce gas costs in smart contract interactions, and serve as a proxyfor sophisticated, contract-based actors. 2.2.2 Temporal Features sleepiness_hr is the maximum gap between consecutive trades, inhours. This departs from the interval-averaged formulation ofNiedermayer et al. (2024), who average maximum gaps across two-daywindows; we take the single global maximum over the observation period.A high value indicates a wallet that goes dormant for extended periodsbetween activity bursts — characteristic of event-driven bots — while alow value indicates continuous market monitoring. sender_diversity is the ratio of unique tx_sender addresses tototal trades. A value near zero means all trades were routed through asingle caller (proprietary infrastructure); a value near one means eachtrade came from a distinct sender (typical of retail users routingthrough public interfaces). self_swap_ratio is the fraction of trades where tx_senderequals recipient — that is, the wallet submitted and received its ownswap without an intermediary router. This is a binary indicator ofdirect, self-initiated execution. A ratio of 1.0 combined with a high avg_log_indexis the pathognomonic sign of synthetic volume. While self-swapping isoften ignored in MEV literature, in our sample it serves as a filterfor wash traders who inflate multichain valuations while operating atthe lowest-priority tiers of the block (median log index > 300)." 2.2.3 Volume and Price Features value_clustering_score is the fraction of ETH trade sizes thatqualify as round numbers, defined as values whose string representationhas fewer than four significant decimal digits. Following Niedermayer etal. (2024) and Cong et al. (2021), round-number clustering reflectshuman cognitive reference points; its absence in a bot wallet isexpected and its presence may indicate wash trading or manualintervention. mean_base_qty_eth, mean_quote_qty_usdc,mean_gas_price_gwei, and mean_tx_fee_eth are arithmeticmeans of per-trade quantities. Gas price is interpreted as an inclusionurgency signal rather than an absolute cost measure, consistent with theEIP-1559 recording issue described in Section 3.2. net_usdc_flow is the signed sum of USDC flows over the observationwindow: $$\text{net_usdc_flow}(w) = \sum_{t} \text{flow}_t, \quad\text{where} \quad\text{flow}_t =\begin{cases}+\text{usdc}_t & \text{if SELL_ETH} \-\text{usdc}_t & \text{if BUY_ETH}\end{cases}$$ A positive value indicates net USDC extraction from the pool (wallet isa net seller of ETH); a negative value indicates net USDC injection(wallet is a net buyer). This is the primary profitability indicator. 2.2.4 Block-Level Features Four features capture intra-block positioning and dominance, computedfrom the pool's transaction log indexed by block number and log index. avg_log_index is the mean position of the wallet's transactionswithin their respective blocks. Lower values indicate earlier placement,which in the context of a Uniswap pool is a direct signal of blockconstruction access: randomly submitted transactions land at positionsdetermined by mempool ordering, while block builders can place their owntransactions first. block_capture_rate is the fraction of blocks in which the walletaccounts for more than 50% of the pool's total swap volume. A walletachieving majority volume share in a block has effectively dominatedthat block's price discovery. avg_block_share is the mean fraction of per-block pool volumeattributable to the wallet, across all blocks in which it appears. avg_txs_per_block and multi_tx_rate measure the intensityand prevalence of multi-transaction execution within single blocks.avg_txs_per_block is the mean number of swaps placed by the wallet ina block; multi_tx_rate is the fraction of blocks containing more thanone such swap. Values above 2 on avg_txs_per_block are consistentwith atomic sandwich execution (frontrun + backrun bracketing a victimtrade). 2.2.5 CEX-Derived Feature alpha_reaction_rate measures directional alignment between thewallet's DEX trades and concurrent Kraken price movements. Tick-leveltrade data for the XETHZUSD pair is fetched from the Kraken REST API,resampled to one-second intervals, and forward-filled with a 60-secondcap to avoid attaching stale prices to DEX events. Price returns areexpressed in basis points. A CEX signal is defined as a one-secondinterval wi