Search for a command to run...
PipeShard + VLMOpt — MLSys'26 Artifact Evaluation (Source Code) Complete source code, build system, and reproduction scripts for the MLSys'26 paper: "Efficient, VRAM-Constrained XLM Inference on Clients" What's included PipeShard engine — Pipeline-sharded LLM inference that schedules layers across GPU and CPU with concurrent PCIe transfers, enabling models larger than VRAM to run on consumer hardware. VLMOpt — Complementary VRAM-reduction optimizations for vision encoders (CPU offload, tiled-Q attention, dynamic clip resolution) enabling high-resolution VLM inference under tight VRAM budgets. Hardware profilers — concurrent_profiler (CPU + PCIe bandwidth under concurrent GPU load) and gpu_profiler (GPU kernel throughput), used by the solver to select optimal execution strategies. Full reproduction suite — Automated scripts (PowerShell + Bash) to reproduce Tables 4, 8, 9 and Figure 2, 7 from the paper with a single command. Docker support — Pre-built container image for one-command Linux reproduction. Model download automation — Scripts to fetch and verify all required GGUF models. Tested platforms | Platform | GPU | OS | Status | |----------|-----|----|--------| | Desktop client | NVIDIA RTX 5090 | Windows 11 | Primary development & reproduction of paper results | | Server / Workstation | NVIDIA A100-PCIe-40GB | Ubuntu (Linux) | Verified — full build + reproduction suite | Quick start # Build from source git clone https://github.com/deepshnv/pipeshard-mlsys26-ae.git cd pipeshard-mlsys26-ae cmake -B build -DGGML_CUDA=ON cmake --build build --config Release -j$(nproc) # Download models + run all 5 reproduction scripts (linux, use .ps1 for windows powershell) chmod +x ./download_models.sh ./run_all_repro.sh ./paper_results/*.sh ./download_models.sh ./run_all_repro.sh