Search for a command to run...
New Features Atomic Scatter Operations: Added dr.scatter_cas() (atomic compare-and-swap) and dr.scatter_exch() (atomic exchange) operations. On the CUDA backend, these map to native PTX instructions; the LLVM implementation uses a loop over the vectorization width. (Dr.Jit PR #450, Dr.Jit-Core PR #177). AdamW Optimizer: Added the dr.opt.AdamW optimizer with built-in weight decay, equivalent to PyTorch's implementation. (PR #449). AMSGrad for Adam/AdamW: The dr.opt.Adam and dr.opt.AdamW optimizers now support an optional amsgrad parameter. AMSGrad keeps a running maximum of the second moments, which can help improve stability near local minima. (PR #467). Functions in IR dr.func: A new function decorator that forces a Python function to also become a callable in the generated IR. This can improve compilation times: without it, Dr.Jit emits the function body's IR every time it is called within a single kernel. With @dr.func, each call resolves to a function call in the IR, emitting the body only once. (Dr.Jit PR #473, Dr.Jit-Core PR #183). Oklab Color Space Conversion: Added dr.linear_srgb_to_oklab() and dr.oklab_to_linear_srgb() for perceptually uniform color space conversion. (PR #453). Pickling Support: Dr.Jit arrays can now be natively pickled and unpickled via Python's pickle module. (PR #448). Bounded Integer RNG: Added dr.rng().integers() to generate uniformly distributed integers on a given interval. (commit cb09caa). Symbolic RNG mode: dr.rng() now accepts a symbolic argument for a purely symbolic sampler. (commit 51bacbf). ArrayX Initialization from Tensors: Nested array types with multiple dynamic dimensions (like ArrayXf) can now be initialized from Dr.Jit tensors or NumPy arrays. (commit e7e1339). Type Trait: Added dr.replace_shape_t() convenience type trait for writing generic functions that need to reshape array types. (commit 4643452). Hardware/platform-specfic features NVIDIA Blackwell (SM120+): Added support for wide packet loads, gathers, and atomics on NVIDIA Blackwell GPUs (SM120+). (commit 879c103). Python 3.14 Compatibility: Fixed compatibility with PEP 649 deferred annotation evaluation, ensuring Dr.Jit works correctly on Python 3.14. (commit 7fa6eb4). Linux ARM Wheels: Added ubuntu-24.04-arm to the wheels pipeline. (PR #461, contributed by Merlin Nimier-David). Performance Improvements Simplified Single-Target Virtual Calls: When a virtual function call has only a single target (as is the case for @dr.func), the JIT backend now eliminates the indirection/dispatch loop and calls the function directly, producing simpler IR. (Dr.Jit-Core PR #183). AD Early Exit for Zero Derivatives: The AD graph traversal now skips edges with zero-valued derivatives, avoiding unnecessary computation. (commit 06b0a9d). GIL Release in __getitem__: dr.ArrayBase.__getitem__() now releases the GIL while waiting, improving multi-threaded performance. (commit c24be70). Bug Fixes Fixed a bug where constructing a cooperative vector inside a dr.suspend_grad() scope could raise an exception. (PR #475, contributed by Christian Döring). Fixed a crash when calling a frozen function with a re-seeded random number generator whose seed was a Python integer type. (PR #471, contributed by Christian Döring). Fixed a bug in the C++ transform_compose() function where the translation was placed in the last row of the matrix rather than the last column. (PR #451, contributed by Delio Vicini). Fixed multiple issues in the Dr.Jit-Core gather re-indexing logic: the mask stack is now correctly applied during re-indexing, and nested gather masks are combined rather than overwritten. (Dr.Jit-Core PR #178). Fixed a bug in virtual call analysis when a target contained a symbolic loop — the analysis now accounts for eliminated/optimized-out loop state variables. (Dr.Jit-Core PR #184). Fixed LLVM backend compilation of wavefront loops with scalar masks. (commit 16a81d0). Fixed lost tensor shapes when a loop or conditional is replayed for AD passes, with more robust inference of tensor output shapes. (commit 9d201f2). Fixed a regression in ArrayX initialization from tensors and NumPy ndarrays (wrong shape hint order for flipped axes and broken shift loop). (commit df4cf48). Fixed Texture::eval_fetch_cuda to handle double-precision queries gracefully by casting to single-precision when a HW-accelerated texture is requested. (commits 83083d8, 054d115). Fixed symbolic loop size computation to also account for side-effect sizes. (Dr.Jit-Core commit c6dfc83). Fixed spurious warning when freezing functions with very wide literals. (PR #455). Other Improvements Updated to nanobind v2.10.2. Improved documentation and log messages for textures, including clarifications regarding numerical precision and extra diagnostics for migrated textures. (commit 4edae0a).