Swizzle and Shuffle Weights before calling FlashInfer's Fused MoE Kernel
We discuss how to swizzle and shuffle the experts weights before calling FlashInfer's fused MoE kernel to satisfy the memory layout requirements.
We discuss how to swizzle and shuffle the experts weights before calling FlashInfer's fused MoE kernel to satisfy the memory layout requirements.
Centralizing context switch logic in PendSV simplifies concurrency reasoning and enhances system performance by reducing register preservation overhead of all other exceptions.
Changing a Rust enum variant through a mutable reference can be achieved either by wrapping the variant attached variable inside `Option`, or better, by introducing an `Undef` …
Although `ArrayQueue` contains no explicit spin lock or mutex, its code structure forms a big spin lock and thus deadlock is possible.
SVC will be pended if a higher priority exception arrives during the stacking. If the priority of SVC is raised above the previously higher priority exception inside its handler, …
Read-modify-write on the `cr1` register of I²C may generate two consecutive start condition and hang the peripheral.
The Rust HAL Library `stm32f4xx-hal` failed to set the `LAST` bit in I²C `CR2` before initiating a DMA read. We tracked down the problem with a logic analyzer and identified the …
The I²C bus may be stuck busy after a software reset if the slave peripheral is holding the SDA line upon reset. Manually generating some clock pulses on the SCL line and sending a …
STM32F4xx I²C can be stuck after start condition when the stop bit is set while the I²C bus is already idle.
Calling a Rust closure from assembly code involves erasing the type and converting everything into raw pointers before passing to the assembly code, and then reconstructing Rust …