Neural processing unit - Wikipedia On consumer devices, the NPU is intended to be small, power-efficient, but reasonably fast when used to run small models To do this they are designed to support low-bitwidth operations using data types such as INT4, INT8, FP8, and FP16