[2511. 18822] DiP: Taming Diffusion Models in Pixel Space Diffusion models face a fundamental trade-off between generation quality and computational efficiency Latent Diffusion Models (LDMs) offer an efficient solution but suffer from potential information loss and non-end-to-end training In contrast, existing pixel space models bypass VAEs but are computationally prohibitive for high-resolution synthesis To resolve this dilemma, we propose DiP