I am currently pursuing a Ph.D. at KAUST, supervised by Prof.Juergen Schmidhuber. My research focuses on multimodal generative models, particularly video and image generation, with the long-term goal of developing a Physical AI (world model). I have completed one research internship at Tencent and two at Meta AI, where I worked on building large-scale prototype foundation models from scratch, including autoregressive diffusion and native LLM-driven multimodal generation. I am (co-)first author of over 10 papers in top-tier venues, with more than 1k citations. Several of my favorite co-authored projects include BoxDiff, an early training-free controllable diffusion model; NLSOM, a pioneering position paper on natural language agentic systems; TGATE, the first caching systems for diffusion transformers; MarDini, an early but powerful Auto-Regressive video diffusion model; SANA-Video, an efficient video diffusion transformer with linear attention; and MoS, a native LLM-driven multimodal generation model. Additionally, I am a core member of several large-scale pre-training projects, including scaling models across over 1,000 GPUs. My internship work has been integrated into multiple commercial products. My curriculum vitae can be found at here.

I’m open to future collaborations—whether it’s co-founding a venture or pursuing full-time opportunities in industry or academia. Feel free to reach out via email: haozhe.liu[at]kaust.edu.sa

🔥 News

2025.05: 🎉 One paper is accepted by TMLR!
2025.02: 🎉 One paper is accepted by CVPR!
2025.02: 🎉 One paper is accepted by TMLR!
2024.11: 🎉🎉 I will join Meta (MPK) as Research Scientist Intern, focusing on GenAI-related topics, in Summer 2025!
2024.04: 🎉 Promote to Ph.D. Candidate!
2024.02: 🎉 One paper is accepted by CVPR’2024!
2024.02: 🎉🎉 I will join Meta (London) as Research Scientist Intern on Efficient Video Generation in Summer 2024!
2023.12: 🎉 NLSOM is recognized as the best paper at NeurIPS’2023 workshop in Robustness of Few-shot/Zero-shot Learning in Foundation Models !
2023.09: 🎉 One paper is accepted by NeurIPS’2023!
2023.07: 🎉🎉 Two papers are accepted by ICCV’2023!.
2023.02: 🎉🎉 Two papers are accepted by CVPR’2023!.
2022.11: 🎉🎉 One paper is accepted by AAAI’2023 (Oral).
2022.08: 🎉🎉 I join AI Initiative, KAUST to pursue the Ph.D. degree under the supervision of Juergen Schmidhuber!
2022.08: 🎉 Our team reaches to the 4th/40 in NICO challenge (Invited Workshop Paper in ECCV’2022).
2022.07: 🎉 One paper is accepted by ECCV’2022!
2022.06: 🎉 Two papers are accepted by MICCAI’2022!
2021.07: 🎉 One paper is accepted by ICCV’2021!

📝 Publications

Journals: TMLR x 2, IEEE TIP x 1, IEEE TCYB x 1, IEEE TNNLS x 1, IEEE TIFS x 1, IEEE TIM x 1, MIA x 1, PR x 3

Conferences: NeurIPS x 1, CVPR x 5, ICCV x 3, ECCV x 1, MICCAI x 2, AAAI x 1.

Selected Publications:

Liu, H., Liu, D., Zhuge, M., Zhou, Z., Xie, T., He, S., … & Schmidhuber, J. (2025). Mixture of States: Routing Token-Level Dynamics for Multimodal Generation. Technical Report.
Liu, H., Liu, S., Zhou, Z., Xu, M., Xie, Y., Han, X., … & Pérez-Rúa, J. M. (2024). MarDini: Masked Autoregressive Diffusion for Video Generation at Scale. TMLR.
Liu, H., Zhang, W., Xie, J., Faccio, F., Xu, M., Xiang, T., … & Schmidhuber, J. (2024). Faster Diffusion via Temporal Attention Decomposition. ICLR & TMLR.
Liu, H., Zhang, W., Li, B., Ghanem, B., & Schmidhuber, J. Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable. Technical Report.
Liu, H., Zhuge, M., Li, B., Wang, Y., Faccio, F., Ghanem, B. & Schmidhuber, J. Learning to Identify Critical States for Reinforcement Learning from Videos ICCV’2023.
Liu, H., W Zhang, Li, B., Wu, H., He, N., Huang, Y., Li, Y., Ghanem, B. & Zheng, Y. AdaptiveMix: Improving GAN Training via Feature Space Shrinkage CVPR’2023.
Liu, H., Li, B., Wu, H., Liang, H., Huang, Y., Li, Y., … & Zheng, Y. Combating Mode Collapse in GANs via Manifold Entropy Estimation. AAAI’2023 Oral.
Liu, H., Wu, H., Xie, W., Liu, F., & Shen, L. Group-wise Inhibition-based Feature Regularization for Robust Classification. ICCV’2021.
Liu, H., Zhang, W., Liu, F., Wu, H.,& Shen, L. (2021). Fingerprint Presentation Attack Detector Using Global-Local Model. IEEE T-CYB.
Liu, H., Zhang, W., Xie J., Wu, H., Li, B., Zhang, Z., Li, Y., Huang, Y., Ghanem, B., Y. Zheng. Decoupled Mixup for Out-of-Distribution Visual Recognition. ECCV’2022 Workshop.
Zhang, W., Liu, H.#, Xie, J., Faccio, F., Shou, M. Z., & Schmidhuber, J. (2024). Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models. Technical Report. (# Corresponding Author)
Zhang, W.*, Liu, H.*, Li, B., Xie J., Huang, Y., Li, Y., Y. Zheng, Ghanem, B.. Dynamically Masked Discriminator for Generative Adversarial Networks NeurIPS’2023. (* Equal Contribution)
Zhuge, M.*, Liu, H.*, Faccio, F.*, Ashley, D. R.*, Csordás, R., Gopalakrishnan, A., … & Schmidhuber, J. (2023). Mindstorms in Natural Language-Based Societies of Mind. Position Paper, Best Paper@NeuralIPSW. (* Equal Contribution)
Wu, H.*, Chen, K.*, Liu, H.*, Zhuge, M.*, B Li, …, & Ghanem, B. NewsNet: A Novel Dataset for Hierarchical Temporal Segmentation CVPR’2023.(* Equal Contribution)
Ji, H.*, Liu, H.*, Li, Y.*, Xie J., He, N., Huang, Y., Dong, W., Chen, X., Shen L. & Zheng, Y. Point Beyond Class: A Benchmark for Weakly Semi-Supervised Abnormality Localization in Chest X-Rays. MICCAI’2022. (* Equal Contribution)
Zhang, W.*, Liu, H.*, Liu, F., Ramachandra, R., & Busch, C. Effective Presentation Attack Detection Driven by Face Related Task. ECCV’2022.(* Equal Contribution)

🎖 Honors and Awards

2023 Best Paper Award at NeuralIPS Workshop in Robustness of Few-shot/Zero-shot Learning in Foundation Models
2022 Outstanding Graduate Award (Rate<5%)
2021 China National Scholarship (Rate<0.02%)

📖 Research Experience

Meta AI (MPK)

Research Scientist Internship working with Ding Liu.

Research Topic: Foundational Training, Text-to-Image Generation.
Publication Records: Under Review x 5
Co-developing a commercial 7B text-to-image model trained on over 1B image–text pairs, deployed in several widely used products.
Built a 20B native LLM-driven multimodal generation prototype from scratch.
Contributed to developing a native unified model for both visual understanding and generation tasks.

Meta AI (London)

Research Scientist Internship working with Juan-Manuel Pérez-Rúa.

Research Topic: Foundational Training, Image-to-Video Generation, Text-to-Image Generation.
Publication Records: TMLR x 2; CVPR x 1; Under Review x 2
Co-Developing a foundtional text-to-image model (model size >5B; data pairs >1B) to support several well-known products.
Scaling auto-regressive diffusion to video generation.

AI Initiative (KAUST)

PhD Candidate supervised by Prof. Juergen Schmidhuber.

Research Topic: Neural Networks with Multiple-Step Inferences, e.g., Diffusion Model, Auto-Regressive Model, and RL Agents.
Publication Records: ICCV x 1; CVPR x 2; NIPSW x 1; TMLR x 2, Under Review x 2.
Highlight: NLSOM is awarded Best paper@NIPS’23 Ro-FoMo Workshop.
Highlight: TGate is merged into the Diffusers Library and received over 300 stars on GitHub.

Jarvis Lab (Tencent)

Internship supervised by Mentor: Dr. Yawen Huang, Dr. Nanjun He & Dr. Yuexiang Li and Director: Dr. Yefeng Zheng

Research Topic: Generative Model and Medical Imaging.
Publication Records: NIPS x 1, CVPR x 1; ICCV x 1; AAAI x 1; MICCAI x 2; MIA x 1; PR x 1; ECCVW x 1.
Highlight: MaF-GAN is recognized as Oral paper@AAAI.
Highlight: Ranked 4th in ECCV’2022 NICO Challenge.

Haozhe Liu