About Me
I am Kaiwei Che (Richard), a third-year Ph.D. student at Peking University (2023–2027), advised by Prof. Yonghong Tian and Prof. Li Yuan. My research focuses on AI Infra and Efficient AI, including model sparse quantization, training and inference acceleration, and brain-inspired algorithms. I have published 3 CCF-A first-author papers at ICML (Spotlight), NeurIPS (Spotlight), and AAAI, along with 2 SCI journal papers, with 360+ total citations.
Before this, I received my Master’s degree from Southern University of Science and Technology (SUSTech) in 2023, advised by Prof. Qinghu Meng. I received my Bachelor’s degree from Shenzhen University in 2020.
I have interned at Huawei (2012 Lab, Research Intern, 2021–2023) and DJI (RoboMasters, 2018), with extensive experience in algorithm research and engineering practice.
Research Interests
- AI Infra & Efficient AI: model sparse quantization, training/inference acceleration, brain-inspired algorithms
- Spiking Neural Networks (SNNs): energy-efficient deep learning, neuromorphic computing
- LLM/VLM: post-training alignment (SFT, RLHF), RAG, agent systems
- Event-based Vision: neuromorphic sensing, efficient event processing
Publications
First Author
ICML 2026, Spotlight Efficiently Training Time-to-First-Spike Spiking Neural Networks from Scratch
Kaiwei Che, Wei Fang, Zhengyu Ma, Yifan Huang, Peng Xue, Li Yuan, Timothée Masquelier, Yonghong TianAAAI 2026 Parallel Training Time-to-First-Spike Spiking Neural Networks
Kaiwei Che, Wei Fang, Peng Xue, Yifan Huang, Zhengyu Ma, Yonghong TianNeurIPS 2022, Spotlight Differentiable Hierarchical and Surrogate Gradient Search for Spiking Neural Networks
Kaiwei Che, Luziwei Leng, Kaixuan Zhang, Jianguo Zhang, Qinghu Meng, Jie Cheng, Qinghai Guo, Jianxing LiaoMM 2026, under review Deep-TTFS: Scaling Time-to-First-Spike Neural Networks to ImageNet
Kaiwei Che, Wei Fang, Zhengyu Ma, Peng Xue, Li Yuan, Yonghong TianFrontiers in Neuroscience 2024 Auto-Spikformer: Spikformer Architecture Search
Kaiwei Che, Zhaokun Zhou, Jun Niu, Zhengyu Ma, Wei Fang, Yanqi Chen, Shuaijie Shen, Li Yuan, Yonghong TianTransactions on Artificial Intelligence 2024 Spatial-Temporal Search for Spiking Neural Networks
Kaiwei Che, Zhaokun Zhou, Li Yuan, Jianguo Zhang, Yonghong Tian, Luziwei LengIntelligence & Robotics 2024 A Deep Learning-based System for Accurate Detection of Anatomical Landmarks in Colon Environment
Kaiwei Che, Chengwei Ye, Yibing Yao, Nachuan Ma, Ruo Zhang, Jiankun Wang, Max Q-H Meng
Co-Author
Neural Networks 2025 Spatially-enhanced Spiking Neural Network for Efficient Point Cloud Analysis
Yijie Lu, Zhiyi Pan, Renrui Zhang, Yanhao Jia, Kaiwei Che, Zhaokun ZhouNeurIPS 2024 Spiking Transformer with Experts Mixture
Zhaokun Zhou, Yijie Lu, Yanhao Jia, Kaiwei Che, Jun Niu, Liwei Huang, Xinyu Shi, Yuesheng Zhu, Guoqi Li, Zhaofei YuTIP, under review Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN Ticket
Zhaokun Zhou, Kaiwei Che, Wei Fang, Keyu Tian, Yuesheng Zhu, Shuicheng Yan, Yonghong Tian, Li YuanTNNLS 2024 Accurate and Efficient Event-based Semantic Segmentation Using Adaptive Spiking Encoder-Decoder Network
Rui Zhang, Luziwei Leng, Kaiwei Che, Hu Zhang, Jie Cheng, Qinghai Guo, Jianxing Liao, Ran ChengIEEE TCDS 2024 Automotive Object Detection via Learning Sparse Events by Spiking Neurons
Hu Zhang, Yanchen Li, Luziwei Leng, Kaiwei Che, Qian Liu, Qinghai Guo, Jianxing Liao, Ran ChengIEEE TCDS 2024 Spatial-Temporal Spiking Feature Pruning in Spiking Transformer
Zhaokun Zhou, Kaiwei Che, Jun Niu, Man Yao, Guoqi Li, Li Yuan, Guibo Luo, Yuesheng ZhuICANN 2024 A Multi-modal Spiking Meta-learner with Brain-Inspired Task-Aware Modulation Scheme
Jun Niu, Zhaokun Zhou, Kaiwei Che, Li YuanCVPR 2022 Discrete Time Convolution for Fast Event-Based Stereo
Kaixuan Zhang, Kaiwei Che, Jianguo Zhang, Jie Cheng, Ziyang Zhang, Qinghai Guo, Luziwei LengROBIO 2021 Motion Planning for Hexapod Robots in Dynamic Rough Terrain Environments
Bingyi Xia, Kaiwei Che, Zhilong Tang, Jiankun Wang, Max Q-H Meng
Research Projects & Experience
AI Infra & Efficient AI
NPUSlim: LLM Quantization, Sparse & Acceleration Framework on Huawei NPU (Provincial Key Project)
A framework for quantization, sparsity, and inference acceleration on Huawei NPUs. Supports mainstream model integration and deployment via vLLM / vLLM-ascend. Implements full-pipeline 2:4 structured sparse operator adaptation on NPU, integrates GPTQ, SparseGPT and other PTQ algorithms. Features a layer-block streaming quantization pipeline with trillion-parameter-scale offline quantization capability on a single machine.Winner-Take-All Spiking Transformer for Language Modeling
Proposes WTA-based attention to replace softmax for fully spike-driven, floating-point-free computation. Develops both Decoder-only and Encoder-only SNN Transformers, achieving SNN SOTA across multiple datasets with only 1/14 the energy of Qwen-1.5B.
Agent
SF-RAG: Selective Filtering for Retrieval-Augmented Generation
Addresses context overload and intermediate information issues in traditional RAG. Proposes a lightweight Reasoner for parallel inference and CoT chunk-level utility scoring, enabling dynamic retrieval document filtering. Outperforms SELF-RAG, SURE, and other baselines on 5 open-domain QA benchmarks. The Reasoner is an LLaMA-8B model fine-tuned via SFT on GPT-4o synthetic data.Personal Time Management Agent
A multi-workflow agent built on LangGraph that routes weekly/daily planning and review workflows by intent. Uses interrupt/resume for user confirmation and information completion, constraining the LLM from unauthorized plan modifications for improved controllability.
Technical Blogs
LLM/VLM Technical Blog
Systematically covers VLM technical directions, including discriminative models (CLIP, BLIP-2, Flamingo, Qwen-VL) and generative models (DDPM, LDM, DALL-E, DiT). Includes multimodal fine-tuning experiments using LLaMA-Factory with Qwen3-VL SFT.RLHF Technical Blog
Covers RL alignment methods (PG, TRPO, PPO, GRPO) from both mathematical principles and engineering implementation perspectives. Includes multimodal RL alignment experiments on Geo3K using verl with Qwen2.5-VL and LoRA GRPO.
Academic Service
I have experience in reviewing for high-impact AI conferences and journals, including:
Conferences: NeurIPS, ICML, ICLR, CVPR, AAAI
Journals: TIP, TNNLS, Pattern Recognition, Scientific Reports
Skills
- Programming Languages & Frameworks: Python, C++, Triton; PyTorch
- Post-training & Alignment: SFT, RLHF (PPO, GRPO); Frameworks: LLaMA-Factory, verl
- Agent Systems: RAG, Tool use; Framework: LangGraph
- Inference Optimization: Quantization, sparsity, inference acceleration, NPU deployment; Framework: vLLM
- Open Source: Spikingjelly (2k+ stars), NPUSlim, RoboRTS (800+ stars)
- Languages: CET6
Contact
- E-mail: chekaiwei@stu.pku.edu.cn
- Google Scholar: Kaiwei Che
