Can Jin
Ph.D. Candidate in Computer Science, Rutgers University
I am a Ph.D. Candidate in Computer Science at Rutgers University, New Brunswick, advised by Professor Dimitris N. Metaxas. My research interests include Pre-training/Post-training/Inference of Foundation Models, with a focus on Efficiency and Reasoning/Coding Capabilities. Please feel free to reach out via email if you share similar interests and would like to collaborate.
I earned my M.S. and B.S. degrees in Mathematics from the University of Science and Technology of China (USTC). Before my doctoral studies, I worked as a Machine Learning Engineer at Meituan Dianping Corporation. Recently, as a Research Intern at Adobe Research, I focused on the efficient pre-training of large foundation models (LLMs/DiTs/omni-models) via Dense/MoE architectures.
I am actively seeking research internship opportunities for Summer 2027, focusing on Pre-training/Post-training/Inference of Foundation Models. You can find my CV here.
Research
My research is organized around three core stages of foundation model development: pre-training, post-training, and inference. Across these directions, I work on making large models more capable, efficient, and reliable for reasoning, coding, alignment etc.
Pre-training
- Focus: Investigating foundation model pre-training through efficient architecture design, generalizable training strategies, and scaling laws.
- DTop-p MoE: Sparsity-Controlled Dynamic Top-p MoE for Foundation Model Pre-trainingIn Forty-third International Conference on Machine Learning, 2026Propose DTop-p MoE, a dynamic routing mechanism that utilizes a Proportional-Integral controller and dynamic routing normalization to precisely control expert activation sparsity while adapting to varying token difficulty. DTop-p outperforms Top-k and Top-p MoE across Large Language Models and Diffusion Transformers.
- Learning from Teaching Regularization: Generalizable Correlations Should be Easy to ImitateIn Advances in Neural Information Processing Systems, 2024Develop Learning from Teaching (LoT), a novel regularization technique for deep neural networks that enhances model generalization by training a teacher model to prioritize features that are easier for a student model to imitate, thereby filtering out spurious correlations.
Post-training
- Focus: Investigating post-training techniques such as reinforcement learning, on-policy distillation, supervised fine-tuning, prompt-based adaptation, and pruning for efficient reasoning, coding, alignment, and adaptation.
- DARE: Difficulty-Adaptive Reinforcement Learning with Co-Evolved Difficulty Estimation2026Introduce DARE, a difficulty-adaptive Reinforcement Learning framework that co-evolves policy-aligned difficulty estimation with dynamic data selection and difficulty-specific optimization, improving training efficiency, final accuracy, and inference-token efficiency for LLM reasoning.
- Weak Critics Make Strong Learners: On-Policy Critique Distillation for Scalable OversightIn 3rd AI for Math Workshop: Toward Self-Evolving Scientific Agents, 2026We propose weak-critic strong oversight and on-policy critique distillation (OPCD), showing that weak models can guide stronger models through useful critiques while improving alignment and reasoning performance.
- Reasoning over Precedents Alongside Statutes: Case-Augmented Deliberative Alignment for LLM SafetyIn The 64th Annual Meeting of the Association for Computational Linguistics, 2026Propose CADA, a case-augmented deliberative alignment framework that leverages reinforcement learning on self-generated reasoning chains to transition from rigid rule enforcement to flexible case-based reasoning, significantly reducing over-refusal while enhancing robustness against jailbreak attacks.
- LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model AdaptationIn The Thirteenth International Conference on Learning Representations, 2025Design LoR-VP, a low-rank visual prompting technique for efficient vision model adaptation that reduces trainable parameters while outperforming full fine-tuning and standard visual prompting methods on object detection and segmentation benchmarks.
- Visual Prompting Upgrades Neural Network Sparsification: A Data-Model PerspectiveIn Proceedings of the AAAI Conference on Artificial Intelligence, 2025Propose VPNs, a novel data-model co-design framework that simultaneously optimizes visual prompts and network sparsity, significantly enhancing the performance and transferability of sparse vision models.
Inference
- Focus: Investigating inference-time techniques such as test-time search, refinement/critiquing, prompt engineering, and multi-agent systems for improving reasoning, coding, retrieval, and agentic performance.
- Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative ReasoningIn Workshop on Scaling Environments for Agents, 2025Develop MAS-TTS, a framework that integrates a specialized multi-agent training pipeline with an adaptive CEO agent to orchestrate collaborative reasoning, effectively optimizing test-time scaling for complex tasks.
- APEER: Automatic Prompt Engineering Enhances Large Language Model Reranking (Best Paper Award @ RelWeb)In Companion Proceedings of the ACM Web Conference 2025, Sydney, NSW, Australia, 2025Propose APEER, a novel automatic prompt engineering algorithm that iteratively generates and refines prompts to enhance the performance and transferability of Large Language Models in information retrieval reranking tasks.
- RankFlow: A Multi-Role Collaborative Reranking Workflow Utilizing Large Language ModelsIn Companion Proceedings of the ACM Web Conference 2025, Sydney, NSW, Australia, 2025Design RankFlow, an LLM-driven reranking framework that utilizes multi-role collaboration to enhance retrieval accuracy, demonstrating superior performance over existing baselines in extensive empirical studies.
- Your reward function for RL is your best PRM for search: Unifying RL and search-based TTSarXiv preprint arXiv:2508.14313, 2025Introduce AIRL-S that unifies Reinforcement Learning and search-based Test-Time Scaling, demonstrating that RL reward functions can serve as optimal Process Reward Models for guiding search in complex reasoning tasks.
Academic Services
Teaching Assistant
- Rutgers University: CS344: Algorithms (Spring 2026), CS211: Computer Architecture (Fall 2025), CS534: Computer Vision (Spring 2025), CS210: Data Management for Data Science (Fall 2024)
Peer Review
- Conference: NeurIPS 25/26, ICLR 25/26, ICML 24/26, CVPR 25/26, ECCV 26, AAAI 26, etc.
- Journal: Alexandria Engineering Journal, Information Fusion, Pattern Recognition, Signal Processing