2025-04-07 Papers

1/2

Paper 1

ZClip: Adaptive Spike Mitigation for LLM Pre-Training

Published: 2025-04-03

Link: http://arxiv.org/pdf/2504.02507

1. 📘 Topic and Domain: The paper focuses on gradient clipping techniques for large language model (LLM) pre-training, specifically addressing training stability in deep learning.

2. 💡 Previous Research and New Ideas: Based on traditional gradient clipping methods (fixed-threshold and norm-based), the paper proposes a new adaptive gradient clipping algorithm called ZClip that dynamically adjusts clipping thresholds based on statistical properties.

3. ❓ Problem: The paper aims to solve the problem of loss spikes and gradient instability during LLM training, which can lead to catastrophic divergence and require costly checkpoint restoration.

4. 🛠️ Methods: ZClip uses z-score-based anomaly detection with exponential moving averages (EMA) to track gradient norm statistics and dynamically adjust clipping thresholds during training.

5. 📊 Results and Evaluation: Testing on a 1B parameter LLaMA model showed ZClip eliminated loss spikes, enabled higher learning rates, achieved 35% faster convergence compared to baseline methods, and improved downstream task performance on HellaSwag and WinoGrande benchmarks.

ZClip: Adaptive Spike Mitigation for LLM Pre-Training

1/2

Paper 2

AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction

Published: 2025-04-01

Link: http://arxiv.org/pdf/2504.01014

1. 📘 Topic and Domain: The paper focuses on creating an infinite anime life simulation game system using AI, specifically in the domain of generative game development and character animation.

2. 💡 Previous Research and New Ideas: Prior research used large language models (LLMs) to generate static images for games, while this paper introduces a novel approach using Multimodal Large Language Models (MLLMs) to generate dynamic animation shots with contextual consistency.

3. ❓ Problem: The paper addresses the limitations of existing methods that lack visual context consistency and can only generate static images, which results in less engaging gameplay experiences.

4. 🛠️ Methods: The authors developed AnimeGamer, which uses MLLMs to generate game states and incorporates action-aware multimodal representations that can be decoded into video clips using a video diffusion model.

5. 📊 Results and Evaluation: Through both automated metrics and human evaluations, AnimeGamer outperformed existing methods in instruction following, contextual consistency, character consistency, style consistency, and overall gaming experience.

AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction

1/2

Paper 3

Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation

Published: 2025-04-03

Link: http://arxiv.org/pdf/2504.02542

1. 📘 Topic and Domain: The paper focuses on talking head video generation using a video diffusion model that can be controlled by both audio and visual signals simultaneously.

2. 💡 Previous Research and New Ideas: Based on existing video diffusion models that only allow single-signal control, this paper proposes a novel framework that enables multiple signals to control different facial regions without conflicts.

3. ❓ Problem: The paper addresses the challenge of generating portrait videos that can be controlled by both audio and facial motion signals simultaneously while preventing control conflicts between signals.

4. 🛠️ Methods: The paper introduces ACTalker, an end-to-end framework featuring a parallel-control mamba layer with multiple branches and mask-drop strategy to enable region-specific control by different signals, along with a gating mechanism for flexible control.

5. 📊 Results and Evaluation: The method outperforms existing approaches in both single-signal and multi-signal control scenarios, achieving superior lip synchronization scores and video quality metrics while demonstrating natural facial expressions and smooth transitions.