2025-04-14 Papers

1/2

Paper 1

Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model

Published: 2025-04-11

Link: http://arxiv.org/pdf/2504.08685

1. 📘 Topic and Domain: The paper presents Seaweed-7B, a cost-effective video generation foundation model with 7 billion parameters, focusing on efficient training strategies in the domain of AI-generated video.

2. 💡 Previous Research and New Ideas: The paper builds on prior video generation models like Sora and MovieGen, proposing that medium-sized models can match or exceed larger models through optimized architecture, training strategies, and data curation.

3. ❓ Problem: The paper addresses the excessive computational costs of training and deploying video generation models, which typically require thousands of GPUs and substantial resources.

4. 🛠️ Methods: The authors trained a 7B-parameter diffusion transformer with a hybrid-stream architecture, using multi-stage training on mixed-resolution data, specialized variational autoencoder designs, and model optimization techniques to maximize efficiency.

5. 📊 Results and Evaluation: Seaweed-7B achieved performance comparable to or better than larger models trained with substantially more resources, ranking second in image-to-video generation in Elo ratings while requiring only 665,000 H100 GPU hours (27.7 days on 1,000 GPUs).

Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model

1/2

Paper 2

C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing

Published: 2025-04-10

Link: http://arxiv.org/pdf/2504.07964

1. 📘 Topic and Domain: The paper introduces C3PO (Critical-Layer, Core-Expert, Collaborative Pathway Optimization), a test-time optimization method for Mixture-of-Experts (MoE) Large Language Models to improve expert pathway selection.

2. 💡 Previous Research and New Ideas: The paper builds on MoE architectures and test-time adaptation techniques, proposing novel collaborative pathway optimization that leverages successful reference samples to re-mix expert weights during inference.

3. ❓ Problem: The paper addresses the sub-optimal expert pathways in MoE LLMs, where naive expert selection during pretraining leaves a 10-20% accuracy gap for potential improvement.

4. 🛠️ Methods: The authors optimize expert routing weights at test time using three surrogate objectives: mode-finding, kernel regression, and neighborhood gradient descent, focusing only on critical layers and core experts to balance performance and efficiency.

5. 📊 Results and Evaluation: C3PO consistently improves MoE base models by 7-15% in accuracy across six benchmarks, outperforming test-time learning baselines like in-context learning and prompt tuning, and enabling MoE LLMs with 1-3B active parameters to outperform dense LLMs of 7-9B parameters.

C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing

1/2

Paper 3

GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation

Published: 2025-04-11

Link: http://arxiv.org/pdf/2504.08736

1. 📘 Topic and Domain: Scaling visual tokenizers to 3 billion parameters for autoregressive image generation.

2. 💡 Previous Research and New Ideas: Based on vector-quantized tokenizer research; proposes semantic regularization to overcome the reconstruction vs. generation dilemma when scaling tokenizers.

3. ❓ Problem: Solving the dilemma where naively scaling visual tokenizers improves reconstruction quality but degrades downstream generation performance.

4. 🛠️ Methods: Introduces GigaTok with semantic regularization that aligns tokenizer features with pretrained visual representations, uses 1D tokenizers with hybrid CNN-Transformer architecture, prioritizes decoder scaling, and employs entropy loss.

5. 📊 Results and Evaluation: GigaTok achieves state-of-the-art performance in reconstruction, downstream autoregressive generation, and representation quality on ImageNet, with the 2.9B tokenizer enabling a 1.4B AR model to outperform previous approaches.