2025-09-25 Papers

1/2

Paper 1

SIM-CoT: Supervised Implicit Chain-of-Thought

Published: 2025-09-24

Link: http://arxiv.org/pdf/2509.20317

1. 📘 Topic and Domain: The paper focuses on improving implicit Chain-of-Thought (CoT) reasoning in Large Language Models within the domain of natural language processing and machine learning.

2. 💡 Previous Research and New Ideas: Based on existing implicit CoT methods like Coconut and CODI, it proposes SIM-CoT, a novel approach that introduces step-level supervision to stabilize and enrich latent reasoning space.

3. ❓ Problem: The paper addresses the latent instability issue in implicit CoT approaches, where increasing the number of implicit reasoning tokens leads to training instability and performance collapse.

4. 🛠️ Methods: The authors implement a plug-and-play training module with an auxiliary decoder that aligns each implicit token with corresponding explicit reasoning steps during training, while removing the decoder during inference.

5. 📊 Results and Evaluation: SIM-CoT improved performance across multiple models and benchmarks, achieving +8.2% improvement over Coconut on GPT-2, +3.0% over CODI on LLaMA-3.1 8B, and surpassing explicit CoT baseline by 2.1% with 2.3× greater token efficiency.

SIM-CoT: Supervised Implicit Chain-of-Thought

1/2

Paper 2

EmbeddingGemma: Powerful and Lightweight Text Representations

Published: 2025-09-24

Link: http://arxiv.org/pdf/2509.20354

1. 📘 Topic and Domain: Development of EmbeddingGemma, a lightweight text embedding model for natural language processing, focusing on efficient text representation.

2. 💡 Previous Research and New Ideas: Based on Gemma 3 language model family and encoder-decoder models; proposes new training techniques combining encoder-decoder initialization, geometric embedding distillation, and spread-out regularization.

3. ❓ Problem: The trade-off between model capability and computational cost in text embedding models, where state-of-the-art models are too large and expensive for real-world applications.

4. 🛠️ Methods: Uses a 308M parameter model initialized from T5Gemma encoder, trained with noise-contrastive estimation loss, spread-out regularizer, and embedding matching loss, combined with model souping from multiple finetuned checkpoints.

5. 📊 Results and Evaluation: Achieves state-of-the-art results on MTEB benchmarks for models under 500M parameters, outperforming larger models and maintaining performance even with quantization and embedding truncation.

EmbeddingGemma: Powerful and Lightweight Text Representations

1/2

Paper 3

EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning

Published: 2025-09-24

Link: http://arxiv.org/pdf/2509.20360

1. 📘 Topic and Domain: A unified framework called EditVerse for both image and video editing/generation using in-context learning in computer vision.

2. 💡 Previous Research and New Ideas: Based on previous fragmented approaches to image/video editing, proposes a novel unified architecture that represents all modalities (text, image, video) as a single token sequence.

3. ❓ Problem: Addresses the fragmentation and data scarcity in video editing by creating a unified framework that can transfer knowledge from image to video domain.

4. 🛠️ Methods: Uses a transformer architecture with full self-attention, interleaved text/vision inputs, 4D rotary positional embeddings, and a scalable data pipeline generating 232K video editing samples.

5. 📊 Results and Evaluation: Achieves state-of-the-art performance on EditVerseBench (their proposed benchmark), surpassing existing open-source methods and commercial models in both automated metrics and user studies.