2025-05-06 Papers

1/2

Paper 1

Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

Published: 2025-05-05

Link: http://arxiv.org/pdf/2505.02707

1. 📘 Topic and Domain: Voice-language foundation models for real-time autonomous interaction and voice role-play, focusing on AI-human voice communication.

2. 💡 Previous Research and New Ideas: Based on traditional pipeline systems (like Siri, Alexa) and end-to-end audio-language models, introducing new full-duplex architecture enabling simultaneous listening and speaking with voice customization capabilities.

3. ❓ Problem: Addressing limitations of current voice AI systems including high latency, loss of vocal nuances, and rigid turn-based interactions that prevent natural, autonomous conversations.

4. 🛠️ Methods: Implemented hierarchical Transformer architecture with streaming audio encoding, multi-scale Transformers consisting of LLM backbone and hierarchical audio generator, trained end-to-end with extensive audio-text data.

5. 📊 Results and Evaluation: Achieved 195ms response latency (faster than human average), outperformed baselines in ASR (2.7% WER) and TTS (2.8% WER) tasks, and demonstrated superior performance on the Voila Benchmark across multiple domains.

Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

1/2

Paper 2

RM-R1: Reward Modeling as Reasoning

Published: 2025-05-05

Link: http://arxiv.org/pdf/2505.02387

1. 📘 Topic and Domain: The paper introduces RM-R1, a new approach to reward modeling for large language models that frames it as a reasoning task, focusing on improving model evaluation and preference learning.

2. 💡 Previous Research and New Ideas: Based on existing scalar-based and generative reward models, it proposes a novel approach of integrating explicit reasoning capabilities into reward modeling through Chain-of-Rubrics prompting and structured evaluation.

3. ❓ Problem: The paper addresses the lack of interpretability and reliability in current reward models, which either produce opaque scalar scores or generate superficial judgments without deep reasoning.

4. 🛠️ Methods: Uses a two-stage training pipeline: first distilling high-quality reasoning traces from teacher models, then applying reinforcement learning with verifiable rewards (RLVR), while implementing a Chain-of-Rubrics framework for structured evaluation.

5. 📊 Results and Evaluation: RM-R1 achieved state-of-the-art or near state-of-the-art performance across multiple benchmarks (RewardBench, RM-Bench, RMB), outperforming larger models like Llama3.1-405B and GPT-4o by up to 13.8% in accuracy while providing more interpretable judgments.

2025-05-06 Papers

Paper 1

Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

Paper 2

RM-R1: Reward Modeling as Reasoning

Paper 3

Practical Efficiency of Muon for Pretraining