2026-01-27 Papers

1/2

Paper 1

Qwen3-TTS Technical Report

Published: 2026-01-21

Link: http://arxiv.org/pdf/2601.15621

1. 📘 Topic and Domain: The paper presents Qwen3-TTS, a family of advanced multilingual, controllable, robust, and streaming text-to-speech models in the speech synthesis domain.

2. 💡 Previous Research and New Ideas: The paper builds on discrete speech tokenization methods and autoregressive language modeling for TTS, proposing a novel dual-track LM architecture with two new speech tokenizers (25Hz semantic-focused and 12Hz ultra-low-latency multi-codebook) for real-time synthesis.

3. ❓ Problem: The paper aims to solve the challenges of achieving stable, controllable, and human-like speech synthesis with low latency while supporting multiple languages, voice cloning, and fine-grained control through natural language instructions.

4. 🛠️ Methods: The authors use a dual-track autoregressive architecture with two custom tokenizers, train on over 5 million hours of speech data across 10 languages, employ a three-stage pre-training process followed by post-training with DPO and GSPO, and implement streaming capabilities through block-wise attention mechanisms.

5. 📊 Results and Evaluation: Qwen3-TTS achieves state-of-the-art performance in zero-shot voice cloning (lowest WER on Seed-TTS benchmark), superior speaker similarity across all 10 evaluated languages compared to commercial baselines, exceptional cross-lingual synthesis (66% error reduction in Chinese-to-Korean), and can generate over 10 minutes of natural speech with first-packet latency as low as 97ms.

Qwen3-TTS Technical Report

1/2

Paper 2

DRPG (Decompose, Retrieve, Plan, Generate): An Agentic Framework for Academic Rebuttal

Published: 2026-01-25

Link: http://arxiv.org/pdf/2601.18081

1. 📘 Topic and Domain: The paper focuses on automated academic rebuttal generation in the domain of natural language processing and AI-assisted peer review systems.

2. 💡 Previous Research and New Ideas: The paper builds on existing LLM-based rebuttal approaches and debate/persuasion techniques, proposing a novel four-stage agentic framework (DRPG) that explicitly plans rebuttal strategies before generation.

3. ❓ Problem: The paper aims to solve the challenge of generating high-quality academic rebuttals automatically, addressing LLMs' limitations with long-context understanding and their tendency to produce generic, unconvincing responses.

4. 🛠️ Methods: The authors use a four-component pipeline (Decompose, Retrieve, Plan, Generate) with a trained Planner module that selects optimal rebuttal perspectives based on paper content support scores.

5. 📊 Results and Evaluation: DRPG achieves 40 points higher Elo score than existing pipelines and surpasses average human performance using only an 8B model, with the Planner achieving 98% accuracy in perspective selection, evaluated through LLM-based pairwise comparison and judge scoring.

DRPG (Decompose, Retrieve, Plan, Generate): An Agentic Framework for Academic Rebuttal

1/2

Paper 3

Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability

Published: 2026-01-26

Link: http://arxiv.org/pdf/2601.18778

1. 📘 Topic and Domain: The paper investigates self-improvement in large language models through meta-reinforcement learning for mathematical reasoning tasks.

2. 💡 Previous Research and New Ideas: Building on curriculum learning and self-play methods that use intrinsic rewards, the paper proposes grounding teacher rewards in actual student performance on hard problems rather than proxy metrics.

3. ❓ Problem: The paper addresses the challenge of training models on problems with near-zero initial success rates where standard reinforcement learning fails due to sparse rewards.

4. 🛠️ Methods: SOAR uses asymmetric teacher-student meta-RL where a teacher generates synthetic problems, a student trains on them, and the teacher is rewarded based on measurable student improvement on hard problems.

5. 📊 Results and Evaluation: On mathematical benchmarks with 0/128 baseline success, SOAR achieves 4× improvement in pass@1 and 2× in pass@32 on MATH, with generated questions transferring to unseen datasets and maintaining diversity unlike intrinsic reward methods.