Daily AI News Brief - July 9, 2025

July 9, 2025 presents ten significant AI developments spanning audio generation, video creation, language models, web intelligence, customer support, enterprise applications, education initiatives, and advanced reasoning capabilities.

1️⃣ Alibaba Tongyi Open Sources ThinkSound: Chain-of-Thought Audio Generation Model

Alibaba's Speech AI team has open-sourced ThinkSound, the world's first audio generation model supporting chain-of-thought reasoning. The model breaks through traditional video-to-audio technology limitations by introducing Chain-of-Thought technology, achieving high-fidelity, synchronized spatial audio generation that marks AI audio evolution from simple audio matching to structured scene understanding.

Revolutionary Technical Features:Multimodal Integration: ThinkSound first combines multimodal large language models with unified audio generation architecture for precise audio synthesis
AudioCoT Dataset: Research team built dataset containing 2,531.8 hours of high-quality samples, enhancing model capability to process complex instructions
Superior Performance: ThinkSound outperforms mainstream methods across multiple test sets, with code and pre-trained weights open-sourced for free developer access

Available on GitHub (https://github.com/FunAudioLLM/ThinkSound), Hugging Face (https://huggingface.co/spaces/FunAudioLLM/ThinkSound), and ModelScope (https://www.modelscope.cn/studios/iic/ThinkSound), the model offers three scalable sizes from 533M to 1.3B parameters, supporting deployment across high-performance servers to edge devices for diverse creative applications.

2️⃣ Google Veo3 Major Upgrade: Static Photo to Dynamic Video Support

Google has announced major upgrades to AI video generation tool Veo3, enabling users to generate high-quality audio and video content by uploading a single static photo. The upgrade demonstrates AI's tremendous potential in creative fields, with core functionality including character consistency across multiple shots and rich camera movement features like dolly shots.

Enhanced Capabilities:

Photo-to-Video Generation: Veo3 upgrade supports generating high-quality dynamic videos from single static images
Camera Movement Support: Supports camera functions like dolly in shots, enhancing video professionalism
Quality Options: Users can select different quality generation models but need to consume corresponding credits

The photo-to-video capability transforms still images into dynamic eight-second video clips with sound, rolling out to Google AI Pro and Ultra subscribers in select countries worldwide, with over 40 million Veo3 videos generated across platforms.

3️⃣ Hugging Face Releases New Generation Small Parameter Model SmolLM3: 128K Context, Dual-Mode Reasoning

Hugging Face has released SmolLM3, a small open-source model with 3 billion parameters that outperforms Llama-3.2-3B and Qwen2.5-3B. The model supports multilingual processing with dual-mode reasoning functionality while openly sharing architectural details to promote research and optimization.

Model Specifications:

Parameter Excellence: SmolLM3 features 3 billion parameters with performance surpassing similar open-source models, supporting multilingual processing
Dual Reasoning Modes: Provides deep thinking and non-thinking reasoning modes, flexibly addressing different requirements
Advanced Architecture: Uses advanced transformer decoder architecture with three-stage mixed training to enhance capabilities

Available on Hugging Face (https://huggingface.co/HuggingFaceTB/SmolLM3-3B-Base), the fully open model offers long context support up to 128K tokens using YaRN, multilingual support for six languages, and hybrid reasoning capabilities with both think and no-think modes for diverse application scenarios.

4️⃣ Alibaba Open Sources WebSailor with Powerful Reasoning and Retrieval Capabilities

Alibaba Tongyi has open-sourced web agent WebSailor, demonstrating excellent performance in BrowseComp evaluation sets for Chinese and English tasks, surpassing closed-source models like DeepSeek R1 and Grok-3 with powerful reasoning and retrieval capabilities. Galaxy Securities indicates AI Agent economy has fully launched, recommending focus on leading SAAS enterprises.

Performance Achievements:

Superior Benchmarks: Alibaba Tongyi open-sourced WebSailor demonstrates excellent reasoning and retrieval capabilities
Market Recognition: Galaxy Securities indicates AI Agent economy has fully launched, recommending focus on leading SAAS enterprises
Industry Applications: Related companies like Focus Technology and CSSC Financial have notable advantages in agent technology applications

Available on GitHub (https://github.com/Alibaba-NLP/WebAgent), WebSailor uses novel training methodology focused on high-uncertainty problems, enabling navigation of vast digital landscapes to find answers as a key step toward achieving superhuman reasoning capabilities.

5️⃣ Moonvalley Releases Marey Realism v1.5: Native 1080P AI Video Model with Zero Copyright Risk

Moonvalley has launched Marey Realism v1.5 AI video generation model achieving comprehensive upgrades in image quality, creative freedom, and legal compliance. Its native 1080P video generation capability, training data based on licensed content, and ability to accurately interpret complex prompts provide safer and more efficient tools for film production and advertising creativity.

Key Features:

Native 1080P Generation: Native 1080P video generation capability providing visual experience close to real filming
100% Licensed Training: 100% authorized data training completely avoiding copyright risks
Flexible Generation: Supports text-to-video and image-to-video generation, enhancing creative flexibility

Available in ComfyUI with native API nodes, the model was trained exclusively on licensed content ensuring commercial safety while delivering stunning clarity without upscaling, featuring prompt accuracy, cinematic motions, and excellent physics and lighting.

6️⃣ Vidu Q1 Shocking Upgrade: Reference-to-Video Supports Up to Seven Images

Vidu Q1's reference-to-video feature allows users to upload up to seven reference images, generating 1080p videos with extremely high visual consistency. The technology ensures multi-image elements maintain consistency in videos through semantic fusion, solving traditional AI video generation problems of scene breaks or character distortion.

Advanced Capabilities:

Seven Image Support: Supports up to seven reference images, enhancing video creation flexibility
Semantic Fusion Technology: Ensures multi-image elements maintain high consistency in videos
Multi-Subject Consistency: Multi-subject consistency technology achieves coherent visual experience in complex scenes

The professional-grade video generation model delivers superior video quality with refined visual effects, ideal for creators and teams seeking high-quality, efficient AI video production with ultra-fast text parsing and scene synthesis capabilities.

7️⃣ Apple Develops ChatGPT-Style AI Customer Support Assistant

Apple is developing an AI-based Support Assistant aimed at providing users with more intelligent and efficient customer service experiences. The feature has been discovered in Apple Support app code and will allow users to receive AI-generated solutions before contacting customer service, improving service efficiency.

Assistant Capabilities:

AI Support Development: Apple is developing an AI-based support assistant to improve customer service efficiency
Pre-Contact Solutions: Users can receive AI-generated problem solutions before contacting customer service, reducing wait times
File Upload Support: Support assistant may allow file uploads, enriching interactive experiences

The ChatGPT-style generative AI assistant uses generative models to provide answers related to Apple products and services, with clear disclaimers about potential inaccuracies and recommendations not to rely solely on AI for professional advice.

8️⃣ Feishu Launches Multiple AI Products, Building Enterprise-Level Doubao

Feishu has released multiple AI products including knowledge Q&A, AI meetings, Aily, and Feishu Miaoda, aiming to accelerate AI implementation in enterprise applications. Feishu also launched the industry's first AI application maturity model to help enterprises evaluate actual effectiveness of AI products.

Product Suite:

Multiple AI Products: Feishu launched multiple AI products to help enterprises achieve intelligent operations
Maturity Model: Released AI application maturity model to enhance enterprise judgment capabilities for AI products
Performance Enhancement: Feishu multidimensional tables achieve dual leap in performance and AI capabilities, supporting large-scale data processing

The comprehensive AI suite aims to provide enterprises with truly usable and implementable AI products, helping achieve comprehensive intelligence in the AI era with intelligent assistant capabilities that understand enterprise needs.

9️⃣ Microsoft, OpenAI, and Anthropic Launch AI Training Center for Educators

American Federation of Teachers (AFT) has partnered with Microsoft, OpenAI, and Anthropic to establish the National AI Education Academy, providing free AI tool training for teachers to help them better utilize artificial intelligence technology. The project receives $23 million funding support to promote technological transformation in education.

Training Program:

Teacher AI Training: Teachers will master new technologies through AI training, ensuring their leading position in education
Funding Support: Microsoft, OpenAI, and Anthropic provide $23 million funding support for AI education projects
Education Democratization: AI academy committed to promoting education democratization, ensuring technology serves students and teachers

The collaborative initiative aims to ensure teachers maintain leadership roles in education while embracing AI capabilities, with comprehensive training programs designed to integrate AI tools effectively into educational workflows and classroom management.

🔟 Kunlun Wanwei Launches Skywork-R1V3.0: Multimodal Reasoning Approaching Human Expert Level

Kunlun Wanwei has released Skywork-R1V3.0 demonstrating excellent multimodal reasoning capabilities with few training samples but outstanding performance, reaching human expert levels in complex reasoning tasks and multimodal understanding scenarios.

Expert-Level Performance:

Multimodal Excellence: Skywork-R1V3.0 demonstrates exceptional multimodal reasoning capabilities approaching human expert performance
Efficient Training: Achieves outstanding results with relatively few training samples, demonstrating efficient learning capabilities
Human-Level Reasoning: Reaches human expert levels in complex multimodal reasoning and analysis tasks

The model represents significant advancement in AI reasoning capabilities, bridging the gap between artificial and human intelligence in complex analytical tasks while maintaining efficiency in training and deployment across diverse application scenarios.