Daily AI News Brief - July 9, 2025
Ten major AI developments including Alibaba's ThinkSound chain-of-thought audio generation, Google Veo3 photo-to-video upgrade, Hugging Face SmolLM3 with 128K context, Alibaba's WebSailor web agent, M...
AIToolery
Published Jul 09, 2025
July 9, 2025 presents ten significant AI developments spanning audio generation, video creation, language models, web intelligence, customer support, enterprise applications, education initiatives, and advanced reasoning capabilities.
1️⃣ Alibaba Tongyi Open Sources ThinkSound: Chain-of-Thought Audio Generation Model
Alibaba's Speech AI team has open-sourced ThinkSound, the world's first audio generation model supporting chain-of-thought reasoning. The model breaks through traditional video-to-audio technology limitations by introducing Chain-of-Thought technology, achieving high-fidelity, synchronized spatial audio generation that marks AI audio evolution from simple audio matching to structured scene understanding.
Revolutionary Technical Features:
- Multimodal Integration: ThinkSound first combines multimodal large language models with unified audio generation architecture for precise audio synthesis
- AudioCoT Dataset: Research team built dataset containing 2,531.8 hours of high-quality samples, enhancing model capability to process complex instructions
- Superior Performance: ThinkSound outperforms mainstream methods across multiple test sets, with code and pre-trained weights open-sourced for free developer access
Available on GitHub (https://github.com/FunAudioLLM/ThinkSound), Hugging Face (https://huggingface.co/spaces/FunAudioLLM/ThinkSound), and ModelScope (https://www.modelscope.cn/studios/iic/ThinkSound), the model offers three scalable sizes from 533M to 1.3B parameters, supporting deployment across high-performance servers to edge devices for diverse creative applications.
2️⃣ Google Veo3 Major Upgrade: Static Photo to Dynamic Video Support
Google has announced major upgrades to AI video generation tool Veo3, enabling users to generate high-quality audio and video content by uploading a single static photo. The upgrade demonstrates AI's tremendous potential in creative fields, with core functionality including character consistency across multiple shots and rich camera movement features like dolly shots.
Enhanced Capabilities:
- Photo-to-Video Generation: Veo3 upgrade supports generating high-quality dynamic videos from single static images
- Camera Movement Support: Supports camera functions like dolly in shots, enhancing video professionalism
- Quality Options: Users can select different quality generation models but need to consume corresponding credits
The photo-to-video capability transforms still images into dynamic eight-second video clips with sound, rolling out to Google AI Pro and Ultra subscribers in select countries worldwide, with over 40 million Veo3 videos generated across platforms.
3️⃣ Hugging Face Releases New Generation Small Parameter Model SmolLM3: 128K Context, Dual-Mode Reasoning
Hugging Face has released SmolLM3, a small open-source model with 3 billion parameters that outperforms Llama-3.2-3B and Qwen2.5-3B. The model supports multilingual processing with dual-mode reasoning functionality while openly sharing architectural details to promote research and optimization.
Model Specifications:
- Parameter Excellence: SmolLM3 features 3 billion parameters with performance surpassing similar open-source models, supporting multilingual processing
- Dual Reasoning Modes: Provides deep thinking and non-thinking reasoning modes, flexibly addressing different requirements
- Advanced Architecture: Uses advanced transformer decoder architecture with three-stage mixed training to enhance capabilities
Available on Hugging Face (https://huggingface.co/HuggingFaceTB/SmolLM3-3B-Base), the fully open model offers long context support up to 128K tokens using YaRN, multilingual support for six languages, and hybrid reasoning capabilities with both think and no-think modes for diverse application scenarios.
4️⃣ Alibaba Open Sources WebSailor with Powerful Reasoning and Retrieval Capabilities
Alibaba Tongyi has open-sourced web agent WebSailor, demonstrating excellent performance in BrowseComp evaluation sets for Chinese and English tasks, surpassing closed-source models like DeepSeek R1 and Grok-3 with powerful reasoning and retrieval capabilities. Galaxy Securities indicates AI Agent economy has fully launched, recommending focus on leading SAAS enterprises.
Performance Achievements:
- Superior Benchmarks: Alibaba Tongyi open-sourced WebSailor demonstrates excellent reasoning and retrieval capabilities
- Market Recognition: Galaxy Securities indicates AI Agent economy has fully launched, recommending focus on leading SAAS enterprises
- Industry Applications: Related companies like Focus Technology and CSSC Financial have notable advantages in agent technology applications
Available on GitHub (https://github.com/Alibaba-NLP/WebAgent), WebSailor uses novel training methodology focused on high-uncertainty problems, enabling navigation of vast digital landscapes to find answers as a key step toward achieving superhuman reasoning capabilities.
5️⃣ Moonvalley Releases Marey Realism v1.5: Native 1080P AI Video Model with Zero Copyright Risk
Moonvalley has launched Marey Realism v1.5 AI video generation model achieving comprehensive upgrades in image quality, creative freedom, and legal compliance. Its native 1080P video generation capability, training data based on licensed content, and ability to accurately interpret complex prompts provide safer and more efficient tools for film production and advertising creativity.
Key Features:
- Native 1080P Generation: Native 1080P video generation capability providing visual experience close to real filming
- 100% Licensed Training: 100% authorized data training completely avoiding copyright risks
- Flexible Generation: Supports text-to-video and image-to-video generation, enhancing creative flexibility
Available in ComfyUI with native API nodes, the model was trained exclusively on licensed content ensuring commercial safety while delivering stunning clarity without upscaling, featuring prompt accuracy, cinematic motions, and excellent physics and lighting.
6️⃣ Vidu Q1 Shocking Upgrade: Reference-to-Video Supports Up to Seven Images
Vidu Q1's reference-to-video feature allows users to upload up to seven reference images, generating 1080p videos with extremely high visual consistency. The technology ensures multi-image elements maintain consistency in videos through semantic fusion, solving traditional AI video generation problems of scene breaks or character distortion.
Advanced Capabilities:
- Seven Image Support: Supports up to seven reference images, enhancing video creation flexibility
- Semantic Fusion Technology: Ensures multi-image elements maintain high consistency in videos
- Multi-Subject Consistency: Multi-subject consistency technology achieves coherent visual experience in complex scenes
The professional-grade video generation model delivers superior video quality with refined visual effects, ideal for creators and teams seeking high-quality, efficient AI video production with ultra-fast text parsing and scene synthesis capabilities.
7️⃣ Apple Develops ChatGPT-Style AI Customer Support Assistant
Apple is developing an AI-based Support Assistant aimed at providing users with more intelligent and efficient customer service experiences. The feature has been discovered in Apple Support app code and will allow users to receive AI-generated solutions before contacting customer service, improving service efficiency.
Assistant Capabilities:
- AI Support Development: Apple is developing an AI-based support assistant to improve customer service efficiency
- Pre-Contact Solutions: Users can receive AI-generated problem solutions before contacting customer service, reducing wait times
- File Upload Support: Support assistant may allow file uploads, enriching interactive experiences
The ChatGPT-style generative AI assistant uses generative models to provide answers related to Apple products and services, with clear disclaimers about potential inaccuracies and recommendations not to rely solely on AI for professional advice.
8️⃣ Feishu Launches Multiple AI Products, Building Enterprise-Level Doubao
Feishu has released multiple AI products including knowledge Q&A, AI meetings, Aily, and Feishu Miaoda, aiming to accelerate AI implementation in enterprise applications. Feishu also launched the industry's first AI application maturity model to help enterprises evaluate actual effectiveness of AI products.
Product Suite:
- Multiple AI Products: Feishu launched multiple AI products to help enterprises achieve intelligent operations
- Maturity Model: Released AI application maturity model to enhance enterprise judgment capabilities for AI products
- Performance Enhancement: Feishu multidimensional tables achieve dual leap in performance and AI capabilities, supporting large-scale data processing
The comprehensive AI suite aims to provide enterprises with truly usable and implementable AI products, helping achieve comprehensive intelligence in the AI era with intelligent assistant capabilities that understand enterprise needs.
9️⃣ Microsoft, OpenAI, and Anthropic Launch AI Training Center for Educators
American Federation of Teachers (AFT) has partnered with Microsoft, OpenAI, and Anthropic to establish the National AI Education Academy, providing free AI tool training for teachers to help them better utilize artificial intelligence technology. The project receives $23 million funding support to promote technological transformation in education.
Training Program:
- Teacher AI Training: Teachers will master new technologies through AI training, ensuring their leading position in education
- Funding Support: Microsoft, OpenAI, and Anthropic provide $23 million funding support for AI education projects
- Education Democratization: AI academy committed to promoting education democratization, ensuring technology serves students and teachers
The collaborative initiative aims to ensure teachers maintain leadership roles in education while embracing AI capabilities, with comprehensive training programs designed to integrate AI tools effectively into educational workflows and classroom management.
🔟 Kunlun Wanwei Launches Skywork-R1V3.0: Multimodal Reasoning Approaching Human Expert Level
Kunlun Wanwei has released Skywork-R1V3.0 demonstrating excellent multimodal reasoning capabilities with few training samples but outstanding performance, reaching human expert levels in complex reasoning tasks and multimodal understanding scenarios.
Expert-Level Performance:
- Multimodal Excellence: Skywork-R1V3.0 demonstrates exceptional multimodal reasoning capabilities approaching human expert performance
- Efficient Training: Achieves outstanding results with relatively few training samples, demonstrating efficient learning capabilities
- Human-Level Reasoning: Reaches human expert levels in complex multimodal reasoning and analysis tasks
The model represents significant advancement in AI reasoning capabilities, bridging the gap between artificial and human intelligence in complex analytical tasks while maintaining efficiency in training and deployment across diverse application scenarios.