Muon Particle - Search

About 216,000 results

Open links in new tab

Any time

wikipedia.org
https://en.wikipedia.org › wiki › Muon
Muon - Wikipedia
It is classified as a lepton. As with other leptons, the muon is not thought to be composed of any simpler particles. The muon is an unstable subatomic particle with a mean lifetime of 2.2 μs, …
zhihu.com
https://zhuanlan.zhihu.com
Muon: 新一代神经网络优化器 - 知乎
这种“为不同网络模块定制优化器”的理念正是深度学习优化最新趋势的一部分。 Muon 优化器聚焦于二维权重矩阵参数（例如全连接层的权重矩阵或卷积核张量展平后的矩阵），通过对这些 …
spaces.ac.cn
https://spaces.ac.cn › archives
Muon续集：为什么我们选择尝试Muon？ - 科学空间|Scientific …
Feb 27, 2025 · 本文解读一下我们最新的技术报告《Muon is Scalable for LLM Training》，里边分享了我们之前在《Muon优化器赏析：从向量到矩阵的本质跨越》介绍过的Muon优化器的一 …
zhihu.com
https://zhuanlan.zhihu.com
Muon优化器赏析：从向量到矩阵的本质跨越 - 知乎
Dec 19, 2024 · 追根溯源 Muon还有一个更久远的相关工作《Shampoo: Preconditioned Stochastic Tensor Optimization》，这是2018年的论文，提出了名为Shampoo的优化器，跟Muon有异曲 …
github.com
https://github.com › KellerJordan › Muon
GitHub - KellerJordan/Muon: Muon is an optimizer for hidden …
Muon is an optimizer for the hidden weights of a neural network. Other parameters, such as embeddings, classifier heads, and hidden gains/biases should be optimized using standard …
csdn.net
https://blog.csdn.net › shizheng_Li › article › details
深入剖析 Muon 优化器（一）：从基础原理到 Kimi K2 大模型的应 …
Aug 14, 2025 · 它能让训练更快、更稳定，尤其适合大模型。简单说，Muon 可以让模型用更少的“步数”（计算资源）爬到更低的山谷，效率提升 1.3 到 2 倍。为什么 Muon 受欢迎？在小规 …
spaces.ac.cn
https://spaces.ac.cn › archives
Muon优化器赏析：从向量到矩阵的本质跨越 - 科学空间|Scientific …
Dec 10, 2024 · Muon最值得深思的地方，其实是向量与矩阵的内在区别，以及它对优化的影响。 SGD、Adam、Tiger等常见优化器的更新规则是Element-wise的，即不论向量、矩阵参数，实 …
zhihu.com
https://zhuanlan.zhihu.com
Muon续集：为什么我们选择尝试Muon？ - 知乎
Mar 5, 2025 · 优化器的工作说多不多，但说少也不少，为什么我们会选择Muon来作为新的尝试方向呢？已经调好超参的 Adam优化器，怎么快速切换到Muon上进行尝试呢？模型Scale上去 …
zhihu.com
https://zhuanlan.zhihu.com
月之暗面开源改进版Muon优化器，算力需求比AdamW锐 …
Feb 23, 2025 · 在训练一个8亿参数模型至100B tokens（约5倍计算预算最优）的过程中，团队对比了AdamW、无权重衰减的Muon和带权重衰减的Muon。结果显示，带权重衰减的Muon在过 …
arxiv.org
https://arxiv.org › abs
[2502.16982] Muon is Scalable for LLM Training - arXiv.org
Feb 24, 2025 · We identify two crucial techniques for scaling up Muon: (1) adding weight decay and (2) carefully adjusting the per-parameter update scale. These techniques allow Muon to …

Some results have been removed
Pagination
- Next
- Next

Muon - Wikipedia

Muon: 新一代神经网络优化器 - 知乎

Muon续集：为什么我们选择尝试Muon？ - 科学空间|Scientific …

Muon优化器赏析：从向量到矩阵的本质跨越 - 知乎

GitHub - KellerJordan/Muon: Muon is an optimizer for hidden …

深入剖析 Muon 优化器（一）：从基础原理到 Kimi K2 大模型的应 …

Muon优化器赏析：从向量到矩阵的本质跨越 - 科学空间|Scientific …

Muon续集：为什么我们选择尝试Muon？ - 知乎

月之暗面开源改进版Muon优化器，算力需求比AdamW锐 …

[2502.16982] Muon is Scalable for LLM Training - arXiv.org