![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
DeepSeek于4月30日将其最新型号(Prover V2)上传到了Hosting Service Hugging Face。根据宽松的开源MIT许可发布的最新车型旨在解决数学证明验证。
Chinese artificial intelligence development company DeepSeek has released a new large language model (LLM) on the hosting service Hugging Face.
中国人工智能开发公司DeepSeek已在托管服务拥抱面前发布了新的大型语言模型(LLM)。
The latest model, Prover V2, is being released under the permissive open-source MIT license. It is a continuation of the Prover V1 line, first announced in August 2024. The first version of the model was presented in a paper titled “Prover: A Large Language Model for Compressing Mathematical Knowledge and Programming Lean 4.”
最新的型号Prover V2将根据允许的开源MIT许可发布。这是摊子V1系列的延续,该系列于2024年8月首次宣布。该模型的第一个版本是在题为“ Prover:一种用于压缩数学知识和编程精益4的大型语言模型”的论文中介绍的。
Prover V1 was trained to translate math competition problems into the Lean 4 programming language, which is used for proving theorems and was developed at Microsoft Research. The model was based on DeepSeek’s seven-billion-parameter DeepSeekMath model and was fine-tuned on synthetic data. Synthetic data refers to data used for training AI models that was, in turn, also generated by AI models, with human-generated data usually seen as an increasingly scarce source of higher-quality data.
Prover V1经过培训,可以将数学竞争问题转化为精益4编程语言,该语言用于证明定理,并在Microsoft Research进行了开发。该模型基于DeepSeek的70亿参数DeepSeekmath模型,并在合成数据上进行了微调。合成数据是指用于训练AI模型的数据,而AI模型也是由AI模型生成的,其中人类生成的数据通常被视为越来越稀缺的高质量数据来源。
Prover V1.5, in turn, improved on the previous version by optimizing both training and execution and achieving higher accuracy in several common benchmarks.
反过来,Prover v1.5通过优化培训和执行,并在几种常见的基准中提高了精度,从而改善了先前版本。
The new Prover V2 model is expected to run from RAM or VRAM. It has 671 billion parameters and weighs approximately 650 GB. To get them down to this size, Prover V2 weights have been quantized down to eight-bit floating point precision, meaning that each parameter has been approximated to take half the space of the usual 16 bits, with a bit being a single digit in binary numbers. This effectively halves the model’s bulk.
新的Prover V2模型有望从RAM或VRAM运行。它具有6710亿个参数,重约650 GB。为了使它们达到这一尺寸,称者V2权重已被量化为八位的浮点精度,这意味着每个参数已近似以占据通常的16位的一半空间,其中二进制数字是一个数字。这有效地将模型的批量减半。
So far, the improvements introduced by Prover V2 are unclear, as no research paper or other information has been published at the time of writing. The number of parameters in the Prover V2 weights suggests that it is likely to be based on the company’s previous R1 model. When it was first released, R1 made waves in the AI space with its performance comparable to the then state-of-the-art OpenAI’s o1 model.
到目前为止,众所周知,V2引入的改进尚不清楚,因为在撰写本文时尚未发表研究论文或其他信息。 Prover V2权重中的参数数量表明它可能基于公司以前的R1模型。当它首次发行时,R1在AI空间中引起了波浪,其性能与当时最先进的OpenAI的O1型号相当。
The importance of open weights
开放权重的重要性
Publicly releasing the weights of LLMs is a controversial topic. On one side, it is a democratizing force that allows the public to access AI on their own terms without relying on private company infrastructure.
公开释放LLM的权重是一个有争议的话题。一方面,这是一支民主化的力量,使公众能够以自己的条件访问AI而不依靠私人公司基础设施。
On the other side, it means that the company cannot step in and prevent abuse of the model by enforcing certain limitations on dangerous user queries. The release of R1 in this manner also raised security concerns, and some described it as China’s “Sputnik moment.”
另一方面,这意味着公司不能通过对危险用户查询的某些限制来介入并防止滥用模型。以这种方式发布R1也引起了安全的关注,有些人将其描述为中国的“人造时刻”。
Open source proponents rejoiced that DeepSeek continued where Meta left off with the release of its LLaMA series of open-source AI models, proving that open AI is a serious contender for OpenAI’s closed AI. The accessibility of those models is also constantly improving.
开源支持者为DeepSeek继续感到高兴,继续在梅塔(Meta)发行其一系列开源AI模型的情况下离开了,这证明Open AI是Openai关闭的AI的认真竞争者。这些模型的可访问性也在不断改善。
Now, even users without access to a supercomputer that costs more than the average home in much of the world can run LLMs locally. This is primarily thanks to two AI development techniques: model distillation and quantization.
现在,即使没有访问超级计算机的用户,它的成本超过了全世界大部分地区的普通房屋,也可以在本地运行LLMS。这主要归功于两种AI开发技术:模型蒸馏和量化。
Distillation refers to training a compact “student” network to replicate the behavior of a larger “teacher” model, so you keep most of the performance while cutting parameters to make it accessible to less powerful hardware. Quantization consists of reducing the numeric precision of a model’s weights and activations to shrink size and boost inference speed with only minor accuracy loss.
蒸馏是指培训紧凑的“学生”网络以复制较大的“老师”模型的行为,因此您可以在切割参数的同时保持大部分性能,以使其可容纳功能较低的硬件。量化包括将模型的权重和激活的数字精度降低到缩小尺寸和提高推理速度,仅需较小的精度损失。
An example is Prover V2’s reduction from 16 to eight-bit floating point numbers, but further reductions are possible by halving bits further. Both of those techniques have consequences for model performance, but usually leave the model largely functional.
一个例子是Prover V2从16位浮点数减少到八位的浮点数,但是通过进一步减少位置可以进一步减少。这两种技术都会对模型性能产生后果,但通常会使模型在很大程度上发挥作用。
DeepSeek’s R1 was distilled into versions with retrained LLaMA and Qwen models ranging from 70 billion parameters to as low as 1.5 billion parameters. The smallest of those models can even reliably be run on some mobile devices.output: Publicly releasing the weights of large language models (LLMs) is a hotly debated topic. On one side of the argument, it is a democratizing force that allows the public to access AI on their own terms without relying on private company infrastructure. On the other side, it means that the company cannot step in and prevent abuse of the model by enforcing certain limitations on dangerous user queries.
DeepSeek的R1被蒸馏成版本,其重新训练的Llama和QWEN型号的范围从700亿次参数到低至15亿的参数。这些模型中最小的模型甚至可以在某些移动设备上可靠地运行。输出:公开释放大语模型(LLMS)的权重是一个备受争议的主题。在论点的一方面,这是一种民主化的力量,使公众可以自己的条件访问AI,而无需依靠私人公司基础设施。另一方面,这意味着公司不能通过对危险用户查询的某些限制来介入并防止滥用模型。
Those who follow the artificial intelligence (AI) landscape closely will recall the fuss that ensued when DeepSeek, a leading Chinese AI development company, released its R1 LLM with 1.5 trillion parameters. The model, which achieved performance comparable to OpenAI’s o1, was made available on the Hugging Face hosting service with the permissive MIT license.
那些密切关注人工智能(AI)景观的人会回想起中国领先的AI开发公司DeepSeek以1.5万亿参数发布其R1 LLM时随之而来的大惊小怪。该模型的性能与OpenAI的O1相当,在Hugging Face Hosting Service提供了宽松的MIT许可证上。
The release of R1 sparked a great deal of discussion in both the technical and economic spheres, with some comparing it to a “Sputnik moment” for China in the AI race. It also prompted a response from OpenAI, which announced that it would be releasing the weights of its own models in
R1的发布在技术领域和经济领域都引发了很多讨论,其中有些人将其与AI种族中中国的“人造卫星时刻”进行了比较。它还促使Openai的回应,该回应将在
免责声明:info@kdj.com
所提供的信息并非交易建议。根据本文提供的信息进行的任何投资,kdj.com不承担任何责任。加密货币具有高波动性,强烈建议您深入研究后,谨慎投资!
如您认为本网站上使用的内容侵犯了您的版权,请立即联系我们(info@kdj.com),我们将及时删除。
-
-
- 加密交易所MEXC宣布了3亿美元的生态系统发展基金
- 2025-05-01 19:05:12
- 该计划旨在支持迪拜的Token2049,旨在支持早期区块链技术,公共连锁店,钱包和分散工具
-
- WorldCoin正式在美国启动其身份验证平台
- 2025-05-01 19:00:11
- 根据4月30日的声明,World,以前称为WorldCoin,已正式启动其身份验证平台。
-
-
-
- 研究人员Dankrad Feist警告说,以太坊的生存取决于四年来缩放100倍。
- 2025-05-01 18:55:13
- FEIST的EIP-7938提案提高了气体限制,以支持每个块较高的交易量。
-
- MicroStrategy(MSTR)股票以其最佳每月性能结束
- 2025-05-01 18:50:13
- 策略(MSTR),以前称为MicroStrategy,以其最佳的月度表现闭幕
-
-