市值: $2.9947T 0.170%
成交额(24h): $123.1889B 70.050%
  • 市值: $2.9947T 0.170%
  • 成交额(24h): $123.1889B 70.050%
  • 恐惧与贪婪指数:
  • 市值: $2.9947T 0.170%
加密货币
话题
百科
资讯
加密话题
视频
热门新闻
加密货币
话题
百科
资讯
加密话题
视频
bitcoin
bitcoin

$98777.721712 USD

2.53%

ethereum
ethereum

$1860.886838 USD

2.21%

tether
tether

$1.000198 USD

0.01%

xrp
xrp

$2.171331 USD

1.39%

bnb
bnb

$608.064054 USD

1.06%

solana
solana

$150.182992 USD

2.92%

usd-coin
usd-coin

$1.000135 USD

0.01%

dogecoin
dogecoin

$0.177773 USD

4.19%

cardano
cardano

$0.701641 USD

4.02%

tron
tron

$0.249462 USD

2.11%

sui
sui

$3.587954 USD

6.89%

chainlink
chainlink

$14.328735 USD

3.42%

avalanche
avalanche

$20.069571 USD

1.40%

stellar
stellar

$0.267019 USD

2.34%

unus-sed-leo
unus-sed-leo

$8.829380 USD

1.23%

加密货币新闻

布朗大学的研究人员开发了可以在机器人和动画人物中产生运动的AI模型

2025/05/09 03:08

该模型称为MotionGlot,使用户可以简单地键入一个操作 - “向前走几步并正确执行” - 该模型可以生成该动作的准确表示,以指挥机器人或动画化的头像。

布朗大学的研究人员开发了可以在机器人和动画人物中产生运动的AI模型

Researchers at Brown University have developed an artificial intelligence model that can generate movement in robots and animated figures in much the same way that AI models like ChatGPT generate text.

布朗大学的研究人员开发了一个人工智能模型,该模型可以与AI模型(如ChatGpt)生成文本一样,在机器人和动画人物中产生运动。

The model, called MotionGlot, enables users to simply type an action — “walk forward a few steps and take a right”— and the model can generate accurate representations of that motion to command a robot or animated avatar.

该模型称为MotionGlot,使用户可以简单地键入一个操作 - “向前走几步并正确执行” - 该模型可以生成该动作的准确表示,以指挥机器人或动画化的头像。

The model’s key advance, according to the researchers, is its ability to “translate” motion across robot and figure types, from humanoids to quadrupeds and beyond. That enables the generation of motion for a wide range of robotic embodiments and in all kinds of spatial configurations and contexts.

根据研究人员的说法,该模型的主要进步是其跨机器人和人物类型“翻译”运动的能力,从人形生物到四足动物及以后。这样可以为各种机器人实施方案以及各种空间配置和上下文的运动产生运动。

“We’re treating motion as simply another language,” said Sudarshan Harithas, a Ph.D. student in computer science at Brown, who led the work. “And just as we can translate languages — from English to Chinese, for example — we can now translate language-based commands to corresponding actions across multiple embodiments. That enables a broad set of new applications.”

“我们将动作只是另一种语言,” Sudarshan Harithas博士说。主持工作的布朗计算机科学专业学生。 “就像我们可以翻译语言(例如,从英语到中文)一样,我们现在可以将基于语言的命令转换为跨多个实施例的相应操作。这可以实现一组广泛的新应用程序。”

The research, which was supported by the Office of Naval Research, will be presented later this month at the 2025 International Conference on Robotics and Automation in Atlanta. The work was co-authored by Harithas and his advisor, Srinath Sridhar, an assistant professor of computer science at Brown.

该研究得到了海军研究办公室的支持,将于本月晚些时候在2025年亚特兰大的机器人和自动化国际会议上介绍。这项工作是由Harithas及其顾问Srinath Sridhar合着的,布朗的计算机科学助理教授。

Large language models like ChatGPT generate text through a process called “next token prediction,” which breaks language down into a series of tokens, or small chunks, like individual words or characters. Given a single token or a string of tokens, the language model makes a prediction about what the next token might be. These models have been incredibly successful in generating text, and researchers have begun using similar approaches for motion. The idea is to break down the components of motion— the discrete position of legs during the process of walking, for example — into tokens. Once the motion is tokenized, fluid movements can be generated through next token prediction.

诸如chatgpt之类的大型语言模型通过称为“隔壁预测”的​​过程生成文本,该过程将语言分解为一系列令牌或小块,例如单个单词或字符。给定单个令牌或一串令牌,语言模型可以预测下一个令牌可能是什么。这些模型在生成文本方面非常成功,研究人员已经开始使用类似的运动方法。这个想法是分解运动的组成部分 - 例如,步行过程中腿部的离散位置 - 进入令牌。一旦运动被象征化,就可以通过隔壁的预测产生流体运动。

One challenge with this approach is that motions for one body type can look very different for another. For example, when a person is walking a dog down the street, the person and the dog are both doing something called “walking,” but their actual motions are very different. One is upright on two legs; the other is on all fours. According to Harithas, MotionGlot can translate the meaning of walking from one embodiment to another. So a user commanding a figure to “walk forward in a straight line” will get the correct motion output whether they happen to be commanding a humanoid figure or a robot dog.

这种方法的一个挑战是,一种体型的运动对于另一种体型而言可能会大不相同。例如,当一个人在街上walking狗时,该人和狗都在做“步行”的事情,但是他们的实际动作却大不相同。一个直立在两条腿上。另一个均在四个方面。根据Harithas的说法,MotionGlot可以将行走从一个实施例转移到另一个体现的含义。因此,无论是碰巧指挥类人的人物还是机器人狗,指挥“直线向前行走”的用户将获得正确的运动输出。

To train their model, the researchers used two datasets, each containing hours of annotated motion data. QUAD-LOCO features dog-like quadruped robots performing a variety of actions along with rich text describing those movements. A similar dataset called QUES-CAP contains real human movement, along with detailed captions and annotations appropriate to each movement.

为了训练他们的模型,研究人员使用了两个数据集,每个数据集都包含注释运动数据的小时。 Quad-loco具有类似狗的四足机器人,以及描述这些动作的丰富文本。一个称为ques-cap的类似数据集包含真实的人类运动,以及适合每个运动的详细标题和注释。

Using that training data, the model reliably generates appropriate actions from text prompts, even actions it has never specifically seen before. In testing, the model was able to recreate specific instructions, like “a robot walks backwards, turns left and walks forward,” as well as more abstract prompts like “a robot walks happily.” It can even use motion to answer questions. When asked “Can you show me movement in cardio activity?” the model generates a person jogging.

使用该培训数据,该模型可靠地从文本提示中生成适当的操作,甚至以前从未有过特殊看法的动作。在测试中,该模型能够重新创建特定的说明,例如“机器人向后走,向左走并向前行走”,以及更抽象的提示,例如“机器人愉快地行走”。它甚至可以使用运动来回答问题。当被问及“您能告诉我有氧运动中的运动吗?”该模型会产生一个人慢跑。

“These models work best when they’re trained on lots and lots of data,” Sridhar said. “If we could collect large-scale data, the model can be easily scaled up.”

Sridhar说:“这些模型在接受大量数据的培训时效果最好。” “如果我们可以收集大规模数据,则可以轻松地扩展模型。”

The model’s current functionality and the adaptability across embodiments make for promising applications in human-robot collaboration, gaming and virtual reality, and digital animation and video production, the researchers say. They plan to make the model and its source code publicly available so other researchers can use it and expand on it.

研究人员说,该模型的当前功能和跨实施方案的适应性使得在人机协作,游戏和虚拟现实以及数字动画和视频制作中有希望的应用。他们计划使该模型及其源代码公开可用,以便其他研究人员可以使用它并在其上进行扩展。

免责声明:info@kdj.com

所提供的信息并非交易建议。根据本文提供的信息进行的任何投资,kdj.com不承担任何责任。加密货币具有高波动性,强烈建议您深入研究后,谨慎投资!

如您认为本网站上使用的内容侵犯了您的版权,请立即联系我们(info@kdj.com),我们将及时删除。

2025年05月09日 发表的其他文章