![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
該模型稱為MotionGlot,使用戶可以簡單地鍵入一個操作 - “向前走幾步並正確執行” - 該模型可以生成該動作的準確表示,以指揮機器人或動畫化的頭像。
Researchers at Brown University have developed an artificial intelligence model that can generate movement in robots and animated figures in much the same way that AI models like ChatGPT generate text.
布朗大學的研究人員開發了一個人工智能模型,該模型可以與AI模型(如ChatGpt)生成文本一樣,在機器人和動畫人物中產生運動。
The model, called MotionGlot, enables users to simply type an action — “walk forward a few steps and take a right”— and the model can generate accurate representations of that motion to command a robot or animated avatar.
該模型稱為MotionGlot,使用戶可以簡單地鍵入一個操作 - “向前走幾步並正確執行” - 該模型可以生成該動作的準確表示,以指揮機器人或動畫化的頭像。
The model’s key advance, according to the researchers, is its ability to “translate” motion across robot and figure types, from humanoids to quadrupeds and beyond. That enables the generation of motion for a wide range of robotic embodiments and in all kinds of spatial configurations and contexts.
根據研究人員的說法,該模型的主要進步是其跨機器人和人物類型“翻譯”運動的能力,從人形生物到四足動物及以後。這樣可以為各種機器人實施方案以及各種空間配置和上下文的運動產生運動。
“We’re treating motion as simply another language,” said Sudarshan Harithas, a Ph.D. student in computer science at Brown, who led the work. “And just as we can translate languages — from English to Chinese, for example — we can now translate language-based commands to corresponding actions across multiple embodiments. That enables a broad set of new applications.”
“我們將動作只是另一種語言,” Sudarshan Harithas博士說。主持工作的布朗計算機科學專業學生。 “就像我們可以翻譯語言(例如,從英語到中文)一樣,我們現在可以將基於語言的命令轉換為跨多個實施例的相應操作。這可以實現一組廣泛的新應用程序。”
The research, which was supported by the Office of Naval Research, will be presented later this month at the 2025 International Conference on Robotics and Automation in Atlanta. The work was co-authored by Harithas and his advisor, Srinath Sridhar, an assistant professor of computer science at Brown.
該研究得到了海軍研究辦公室的支持,將於本月晚些時候在2025年亞特蘭大的機器人和自動化國際會議上介紹。這項工作是由Harithas及其顧問Srinath Sridhar合著的,布朗的計算機科學助理教授。
Large language models like ChatGPT generate text through a process called “next token prediction,” which breaks language down into a series of tokens, or small chunks, like individual words or characters. Given a single token or a string of tokens, the language model makes a prediction about what the next token might be. These models have been incredibly successful in generating text, and researchers have begun using similar approaches for motion. The idea is to break down the components of motion— the discrete position of legs during the process of walking, for example — into tokens. Once the motion is tokenized, fluid movements can be generated through next token prediction.
諸如chatgpt之類的大型語言模型通過稱為“隔壁預測”的過程生成文本,該過程將語言分解為一系列令牌或小塊,例如單個單詞或字符。給定單個令牌或一串令牌,語言模型可以預測下一個令牌可能是什麼。這些模型在生成文本方面非常成功,研究人員已經開始使用類似的運動方法。這個想法是分解運動的組成部分 - 例如,步行過程中腿部的離散位置 - 進入令牌。一旦運動被象徵化,就可以通過隔壁的預測產生流體運動。
One challenge with this approach is that motions for one body type can look very different for another. For example, when a person is walking a dog down the street, the person and the dog are both doing something called “walking,” but their actual motions are very different. One is upright on two legs; the other is on all fours. According to Harithas, MotionGlot can translate the meaning of walking from one embodiment to another. So a user commanding a figure to “walk forward in a straight line” will get the correct motion output whether they happen to be commanding a humanoid figure or a robot dog.
這種方法的一個挑戰是,一種體型的運動對於另一種體型而言可能會大不相同。例如,當一個人在街上walking狗時,該人和狗都在做“步行”的事情,但是他們的實際動作卻大不相同。一個直立在兩條腿上。另一個均在四個方面。根據Harithas的說法,MotionGlot可以將行走從一個實施例轉移到另一個體現的含義。因此,無論是碰巧指揮類人的人物還是機器人狗,指揮“直線向前行走”的用戶將獲得正確的運動輸出。
To train their model, the researchers used two datasets, each containing hours of annotated motion data. QUAD-LOCO features dog-like quadruped robots performing a variety of actions along with rich text describing those movements. A similar dataset called QUES-CAP contains real human movement, along with detailed captions and annotations appropriate to each movement.
為了訓練他們的模型,研究人員使用了兩個數據集,每個數據集都包含註釋運動數據的小時。 Quad-loco具有類似狗的四足機器人,以及描述這些動作的豐富文本。一個稱為ques-cap的類似數據集包含真實的人類運動,以及適合每個運動的詳細標題和註釋。
Using that training data, the model reliably generates appropriate actions from text prompts, even actions it has never specifically seen before. In testing, the model was able to recreate specific instructions, like “a robot walks backwards, turns left and walks forward,” as well as more abstract prompts like “a robot walks happily.” It can even use motion to answer questions. When asked “Can you show me movement in cardio activity?” the model generates a person jogging.
使用該培訓數據,該模型可靠地從文本提示中生成適當的操作,甚至以前從未有過特殊看法的動作。在測試中,該模型能夠重新創建特定的說明,例如“機器人向後走,向左走並向前行走”,以及更抽象的提示,例如“機器人愉快地行走”。它甚至可以使用運動來回答問題。當被問及“您能告訴我有氧運動中的運動嗎?”該模型會產生一個人慢跑。
“These models work best when they’re trained on lots and lots of data,” Sridhar said. “If we could collect large-scale data, the model can be easily scaled up.”
Sridhar說:“這些模型在接受大量數據的培訓時效果最好。” “如果我們可以收集大規模數據,則可以輕鬆地擴展模型。”
The model’s current functionality and the adaptability across embodiments make for promising applications in human-robot collaboration, gaming and virtual reality, and digital animation and video production, the researchers say. They plan to make the model and its source code publicly available so other researchers can use it and expand on it.
研究人員說,該模型的當前功能和跨實施方案的適應性使得在人機協作,遊戲和虛擬現實以及數字動畫和視頻製作中有希望的應用。他們計劃使該模型及其源代碼公開可用,以便其他研究人員可以使用它並在其上進行擴展。
免責聲明:info@kdj.com
所提供的資訊並非交易建議。 kDJ.com對任何基於本文提供的資訊進行的投資不承擔任何責任。加密貨幣波動性較大,建議您充分研究後謹慎投資!
如果您認為本網站使用的內容侵犯了您的版權,請立即聯絡我們(info@kdj.com),我們將及時刪除。
-
-
-
- 比特幣接近重新審視$ 100000
- 2025-05-09 08:25:12
- 數據表明,隨著比特幣和其他資產的集會,加密貨幣衍生品市場遭受了大量清算。
-
-
-
-
-
- 韓國在第1季度遭受大量加密貨幣流出
- 2025-05-09 08:15:12
- 投入:韓國在第一季度遭受了大量的加密貨幣流出,約有57萬億韓元(近406億美元)離開該國。這些流出中的幾乎一半
-