After showing off its cooking skills, the robot developed by the Stanford Chinese team released a new video "A Day in the Life of Mobile ALOHA" early this morning Beijing time, showing dozens of housework skills such as watering flowers, cleaning the room, making coffee, shaving the owner, washing dishes, playing with cats, throwing away garbage, washing clothes, changing quilt covers, and storing clothes. It can be called an "all-round housekeeper".
Netizens commented, "The most rare thing is that there is life in the eyes."
People have long been troubled by "Moravec's paradox," a counterintuitive phenomenon that "tasks that humans find easy are extremely difficult for artificial intelligence, and vice versa."
In other words, a robot that can do housework is very rare.
But don't be too happy. Although MobileALOHA has "live eyes", its movements are still controlled by humans (see the picture below), and it is not a fully intelligent autonomous operation.
One of the team leaders said that human control is temporary, and they are already studying how to bridge the gap between human control and robot self-control. Tony Z. Zhao, another leader of the team, said optimistically, "2024 will be the year of the robot, and this (housekeeping robot) is just the beginning!"
The greater significance of MobileALOHA is that its motion control capabilities are more powerful than similar systems that cost 5-10 times more, demonstrating the feasibility of universal robots. A cheap and easy-to-use home robot may soon arrive.
ALOHA, Alow-cost Open-source Hardware System for Bimanual Teleoperation, is a low-cost open source two-hand remote control operation hardware system, that is, an open source robotic arm. Its algorithm ActionChunkingwithTransformers (ACT) uses the neural network model Transformers, so it has imitation learning capabilities. In just 15 minutes of demonstration, the robotic arm can learn an action - performing end-to-end imitation learning directly from real demonstrations and collected through a custom remote operation interface.
According to the above-mentioned Chinese team, MobileALOHA can complete various complex tasks with only 50 demonstrations. The cost is only US$32,000 (about 220,000 yuan), and the software and hardware are all open source.
The team introduced the hardware configuration of MobileALOHA in detail in the paper - the most expensive ones are the robotic arm and mobile base, of which the mobile base is a relatively cheap one among similar products; the sensor is equipped with 2 wrist cameras and 1 top camera; equipped with onboard power supply and calculations, i.e. 1.26 kWh battery weighing 14 kg. All calculations during data collection and inference were performed on a consumer-grade laptop with an Nvidia 3070ti GPU (8GB VRAM) and Intel i7-12800H.
High-value parts in the bill of material
Jim Fan, an "Internet celebrity researcher" at Nvidia and the first intern at OpenAI, previously optimistically predicted that 2024 will be the first year for the artificial intelligence community to fully rise to counter Moravec's Paradox. "We will not win immediately, but we will be on the road to victory."
This is not just a moment of excitement. Various developments in the industry are emerging in an endless stream. Jim listed the development of basic models and platforms for future robots in 2023:
1. Large multi-modal models using robotic arms as physical input/output devices: VIMA, PerAct, RvT (NVIDIA), RT-1, RT-2, PaLM-E (Google), RoboCat (DeepMind), Octo (Berkeley, Stanford, CMU), etc.;
2. Algorithms that bridge the gap between System1 (responsible for low-level control) and System2 (responsible for high-level reasoning) (enabling the system to use high-level reasoning to make decisions and understand, and transform these decisions into actual operations and control): Eureka (NVIDIA), CodeasPolicies (Google), etc.;
3. Amazing progress in powerful hardware: Tesla Optimus Prime, Figure, 1X, Apptronik, Sanctuary, Agility+Amazon, Unitree, etc.;
4. Data has always been the Achilles heel of robotics. The research community is jointly planning the next ImageNet (a key project for breakthroughs in artificial intelligence deep learning), such as opening the OpenX-Embodiment (RT-X) data set. Although the dataset is not yet diverse enough, it is an important step;
5. Simulation and synthetic data will play a key role in solving robot dexterity and even computer vision problems as a whole;
The foundation is being laid step by step. At the beginning of 2024, we have reason to look forward to the stunning debut of more powerful robots.