-
HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation arXiv.cs.RO Pub Date : 2024-03-15 Carmelo Sferrazza, Dun-Ming Huang, Xingyu Lin, Youngwoon Lee, Pieter Abbeel
Humanoid robots hold great promise in assisting humans in diverse environments and tasks, due to their flexibility and adaptability leveraging human-like morphology. However, research in humanoid robots is often bottlenecked by the costly and fragile hardware setups. To accelerate algorithmic research in humanoid robots, we present a high-dimensional, simulated robot learning benchmark, HumanoidBench
-
Reconfigurable Robot Identification from Motion Data arXiv.cs.RO Pub Date : 2024-03-15 Yuhang Hu, Yunzhe Wang, Ruibo Liu, Zhou Shen, Hod Lipson
Integrating Large Language Models (VLMs) and Vision-Language Models (VLMs) with robotic systems enables robots to process and understand complex natural language instructions and visual information. However, a fundamental challenge remains: for robots to fully capitalize on these advancements, they must have a deep understanding of their physical embodiment. The gap between AI models cognitive capabilities
-
Lifelong LERF: Local 3D Semantic Inventory Monitoring Using FogROS2 arXiv.cs.RO Pub Date : 2024-03-15 Adam Rashid, Chung Min Kim, Justin Kerr, Letian Fu, Kush Hari, Ayah Ahmad, Kaiyuan Chen, Huang Huang, Marcus Gualtieri, Michael Wang, Christian Juette, Nan Tian, Liu Ren, Ken Goldberg
Inventory monitoring in homes, factories, and retail stores relies on maintaining data despite objects being swapped, added, removed, or moved. We introduce Lifelong LERF, a method that allows a mobile robot with minimal compute to jointly optimize a dense language and geometric representation of its surroundings. Lifelong LERF maintains this representation over time by detecting semantic changes and
-
Stimulate the Potential of Robots via Competition arXiv.cs.RO Pub Date : 2024-03-15 Kangyao Huang, Di Guo, Xinyu Zhang, Xiangyang Ji, Huaping Liu
It is common for us to feel pressure in a competition environment, which arises from the desire to obtain success comparing with other individuals or opponents. Although we might get anxious under the pressure, it could also be a drive for us to stimulate our potentials to the best in order to keep up with others. Inspired by this, we propose a competitive learning framework which is able to help individual
-
Online Concurrent Multi-Robot Coverage Path Planning arXiv.cs.RO Pub Date : 2024-03-15 Ratijit Mitra, Indranil Saha
Recently, centralized receding horizon online multi-robot coverage path planning algorithms have shown remarkable scalability in thoroughly exploring large, complex, unknown workspaces with many robots. In a horizon, the path planning and the path execution interleave, meaning when the path planning occurs for robots with no paths, the robots with outstanding paths do not execute, and subsequently
-
Partially Observable Task and Motion Planning with Uncertainty and Risk Awareness arXiv.cs.RO Pub Date : 2024-03-15 Aidan Curtis, George Matheos, Nishad Gothoskar, Vikash Mansinghka, Joshua Tenenbaum, Tomás Lozano-Pérez, Leslie Pack Kaelbling
Integrated task and motion planning (TAMP) has proven to be a valuable approach to generalizable long-horizon robotic manipulation and navigation problems. However, the typical TAMP problem formulation assumes full observability and deterministic action effects. These assumptions limit the ability of the planner to gather information and make decisions that are risk-aware. We propose a strategy for
-
H-MaP: An Iterative and Hybrid Sequential Manipulation Planner arXiv.cs.RO Pub Date : 2024-03-15 Berk Cicek, Cankut Bora Tuncer, Busenaz Kerimgil, Ozgur S. Oguz
This study introduces the Hybrid Sequential Manipulation Planner (H-MaP), a novel approach that iteratively does motion planning using contact points and waypoints for complex sequential manipulation tasks in robotics. Combining optimization-based methods for generalizability and sampling-based methods for robustness, H-MaP enhances manipulation planning through active contact mode switches and enables
-
SculptDiff: Learning Robotic Clay Sculpting from Humans with Goal Conditioned Diffusion Policy arXiv.cs.RO Pub Date : 2024-03-15 Alison Bartsch, Arvind Car, Charlotte Avra, Amir Barati Farimani
Manipulating deformable objects remains a challenge within robotics due to the difficulties of state estimation, long-horizon planning, and predicting how the object will deform given an interaction. These challenges are the most pronounced with 3D deformable objects. We propose SculptDiff, a goal-conditioned diffusion-based imitation learning framework that works with point cloud state observations
-
Collaborative Aquatic Positioning system Utilising Multi-beam Sonar and Depth Sensors arXiv.cs.RO Pub Date : 2024-03-15 Xueliang Cheng, Barry Lennox, Keir Groves
Accurate positioning of underwater robots in confined environments is crucial for inspection and mapping tasks and is also a prerequisite for autonomous operations. Presently, there are no positioning systems available that are suited for real-world use in confined underwater environments, unconstrained by environmental lighting and water turbidity levels and have sufficient accuracy for reliable and
-
Revolutionizing Packaging: A Robotic Bagging Pipeline with Constraint-aware Structure-of-Interest Planning arXiv.cs.RO Pub Date : 2024-03-15 Jiaming Qi, Peng Zhou, Pai Zheng, Hongmin Wu, Chenguang Yang, David Navarro-Alarcon, Jia Pan
Bagging operations, common in packaging and assisted living applications, are challenging due to a bag's complex deformable properties. To address this, we develop a robotic system for automated bagging tasks using an adaptive structure-of-interest (SOI) manipulation approach. Our method relies on real-time visual feedback to dynamically adjust manipulation without requiring prior knowledge of bag
-
An Investigation of the Factors Influencing Evolutionary Dynamics in the Joint Evolution of Robot Body and Control arXiv.cs.RO Pub Date : 2024-03-15 Léni K. Le Goff, Edgar Buchanan, Emma Hart
In evolutionary robotics, jointly optimising the design and the controller of robots is a challenging task due to the huge complexity of the solution space formed by the possible combinations of body and controller. We focus on the evolution of robots that can be physically created rather than just simulated, in a rich morphological space that includes a voxel-based chassis, wheels, legs and sensors
-
EasyCalib: Simple and Low-Cost In-Situ Calibration for Force Reconstruction with Vision-Based Tactile Sensors arXiv.cs.RO Pub Date : 2024-03-15 Mingxuan Li, Lunwei Zhang, Yen Hang Zhou, Tiemin Li, Yao Jiang
For elastomer-based tactile sensors, represented by visuotactile sensors, routine calibration of mechanical parameters (Young's modulus and Poisson's ratio) has been shown to be important for force reconstruction. However, the reliance on existing in-situ calibration methods for accurate force measurements limits their cost-effective and flexible applications. This article proposes a new in-situ calibration
-
Ultra-Wideband Positioning System Based on ESP32 and DWM3000 Modules arXiv.cs.RO Pub Date : 2024-03-15 Sebastian Krebs, Tom Herter
In this paper, an Ultra-Wideband (UWB) positioning system is introduced, that leverages six identical custom-designed boards, each featuring an ESP32 microcontroller and a DWM3000 module from Quorvo. The system is capable of achieving localization with an accuracy of up to 10 cm, by utilizing Two-Way-Ranging (TWR) measurements between one designated tag and five anchor devices. The gathered distance
-
Grasp Anything: Combining Teacher-Augmented Policy Gradient Learning with Instance Segmentation to Grasp Arbitrary Objects arXiv.cs.RO Pub Date : 2024-03-15 Malte Mosbach, Sven Behnke
Interactive grasping from clutter, akin to human dexterity, is one of the longest-standing problems in robot learning. Challenges stem from the intricacies of visual perception, the demand for precise motor skills, and the complex interplay between the two. In this work, we present Teacher-Augmented Policy Gradient (TAPG), a novel two-stage learning framework that synergizes reinforcement learning
-
Comparative Analysis of Programming by Demonstration Methods: Kinesthetic Teaching vs Human Demonstration arXiv.cs.RO Pub Date : 2024-03-15 Bruno Maric, Filip Zoric, Frano Petric, Matko Orsag
Programming by demonstration (PbD) is a simple and efficient way to program robots without explicit robot programming. PbD enables unskilled operators to easily demonstrate and guide different robots to execute task. In this paper we present comparison of demonstration methods with comprehensive user study. Each participant had to demonstrate drawing simple pattern with human demonstration using virtual
-
Do Visual-Language Maps Capture Latent Semantics? arXiv.cs.RO Pub Date : 2024-03-15 Matti Pekkanen, Tsvetomila Mihaylova, Francesco Verdoja, Ville Kyrki
Visual-language models (VLMs) have recently been introduced in robotic mapping by using the latent representations, i.e., embeddings, of the VLMs to represent the natural language semantics in the map. The main benefit is moving beyond a small set of human-created labels toward open-vocabulary scene understanding. While there is anecdotal evidence that maps built this way support downstream tasks,
-
Autonomous Monitoring of Pharmaceutical R&D Laboratories with 6 Axis Arm Equipped Quadruped Robot and Generative AI: A Preliminary Study arXiv.cs.RO Pub Date : 2024-03-15 Shunichi Hato, Nozomi Ogawa
This paper presents a proof-of-concept study that examines the utilization of generative AI and mobile robotics for autonomous laboratory monitoring in the pharmaceutical R&D laboratory. The study investigates the potential advantages of anomaly detection and automated reporting by multi-modal model and Vision Foundation Model (VFM), which have the potential to enhance compliance and safety in laboratory
-
Belief Aided Navigation using Bayesian Reinforcement Learning for Avoiding Humans in Blind Spots arXiv.cs.RO Pub Date : 2024-03-15 Jinyeob Kim, Daewon Kwak, Hyunwoo Rim, Donghan Kim
Recent research on mobile robot navigation has focused on socially aware navigation in crowded environments. However, existing methods do not adequately account for human robot interactions and demand accurate location information from omnidirectional sensors, rendering them unsuitable for practical applications. In response to this need, this study introduces a novel algorithm, BNBRL+, predicated
-
Agile and Safe Trajectory Planning for Quadruped Navigation with Motion Anisotropy Awareness arXiv.cs.RO Pub Date : 2024-03-15 Wentao Zhang, Shaohang Xu, Peiyuan Cai, Lijun Zhu
Quadruped robots demonstrate robust and agile movements in various terrains; however, their navigation autonomy is still insufficient. One of the challenges is that the motion capabilities of the quadruped robot are anisotropic along different directions, which significantly affects the safety of quadruped robot navigation. This paper proposes a navigation framework that takes into account the motion
-
HeR-DRL:Heterogeneous Relational Deep Reinforcement Learning for Decentralized Multi-Robot Crowd Navigation arXiv.cs.RO Pub Date : 2024-03-15 Xinyu Zhou, Songhao Piao, Wenzheng Chi, Liguo Chen, Wei Li
Crowd navigation has received significant research attention in recent years, especially DRL-based methods. While single-robot crowd scenarios have dominated research, they offer limited applicability to real-world complexities. The heterogeneity of interaction among multiple agent categories, like in decentralized multi-robot pedestrian scenarios, are frequently disregarded. This "interaction blind
-
GeoPro-VO: Dynamic Obstacle Avoidance with Geometric Projector Based on Velocity Obstacle arXiv.cs.RO Pub Date : 2024-03-15 Jihao Huang, Xuemin Chi, Jun Zeng, Zhitao Liu, Hongye Su
Optimization-based approaches are widely employed to generate optimal robot motions while considering various constraints, such as robot dynamics, collision avoidance, and physical limitations. It is crucial to efficiently solve the optimization problems in practice, yet achieving rapid computations remains a great challenge for optimization-based approaches with nonlinear constraints. In this paper
-
Towards Embedding Dynamic Personas in Interactive Robots: Masquerading Animated Social Kinematics (MASK) arXiv.cs.RO Pub Date : 2024-03-15 Jeongeun Park, Taemoon Jeong, Hyeonseong Kim, Taehyun Byun, Seungyoon Shin, Keunjun Choi, Jaewoon Kwon, Taeyoon Lee, Matthew Pan, Sungjoon Choi
This paper presents the design and development of an innovative interactive robotic system to enhance audience engagement using character-like personas. Built upon the foundations of persona-driven dialog agents, this work extends the agent application to the physical realm, employing robots to provide a more immersive and interactive experience. The proposed system, named the Masquerading Animated
-
Language to Map: Topological map generation from natural language path instructions arXiv.cs.RO Pub Date : 2024-03-15 Hideki Deguchi, Kazuki Shibata, Shun Taguchi
In this paper, a method for generating a map from path information described using natural language (textual path) is proposed. In recent years, robotics research mainly focus on vision-and-language navigation (VLN), a navigation task based on images and textual paths. Although VLN is expected to facilitate user instructions to robots, its current implementation requires users to explain the details
-
CLOSURE: Fast Quantification of Pose Uncertainty Sets arXiv.cs.RO Pub Date : 2024-03-15 Yihuai Gao, Yukai Tang, Han Qi, Heng Yang
We investigate uncertainty quantification of 6D pose estimation from keypoint measurements. Assuming unknown-but-bounded measurement noises, a pose uncertainty set (PURSE) is a subset of SE(3) that contains all possible 6D poses compatible with the measurements. Despite being simple to formulate and its ability to embed uncertainty, the PURSE is difficult to manipulate and interpret due to the many
-
Interactive Distance Field Mapping and Planning to Enable Human-Robot Collaboration arXiv.cs.RO Pub Date : 2024-03-15 Usama Ali, Lan Wu, Adrian Mueller, Fouad Sukkar, Tobias Kaupp, Teresa Vidal-Calleja
Human-robot collaborative applications require scene representations that are kept up-to-date and facilitate safe motions in dynamic scenes. In this letter, we present an interactive distance field mapping and planning (IDMP) framework that handles dynamic objects and collision avoidance through an efficient representation. We define \textit{interactive} mapping and planning as the process of creating
-
Advancing Object Goal Navigation Through LLM-enhanced Object Affinities Transfer arXiv.cs.RO Pub Date : 2024-03-15 Mengying Lin, Yaran Chen, Dongbin Zhao, Zhaoran Wang
In object goal navigation, agents navigate towards objects identified by category labels using visual and spatial information. Previously, solely network-based methods typically rely on historical data for object affinities estimation, lacking adaptability to new environments and unseen targets. Simultaneously, employing Large Language Models (LLMs) for navigation as either planners or agents, though
-
Design and Control Co-Optimization for Automated Design Iteration of Dexterous Anthropomorphic Soft Robotic Hands arXiv.cs.RO Pub Date : 2024-03-15 Pragna Mannam, Xingyu Liu, Ding Zhao, Jean Oh, Nancy Pollard
We automate soft robotic hand design iteration by co-optimizing design and control policy for dexterous manipulation skills in simulation. Our design iteration pipeline combines genetic algorithms and policy transfer to learn control policies for nearly 400 hand designs, testing grasp quality under external force disturbances. We validate the optimized designs in the real world through teleoperation
-
Right Place, Right Time! Towards ObjectNav for Non-Stationary Goals arXiv.cs.RO Pub Date : 2024-03-14 Vishnu Sashank Dorbala, Bhrij Patel, Amrit Singh Bedi, Dinesh Manocha
We present a novel approach to tackle the ObjectNav task for non-stationary and potentially occluded targets in an indoor environment. We refer to this task Portable ObjectNav (or P-ObjectNav), and in this work, present its formulation, feasibility, and a navigation benchmark using a novel memory-enhanced LLM-based policy. In contrast to ObjNav where target object locations are fixed for each episode
-
DTG : Diffusion-based Trajectory Generation for Mapless Global Navigation arXiv.cs.RO Pub Date : 2024-03-14 Jing Liang, Amirreza Payandeh, Daeun Song, Xuesu Xiao, Dinesh Manocha
We present a novel end-to-end diffusion-based trajectory generation method, DTG, for mapless global navigation in challenging outdoor scenarios with occlusions and unstructured off-road features like grass, buildings, bushes, etc. Given a distant goal, our approach computes a trajectory that satisfies the following goals: (1) minimize the travel distance to the goal; (2) maximize the traversability
-
Visual Inertial Odometry using Focal Plane Binary Features (BIT-VIO) arXiv.cs.RO Pub Date : 2024-03-14 Matthew Lisondra, Junseo Kim, Riku Murai, Kourosh Zareinia, Sajad Saeedi
Focal-Plane Sensor-Processor Arrays (FPSP)s are an emerging technology that can execute vision algorithms directly on the image sensor. Unlike conventional cameras, FPSPs perform computation on the image plane -- at individual pixels -- enabling high frame rate image processing while consuming low power, making them ideal for mobile robotics. FPSPs, such as the SCAMP-5, use parallel processing and
-
Touch-GS: Visual-Tactile Supervised 3D Gaussian Splatting arXiv.cs.RO Pub Date : 2024-03-14 Aiden Swann, Matthew Strong, Won Kyung Do, Gadiel Sznaier Camps, Mac Schwager, Monroe Kennedy III
In this work, we propose a novel method to supervise 3D Gaussian Splatting (3DGS) scenes using optical tactile sensors. Optical tactile sensors have become widespread in their use in robotics for manipulation and object representation; however, raw optical tactile sensor data is unsuitable to directly supervise a 3DGS scene. Our representation leverages a Gaussian Process Implicit Surface to implicitly
-
Constrained Passive Interaction Control: Leveraging Passivity and Safety for Robot Manipulators arXiv.cs.RO Pub Date : 2024-03-14 Zhiquan Zhang, Tianyu Li, Nadia Figueroa
Passivity is necessary for robots to fluidly collaborate and interact with humans physically. Nevertheless, due to the unconstrained nature of passivity-based impedance control laws, the robot is vulnerable to infeasible and unsafe configurations upon physical perturbations. In this paper, we propose a novel control architecture that allows a torque-controlled robot to guarantee safety constraints
-
MultiGripperGrasp: A Dataset for Robotic Grasping from Parallel Jaw Grippers to Dexterous Hands arXiv.cs.RO Pub Date : 2024-03-14 Luis Felipe Casas Murrilo, Ninad Khargonkar, Balakrishnan Prabhakaran, Yu Xiang
We introduce a large-scale dataset named MultiGripperGrasp for robotic grasping. Our dataset contains 30.4M grasps from 11 grippers for 345 objects. These grippers range from two-finger grippers to five-finger grippers, including a human hand. All grasps in the dataset are verified in Isaac Sim to classify them as successful and unsuccessful grasps. Additionally, the object fall-off time for each grasp
-
NeuFlow: Real-time, High-accuracy Optical Flow Estimation on Robots Using Edge Devices arXiv.cs.RO Pub Date : 2024-03-15 Zhiyong Zhang, Huaizu Jiang, Hanumant Singh
Real-time high-accuracy optical flow estimation is a crucial component in various applications, including localization and mapping in robotics, object tracking, and activity recognition in computer vision. While recent learning-based optical flow methods have achieved high accuracy, they often come with heavy computation costs. In this paper, we propose a highly efficient optical flow architecture
-
Thermal-NeRF: Neural Radiance Fields from an Infrared Camera arXiv.cs.RO Pub Date : 2024-03-15 Tianxiang Ye, Qi Wu, Junyuan Deng, Guoqing Liu, Liu Liu, Songpengcheng Xia, Liang Pang, Wenxian Yu, Ling Pei
In recent years, Neural Radiance Fields (NeRFs) have demonstrated significant potential in encoding highly-detailed 3D geometry and environmental appearance, positioning themselves as a promising alternative to traditional explicit representation for 3D scene reconstruction. However, the predominant reliance on RGB imaging presupposes ideal lighting conditions: a premise frequently unmet in robotic
-
RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception arXiv.cs.RO Pub Date : 2024-03-15 Ruiyang Hao, Siqi Fan, Yingru Dai, Zhenlin Zhang, Chenxi Li, Yuntian Wang, Haibao Yu, Wenxian Yang, Jirui Yuan, Zaiqing Nie
The value of roadside perception, which could extend the boundaries of autonomous driving and traffic management, has gradually become more prominent and acknowledged in recent years. However, existing roadside perception approaches only focus on the single-infrastructure sensor system, which cannot realize a comprehensive understanding of a traffic area because of the limited sensing range and blind
-
Towards Comprehensive Multimodal Perception: Introducing the Touch-Language-Vision Dataset arXiv.cs.RO Pub Date : 2024-03-14 Ning Cheng, You Li, Jing Gao, Bin Fang, Jinan Xu, Wenjuan Han
Tactility provides crucial support and enhancement for the perception and interaction capabilities of both humans and robots. Nevertheless, the multimodal research related to touch primarily focuses on visual and tactile modalities, with limited exploration in the domain of language. Beyond vocabulary, sentence-level descriptions contain richer semantics. Based on this, we construct a touch-language-vision
-
BOP Challenge 2023 on Detection, Segmentation and Pose Estimation of Seen and Unseen Rigid Objects arXiv.cs.RO Pub Date : 2024-03-14 Tomas Hodan, Martin Sundermeyer, Yann Labbe, Van Nguyen Nguyen, Gu Wang, Eric Brachmann, Bertram Drost, Vincent Lepetit, Carsten Rother, Jiri Matas
We present the evaluation methodology, datasets and results of the BOP Challenge 2023, the fifth in a series of public competitions organized to capture the state of the art in model-based 6D object pose estimation from an RGB/RGB-D image and related tasks. Besides the three tasks from 2022 (model-based 2D detection, 2D segmentation, and 6D localization of objects seen during training), the 2023 challenge
-
Neuromorphic force-control in an industrial task: validating energy and latency benefits arXiv.cs.RO Pub Date : 2024-03-13 Camilo Amaya, Evan Eames, Gintautas Palinauskas, Alexander Perzylo, Yulia Sandamirskaya, Axel von Arnim
As robots become smarter and more ubiquitous, optimizing the power consumption of intelligent compute becomes imperative towards ensuring the sustainability of technological advancements. Neuromorphic computing hardware makes use of biologically inspired neural architectures to achieve energy and latency improvements compared to conventional von Neumann computing architecture. Applying these benefits
-
GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping arXiv.cs.RO Pub Date : 2024-03-14 Yuhang Zheng, Xiangyu Chen, Yupeng Zheng, Songen Gu, Runyi Yang, Bu Jin, Pengfei Li, Chengliang Zhong, Zengmao Wang, Lina Liu, Chao Yang, Dawei Wang, Zhen Chen, Xiaoxiao Long, Meiqing Wang
Constructing a 3D scene capable of accommodating open-ended language queries, is a pivotal pursuit, particularly within the domain of robotics. Such technology facilitates robots in executing object manipulations based on human language directives. To tackle this challenge, some research efforts have been dedicated to the development of language-embedded implicit fields. However, implicit fields (e
-
Scalable Autonomous Drone Flight in the Forest with Visual-Inertial SLAM and Dense Submaps Built without LiDAR arXiv.cs.RO Pub Date : 2024-03-14 Sebastián Barbas Laina, Simon Boche, Sotiris Papatheodorou, Dimos Tzoumanikas, Simon Schaefer, Hanzhi Chen, Stefan Leutenegger
Forestry constitutes a key element for a sustainable future, while it is supremely challenging to introduce digital processes to improve efficiency. The main limitation is the difficulty of obtaining accurate maps at high temporal and spatial resolution as a basis for informed forestry decision-making, due to the vast area forests extend over and the sheer number of trees. To address this challenge
-
ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models arXiv.cs.RO Pub Date : 2024-03-14 Runyu Ma, Jelle Luijkx, Zlatan Ajanovic, Jens Kober
In image-based robot manipulation tasks with large observation and action spaces, reinforcement learning struggles with low sample efficiency, slow training speed, and uncertain convergence. As an alternative, large pre-trained foundation models have shown promise in robotic manipulation, particularly in zero-shot and few-shot applications. However, using these models directly is unreliable due to
-
Enhancing Trust in Autonomous Agents: An Architecture for Accountability and Explainability through Blockchain and Large Language Models arXiv.cs.RO Pub Date : 2024-03-14 Laura Fernández-Becerra, Miguel Ángel González-Santamarta, Ángel Manuel Guerrero-Higueras, Francisco Javier Rodríguez-Lera, Vicente Matellán Olivera
The deployment of autonomous agents in environments involving human interaction has increasingly raised security concerns. Consequently, understanding the circumstances behind an event becomes critical, requiring the development of capabilities to justify their behaviors to non-expert users. Such explanations are essential in enhancing trustworthiness and safety, acting as a preventive measure against
-
PaperBot: Learning to Design Real-World Tools Using Paper arXiv.cs.RO Pub Date : 2024-03-14 Ruoshi Liu, Junbang Liang, Sruthi Sudhakar, Huy Ha, Cheng Chi, Shuran Song, Carl Vondrick
Paper is a cheap, recyclable, and clean material that is often used to make practical tools. Traditional tool design either relies on simulation or physical analysis, which is often inaccurate and time-consuming. In this paper, we propose PaperBot, an approach that directly learns to design and use a tool in the real world using paper without human intervention. We demonstrated the effectiveness and
-
Development of control algorithms for mobile robotics focused on their potential use for FPGA-based robots arXiv.cs.RO Pub Date : 2024-03-14 Andrés-David Suárez-Gómez, Andres A. Hernandez Ortega
This paper investigates the development and optimization of control algorithms for mobile robotics, with a keen focus on their implementation in Field-Programmable Gate Arrays (FPGAs). It delves into both classical control approaches such as PID and modern techniques including deep learning, addressing their application in sectors ranging from industrial automation to medical care. The study highlights
-
MOTPose: Multi-object 6D Pose Estimation for Dynamic Video Sequences using Attention-based Temporal Fusion arXiv.cs.RO Pub Date : 2024-03-14 Arul Selvam Periyasamy, Sven Behnke
Cluttered bin-picking environments are challenging for pose estimation models. Despite the impressive progress enabled by deep learning, single-view RGB pose estimation models perform poorly in cluttered dynamic environments. Imbuing the rich temporal information contained in the video of scenes has the potential to enhance models ability to deal with the adverse effects of occlusion and the dynamic
-
Pushing in the Dark: A Reactive Pushing Strategy for Mobile Robots Using Tactile Feedback arXiv.cs.RO Pub Date : 2024-03-14 Idil OzdamarHuman-Robot Interfaces and Interaction, Istituto Italiano di Tecnologia, Genoa, ItalyDept. of Informatics, Bioengineering, Robotics, and System Engineering, University of Genoa, Genoa, Italy, Doganay SirintunaHuman-Robot Interfaces and Interaction, Istituto Italiano di Tecnologia, Genoa, ItalyDept. of Informatics, Bioengineering, Robotics, and System Engineering, University of Genoa, Genoa
For mobile robots, navigating cluttered or dynamic environments often necessitates non-prehensile manipulation, particularly when faced with objects that are too large, irregular, or fragile to grasp. The unpredictable behavior and varying physical properties of these objects significantly complicate manipulation tasks. To address this challenge, this manuscript proposes a novel Reactive Pushing Strategy
-
THÖR-MAGNI: A Large-scale Indoor Motion Capture Recording of Human Movement and Robot Interaction arXiv.cs.RO Pub Date : 2024-03-14 Tim Schreiter, Tiago Rodrigues de Almeida, Yufei Zhu, Eduardo Gutierrez Maestro, Lucas Morillo-Mendez, Andrey Rudenko, Luigi Palmieri, Tomasz P. Kucner, Martin Magnusson, Achim J. Lilienthal
We present a new large dataset of indoor human and robot navigation and interaction, called TH\"OR-MAGNI, that is designed to facilitate research on social navigation: e.g., modelling and predicting human motion, analyzing goal-oriented interactions between humans and robots, and investigating visual attention in a social interaction context. TH\"OR-MAGNI was created to fill a gap in available datasets
-
BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation arXiv.cs.RO Pub Date : 2024-03-14 Chengshu Li, Ruohan Zhang, Josiah Wong, Cem Gokmen, Sanjana Srivastava, Roberto Martín-Martín, Chen Wang, Gabrael Levine, Wensi Ai, Benjamin Martinez, Hang Yin, Michael Lingelbach, Minjune Hwang, Ayano Hiranaka, Sujay Garlanka, Arman Aydin, Sharon Lee, Jiankai Sun, Mona Anvari, Manasi Sharma, Dhruva Bansal, Samuel Hunter, Kyu-Young Kim, Alan Lou, Caleb R Matthews, Ivan Villa-Renteria, Jerry Huayang
We present BEHAVIOR-1K, a comprehensive simulation benchmark for human-centered robotics. BEHAVIOR-1K includes two components, guided and motivated by the results of an extensive survey on "what do you want robots to do for you?". The first is the definition of 1,000 everyday activities, grounded in 50 scenes (houses, gardens, restaurants, offices, etc.) with more than 9,000 objects annotated with
-
Cellular-enabled Collaborative Robots Planning and Operations for Search-and-Rescue Scenarios arXiv.cs.RO Pub Date : 2024-03-14 Arnau Romero, Carmen Delgado, Lanfranco Zanzi, Raúl Suárez, Xavier Costa-Pérez
Mission-critical operations, particularly in the context of Search-and-Rescue (SAR) and emergency response situations, demand optimal performance and efficiency from every component involved to maximize the success probability of such operations. In these settings, cellular-enabled collaborative robotic systems have emerged as invaluable assets, assisting first responders in several tasks, ranging
-
Efficient Lexicographic Optimization for Prioritized Robot Control and Planning arXiv.cs.RO Pub Date : 2024-03-14 Kai Pfeiffer, Abderrahmane Kheddar
In this work, we present several tools for efficient sequential hierarchical least-squares programming (S-HLSP) for lexicographical optimization tailored to robot control and planning. As its main step, S-HLSP relies on approximations of the original non-linear hierarchical least-squares programming (NL-HLSP) to a hierarchical least-squares programming (HLSP) by the hierarchical Newton's method or
-
Safe Road-Crossing by Autonomous Wheelchairs: a Novel Dataset and its Experimental Evaluation arXiv.cs.RO Pub Date : 2024-03-13 Carlo Grigioni, Franca Corradini, Alessandro Antonucci, Jérôme Guzzi, Francesco Flammini
Safe road-crossing by self-driving vehicles is a crucial problem to address in smart-cities. In this paper, we introduce a multi-sensor fusion approach to support road-crossing decisions in a system composed by an autonomous wheelchair and a flying drone featuring a robust sensory system made of diverse and redundant components. To that aim, we designed an analytical danger function based on explainable
-
3D-VLA: A 3D Vision-Language-Action Generative World Model arXiv.cs.RO Pub Date : 2024-03-14 Haoyu Zhen, Xiaowen Qiu, Peihao Chen, Jincheng Yang, Xin Yan, Yilun Du, Yining Hong, Chuang Gan
Recent vision-language-action (VLA) models rely on 2D inputs, lacking integration with the broader realm of the 3D physical world. Furthermore, they perform action prediction by learning a direct mapping from perception to action, neglecting the vast dynamics of the world and the relations between actions and dynamics. In contrast, human beings are endowed with world models that depict imagination
-
OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments arXiv.cs.RO Pub Date : 2024-03-14 Yinan Deng, Jiahui Wang, Jingyu Zhao, Xinyu Tian, Guangyan Chen, Yi Yang, Yufeng Yue
Environment maps endowed with sophisticated semantics are pivotal for facilitating seamless interaction between robots and humans, enabling them to effectively carry out various tasks. Open-vocabulary maps, powered by Visual-Language models (VLMs), possess inherent advantages, including multimodal retrieval and open-set classes. However, existing open-vocabulary maps are constrained to closed indoor
-
Enabling Waypoint Generation for Collaborative Robots using LLMs and Mixed Reality arXiv.cs.RO Pub Date : 2024-03-14 Cathy Mengying Fang, Krzysztof Zieliński, Pattie Maes, Joe Paradiso, Bruce Blumberg, Mikkel Baun Kjærgaard
Programming a robotic is a complex task, as it demands the user to have a good command of specific programming languages and awareness of the robot's physical constraints. We propose a framework that simplifies robot deployment by allowing direct communication using natural language. It uses large language models (LLM) for prompt processing, workspace understanding, and waypoint generation. It also
-
VDNA-PR: Using General Dataset Representations for Robust Sequential Visual Place Recognition arXiv.cs.RO Pub Date : 2024-03-14 Benjamin Ramtoula, Daniele De Martini, Matthew Gadd, Paul Newman
This paper adapts a general dataset representation technique to produce robust Visual Place Recognition (VPR) descriptors, crucial to enable real-world mobile robot localisation. Two parallel lines of work on VPR have shown, on one side, that general-purpose off-the-shelf feature representations can provide robustness to domain shifts, and, on the other, that fused information from sequences of images
-
CART: Caltech Aerial RGB-Thermal Dataset in the Wild arXiv.cs.RO Pub Date : 2024-03-13 Connor Lee, Matthew Anderson, Nikhil Raganathan, Xingxing Zuo, Kevin Do, Georgia Gkioxari, Soon-Jo Chung
We present the first publicly available RGB-thermal dataset designed for aerial robotics operating in natural environments. Our dataset captures a variety of terrains across the continental United States, including rivers, lakes, coastlines, deserts, and forests, and consists of synchronized RGB, long-wave thermal, global positioning, and inertial data. Furthermore, we provide semantic segmentation
-
Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning arXiv.cs.RO Pub Date : 2024-03-13 Peihong Yu, Manav Mishra, Alec Koppel, Carl Busart, Priya Narayan, Dinesh Manocha, Amrit Bedi, Pratap Tokekar
Multi-Agent Reinforcement Learning (MARL) algorithms face the challenge of efficient exploration due to the exponential increase in the size of the joint state-action space. While demonstration-guided learning has proven beneficial in single-agent settings, its direct applicability to MARL is hindered by the practical difficulty of obtaining joint expert demonstrations. In this work, we introduce a
-
SLCF-Net: Sequential LiDAR-Camera Fusion for Semantic Scene Completion using a 3D Recurrent U-Net arXiv.cs.RO Pub Date : 2024-03-13 Helin Cao, Sven Behnke
We introduce SLCF-Net, a novel approach for the Semantic Scene Completion (SSC) task that sequentially fuses LiDAR and camera data. It jointly estimates missing geometry and semantics in a scene from sequences of RGB images and sparse LiDAR measurements. The images are semantically segmented by a pre-trained 2D U-Net and a dense depth prior is estimated from a depth-conditioned pipeline fueled by Depth
-
People Attribute Purpose to Autonomous Vehicles When Explaining Their Behavior arXiv.cs.RO Pub Date : 2024-03-11 Balint Gyevnar, Stephanie Droop, Tadeg Quillien
A hallmark of a good XAI system is explanations that users can understand and act on. In many cases, this requires a system to offer causal or counterfactual explanations that are intelligible. Cognitive science can help us understand what kinds of explanations users might expect, and in which format to frame these explanations. We briefly review relevant literature from the cognitive science of explanation