Abstract
This work explores how to use an unmanned ground vehicle (UGV) to offload the physical burdens of equipment from humans. This work formulates dynamic alignment following and compares it to position-based following techniques. We describe the control strategies of both following methods and implement them in a dynamic simulation and a physical prototype. We test the performance of the two following methods and show that dynamic alignment following can reduce robot positional error and interaction force between the human and the robot. We then analyze the energetics and the performance of the human–UGV team for candidate transportation tasks. The presence of the robot can make some tasks take longer to perform. Nonetheless, the results show that for the candidate tasks, the robot can reduce human average metabolic power and average overall task energy.
1 Introduction
Workers involved in industrial, first-responder, or military roles may be required to carry or wear heavy equipment. This additional weight can place significant restrictions on workers and result in additional energy expenditure [1]. To address this issue, devices or systems capable of distributing the load have the potential to reduce workers’ energy consumption and enhance their productivity. These offloading solutions should enable workers to decrease their energy expenditure rates (power) and overall energy consumption during a given task. However, it is crucial to consider that if the adoption of an off-loading strategy significantly hampers task performance, it may inadvertently lead to an overall increase in task-related energy expenditure due to prolonged power integration.
Common off-loading strategies include external push–pull devices, exoskeletons, and robotics vehicles. External push–pull devices (e.g. wagon, dolly) are pervasive in manual labor fields. However, these devices can restrict a worker’s movements and be physically intensive [2]. An alternative approach is to use wearable devices such as exoskeletons, but these devices are unable to completely offload workers [3]. A third approach is to use a robotic vehicle capable of following a person to help share the human load.
The current research in human-following robot perception, control, and design is summarized in Ref. [4]. This work details that many human-following robots focus on accurately tracking human position to prescribe robot velocity commands. This can be achieved through image-based (i.e., visual) servoing through a camera [4]. This is an attractive human tracking method because it relies on sensing the robot onboard and allows us to deploy computer vision and machine learning techniques to identify and track a human target [5]. Once the position of the human target is identified, a feedback controller (i.e., proportional-integral-derivative or PID controller) can generate robot velocity commands that cause the robot follower to follow a human path [5] or a position [6]. In this work, we design a robot follower based on prior works [4–6] that track human position and prescribe robot velocities based on the human’s current position. We call this as position-based following.
When a human follows another human, they first attempt to match the velocity of the human leader and then adjust that velocity depending on the perceived distance to the leader [7,8]. In the same way, we can first command the robot follower to copy the velocity of a human target and then adjust that velocity based on image-based servoing. We call this as dynamic alignment following, and we achieve this through wearable sensing. In this work, we compare this human-inspired following technique (i.e., dynamic alignment following) with position-based human-following techniques using a RGB-D camera (i.e., position-based following).
This work benchmarks human–unmanned ground vehicle (UGV) efficiency and human energetics during transportation tasks. We believe that the overall goal of human–UGV teams in many working environments should be to reduce the overall human effort while not overly inhibiting human task performance. The way that we achieve this goal is through dynamic alignment following. This goal separates our work from the prior work in human-following robots, which focuses on minimizing the desired positional error of the human–UGV team [4,6]. Additionally, several works have used wearable sensing to assist robots in following humans [9,10], but none attempt to implement human-inspired alignment dynamics for equipment transportation.
The contributions of this work are as follows: (1) creating a dynamic alignment-based coordination system that combines robot vision and wearable velocity estimation, (2) demonstrating the quantitative benefits of dynamic alignment-based human–robot coordination, and (3) experimentally demonstrating how human–robot coordination can reduce human energy consumption for example tasks.
2 Modeling and Simulation of Human–Robot Team
2.1 Dynamic Simulation.
2.2 Low-Level Control.
2.3 Position-Based Following Control Law.
Position-based following is a relatively simple way to achieve human tracking. However, noise and relatively low-rate data from the camera hinder the ability to determine the relative velocity between the human and the robot (see Fig. 3(c)).
2.4 Dynamic Alignment Control Law.
In Eq. (7), is a matrix that represents the and velocity of the human relative to the robot. allows us to scale the feed-forward controller’s dependence on the and velocity of the human target. If is set to zero, then the position-based following and dynamic alignment following are the same. For our model, we chose gains that feed-forward the true human velocity estimates.
2.5 Simulated Results.
The aforementioned dynamic models allow us to simulate a robot capable of following a human with complex trajectories. In this work, our predefined trajectories () are a zig-zag trajectory, a straight-line trajectory with variable speed, and a circular trajectory (shown in Fig. 1). During the simulation, we added Gaussian noise to the position and velocity estimates of the human to simulate sensor noise. The PID gains were experimentally tuned to minimize positional error for position-based following. For comparison purposes, the PID gains were the same for dynamic alignment following, and the starting position of the robot was the same across all the conditions. The sampling rate of the human’s position estimates was fixed at 20 Hz, and the rate of the human’s velocity estimates was fixed at 50 Hz. The results of the positional error using both control methods are shown in Fig. 1.
The results show that the velocity feed-forward controller can lower root-mean-square error (RMSE) between the desired distance and the actual distance for each predefined trajectory. With these results in mind, we move to design the prototype to test these controllers on a physical system.
3 Physical Prototype
3.1 Position-Based Following.
We design a system capable of following a human based on positional estimates. This design is derived from current research [4] and is used to compare traditional human-following techniques (position-based following) with dynamic alignment following. We choose to track a target with a depth camera and a deep neural network trained to identify humans in an image [4]. Additionally, we choose a human identifier that provides the skeletal structure of a human so that we can identify human positions and gestures to provide further information to the robot.
We achieve positional estimates of a human through the Intel Realsense D415 RGB-D Camera and through using Nvidia’s Posenet (pretrained resnet18-body) [12]. The Posenet is used because it easily runs on Jetson Xavier Products, and it is accelerated by TensorRT that allows for real-time human detection [13]. We supplement the Posenet tracking with color-based segmentation tracking to provide tracking redundancy in the event the human is unidentified in a frame [4]. Once a target is identified, we can find the position of the target relative to the robot through the camera’s depth measurements and focal lengths. In addition, we mount the RGB-D camera on a 2 degree-of-freedom gimbal to help overcome field of view limitation from the camera.
Once the position of a human target is found, linear velocity commands () and angular velocity commands () are found through the same method as presented in Sec. 2.3. Note that due to noise in the positional tracking, variability in human movement, and close following of the robot, the derivative and integral gain terms were chosen to be very small, and the control is governed by the current positional error.
To realize the robotic following strategies, we use two robot platforms: the Rover Robotics’ Rover Zero 2 and the Husarion Panther. These platforms are equipped with a Jetson Xavier and an Intel Realsense D415 Camera. Additionally, a Lord Microstrain 3DMGX5-AHRS IMU is placed on both platforms to estimate the heading of the robot. The robot prototypes are shown in Fig. 2.
3.2 Dynamic Alignment Following.
The second type of following that we implement uses a human’s position and velocity states. As shown in Fig. 3(c), it is difficult to estimate the real-time velocity of the target using the visual estimate of the person due to latency and noise. Wearable sensors have been shown to predict real-time human walking speed through strap-down integration methods [14] and machine learning methods [15]. In this work, we use machine learning methods to predict walking speed because of the availability of labeled, IMU walking data [16] and because machine learning methods have been shown to out-perform integration methods in certain scenarios [15]. We use a recurrent neural network (RNN, specifically a long short-term memory network) that takes in foot IMU data and outputs a continuous walking speed (depicted in Fig. 3(b)). The usage of an RNN is aligned with the current literature [17].

() Walking speed estimator prototype, () supervised domain adaptation, and () human velocity estimates with the onboard camera versus wearable sensors
Initially, we train our network with an open-source IMU dataset during treadmill walking [16]. Then, we perform transfer learning (supervised domain adaptation) on a custom dataset to account for the domain shift of the data due to the differences in IMUs. Supervised domain adaptation (SDA) was performed by freezing the RNN and retraining the linear output layer. This process is depicted in Fig. 3(b). Consent was obtained from the participants according to the protocol (H22402) approved by the Georgia Institute of Technology’s Institutional Review Board (IRB).
The SDA was evaluated by its ability to lower RMSE on novel subjects. This was done by performing a K-folds () error analysis across subjects. The results of this analysis are presented in Table 1. The table shows that SDA lowers the RMSE around for lower speeds (0.75 m/s) and around for higher speeds (1.5 m/s). Additionally, the performance of this walking speed prediction is similar to the performance of other works [15,18]. For the rest of this article, the RNN used on the walking speed estimator (WSE) is trained with the open-source data with correction via SDA using all five additional subjects.
RMSE between actual speed and output of the walking speed estimator
Speed (m/s) | 0.75 | 1.00 | 1.25 | 1.5 |
---|---|---|---|---|
Before SDA | 0.10 | 0.14 | 0.14 | 0.18 |
After SDA | 0.09 | 0.08 | 0.06 | 0.07 |
Speed (m/s) | 0.75 | 1.00 | 1.25 | 1.5 |
---|---|---|---|---|
Before SDA | 0.10 | 0.14 | 0.14 | 0.18 |
After SDA | 0.09 | 0.08 | 0.06 | 0.07 |
The walking speed estimation is used by the feed-forward controller discussed in Sec. 2.4. Note that in the simulation, we use and . To find the components of velocity, we put an additional IMU that provides heading estimates on the torso of the person and on the robot to estimate the heading relative to the robot.
Providing the feed-forward term () improved our robot following. As soon as the human target transitions from stance to the swing phase of walking, the robot will begin receiving forward velocity commands even if the positional error is still small.
The physical prototype of the walking speed estimator is shown in Fig. 3(a). It consists of 3 Lord Microstrain 3DMGX5-AHRS IMUs—one on each foot for inputs into the RNN and one on the torso that we only use for heading estimates. The RNN runs on a Jetson Xavier AGX, and the Jetson is placed in a waist pack that publishes velocity and heading estimates on a private network. Additionally, vibrotactile motors are placed on the waist pack to alert the human that the robot is visually searching for them (in the event that they are being initialized as the targeted human or they were visually lost).
4 Human–Robot Coordination Experimental Results
We design an experiment to compare the position-based following and the dynamic alignment following (approved under IRB protocol H22402). In this experiment, a subject is asked to walk in two laps of a figure-8 (each loop with a radius of 1.9 m), while the robot prototype (Rover Zero 2) is tethered to them and follows them at a distance of 2 m. The tether is adjusted to be a consistent length across subjects, and the PID controller of the position-based following was tuned to produce consistent following of a target with low following error. For comparison purposes, the PID gains were kept the same for the dynamic alignment following, and both controllers ran at 20 Hz. While walking in this figure-8, a subject will use the WSE output to a hand-held LCD screen to maintain a walking speed of 1 m/s.
During this experiment, the human subject performs ten trials consisting of two laps of figure-8. In five of the trials, the robot follows the human subject with the position-based following algorithm, and in the other five trials, the robot follows the human subject with the dynamic alignment following algorithm. The following method is switched after each trial for each subject. The position of the human and the robot is tracked through an HTC VIVE Tracking System, and the interaction force through the tether is measured via the mini45 load cell placed under the oxygen tank.
Figure 4 reveals that dynamic alignment following can reduce root mean positional error and human–robot interaction forces (peak force and average force). This shows that dynamic alignment following can improve desired following performance (RMSE) and reduce undesirable task-specific metrics such as high forces on fragile human–UGV connections (i.e., oxygen tube connections). Additionally, subjects stated that they physically felt the difference between the following methods, and each subject preferred dynamic alignment following. This preference is possibly due to lower interaction force. By including the velocity in the control scheme, the robot is more responsive to human velocity and allows the human to move more freely in an overground space.

Root-mean-squared position error, peak force, and average force during human–robot coordination experiment. Colored points represent individual subject data.
5 Energetics of Human–Robot Coordination During a Candidate Task
The next step is to evaluate the practical impact of human–robot coordination. We created two sample pick-and-place tasks where a subject is asked to perform a transportation task with and without the robot assistant (approved under IRB protocol H22402). The experiments are described in the following section and in the accompanying video2.
5.1 Energetics of Human–Robot Coordination for a Fixed Transport Experiment.
In the first transportation task, a subject is asked to transport 50 balls from a home location to a randomly chosen goal location 7.6 m (25 ft) away. When they are transporting the balls with the robot teammate, the robot carries the oxygen tank and the robot follows with dynamic alignment following because of the enhanced agility that it provides. Additionally, the robot and human will be tethered together to represent the oxygen tube connection. When the subject performs the transportation task without the robot teammate, the subject must carry the oxygen tank on their back. Because of its size and weight, the Rover Zero 2 was approved by Georgia Tech’s IRB (Protocol H22402) for closer following distances than the Husarion Panther. Therefore, the Rover Zero 2 was used in this task because longer tethers may cause tripping or tangling.
Metabolic rate is recorded using a COSMED K5 device and is calculated using Brockways equation [19] with a standing baseline to normalize across subjects. The average metabolic rate during each condition is found by averaging the rates during the last 6 min of continuous movement. The first ten balls were used to allow the subject’s metabolic expenditure to reach steady state. The last 40 balls were used to calculate net energy expenditure by integrating the metabolic rate through time. After the conclusion of both experimental conditions, subjects were asked to rank the experimental conditions in terms of exertion.
Figure 5(d) shows that a robot teammate can assist in lowering average exertion power and overall task energy despite increasing the task completion time. Analyzing the data collected from the eight subjects, the robot teammate decreased average metabolic power by 39.5 and average overall task energy by 13. Note that these are conservative savings because we do not include the extra energy savings occurring from task completion until the subject’s metabolic expenditure returns to equilibrium. Despite the increased energy efficiency, the robot added 57 s per ten ball interval to the task. This increase in time is attributed to limitations of vehicle speed/acceleration and positional and velocity tracking errors. Additionally, each subject perceived a decrease in exertion during the robot-assisted condition versus the no-assistance condition.

Experiment layout ( and ), offloading strategies with payloads boxed in red ( and ), and the transport fixed and the time-fixed experimental results ( and ). Points overlayed on bar plot represent individual subject data.
In this work, the experimental results were analyzed through a two-sided paired -test through IBM’s SPSS Statistics toolbox (version 29.0.1.0) [20] to determine statistical significance (). Across subjects, we saw a statistically significant decrease in metabolic rate when using the robot teammate () and statistically significant increase in completion time when using the robot teammate (). Despite this time penalty, we saw a statistically significant decrease in overall exertion energy (). For the overall exertion energy, each subject decreased their task energy by 9.8–21.7 with robot assistance except for subject 3 who increased their task energy by .
5.2 Energetics of Human–Robot Coordination for a Fixed-Time Interval.
In the fixed-time interval experiment, subjects are instructed to transport 25 lbs weight 350 ft (five laps to and from an endzone of Fig. 5(e)). In this experiment, the subjects will have 20 mins to transport as many 25 lbs weights as they can. In the no-assistance case, the subject will carry one 25 lbs weight during one 350 ft trip. In the robot-assisted case, the robot will carry four 25 lbs weights (100 lbs) in a single trip. Therefore, we leverage the robot’s strength and endurance to carry four times the weight a person can carry on their own. During the experiment, total weight transported will be recorded. Additionally, energy expenditure will be recorded by integrating the metabolic rate through the last 18 min of the task. The first 2 min is used to get the subjects to a steady-state metabolic rate. Along the path, the robot will avoid obstacles through vector field histograms [21] to emulate real-world environments and potentially inhibit task performance. The robot used in this experiment is the Husarion Panther due to its high payload capacity.
In general, the results in Fig. 5(h) show a decrease in power, a decrease in energy, and an increase in performance. There is a wide spread in power and energy data points for the robot-assisted condition mainly due to the subject’s speed of their performance and due to loading the robot with weight. For example, subject 3 had nearly the same power and energy exertion for the robot-assisted and no-assistance conditions. This can be largely explained by an increase in the number of plates transported during 20 min (moving faster). This result shows that a performance increase and energy decrease are possible by leveraging the robot’s strength and endurance.
6 Conclusion
This work analyzed a UGV’s ability to offload equipment from a human. Specifically, we demonstrated how wearable technologies enable better tracking through the use of dynamic alignment following. In addition, we showed that a human–robot team can leverage dynamic alignment following to share the load of heavy equipment. The human–robot team demonstrated statistically significant decreases in average metabolic rate and total energy exerted in our transport fixed experiment. However, we did see a statistically significant increase in task completion time. In the time-fixed experiment, the average metabolic rate decreased, overall exertion energy decreased, and the performance increased (number of plates transported) with the robot-assisted case. These results illustrate that the human–robot team provides energetic benefits. However, the results show that for certain tasks, the robot reduces task efficiency. This means that further improving human–robot coordination remains an area for exciting research.
Footnote
Acknowledgment
The authors would like to thank the National Defense Science and Engineering Graduate (NDSEG) Fellowship for their support and funding.
Conflict of Interest
There are no conflicts of interest.
Data Availability Statement
The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.