Elsevier

Acta Astronautica

Volume 171, June 2020, Pages 1-13
Acta Astronautica

Research paper
Terminal adaptive guidance via reinforcement meta-learning: Applications to autonomous asteroid close-proximity operations

https://doi.org/10.1016/j.actaastro.2020.02.036Get rights and content

Highlights

  • Adaptive guidance system optimized using Reinforcement Meta-Learning.

  • System maps sensor output directly to actuator commands.

  • System autonomously completes a landing maneuver with pinpoint accuracy.

  • We test system in high fidelity 6-DOF simulator.

  • Simulator models time-varying dynamics, actuator failure, and sensor distortion.

Abstract

Current practice for asteroid close proximity maneuvers requires extremely accurate characterization of the environmental dynamics and precise spacecraft positioning prior to the maneuver. This creates a delay of several months between the spacecraft's arrival and the ability to safely complete close proximity maneuvers. In this work we develop an adaptive integrated guidance, navigation, and control system that can complete these maneuvers in environments with unknown dynamics, with initial conditions spanning a large deployment region, and without a shape model of the asteroid. The system is implemented as a policy optimized using reinforcement meta-learning. The lander is equipped with an optical seeker that locks to either a terrain feature, reflected light from a targeting laser, or an active beacon, and the policy maps observations consisting of seeker angles and LIDAR range readings directly to engine thrust commands. The policy implements a recurrent network layer that allows the deployed policy to adapt real time to both environmental forces acting on the agent and internal disturbances such as actuator failure and center of mass variation. We validate the guidance system through simulated landing maneuvers in a six degrees-of-freedom simulator. The simulator randomizes the asteroid's characteristics such as solar radiation pressure, density, spin rate, and nutation angle, requiring the guidance and control system to adapt to the environment. We also demonstrate robustness to actuator failure, sensor bias, and changes in the lander's center of mass and inertia tensor. Finally, we suggest a concept of operations for asteroid close proximity maneuvers that is compatible with the guidance system.

Introduction

Current practice for asteroid close proximity operations involves first making several passes past the asteroid in order to collect images and LIDAR data that allows creating a shape model of the asteroid [1]. In addition, these maneuvers can be used to estimate the environmental dynamics in the vicinity of the asteroid, which allows calculation of a burn that will put the spacecraft into a safe orbit [2]. Once in this safe orbit, statistical orbit determination techniques [3] augmented by optical navigation are used to create a model that can estimate the spacecraft's orbital state (position and velocity) as a function of time. This model is extremely accurate provided the spacecraft remains in the safe orbit. While the spacecraft is in the safe orbit, both the asteroid shape model and the model of the asteroid's dynamics are refined. Mission planners then use the orbit model, along with an estimate of the forces acting on the spacecraft, to plan an open loop maneuver that will bring the spacecraft from a given point in the safe orbit to some desired location and velocity. An example of such a trajectory is the OSIRIS REx Touch and Go (TAG) sample collection maneuver [2]. If the dynamics model used to plan the open loop maneuver is not completely accurate, errors can accumulate over the trajectory, resulting in a large error ellipse for the contact position between the asteroid and spacecraft [4]. For this reason, multiple rehearsal maneuvers are typically planned and executed in order to ensure that the dynamics model is accurate in the TAG region of interest. To be clear, since the OSIRIS REx CONOPS uses an open loop trajectory for the TAG maneuver, even if the landing site were tagged with a targeting laser, the spacecraft GNC system would not be capable of employing such navigation aid to enhance landing precision or tolerate larger initial condition uncertainty. The latter is due to constraints in the overall system design. Importantly, a partial closed-loop approach had been initially considered to improve the overall accuracy. Berry et al. [4] devised an algorithm based on a second-order surface response method where two individual laser measurements, specifically executed to estimate range-to-go via limb detection and one single altitude, may be employed to correct the timing and magnitude of the intermediate maneuver, thus improving the accuracy. However, as a back-up plan, an autonomous navigation system called Natural Feature Tracking (NFT [5]) has been devised and implemented on the OSIRIS REx spacecraft computer in an attempt to comply with the more stringent requirements for Bennu sampling (less then 5 m radius) due to the unexpected rough terrain observed on the asteroid surface [6]. Finally, note that current practice is not compatible with completely autonomous missions, where the spacecraft can conduct operations on and around an asteroid without human supervision. For example, the Hayabusa 2 mission has devised a hybrid ground/onboard-based navigation system to navigate the initial descent from a home (hovering) position [7,8]. Named GCP-NAV, the system autonomously controls the vertical descent velocity using the on-board LIDAR, whereas the horizontal position and velocity are determined on the ground by an operator that processes the navigation cameras outputs [9]. Subsequently, a horizontal maneuver is planned on the ground and uploaded to the on-board computer to command the spacecraft descent along a predefined path. Note however, the last two phases of the descent, i.e. surface-relative descent and touch down are fully autonomous and rely on the deployment of a target marker which is released from the main spacecraft at 100 m altitude. During such a phase, the spacecraft GNC system attempts to track the marker using both optical navigation camera and Flash Lamp (FLASH).

Now consider an adaptive guidance, navigation and control (GNC) system that after a short interaction with the environment can adapt to that environment's ground truth dynamics, only limited by the spacecraft's thrust capability. Such a system would allow a paradigm shift for mission design, as highlighted in the comparison between current practice and what might be possible using the proposed system, as shown in Table 1. Of course there are scientific reasons for characterizing an asteroid's environment as accurately as possible, but the proposed innovation gives mission flexibility. For example, a mission might involve visiting multiple asteroids and collecting samples, and the orbits of the asteroids might make it necessary to spend only a short time at each one. Or the mission goal might not be scientific at all, but rather to identify resource rich asteroids for future mining operations. For a given level of accuracy with respect to the environmental dynamics model, the ability to adapt real time when the environment diverges from the model should provide a significant reduction in mission risk.

Coupling a suitable navigation system with a traditional closed loop guidance and control law can potentially improve maneuver accuracy. However, if the asteroid's environmental dynamics are not well characterized, accuracy will still be compromised due to errors stemming from both the dynamics model used in the state estimation algorithm and the potential inability of the guidance and control law to function optimally in an environment with unknown dynamics. Indeed, an optimal trajectory generated based off of an inaccurate dynamics model may be infeasible (impossible to track with a controller given control constraints) in the actual environment. Moreover, our initial research into this area [10] has shown that traditional closed loop guidance laws such as DR/DV [11] are not robust to actuator failure, unknown dynamics, and navigation system errors, whereas the proposed GNC system is. Finally, note that integration of the navigation system allows the system to quickly adapt to sensor bias.

It is worth noting that for the 3-DOF case, we have demonstrated landing on an asteroid using a policy that maps LIDAR altimeter readings directly to thrust commands [10], but the performance was suboptimal. We have also demonstrated an asteroid TAG maneuver in 3-DOF where a particle filter [12] uses an asteroid shape model to infer the spacecraft's state in the asteroid body-fixed reference frame, and an energy-optimal guidance law [11] maps this to estimated state to a thrust command. Here performance was acceptable, but the GNC system required knowledge of the asteroid's environmental dynamics.

Recent work by others in the area of adaptive guidance algorithms include [13], which demonstrates an adaptive control law for a UAV tracking a reference trajectory, where the adaptive controller adapts to external disturbances. One limitation is the linear dynamics model, which may not be accurate, as well as the fact that the frequency of the disturbance must be known. In Ref. [14] develops a fault identification system for Mars entry phase control using a pre-trained neural network, with a fault controller implemented as a second Gaussian neural network, Importantly, the second network requires on-line parameter update during the entry phase, which may not be possible to implement in real time on a flight computer. Moreover, the adaptation is limited to known actuator faults as identified by the 1st network. And in Ref. [15] the authors develop an adaptive controller for spacecraft attitude control using reaction wheels. This approach is also limited to actuator faults, and the architecture does not adapt to either state estimation bias or environmental dynamics.

In this work we develop an adaptive and integrated GNC system applicable to asteroid close proximity maneuvers, that allows a lander deployed by the spacecraft (see Section 2) to accurately and robustly land at a designed site. The system is optimized using reinforcement meta-learning (RL meta-learning), and implements a global policy over the region of state space defined by the deployment region and potential landing sites. The policy maps observations to actions, with the observations consisting of angles and range readings from the lander's seeker, changes in lander attitude since the start of the maneuver, and lander rotational velocity. The policy actions consist of on/off thrust commands to the lander's thrusters. In order to reduce mission risk, we present a concept of operations (CONOPS) where a hovering spacecraft tags the landing site with a targeting laser, providing an obvious target for the lander's seeker camera. However, future work will investigate the effectiveness of using terrain features as targets, and the use of surface beacons. In the RL framework, the seeker can be considered an attention mechanism, determining what object in the agent's field of regard the policy should target during the maneuver. In the case where we want to target a terrain feature rather than a tagged landing site, the landing site would be identified by the seeker, rather than the guidance policy. Both seeker design and laser aided guidance are mature technologies, with seekers being widely used in guided missiles [16], and laser aided guidance used in certain types of missiles and guided bombs. Reinforcement Learning (RL) has recently been successfully applied to landing guidance problems [[17], [18], [19], [20]].

Adaptability is achieved through RL-Meta Learning, where different environmental dynamics, sensor noise, actuator failure, and changes in the lander's center of mass and inertia tensor are treated as a range of partially observable Markov decision processes (POMDP). In each POMPD, the policy's recurrent network hidden state will evolve differently over the course of an episode, capturing information regarding hidden variables that are useful in minimizing the cost function, i.e., external forces, changes in the lander's internal dynamics and sensor bias. By optimizing the policy over this range of POMDPs, the trained policy will be able to adapt to novel POMPDs encountered during deployment. Specifically, even though the policy's parameters are fixed after optimization, the policy's hidden state will evolve based off the current POMPD, thus adapting to the environment.

The policy uses approximately 16,000 32 bit network parameters, and requires approximately 64 KB of memory. The policy takes approximately 1 ms to run the mapping between estimated state and thruster commands (four small matrix multiplications) on a 3 Ghz processor. Since in this work the mapping is updated every 6 s, we do not see any issues with running this on the current generation of space-certified flight computers. A diagram illustrating how the policy interfaces with peripheral lander components is shown in Fig. 1.

One advantage of our proposed CONOPS and GN&C system as compared to current practice is that the environmental dynamics need not be accurately characterized prior to the maneuver, removing an element of mission risk. Compared to completely passive optical navigation approaches, our method has the additional advantage that it is insensitive to lighting conditions and does not rely on the asteroid having sufficient terrain diversity to enable navigation. Moreover, the system can adapt to sensor bias and actuator failure, further reducing mission risk. The downside is that fuel efficiency will be inferior to that of an optimal trajectory generated using knowledge of the environmental dynamics. It would be possible to improve the fuel efficiency by observing the movement of the target location from an inertial reference frame, and using this information to put the lander on a collision triangle heading with the target landing site. Instead of heading for the target site, the lander would head towards the point where the target site will be at the completion of the maneuver. In this approach, the agent would be rewarded for keeping the seeker angles at their value at the start of a maneuver, which will keep the lander on the collision triangle with the moving target, as described in more detail in Ref. [21].

We demonstrate that the system can complete maneuvers from a large deployment region and without knowledge of the local environmental dynamics, and successfully adapt to sensor distortion, changes in the lander's center of mass and inertia tensor, and actuator failure. In this work, we will focus on a maneuver that begins approximately 1 km from the desired landing site, with a deployment region spanning approximately 1 cubic km. The goal is to reach a position within 1 m of a target location 10 m above the designated landing site, with velocity magnitude less than 10 cm/s, and negligible rotational velocity. What happens next will be mission specific. To illustrate a scenario, a hovering guidance and control system using LIDAR altimeter could take over at that point, bringing the lander to an attitude consistent with the deployment of a robotic arm, and collect a sample, with the hovering controller compensating for the disturbance created by the arm pushing against the surface. Alternately, the lander could release a rover from this altitude.

Section snippets

Concept of operations (CONOPS)

The GNC system described in this work uses a camera-based optical seeker. In order for the optical seeker to lock onto the desired landing site, the landing site must be appropriately marked. There are multiple methods that could be used to mark the landing site, including reflective markers dropped on the asteroid's surface by a hovering spacecraft [22]. We will propose two new methods for tagging the landing site using a targeting laser on board a hovering spacecraft. Once the landing site is

Lander configuration

The lander is modeled as a uniform density cube with height h=2m, width w=2m, and depth d=2m, with inertia matrix given in Eq. (1), where m is the lander's mass. The lander has a wet mass ranging from 450 to 500 kg. The thruster configuration is shown in Table 2, where x, y, and z are the body frame axes. Roll is about the x-axis, yaw is about the z-axis, and pitch is about the y-axis. When two thrusters on a given side of the cube are fired concurrently, they provide translational thrust

RL overview

In the RL framework, an agent learns through episodic interaction with an environment how to successfully complete a task by learning a policy that maps observations to actions. The environment initializes an episode by randomly generating a ground truth state, mapping this state to an observation, and passing the observation to the agent. These observations could be a corrupted version of the ground truth state (to model sensor noise) or could be raw sensor outputs such as Doppler radar

Experiments

Once we have a chance to commercialize the technology, we will post the Python code that allows reproducing our results on our Github site; the repository will be indexed at github.com/Aerospace-AI/Aerospace-AI.github.io.

Implementation considerations

In this work we considered an ideal seeker that perfectly tracked the target from a stabilized (inertial) platform. When this guidance system is implemented on a small lander, miniaturization of the seeker hardware is critical. First, note that since the GNC system only requires changes in attitude that have accumulated from the start of a maneuver, rather than use a star tracker, we can measure the difference between a gyroscope stabilized reference frame and the lander body frame. This should

Conclusion

We formulated a particularly difficult problem: precision maneuvers around an asteroid with unknown dynamics, starting from a large range of initial condition uncertainty, accounting for actuator failure, center of mass variation, and sensor noise, and using raw sensor measurements. We created a high fidelity 6-DOF simulator that synthesized asteroid models with randomized parameters. Where the asteroid is modeled as a uniform density ellipsoid that in general is not rotating about a principal

Declaration of competing interest

We have no competing interests.

References (37)

  • B. Gaudet, R. Linares, Adaptive Guidance with Reinforcement Meta-Learning, arXiv preprint...
  • C. D'Souza et al.

    An optimal guidance law for planetary landing

  • S. Thrun et al.

    Probabilistic Robotics

    (2005)
  • N. Prabhakar et al.

    Trajectory-driven adaptive control of autonomous unmanned aerial vehicles with disturbance accommodation

    J. Guid. Contr. Dynam.

    (2018)
  • Y. Han et al.

    Adaptive fault-tolerant control of spacecraft attitude dynamics with actuator failures

    J. Guid. Contr. Dynam.

    (2015)
  • G.M. Siouris

    Missile Guidance and Control Systems

    (2004)
  • R. Furfaro et al.

    Deep learning for autonomous lunar landing

  • R. Furfaro et al.

    A recurrent deep architecture for quasi-optimal feedback guidance in planetary landing

  • Cited by (57)

    View all citing articles on Scopus
    View full text