Scaling Learning Based Policy Optimization for Temporal Tasks via Dropout

Date:

More information here

This is a presentation where I am discussing our latest research on a novel model-based approach for training feedback controllers tailored for autonomous agents in nonlinear environments. This work leverages discrete-time Signal Temporal Logic (DT-STL) to ensure that trained policies meet specific task objectives.

The core of our research addresses the significant challenges faced when training recurrent neural networks (RNNs) for long-horizon task objectives. By introducing a new gradient approximation algorithm and proposing innovative smooth semantics for DT-STL, we aim to overcome the inefficiencies of existing methods and facilitate scalable backpropagation over extended time horizons and high-dimensional state spaces.