top of page

Decision-making optimization 

Decision-making optimization in realistic engineering settings needs to account for the existence of operational constraints. An efficient policy is not always a policy that optimizes an objective with 'all options on the table', but rather a policy that can nimbly morph in face of the exogenous environment constraints.


Such constraints can be either 'hard', deterministically delimiting the degrees of freedom of the decision-maker according to a known capacity (e.g. a fixed budget), or 'soft', satisfying a known capacity probabilistically or in expectation (e.g. an acceptable level of failure probability). In addition, as per the temporal and sequential nature of the decision problem, both constraints generally correspond to long-term metrics (e.g. a 5-year budget or a 50-year probability of failure, respectively).

Extending the multi-agent actor critic DRL concept introduced in the DRL-POMDP project through state augmentation and Lagrange multipliers in order to account for such types of constraints, a 10-component deteriorating system is examined against various constrained scenarios. In the figure below, statistical features of the learned policies can be observed for two cases of 5-year budget constraints, and the unconstrained case (from right to left). 

Budget Constrained Policy Stats.png

Besides the apparent diversification of different component policies, a notable pattern in this risk-constrained case is that the agents develop opportunistic strategies when failure events occur in components of the same link, whereas, similarly, intervention actions, which in general cause partial or total link closures, are synchronized for same-link components. The reason why the agents develop this behavior is that maintenance-induced network disruptions are thereby minimized. Again, this behavior, similarly to the observed re-prioritization of inspection and maintenance resources under budget constraints shown previously, is autonomously learned by the agents without any user-defined enforcement or implicit reward-based penalization/motivation.


Andriotis, C.P., and Papakonstantinou, K.G., “Deep reinforcement learning driven inspection and maintenance planning under incomplete information and constraints”, Reliability Engineering & System Safety (under review), arXiv preprint arXiv:2007.01380, 2020. [Link]

Andriotis, C.P., and Papakonstantinou, K.G., “Managing engineering systems with large state and action spaces through deep reinforcement learning”, Reliability Engineering & System Safety, 191 (11), 106483, 2019. [Link]

Andriotis C.P., and Papakonstantinou, K.G., “Life-cycle policies for large engineering systems under complete and partial observability”, 13th International Conference on Applications of Statistics and Probability in Civil Engineering (ICASP), Seoul, South Korea, 2019. [Link]



Data, Documentation, Presentations


Faculty of Architecture & the Built Environment

Delft University of Technology

Julianalaan 134, 2628 BL, Delft 

email: c.andriotis [at] tudelft [dot] nl

Copyright © 2020-21 by C.P. Andriotis

bottom of page