Decision-making optimization

Dynamic decisions for deteriorating systems through Partially Observable Markov Decision Processes

Inspection & maintenance planning for structures and infrastructure against hazards and deterioration constitutes a complex sequential decision-making optimization problem. Significant challenges stem from the fact that the optimization process needs to (i) ensure optimality over multiple decision steps while (ii) efficiently incorporating environment uncertainties along the way.

Discrete stochastic optimal control through Partially Observable Markov Decision Processes (POMDPs) provides a cohesive framework that concurrently accommodates both the above features. The high-level concept of POMDPs is straightforward: the optimization process follows stochastic dynamic programming principles that, under loose regularity conditions, guarantee globally optimal solutions in sequential decision settings, whereas these decisions are conditioned on the state probability distribution (belief) of the system which is continuously updated through Bayesian filtering based on the underlying hidden Markov model.

These Bayesian updates recursively generate a new belief, based on the previous one, the selected action, and the received observation. Although in POMDPs the real state of the system is unknown to the decision-maker, the previous property allows us to equivalently express the POMDP problem into a belief-based MDP problem. This approach is conceptualized in the figure below (shaded nodes represent hidden random variables).

This basic structure can be extended to any type of dynamic Bayesian network formulation. A general structure of dependencies specific to deteriorating systems for example is depicted below. In this network, component states consist of the 'deterioration rate', the 'deterioration condition or level', and the 'performance state'. These vary for the Nc different system components, and are all conditioned on the 'environment parameters'. 'Environment parameters' express statistical dependencies and correlations among components. 'System performance states' express structural dependencies among components, reflecting a global response metric of the system. All these variables characterize the overall 'state' according to the compact POMDP representation of the previous figure. Along the same lines, hidden variables and/or unknown environment parameters of the system are probabilistically inferred through the observations over the course of the control policy.

Policy is a rule according to which actions are decided. In POMDPs, a policy is a map from beliefs to actions. This is an optimal way to decide about an action, since the current belief, as inferred by past actions and observations within a Bayesian context, contains all the information collected up to the current step (sufficient statistic of the history of actions and observations). Such a policy is induced by the optimal value function, i.e. the expected life-cycle cost, which is thus also defined over the belief space.

Although a belief is defined on a continuous space, the optimal value function is a set of finite vectors forming a piecewise linear and convex hyper-surface. Extending dynamic programming value iteration directly in this space to compute these vectors is feasible, however, it comes with an exponential complexity with the number of possible observations at every step. Point-based value iteration alleviates this complexity by operating on the equivalent belief-MDP problem, thus leveraging the polynomial complexity of value iteration in MDPs.

The basic concept of point-based value iteration is: (i) start from an initial probability distribution over states and/or parameters; (ii) traverse a trajectory of posterior beliefs by sampling, or selecting according to a pre-defined rule, actions and observations; (iii) perform vector backups over the collected beliefs to update the value function. Pruning techniques of dominated vectors and leverage of double-sided bounds in the exploration are also common in this approach. The basic steps are illustrated below.

A policy optimized through the above-described process (for Nc=1) is shown below for a reinforced concrete corroding port deck slab. As a result of the stochasticity of the corrosion propagation and the uncertainty in structural measurements, this is only one possible realization of the optimal policy that, overall, encompasses an optimal action mapping for any admissible combination of action and observation histories attainable throughout the operating life of the system.

As shown in the video, the choice of action type at every time-step depends on the dynamically updated state probability distribution (belief) over the system damage conditions. This dynamic updating is key to long-term optimality, and instills real-time adaptivity capabilities in the decision-making process. The notion of belief-conditioned policies is also central in the numerical approach utilized to learn this policy, i.e. the point-based POMDP value iteration, as explained above.

In cases where states are characterized by both certain and uncertain observation outcomes (e.g. component age vs corrosion penetration, respectively), state-based value function decompositions are also applicable through Mixed Observability Markov Decision Processes (MOMDPs), which are able to further reduce dimensionality, allowing for more compact state representations of the problem.

References:

Andriotis, C.P., and Papakonstantinou, K.G., “Deep reinforcement learning driven inspection and maintenance planning under incomplete information and constraints”, Reliability Engineering & System Safety (under review), arXiv preprint arXiv:2007.01380, 2020. [Link]

Andriotis, C.P., Papakonstantinou, K.G., and Chatzi, E.N., “Value of structural health information in partially observable stochastic environments”, Structural Safety (under review), arXiv preprint arXiv:1912.12534, 2020. [Link]

Papakonstantinou K.G., Andriotis C.P., and Shinozuka M., “POMDP and MOMDP solutions for structural life-cycle cost minimization under partial and mixed observability”, Structure and Infrastructure Engineering, 14 (7), 869-882, 2018. [Link]

Papakonstantinou K.G., Andriotis C.P., and Shinozuka M., “Point-based POMDP solvers for life-cycle cost minimization of deteriorating structures”, Proceedings of 5th International Symposium on Life-Cycle Civil Engineering, Delft, The Netherlands, 2016. [Link]

Resources:

Data, Documentation, Presentations

back

Contact

Faculty of Architecture & the Built Environment

Delft University of Technology

Julianalaan 134, 2628 BL, Delft

email: c.andriotis [at] tudelft [dot] nl

Decision-making optimization

Dynamic decisions for deteriorating systems through Partially Observable Markov Decision Processes

​

​

​

​

References:​

​

back

Contact

References: