Cover image for Examples in Markov Decision Processes.
Examples in Markov Decision Processes.
Title:
Examples in Markov Decision Processes.
Author:
Piunovskiy, A. B.
ISBN:
9781848167940
Personal Author:
Physical Description:
1 online resource (308 pages)
Series:
Imperial College Press Optimization Series ; v.2

Imperial College Press Optimization Series
Contents:
Contents -- Preface -- 1. Finite-Horizon Models -- 1.1 Preliminaries -- 1.2 Model Description -- 1.3 Dynamic Programming Approach -- 1.4 Examples -- 1.4.1 Non-transitivity of the correlation -- 1.4.2 The more frequently used control is not better -- 1.4.3 Voting -- 1.4.4 The secretary problem -- 1.4.5 Constrained optimization -- 1.4.6 Equivalent Markov selectors in non-atomic MDPs -- 1.4.7 Strongly equivalent Markov selectors in nonatomic MDPs -- 1.4.8 Stock exchange -- 1.4.9 Markov or non-Markov strategy? Randomized or not? When is the Bellman principle violated? -- 1.4.10 Uniformly optimal, but not optimal strategy -- 1.4.11 Martingales and the Bellman principle -- 1.4.12 Conventions on expectation and infinities -- 1.4.13 Nowhere-differentiable function vt(x) -- discontinuous function vt(x) -- 1.4.14 The non-measurable Bellman function -- 1.4.15 No one strategy is uniformly -optimal -- 1.4.16 Semi-continuous model -- 2. Homogeneous Infinite-Horizon Models: Expected Total Loss -- 2.1 Homogeneous Non-discounted Model -- 2.2 Examples -- 2.2.1 Mixed Strategies -- 2.2.2 Multiple solutions to the optimality equation -- 2.2.3 Finite model: multiple solutions to the optimality equation -- conserving but not equalizing strategy -- 2.2.4 The single conserving strategy is not equalizing and not optimal -- 2.2.5 When strategy iteration is not successful -- 2.2.6 When value iteration is not successful -- 2.2.7 When value iteration is not successful: positive model I -- 2.2.8 When value iteration is not successful: positive model II -- 2.2.9 Value iteration and stability in optimal stopping problems -- 2.2.10 A non-equalizing strategy is uniformly optimal -- 2.2.11 A stationary uniformly -optimal selector does not exist (positive model) -- 2.2.12 A stationary uniformly -optimal selector does not exist (negative model).

2.2.13 Finite-action negative model where a stationary uniformly -optimal selector does not exist -- 2.2.14 Nearly uniformly optimal selectors in negative models -- 2.2.15 Semi-continuous models and the blackmailer's dilemma -- 2.2.16 Not a semi-continuous model -- 2.2.17 The Bellman function is non-measurable and no one strategy is uniformly -optimal -- 2.2.18 A randomized strategy is better than any selector (finite action space) -- 2.2.19 The fluid approximation does not work -- 2.2.20 The fluid approximation: refined model -- 2.2.21 Occupation measures: phantom solutions -- 2.2.22 Occupation measures in transient models -- 2.2.23 Occupation measures and duality -- 2.2.24 Occupation measures: compactness -- 2.2.25 The bold strategy in gambling is not optimal (house limit) -- 2.2.26 The bold strategy in gambling is not optimal (inflation) -- 2.2.27 Search strategy for a moving target -- 2.2.28 The three-way duel ("Truel") -- 3. Homogeneous Infinite-Horizon Models: Discounted Loss -- 3.1 Preliminaries -- 3.2 Examples -- 3.2.1 Phantom solutions of the optimality equation -- 3.2.2 When value iteration is not successful: positive model -- 3.2.3 A non-optimal strategy for which v x solves the optimality equation -- 3.2.4 The single conserving strategy is not equalizing and not optimal -- 3.2.5 Value iteration and convergence of strategies -- 3.2.6 Value iteration in countable models -- 3.2.7 The Bellman function is non-measurable and no one strategy is uniformly -optimal -- 3.2.8 No one selector is uniformly -optimal -- 3.2.9 Myopic strategies -- 3.2.10 Stable and unstable controllers for linear systems -- 3.2.11 Incorrect optimal actions in the model with partial information -- 3.2.12 Occupation measures and stationary strategies -- 3.2.13 Constrained optimization and the Bellman principle.

3.2.14 Constrained optimization and Lagrange multipliers -- 3.2.15 Constrained optimization: multiple solutions -- 3.2.16 Weighted discounted loss and (N,∞)-stationary selectors -- 3.2.17 Non-constant discounting -- 3.2.18 The nearly optimal strategy is not Blackwell optimal -- 3.2.19 Blackwell optimal strategies and opportunity loss -- 3.2.20 Blackwell optimal and n-discount optimal strategies -- 3.2.21 No Blackwell (Maitra) optimal strategies -- 3.2.22 Optimal strategies as → 1 - and MDPs with the average loss - I -- 3.2.23 Optimal strategies as → 1 - and MDPs with the average loss - II -- 4. Homogeneous Infinite-Horizon Models: Average Loss and Other Criteria -- 4.1 Preliminaries -- 4.2 Examples -- 4.2.1 Why lim sup? -- 4.2.2 AC-optimal non-canonical strategies -- 4.2.3 Canonical triplets and canonical equations -- 4.2.4 Multiple solutions to the canonical equations in finite models -- 4.2.5 No AC-optimal strategies -- 4.2.6 Canonical equations have no solutions: the finite action space -- 4.2.7 No AC- -optimal stationary strategies in a finite state model -- 4.2.8 No AC-optimal strategies in a finite-state semicontinuous model -- 4.2.9 Semi-continuous models and the sufficiency of stationary selectors -- 4.2.10 No AC-optimal stationary strategies in a unichain model with a finite action space -- 4.2.11 No AC- -optimal stationary strategies in a finite action model -- 4.2.12 No AC- -optimal Markov strategies -- 4.2.13 Singular perturbation of an MDP -- 4.2.14 Blackwell optimal strategies and AC-optimality -- 4.2.15 Strategy iteration in a unichain model -- 4.2.16 Unichain strategy iteration in a finite communicating model -- 4.2.17 Strategy iteration in semi-continuous models -- 4.2.18 When value iteration is not successful -- 4.2.19 The finite-horizon approximation does not work -- 4.2.20 The linear programming approach to finite models.

4.2.21 Linear programming for infinite models -- 4.2.22 Linear programs and expected frequencies in finite models -- 4.2.23 Constrained optimization -- 4.2.24 AC-optimal, bias optimal, overtaking optimal and opportunity-cost optimal strategies: periodic model -- 4.2.25 AC-optimal and average-overtaking optimal strategies -- 4.2.26 Blackwell optimal, bias optimal, average-overtaking optimal and AC-optimal strategies -- 4.2.27 Nearly optimal and average-overtaking optimal strategies -- 4.2.28 Strong-overtaking/average optimal, overtaking optimal, AC-optimal strategies and minimal opportunity loss -- 4.2.29 Strong-overtaking optimal and strong*-overtaking optimal strategies -- 4.2.30 Parrondo's paradox -- 4.2.31 An optimal service strategy in a queueing system -- Afterword -- Appendix A Borel Spaces and Other Theoretical Issues -- A.1 Main Concepts -- A.2 Probability Measures on Borel Spaces -- A.3 Semi-continuous Functions and Measurable Selection -- A.4 Abelian (Tauberian) Theorem -- Appendix B Proofs of Auxiliary Statements -- Notation -- List of the Main Statements -- Bibliography -- Index.
Abstract:
This invaluable book provides approximately eighty examples illustrating the theory of controlled discrete-time Markov processes. Except for applications of the theory to real-life problems like stock exchange, queues, gambling, optimal search etc, the main attention is paid to counter-intuitive, unexpected properties of optimization problems. Such examples illustrate the importance of conditions imposed in the theorems on Markov Decision Processes. Many of the examples are based upon examples published earlier in journal articles or textbooks while several other examples are new. The aim was to collect them together in one reference book which should be considered as a complement to existing monographs on Markov decision processes.The book is self-contained and unified in presentation.The main theoretical statements and constructions are provided, and particular examples can be read independently of others. Examples in Markov Decision Processes is an essential source of reference for mathematicians and all those who apply the optimal control theory to practical purposes. When studying or using mathematical methods, the researcher must understand what can happen if some of the conditions imposed in rigorous theorems are not satisfied. Many examples confirming the importance of such conditions were published in different journal articles which are often difficult to find. This book brings together examples based upon such sources, along with several new ones. In addition, it indicates the areas where Markov decision processes can be used. Active researchers can refer to this book on applicability of mathematical methods and theorems. It is also suitable reading for graduate and research students where they will better understand the theory.
Local Note:
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
Electronic Access:
Click to View
Holds: Copies: