### Overview of Stochastic Programming

Stochastic programming is a framework for modeling optimization problems that involve uncertainty. Whereas deterministic optimization problems are formulated with known parameters, real world problems almost invariably include some unknown parameters. When the parameters are known only within certain bounds, one approach to tackling such problems is called robust optimization. Here the goal is to find a solution which is feasible for all such data and optimal in some sense. Stochastic programming models are similar in style but take advantage of the fact that probability distributions governing the data are known or can be estimated. The goal here is to find some policy that is feasible for all (or almost all) the possible data instances and maximizes the expectation of some function of the decisions and the random variables. More generally, such models are formulated, solved analytically or numerically, and analyzed in order to provide useful information to a decision-maker.

To find out more about stochastic programming a good place to start is A Tutorial on Stochastic Programming by Alexander Shapiro and Andy Philpott. This tutorial is aimed at readers with some acquaintance with optimization and probability theory; for example graduate students in operations research, or academics/ practitioners from a different field of operations research.

The older Stochastic Programming Introduction by Andy  Philpott is aimed at readers with a less formal background in operations research, for example managers in industry who want to know more about what stochastic programming might offer them without delving too deeply into details.

In addition, tutorials on current research areas are being developed. The main idea is that those who've read the introduction and want to find out more about specific topics will have a good place to start. COSP will be inviting experts to write pages for each of these areas. This collection of introductions is edited by David Morton, Andy Philpott, and Maarten van der Vlerk.

Of course, what constitutes current research will continue to evolve, and so we've incorporated a mechanism to periodically revise and add to the areas themselves.

Other Introductions to SP are available under SP Resources. Many of these are linked to from within this collection of introductions.

### Introduction

Stochastic programming is a framework for modelling optimization problems that involve uncertainty. Whereas deterministic optimization problems are formulated with known parameters, real world problems almost invariably include some unknown parameters. When the parameters are known only within certain bounds, one approach to tackling such problems is called robust optimization. Here the goal is to find a solution which is feasible for all such data and optimal in some sense. Stochastic programming models are similar in style but take advantage of the fact that probability distributions governing the data are known or can be estimated. The goal here is to find some policy that is feasible for all (or almost all) the possible data instances and maximizes the expectation of some function of the decisions and the random variables. More generally, such models are formulated, solved analytically or numerically, and analyzed in order to provide useful information to a decision-maker.

The most widely applied and studied stochastic programming models are two-stage linear programs. Here the decision maker takes some action in the first stage, after which a random event occurs affecting the outcome of the first-stage decision. A recourse decision can then be made in the second stage that compensates for any bad effects that might have been experienced as a result of the first-stage decision. The optimal policy from such a model is a single first-stage policy and a collection of recourse decisions (a decision rule) defining which second-stage action should be taken in response to each random outcome. A good introductory example of such a model is the gas-company example on the NEOS web site, which we summarize below.

Example
The gas company example has a planning horizon of two years. In the first year the gas company buys gas from the market, delivers some to its customers right away and puts the rest in storage for next year. The following year the company can supply from storage or buy from the market. The decision variables are:

1) how much gas to purchase and deliver,
2) how much gas to purchase and store, and
3) how much gas to take from storage and deliver to customers.

The decision depends on the price of gas in year 1 and year 2, the storage cost (say $1 per unit per year), the size of the storage facility, and the demand in each period. With this information the problem can be modelled as a simple linear program with the objective to minimize overall cost. In practice the price and demand in year 2 will be uncertain. Suppose that year 1 is a normal year and that year 2 can be one of three equally likely scenarios: normal, cold, or very cold. Each of these scenarios has different data as shown in the following table: Scenario Probability Gas Cost ($) Demand (units)
Normal 1/3 5.0 100
Cold 1/3 6.0 150
Very Cold 1/3 7.5 180

Forming and solving the stochastic linear program gives the following solution:

 Year  Purchase to Use  Purchase to Store  Storage  Cost ($) Scenario 1 Normal 100 100 100 1100 2 Normal 0 0 0 0 2 Cold 50 0 0 300 2 Very Cold 80 0 0 600 Expected Cost = \$1400

Although stochastic programming encompasses a wide range of methodologies, the two-stage gas-company example illustrates some important general differences between stochastic programming models and deterministic models. In the gas-company example there are three equally likely scenarios. A common approach adopted by planners is to seek an optimal policy by computing an optimal solution for each scenario separately. The candidate solutions here are to store either 0 or 180 units of fuel for the next stage. The optimal policy (as delivered by the stochastic program) is to store 100 units. This does not correspond to the optimal solution in any of the scenarios. (It is also different from the storage policy of 143.33 units obtained by solving an optimization problem with averaged data.) The solution from the stochastic program is well-hedged, building in some flexibility to meet the uncertain demand in the second stage.

A second important observation for the gas-company model is that the sequencing of decisions and observations is important. In constructing a stochastic programming model, it is not enough just to specify the decision variables: the modeller must also construct the model in such a way that prevents decisions that anticipate future uncertain events. In the example, if the company could anticipate demand then it would store 0, 0, or 180 units in the first stage depending on the future weather outcome. However this is not an implementable policy.

A third observation about the example is that the objective function in this case does not account for the variation in outcomes. The model minimizes an expected cost, and its optimal policy gives costs of \$1100, \$1400, and \$1700 under each scenario. If this were a one-shot decision then this spread of outcomes might be seen as unacceptable for the gas company owner if they are unwilling to accept some outcome that costs \$1700. Modelling different attitudes to risk is possible in the stochastic programming framework by using piecewise linear, or more general nonlinear, objective functions.

Although two-stage stochastic linear programs are often regarded as the classical stochastic programming modelling paradigm, the discipline of stochastic programming has grown and broadened to cover a wide range of models and solution approaches. Applications are widespread, from finance to fisheries management. An alternative modelling approach uses so-called chance constraints. These do not require that our decisions are feasible for (almost) every outcome of the random parameters, but require feasibility with at least some specified probability. For details of this approach see the Introduction to Chance-Constrained Programming by Rene Henrion.

One natural generalization of the two-stage model extends it to many stages. Here each stage consists of a decision followed by a set of observations of the uncertain parameters which are gradually revealed over time. In this context stochastic programming is closely related to decision analysis, optimization of discrete event simulations, stochastic control theory, Markov decision processes, and dynamic programming.

How does stochastic programming differ from these models? In general terms the discipline combines the power of mathematical programming with advanced probability techniques, to attack optimization problems that involve uncertainty. A mathematical programming approach has important benefits: the tools of convex analysis and duality theory can be applied to yield important insights and develop solution techniques based on decomposing large problems into manageable pieces. The tools of mathematical programming are also indispensible in handling general constraints on states and decision variables. (The addition of constraints is often a serious impediment to dynamic programming techniques as it increases the dimension of the state space, which can lead to an intractable problem.) An important (current) restriction for stochastic programming problems - in contrast to dynamic programming problems - is that the probability distributions of the random parameters are assumed to be given, and cannot depend on the decisions taken.

For an excellent introduction to stochastic programming and a discussion of its relationship to related areas see the lecture notes Optimization under Uncertainty by R.T. Rockafellar.

### Applications

Stochastic programming has been applied in the following areas:

• Agriculture
• Capacity planning
• Energy
• Finance
• Fisheries management
• Forestry
• Military
• Production control
• Scheduling
• Sport
• Telecommunications
• Transportation
• Water management

### Solving stochastic programs

Solution approaches to stochastic programming models are driven by the type of probability distributions governing the random parameters. A common approach to handling uncertainty is to define a small number of scenarios to represent the future. For example in the gas company example the random outcomes were modelled by three scenarios. In this case it is possible to compute a solution to the stochastic programming problem by solving a deterministic equivalent linear program. These problems are typically very large scale problems, and so, much research effort in the stochastic programming commmunity has been devoted to developing algorithms that exploit the problem structure, in particular in the hope of decomposing large problems into smaller more tractable components. Here convexity is a key property.

When the probability distributions of random parameters are continuous, or there are many random parameters, one is faced with the problem of constructing appropriate scenarios to approximate the uncertainty. One approach to this problem constructs two different deterministic equivalent problems, the optimal solutions of which provide upper and lower bounds on the optimal value z* of the original problem. For details see the tutorial talk by Chanaka Edirisinghe, Bounding Techniques in Stochastic Programming, SP98, Vancouver.

An alternative solution methodology replaces the random variables by a finite random sample and solves the resulting (deterministic) mathematical programming problem as one would do for the finite scenario case (see above). This is often called an external sampling method. Under fairly mild conditions one can obtain a statistical estimate of the optimal solution value that converges to z* as the sample size increases. External sampling methods typically take one sample before applying a mathematical programming method. A number of algorithmic procedures (see the second half of the Birge paper) have been developed to take repeated samples during the course of the algorithm. This is often called an internal sampling method. Details of the convergence properties of external and internal sampling methodologies can be found in the papers by David Morton (PS) and Alexander Shapiro (PDF).

Stochastic integer programming models arise when the decision variables are required to take on integer values. In most practical situations this entails a loss of convexity and makes the application of decomposition methods problematic. Techniques for solving stochastic integer programming models is an active research area (see the tutorial paper by Rüdiger Schultz (PDF)).

For an overview of SP research, see the list of current research areas which provides links to separate pages for each subject.

There are several sites that the reader may seek further information (and other introductory documents) on stochastic programming. These are listed in SP Resources.

This Official COSP Stochastic Programming Introduction was developed by Andy Philpott with the encouragement and support of COSP.

## Introduction

This document is part of the Stochastic Programming Community Page (sponsored by the The Committee on Stochastic Programming - COSP) and provides a first introduction to the challenging and exciting field of stochastic integer programming (SIP). Familiarity with basic mathematical programming concepts is assumed. For an introduction to general stochastic programming, please visit the Stochastic Programming Introduction webpage. We illustrate some standard SIP modelling principles using a simple example; discuss the challenges in solving SIP problems; and briefly mention some of the progress in SIP theory and algorithms. For detailed discussion of SIP, the reader is referred to the textbook [5], or the survey articles [13], [16], [21] and [23].

## 2  An Example

The SunDay Icecream Co. is planning the location and capacity of its distribution centers to serve the demands from its retailers in N cities. The locations and capacities of the distribution centers should be such that retailer demands can be satisfied economically. In addition to the location and capacity decisions for the distribution centers, SunDay also assigns retailers to distribution centers. That is, each retailer's demand is to be supplied from one designated distribution center. This assignment incurs a fixed cost regardless of the retailer's demand.

Unfortunately, the demands of the various retailers are quite uncertain, and SunDay only has information on the probability distribution of the retailer demands. How should SunDay decide on the optimal location and capacity of the distribution centers, as well as the optimal assignment of distribution centers to the retailers? The following subsections describe some alternative optimization models for making this decision.

We shall use the following notation in describing our models. Let yi denote the 0-1 variable indicating whether a distribution center is to be located in city i (i=1,...,N), and xi denote the non-negative variable indicating the capacity of the distribution center located in city i. The maximum capacity that can be located in city i is denoted by Ui. The cost parameters ai and bi denote the per-unit capacity cost and the fixed location cost for locating a distribution center in city i, respectively. Let zij denote the 0-1 variable indicating the assignment of retailer j to distribution center i and gij denote the associated fixed assignment cost. Let $\tilde{d}_j$ be the uncertain demand of the retailer in city j (j=1,...,N). All costs are assumed to be amortized to a weekly basis, and the demands and capacities are in tonnes of ice-cream.

### A simple integer recourse model

Since the demand is uncertain, once the location, capacity and assignment decisions are made, SunDay might find itself in the undesirable situation that the total demand of the retailers assigned to a particular distribution center exceeds capacity of that distribution center. Suppose, in this case, contractual obligations require SunDay to pay a penalty on the shortfall. Furthermore, the penalty cost for a shortfall in a distribution center located in city i is qi dollars per ton of ice-cream shortage rounded up to the next ton. Think of this shortage penalty as the cost to outsource the ice-cream shortfall using one-ton capacity ice-cream trucks.

Note that the locations, capacities and assignments of the distribution centers have to be decided prior to observing the actual demand, whereas the shortage penalty is incurred upon observing the actual demand volume in the future. Consequently, as of now, the shortage penalty is an uncertain parameter, whose actual value depends on the uncertain future demand volume. A natural goal may be to decide on the location, capacity and assignments of the distribution centers here-and-now such that the sum of the location-capacity-assignment costs and the expected future shortage penalty is minimized. An optimization formulation for this problem is then:

$$\min \sum_{i=1}^N (a_i x_i + b_i) + \sum_{i=1}^N \sum_{j=1}^N g_{ij} z_{ij} + \mathbb{E}[Q(x,z,\tilde{d})]$$

subject to

$\begin{array}{ll} 0 \leq x_i \leq U_i y_i & i=1,\ldots,N \\ \sum_{i=1}^N z_{ij} = 1 & j=1,\ldots,N \\ \sum_{j=1}^N z_{ij} \leq y_i & i=1,\ldots,N \\ y_i, z_{ij} \in \{0,1\} & i,j=1,\ldots,N \end{array}$

here $x := (x_1,\ldots,x_N)^\top, z := (z_{11},z_{12},\ldots,z_{NN})^\top,\tilde{d} := (\tilde{d}_1,\ldots,\tilde{d}_N)^\top$,

$Q(x,z,\tilde{d}) := \sum_{i=1}^N \Bigl\lceil \max\Bigl\{ \sum_{j=1}^N \tilde{d}_j z_{ij} - x_i, 0 \Bigr\} \Bigr\rceil,$

and $\lceil a \rceil$ denotes the ceiling or integer round-up of $a$.

The above problem is an example of a two-stage stochastic program with simple integer recourse. The first-stage decisions constitutes location-capacity-assignment of the distribution centers prior to observing the exact demands, while the second-stage decisions constitute a simple recourse action of paying a penalty on the shortfall once demand information becomes available. Since the penalty is paid on integer shortfall values, the recourse action is "integer."

### A general integer recourse model

Suppose now that SunDay has the flexibility of postponing the assignment decisions until actual demand information becomes available. In this case, a natural model is to minimize the sum of location-capacity costs and the expected future assignment and shortage penalty costs. The formulation becomes

$\begin{array}{rll} \min \ & \sum_{i=1}^N (a_i x_i + b_i y_i) + \mathbb{E}[Q(x,\tilde{d})] \\ \text{s.t. } & 0 \leq x_i \leq U_i y_i & i=1,\ldots,N \\ & y_i \in \{0,1\} & i=1,\ldots,N \end{array}$

where

$\begin{array}{rll} Q(x,\tilde{d}) := \min \ & \sum_{i=1}^N \sum_{j=1}^N g_{ij} z_{ij} + \sum_{i=1}^N q_i w_i \\ \text{s.t. } & \sum_{j=1}^N \tilde{d}_j z_{ij} - w_i \leq x_i & i=1,\ldots, N \\ & \sum_{i=1}^N z_{ij} = 1 & j=1,\ldots,N \\ & w_i \in \mathbb{Z}_+, z_{ij} \in \{0,1\} & i,j=1,\ldots,N \end{array}$

The above problem is an example of a two-stage stochastic program with general integer recourse. Here the second-stage or recourse action constitute optimally deciding the assignment of the retailers to distribution centers and computing any shortages based on the capacity installed in the first-stage and actual demand realized.

### A chance-constrained model

Suppose that, as in the simple integer recourse example, SunDay has to make the location-capacity-assignment decisions prior to observing demand. However, instead of paying a penalty on shortages, it wishes to guarantee that the probability of a shortage at any of the distribution centers is small ( $\leq \epsilon$). The problem formulation now becomes:

$\begin{array}{rll} \min \ & \sum_{i=1}^N (a_i x_i + b_i) + \sum_{i=1}^N \sum_{j=1}^N g_{ij} z_{ij} \\ \text{s.t. } & 0 \leq x_i \leq U_i y_i & i=1,\ldots,N \\ & \sum_{i=1}^N z_{ij} = 1 & j=1,\ldots,N \\ & \sum_{j=1}^N z_{ij} \leq y_i & i=1,\ldots,N \\ & y_i, z_{ij} \in \{0,1\} & i,j=1,\ldots,N \\ & \mathbb{P}\left\{\sum_{=1}^N \tilde{d}_j z_{ij} \leq x_i, \ i=1,\ldots, N \right\} \geq 1 - \epsilon \end{array}$

The last constraint in the above model bounds the probability of a shortage from above. Such constraints are known as chance constraints or probabilistic constraints. Unless the underlying random parameter distributions and/or the constraint structure is of very specific form, chance-constraints are extremely difficult to deal with algorithmically. For a discussion of general chance-constrained stochastic programs, see [18]. Various special classes of chance-constrained SIPs have been studied, see e.g. [3] and [4].

Owing to the difficulties with chance-constrained SIPs, much of the progress in SIP theory and algorithms has been for recourse models. For the remainder of this article, we shall concentrate on recourse SIP models.

## 3  Algorithmic Challenges

In this section, we discuss some of the computational challenges involved in solving stochastic programs with integer recourse. A standard form for a two-stage stochastic program with integer recourse is as follows:

$\begin{array}{rl} \min \ & c^\top x + \mathbb{E}[Q(x,w)] \\ \text{s.t. } & Ax=b, \\ &x \in \mathbb{R}_+^{n_1 - p_1}\times \mathbb{Z}_+^{p_1} \end{array}$

where

$\begin{array}{rl} Q(x,w) := \min \ & q^\top y \\ \text{s.t. } & Wy = h - Tx, \\ &y \in \mathbb{R}_+^{n_2-p_2} \times \mathbb{Z}_+^{p_2} \end{array}$

Above x represents the first-stage decisions and y represents the second-stage decisions, and w represents the uncertain data for the second-stage (the parameters (q,W,h,T) are actual realization of the random data). Note that both first- and second- stage variables are restricted to be mixed-integer (R and Z denotes reals and integers respectively).

There are three levels of difficulty in solving stochastic integer programs of the above form.

1. Evaluating the second-stage cost for a fixed first-stage decision and a particular realization of the uncertain parameters. Note that this involves solving an instance of the second-stage problem which may be an NP-hard integer program and involve significant computational difficulties.

2. Evaluating the expected second-stage cost for a fixed first-stage decision. If the uncertain parameters have continuous distribution, this involves integrating the value function Q(x,·) of an integer program, and is in general impossible. If the uncertain parameters have a discrete distribution, this involves solving a (typically huge) number of similar integer programs.

3. Optimizing the expected second-stage cost. It is well known that the value function of an integer program is non-convex and often discontinuous. Consequently, the expected second-stage cost function E[Q(·,w)] is non-convex in x. Figure 1 illustrates the non-convex nature of the objective function cTx + E[Q(x,w)] for a small stochastic integer programming problem with two first-stage variables. The optimization of such a complex objective function poses severe difficulties.

Figure 1: Objective function of a small SIP

## 4  Theory and Algorithmic Progress

In this section, we briefly mention some of the theoretical and algorithmic progress towards addressing the afore-mentioned difficulties in two-stage stochastic integer programming.

### Difficulty 1

It is typically assumed that a single evaluation of the second-stage problem is somehow tractable. Without this assumption, very little progress in optimizing the expected value of this integer program is possible. There has been some work in obtaining approximate solutions to SIPs through approximate solutions to the integer second-stage problem, e.g. in [9].

### Difficulty 2

Let us now consider the difficulty of evaluating the expected value of the second-stage integer program E[Q(x,w)] for a given first-stage decision x. As mentioned earlier, if the distribution of the uncertain parameters is continuous or if, in case of discrete distributions, the number of possible realizations is extremely large, then it is practically impossible to evaluate E[Q(x,w)] exactly. In this case, one has to resort to approximating the underlying probability distribution by a manageable distribution.

For example, if the underlying distribution is continuous one may approximate it via discretization. Theoretical stability results for SIPs (see [19,20]) suggest that if the approximate distribution is not too "far" from the true distribution, then the optimal solution to the SIP involving the approximate distribution is close to the true optimal solution.

Alternatively, one may use statistical estimates of the expected value function via Monte Carlo sampling. This can be done in one of two ways. In interior sampling approaches, the estimation of E[Q(x,w)] is carried within the algorithm used to optimize this function. For example, in the stochastic branch and bound algorithm [17], the feasible domain of the first-stage variables x is recursively partitioned into subsets, and statistical upper and lower bounds on the objective function cTx + E[Q(x,w)] over these subsets are obtained via sampling. These bounds are used to discard inferior subsets of the feasible domain, and further partition the promising subsets to eventually isolate a subset containing an approximate optimal solution. In exterior sampling approaches, the sampling and optimization are decoupled. A Monte Carlo sample of the uncertain parameters is generated, and the expectation objective in the problem is replaced by a sample average. The resulting approximation of the problem is then solved, and its solution serves as a candidate solution to the true problem. By repeating the sampling-optimization procedure several times, it is possible to obtain statistical confidence intervals on the obtained candidate solutions. Surprisingly, it can be shown that the number of samples needed to get a fairly accurate solution with high probability is quite small. Discussion of theoretical and algorithmic issues pertaining to the above approach in the context of SIP can be found in [14,1].

Regardless of how the underlying distribution is approximated, an evaluation of the expected second-stage objective value (under the approximate distribution) requires solving many similar integer programs. Owing to the absence of a computationally useful duality theory for integer programming, it is very difficult to take advantage of the similarities in the different second-stage IPs. When the second-stage variables are pure integer, several proposals for using Groebner basis and other test set based methods from computational algebra for exploiting IP problem similarity have been put forth [10,22,26]. For the case of mixed-integer subproblems, if a cutting plane method is used, then under some conditions it is possible to transform a cut (or a valid inequality) derived for one of the second-stage subproblems into a cut for another subproblem by exploiting similarity [6,11].

### Difficulty  3

Much of the development in SIP has been towards the difficulty of optimizing f(x) : = cTx + E[Q(x,w)], i.e., the sum of the first-stage and the expected second-stage costs. We classify these developments as follows.

#### Convex approximations of the value function

In case of SIP with simple integer recourse, a single evaluation of f(x) is easy, however owing to the non-convex nature of E[Q(x,w)], the function f(x) is difficult to optimize. Fortunately, it has been shown [12] that an expectation of the continuous counterpart of the simple integer recourse function Q(x,w) (i.e. where the recourse variables are continuous instead of being discrete valued) under a particular class of distributions of w provides the tightest convex under-estimator for E[Q(x,w)] over its entire domain. Recently, similar results for constructing convex approximations of general integer recourse functions (SIPs involving pure integer second-stage variables) by perturbing the underlying distribution have been obtained [27]. These convex approximating functions are amenable for optimization and can be used to provide strong lower bounds within some of the under-mentioned algorithms for optimizing f(x).

#### Stage-wise decomposition algorithms

This class of algorithms adopt the natural viewpoint of optimizing the objective function f(x) : = cTx + E[Q(x,w)] over the set of feasible first-stage decisions (say denoted by X). Note that E[Q(x,w)] is not available in closed-form, nor is it suited for direct optimization. Typical algorithms in this class progress in the following manner. In an iteration i, the algorithm builds and/or refines a computationally tractable approximation (typically an under-estimator) $\hat{Q}_i$(x) of E[Q(x,w)]. The under-estimating function cTx +$\hat{Q_i}$(x) is optimized with respect to the first-stage variables (this optimization problem is often referred to as the master problem) to obtain a lower bound on the true optimal objective value as well as to provide a candidate first-stage solution xi. Corresponding to the candidate solution, the second-stage expected value function E[Q(xi,w)] is evaluated. Assuming that the distribution of w is discrete, this step involves independent solution of the second-stage problems for each realization of w, allowing for a computationally convenient decomposition. The value cTxi +E[Q(xi,w)] provides an upper bound on the optimal objective value. The evaluation of E[Q(xi,w)] also provides information on how the approximation $\hat{Q}_i$ is to be updated/refined to $\hat{Q}_{i+1}$ for the master problem of iteration i+1. The process continues until the bounds have converged. The details of the various stage-wise decomposition algorithms differ mainly in how the approximation $\hat{Q}_i$ is constructed and updated.

For SIPs with binary first-stage variables and mixed-integer second-stage variables, the integer L-shaped method [15] approximates the second-stage value function by linear "cuts" that are exact at the binary solution where the cut is generated and are under-estimates at other binary solutions. The optimization of the master problem, i.e. optimizing cTx + $\hat{Q}_i$(x) with respect to the first-stage binary variables, is carried out via a branch-and-bound strategy. As soon as a new first-stage binary solution is encountered in the branch-and-bound search, the second-stage subproblems are solved to generate a new cut and to refine the approximation $\hat{Q}_i$. The integer L-shaped method requires that the second-stage integer problems (corresponding to the current candidate first-stage solution xi) are all solved to optimality before a valid cut can be generated. Recall that typical integer programming algorithms progress by solving a sequence of intermediate linear programming problems. Using disjunctive programming techniques, it is possible to derive cuts from the solutions to these intermediate LPs that are valid under-estimators of E[Q(x,w)] at all binary first-stage solutions [11,24]. This avoids solving difficult integer second-stage problems to optimality in all iterations of the algorithm, providing significant computational advantage.

For SIPs where the first-stage variables are not necessarily all binary, dual functions from the second-stage integer program can, in principle, be used to construct cuts to build the approximation $\hat{Q}_i$ [8]. Owing to the non-convex nature of IP dual functions, the cuts are no longer linear, resulting in a non-convex master problem. If the second-stage variables are pure integer (and the first-stage variables are mixed-integer), then it can be shown that E[Q(x,w)] is piece-wise constant over subsets that form a partitioning of the feasible region of x [22]. Optimization of cTx +E[Q(x,w)] over such a subset is easy. This leads immediately to a scheme where the subsets are enumerated, and the one over which the objective function value is least is chosen. By exploiting certain monotonicity properties, the subsets can be enumerated efficiently within a branch-and-bound strategy [2].

#### Scenario-wise decomposition

Assuming the distribution of w is discrete, i.e. the random parameter takes one of a finite set of values (scenarios) {w1,...,wS} having probabilities {p1,...,pS}, the two-stage SIP can be re-formulated as follows

$\begin{array}{rll} \min \ & \sum_{s=1}^S p_s (c^\top x_s + q^\top_sy_s) \\ \text{s.t. } & Ax_s = b & s=1,\ldots,S \\ & T_sx_s + W_sy_s = h_s & s=1,\ldots,S \\ & x_s \in \mathbb{R}_+^{n_1-p_1}\times \mathbb{Z}_+^{p_1} & s=1,\ldots,S \\ & y_s \in \mathbb{R}_+^{n_2-p_2} \times \mathbb{Z}_+^{p_2} & s=1,\ldots,S \\ & x_1=x_2=\cdots = x_S \end{array}$

[Top of page]Note that copies of the first-stage variable have been introduced for each scenario. The last constraint, known as the non-anticipativity constraints guarantee that the first-stage variables are identical across the different scenarios. Consider the Lagrangian dual problem obtained by relaxing the non-anticipativity constraints through the introduction of Lagrange multipliers. Note that for a given set of multipliers, the problem is separable by scenarios, thus the dual function can be evaluated in a decomposed manner. Optimization of the dual function can be performed using standard non-smooth optimization techniques. However, owing to the non-convexities, there exists a duality gap, and one needs to resort to a branch-and-bound strategy to prove optimality [7].

## 5  Concluding Remarks

This article offers a very limited view of some of the general modelling and algorithmic concepts in stochastic integer programming. The concepts alluded to here have been significantly enriched and extended in recent years. One particularly important area not discussed here is the multi-stage extension of two-stage SIP. We have also not mentioned the large number of important developments in application-specific areas of SIP (see, e.g., [25] for a bibliography of applications of SIP). We hope that this simple introduction will pique the readers interest towards further exploration of SIP.

## References

[1]
S. Ahmed and A. Shapiro. The sample average approximation method for stochastic programs with integer recourse. Optimization-online, available at http://www.optimization-online.org , 2002.

[2]
S. Ahmed, M. Tawarmalani, and N. V. Sahinidis. A finite branch and bound algorithm for two-stage stochastic integer programs. Stochastic Programming E-Print Series, available at http://www.speps.info , 2000.

[3]
P. Beraldi and A. Ruszczy\'nski. A branch and bound method for stochastic integer problems under probabilistic constraints. Optimization Methods and Software, 17(3):359-382, 2002.

[4]
P. Beraldi and A. Ruszczy\'nski. The probabilistic set-covering problem. Operations Research, 50(6):956-967, 2002.

[5]
J. R. Birge and F. Louveaux. Introduction to Stochastic Programming. Springer, New York, NY, 1997.

[6]
C. C. Caroe. Decomposition in stochastic integer programming. PhD thesis, University of Copenhagen, 1998.

[7]
C. C. Caroe and R.Schultz. Dual decomposition in stochastic integer programming. Operations Research Letters, 24:37-45, 1999.

[8]
C. C. Caroe and J. Tind. L-shaped decomposition of two-stage stochastic programs with integer recourse. Mathematical Programming, 83:451-464, 1998.

[9]
M. A. H. Dempster, M. L. Fisher, L. Jansen, B. J. Lageweg, J. K. Lenstra, and A. H. G. Rinnooy Kan. Analytical evaluation of hierarchical planning systems. Operations Research, 29:707-716, 1981.

[10]
R. Hemmecke and R. Schultz. Decomposition of test sets in stochastic integer programming. Mathematical Programming, 94:323-341, 2003.

[11]
J. L. Higle and S. Sen. The C3 theorem and a D2 algorithm for large scale stochastic integer programming: Set convexification. Working paper, University of Arizona, 2000.

[12]
W. K. Klein Haneveld, L. Stougie, and M. H. van der Vlerk. On the convex hull of the simple integer recourse objective function. Annals of Operational Research, 56:209-224, 1995.

[13]
W. K. Klein Haneveld and M. H. van der Vlerk. Stochastic integer programming: General models and algorithms. Annals of Operational Research, 85:39-57, 1999.

[14]
A. J. Kleywegt, A. Shapiro, and T. Homem-De-Mello. The sample average approximation method for stochastic discrete optimization. SIAM Journal of Optimization, 12:479-502, 2001.

[15]
G. Laporte and F. V. Louveaux. The integer L-shaped method for stochastic integer programs with complete recourse. Operations Research Letters, 13:133-142, 1993.

[16]
F.V. Louveaux and R. Schultz. Stochastic Integer Programming. In Handbooks in Operations Research and Mangament Science 10: Stochastic Programming. A. Ruszczy\'nski and A. Shapiro (Eds.). North-Holland, 2003 (To appear).

[17]
V. I. Norkin, G. C. Pflug, and A. Ruszczy\'nski. A branch and bound method for stochastic global optimization. Mathematical Programming, 83:425-450, 1998.

[18]
A. Prekopa. Stochastic Programming. Kluwer Academic Publishers, Netherlands, 1995.

[19]
R. Schultz. Continuity properties of expectation functions in stochastic integer programming. Mathematics of Operations Research, 18(3):578-589, 1993.

[20]
R. Schultz. On structure and stability in stochastic programs with random technology matrix and complete integer recourse. Mathematical Programming, 70(1):73-89, 1995.

[21]
R. Schultz, L. Stougie, and M. H. van der Vlerk. Two-stage stochastic integer programming: A survey. Statistica Neerlandica. Journal of the Netherlands Society for Statistics and Operations Research, 50(3):404-416, 1996.
(Online version)

[22]
R. Schultz, L. Stougie, and M. H. van der Vlerk. Solving stochastic programs with integer recourse by enumeration: A framework using Gröbner basis reductions. Mathematical Programming, 83:229-252, 1998.

[23]
S. Sen. Algorithms for stochastic mixed-integer programming models. Technical report, MORE Institute, SIE Department, University of Arizona, Tucson, AZ, 2003.
URL: http://tucson.sie.arizona.edu/MORE/papers/SIPHbook.pdf

[24]
H. D. Sherali and B. M. P. Fraticelli. A modification of Benders' decomposition algorithm for discrete subproblems: An approach for stochastic programs with integer recourse. Journal of Global Optimization, 22:319-342, 2002.

[25]
L. Stougie and M. H. van der Vlerk. Stochastic integer programming. In Annotated Bibliographies in Combinatorial Optimization, M. Dell'Amico et al (eds), John Wiley & Sons, New York, pages 127-141, 1997.

[26]
S. R. Tayur, R. R. Thomas, and N. R. Natraj. An algebraic geometry algorithm for scheduling in the presence of setups and correlated demands. Mathematical Programming, 69(3):369-401, 1995.

[27]
M.H. van der Vlerk. Convex approximations for complete integer recourse models. Technical Report 02A21, SOM, University of Groningen, The Netherlands, 2002. Also: Stochastic Programming E-Print Series, available at http://www.speps.info

## 1  Introduction

Chance Constrained Programming belongs to the major approaches for dealing with random parameters in optimization problems. Typical areas of application are engineering and finance, where uncertainties like product demand, meteorological or demographic conditions, currency exchange rates etc. enter the inequalities describing the proper working of a system under consideration. The main difficulty of such models is due to (optimal) decisions that have to be taken prior to the observation of random parameters. In this situation, one can hardly find any decision which would definitely exclude later constraint violation caused by unexpected random effects. Sometimes, such constraint violation can be balanced afterwards by some compensating decisions taken in a second stage. For instance, making recourse to pumped storage plants or buying energy on the liberalized market is an option for power generating companies that are faced with unforeseen peaks of electrical load. As long as the costs of compensating decisions are known, these may be considered as a penalization for constraint violation. This idea leads to the important class of twostage or multistage stochastic programs [2,16,28].

In many applications, however, compensations simply do not exist (e.g., for safety relevant restrictions like levels of a water reservoir) or cannot be modeled by costs in any reasonable way. In such circumstances, one would rather insist on decisions guaranteeing feasibility 'as much as possible'. This loose term refers once more to the fact that constraint violation can almost never be avoided because of unexpected extreme events. On the other hand, when knowing or approximating the distribution of the random parameter, it makes sense to call decisions feasible (in a stochastic meaning) whenever they are feasible with high probability, i.e., only a low percentage of realizations of the random parameter leads to constraint violation under this fixed decision. A generic way to express such a probabilistic or chance constraint as an inequality is

$P(h(x,\xi) \geq 0) \geq p \tag{1}$

Here, $x$ and $\xi$ are decision and random vectors, respectively, "$h(x,\xi)\geq 0$" refers to a finite system of inequalities and P is a probability measure. The value $p \in [0,1]$ is called the probability level, and it is chosen by the decision maker in order to model the safety requirements. Sometimes, the probability level is strictly fixed from the very beginning (e.g., p=0.95,0.99 etc.). In other situations, the decision maker may only have a vague idea of a properly chosen level. Of course, he is aware that higher values of p lead to fewer feasible decisions x in (1), hence to optimal solutions at higher costs. Fortunately, it turns out that usually p can be increased over quite a wide range without affecting too much the optimal value of some problem, until it closely approaches 1 and then a strong increase of costs becomes evident. In this way, models with chance constraints can also give a hint to a good compromise between costs and safety.

Among the numerous applications of chance constrained programming one may find areas like water resource management, energy production, circuit manufacturing, chemical engineering, telecommunications, and finance. Rather than listing all possible work in this context, we refer to the overview contained in [23]. The present document is devoted to a brief introduction to models, algorithmic aspects and theory of chance constrained programming and to motivate the reader to further consult deeper treatments of this subject area such as [21,28].

## 2  An Example

### 2.1  Deterministic Version

As an illustrating example, we consider the following cash matching problem taken from [8] and [26]: the pension fund of a company has to make certain payments for the next 15 years which shall be financed by buying three types of bonds. The assumed data are collected in Table 1. An initial capital K=250,000 is given. The amount of cash which is available at the end of the considered time horizon is to be maximized under the condition of meeting the payment restrictions in each year. For $i=1,\ldots,n:=3$ and $j=1,\ldots,m:=15$, we put:

$\begin{array}{rl} a_{ij} := & \text{yield per bond of type i in year j}; \\ b_j := & \text{payment required in year j;} \\ g_i := & \text{cost per bond of type i;} \\ x_i := & \text{number of bonds of type i to be bought;} \end{array}$

The amount of cash available in the fund at the end of year j is

$\underbrace{K - \sum_{i=1}^n g_i x_i}_{\text{cash after buying bonds}} + \underbrace{\sum_{k=1}^j \sum_{i=1}^n a_{ik}x_i}_{\text{cumulative yields of bonds}} - \underbrace{\sum_{k=1}^j b_k}_{\text{cumulative payments}}$

Introducing the quantities $a_{ij} := \sum_{k=1}^j a_{ik}x_i$ and $b_j := \sum_{k=1}^j b_k - K$,  the optimization problem may be written as

$\max \ \sum_{i=1}^n a_{im} x_i \text{ subject to } \sum_{i=1}^n a_{ij} x_i \geq b_j, \ (j=1,\ldots,m) \tag{2}$

where the system of inequality constraints models the condition of positive cash in year j and the objective function corresponds to the final cash (up to a constant).

Table 1: Data for the cash matching problem

 Yields per bond of type: Year Payments 1 2 3 1 11,000 0 0 0 2 12,000 60 65 75 3 14,000 60 65 75 4 15,000 60 65 75 5 16,000 60 65 75 6 18,000 1060 65 75 7 20,000 0 65 75 8 21,000 0 65 75 9 22,000 0 65 75 10 24,000 0 65 75 11 25,000 0 65 75 12 30,000 0 1060 75 13 31,000 0 0 75 14 31,000 0 0 75 15 31,000 0 0 1075 cost per bond: 980 970 1050

Allowing for fractional numbers of bonds to be bought, the solution of the linear programming problem (2) with the given data is

 x1=31.1,   x2=55.5,   x3=147.3,   final amount of cash: 127,332.

The amount of cash as a function of time is illustrated by the thick line in Figure 1 a.

### 2.2  Stochastic Version: individual chance constraints

As one can see from Figure 1 a (thick line), the cash profile of the fund reaches zero several times. Formally, this does not contradict feasibility of the solution, but it strongly depends on the exactness of payment data from Table 1. It is not hard to believe that slight changes of these data would result in infeasible cash profiles under the solution found above. Indeed, there is good reason to assume random payment data in our problem due to demographic uncertainty in a future time period. Therefore, let us assume now, that the bj introduced above are not the (deterministic) payments themselves but rather expected values of random payments $\tilde{b}_j$.

Figure 1: Illustration of the cash matching problem (deterministic version versus individual chance constraints). For details see text.

To be more precise, we suppose that the $\tilde{b}_j$ are independently normally distributed with expectation $b_j$ and standard deviation $s_j := 500 \cdot j$ for $j=1,\ldots,m$ (growing uncertainty over time). When simulating 100 payment profiles according to these assumptions and applying the deterministic solution from above, one arrives at 100 cash profiles illustrated in Figure 1 a (thin lines). As is to be expected, many of the cash profiles fall below zero in particular at the 'sharp times' when the deterministic profile reaches zero. Evidently, the deterministic solution is no longer satisfactory as it violates the constraint of positive cash with high risk. A clearer view is obtained when plotting for each year the percentage of simulated cash profiles yielding constraint violation (Fig. 1 c (thick line)). Several times, this value approaches 50%. In order to arrive at a more robust solution, one could impose the restriction that with each year fixed, the probability of having positive cash exceeds say p=0.95. Passing to the stochastic counterpart $\xi_j := \sum_{k=1}^j \tilde{b}_j - K$ of the deterministic quantity bj introduced above, problem (2) then turns into an optimization problem with individual chance constraints:

$\max \sum_{i=1}^n a_{im} x_i \quad \text{subject to} \quad P\left( \sum_{i=1}^n a_{ij} x_i \geq \xi_j \right) \geq p, \quad (j=1,\ldots,m) . \tag{3}$

The term 'individual' relates to the fact that each of the (stochastic) constraints $\sum_{i=1}^n a_{ij} x_i \geq \xi_j$ is transformed into a chance constraint individually. A different - and more realistic - version will be presented later. Before solving (3), we note that, due to the distribution assumptions on $\tilde{b}_j$, the random vector $\xi$ has a multivariate normal distribution with mean vector b and with covariance matrix

$\left( \begin{array}{cccc} s_1^2 & s_1^2 & \cdots & s_1^2 \\ s_1^2 & s_1^2 + s_2^2 & \cdots & s_1^2 + s_2^2 \\ \vdots & \vdots & & \vdots \\ s_1^2 & s_1^2 + s_2^2 & \cdots & s_1^2 + \cdots + s_m^2 \end{array} \right) \tag{4}$

In particular, $\xi_j$ has variance $\tilde{s}_j^2 = \sum_{k=1}^j s_k^2$. Now, we pass to the standardized random variables

$h_j := \tilde{s}_j^{-1}(\xi_j - b_j),$

which are normally distributed with zero mean and unit standard deviation. Then,

$P\left( \sum_{i=1}^n a_{ij} x_i \geq \xi_j \right) = P \left( \tilde{s}_j^{-1} \left(\sum_{i=1}^n a_{ij} x_i - b_j \right) \geq h_j \right) .$

Denoting by qp the p-quantile of the standard normal distribution (i.e., q0.95=1.65), we have the following equivalence:

$P \left( \sum_{i=1}^n a_{ij} x_i \geq \xi_j \right) \geq p \Leftrightarrow \sum_{i=1}^n a_{ij} x_i \geq \tilde{b}_j := b_j + \tilde{s}_j q_p .$

In other words, the chance constrained problem (3) becomes a simple linear programming problem again, but now with inequality constraints having higher right hand side values than in (2). The difference $\tilde{s}_j q_p$  may be interpreted as a safety term. The solution of (3) with p=0.95 is

 x1=62.8,   x2=72.6,   x3=101.1,   final amount of cash: 103,925.

This solution is evidently more in favour of short term bonds and it realizes a smaller final amount of cash. On the other hand, applying this solution to the same 100 payment profiles used above, it leads to cash profiles with quite rare constraint violations (see Fig. 1 b). The thin line in Fig. 1 c shows that the maximum percentage of negative cash is around 5% in good coincidence with the chosen probability level. The gain in reliability may be considered as substantial when compared to the loss of final cash. More insight is provided here by Fig. 1 d, where the optimal value of problem (3) is plotted as a function of the probability level p. The decrease of the optimal value is quite moderate over a wide range of p, and it is only beyond 0.95 that a further gain in safety is bought by drastic losses in final cash.

When dealing with stochastic constraints, one could be led to the simple idea of replacing the random parameter by its expectation. In our example, this approach would reduce to the deterministic problem (since the deterministic payments were taken as expectations for the stochastic payments). Therefore, the solution based on chance constraints illustrates at the same time the superiority with respect to the reliability/costs ratio over the expected value solution.

### 2.3  Stochastic Version: joint chance constraints

The model with individual chance constraints is appealing for its simple solvability. It has, however, the substantial drawback of not reflecting the proper safety requirements. It guarantees that, for each year fixed, the probability of negative cash is small. Yet, the probability of having negative cash at least once in the considered period may remain high. To illustrate this argument, Figure 2 plots again the 100 simulated cash profiles with those profiles being emphasized as dark lines which become negative at least one time. In case of the expected value solution (Figure 2 a) there are 82 profiles with occasional negative cash. This is an even more striking argument against the use of expected value solutions than the one of the preceding section. For the solution of individual chance constraints (Figure 2 b) it turns out that 16 profiles fall below zero at certain times. In other terms: the probability of maintaining positive cash over the whole period is around 0.84 and certainly significantly lower than the level of 0.95 chosen for the indivdual chance constraints. Obviously different subsets of profiles violate the constraint at different times. Of course, in practice, one has to make sure that - with high probability - cash remains positive over the whole time interval.

Figure 2: Illustration of the cash matching problem (deterministic version versus individual and joint chance constraints). For details see text.

In order to take care of what one could call 'uniform safety', one would have to replace the m individual chance constraints in (3) by one single joint chance constraint of the form

$P \left( \sum_{i=1}^n a_{ij} x_i \geq \xi_j \ (j=1,\ldots, m) \right) \geq p . \tag{5}$

Observe the difference between (5) and the constraint in (3) which is given by the position of the the counter $j=1,\ldots,m$. At first glance, dealing with a single rather than many inequalities, seems to be a simpler task. However, the opposite is true as the calculation of (5) requires dealing with multidimensional distributions (see Section 3 below). The solution of the cash matching problem with joint chance constraints is

 x1=65.8,   x2=83.7,   x3=86.2,   final amount of cash: 98,160.

Comparing this with the previous solutions, once again there is some shift from long term to short term bonds, and the final amount of cash has slightly decreased again. On the other hand, robustness is significantly improved: Just 3 out of 100 cash profiles fall below zero, i.e. 97% of the profiles stay positive over the whole period (see Figure 2 c). This is in good coincidence with the chosen probability level p=0.95 and illustrates the difference from individual chance constraints.

Although the cash matching problem is a prototype example for models with joint chance constraints, there are certain situations where the use of individual chance constraints is appropriate. This applies, in particular, when proper working of some system (power generation, telecommunication) is not affected as long as only a sufficiently small number of components of that system fails. Then it does not make much sense to insist on proper working of a large majority of components over the whole period considered (joint chance constraints). Rather, at each time, a possibly varying large subset of components should be out of failure.

Finally, we emphasize that our presentation of the cash matching problem is just a first approximation to a real life model. Better models would include the possibility of short term borrowing in case of constraint violation. This leads to a mixture of probabilistic constraints and multistage programs which is very challenging but goes beyond the purpose of illustration here. Also, to justify the simplified setting, one could imagine that, for some reason, borrowing is excluded or future conditions of borrowing are extremely uncertain. It has to be mentioned that our model has a similar mathematical structure as many problems in engineering dealing with so called storage level constraints [14,21,24].

## 3  Algorithmic Challenges and Theory

Not surprisingly, there does not exist a general solution method for chance constrained programs. The choice strongly depends on how random and decision variables interact in the constraint model. Sometimes a linear programming solver will do the job (e.g., the cash matching problem with individual chance constraints, see Section 2.2). In other models, one has to have access to values and gradients of multidimensional distribution functions (e.g., cash matching problem with joint chance constraints, see Section 2.3). Of particular interest is the application of algorithms from convex optimization. Convexity of chance constraints, however, does not only depend on convexity properties of the constraint function h in (1) but also of the distribution of the random parameter x. The question of whether this distribution is continuous or discrete is another crucial aspect for algorithmic treatment. The biggest challenges from the algorithmic and theoretical points of view arise in chance constraints where random and decision variables cannot be decoupled.

In contrast to conventional optimization problems, those including chance constraints introduce the presence of two kinds of approximations: first, the distribution of the random parameter is almost never known exactly and has to be estimated from historical data. Secondly, even a given multivariate distribution (such as multivariate normal) cannot be calculated exactly in general but has to be approximated by simulations or bounding arguments. Both types of imprecision motivate the discussion of stability in programs with chance constraints. All issues discussed up to now illustrate the close tie between algorithmic, structural and stability aspects. Some of these shall be briefly presented in the following sections. For the numerical solution of problems including chance constraints with random right hand side, we refer to the SLP-IOR model management system (see [18] and Section 5on web links). If more general models are considered, specially designed algorithms are required.

### 3.1  Structural Aspects

Starting from a system of stochastic inequalities $h_j(x,\xi) \geq 0$, we have introduced before individual chance constraints as

$a_j(x) \geq p \ (j=1,\ldots,m), \ \text{where } a_j(x) := P(h_j(x,\xi) \geq 0) \ (j=1,\ldots,m)$

and joint chance constraints as

$a(x) \geq p, \ \text{where } a(x) := P\bigl(h_j(x,\xi) \geq 0, (j=1,\ldots,m) \bigr) .$

Formally, in both cases, one arrives at constraints on the decision variables as in usual optimization problems. However, $a$ or the $a_j$ are not given by an explicit formula but rather defined as probabilities of some implicitly defined regions in the space of the random parameter x. Calculating values, gradients and possibly Hessians of these functions remains the main challenge in chance-constrained programming. The degree of difficulty in doing so, will mainly depend on the following items:

• Model of chance constraints (individual or joint)

• Assumptions on the random vector (e.g., continuous or discrete distribution, independent components)

• Type of stochastic inequalities (e.g., linear, convex, random right hand side)

The most favorable situation arises when $h_j(x,\xi) = g_j(x) - \xi_j$ $(j=1,\ldots,m)$ for some functions gj. In that case, the stochastic inequalities take the form $g_j(x) \geq \xi_j$ so that the random parameter appears on the right hand side. Using the distribution function $F_X(z) := P(X \leq z)$ for some random variable $X$, we may then write the individual chance constraints as

$a_j(x) = P(g_j(x) \geq \xi_j) = F_{\xi_j}(g_j(x)) \quad (j=1,\ldots,m)$

We recall that, for a one-dimensional random variable $X$, the following relation holds true with respect to the p- quantile qp of $F_X$

$F_X(t) \geq p \Leftrightarrow t \geq q_p := \inf\{ t \mid F_X(t) \geq p \}.$

Consequently, the individual chance constraints can be rewritten as

$a_j(x) \geq p \Leftrightarrow g_j(x) \geq q^{(j)}_p, \quad (j=1,\ldots,m),$

where $q^{(j)}_p$  is the p- quantile of $F_{\xi_j}$. In other words: individual chance constraints with random right hand side inherit their structure from the underlying stochastic constraint. If the latter was linear (as in the cash matching problem) then the induced individual chance constraints will be linear too. Unfortunately, this observation does not generalize to joint chance constraints. For random right hand side, the constraint function may still be written as a composition

$a(x) = P\left( g_j(x) \geq \xi_j \ (j=1,\ldots,m) \right) = F_{\xi}(g(x)), \tag{6}$

where $g=(g_1,\ldots,g_m)$ and $F_{\xi}$ is the distribution function of the random vector $\xi$. However, now the distribution function is multidimensional and simple quantile arguments can no longer be applied. Evidently, in order to calculate values and derivatives of a one has to be able to do so for multidimensional distribution functions (as g is given by an analytic formula in general). From the structural point of view, the composition formula $a = F_{\xi} \cdot g$ allows one to transfer analytic properties like continuity, (local or global) Lipschitz continuity or smoothness from $F_{\xi}$ and $g$ to $a$. As far as convexity is concerned, we refer to Section 3.3.

If the random vector $\xi$ has independent components, then the calculation of $a$ breaks down to one dimensional distribution values again:

$a(x) = F_{\xi_1}(g_1(x))\cdots F_{\xi_m}(g_m(x)).$

Although the constraint $a(x) \geq p$ cannot be further simplified to an explicit constraint involving just the $g_j$ (as was the case for individual chance constraints), one may benefit from the fact that one dimensional distributions are usually easy to calculate. On the other hand, the assumption of independent components is not justified in many applications. For instance, in the cash matching problem, the original random data (payments) were assumed to be independent, yet the cumulative data - which appeared in the chance constraint models - are definitely not independent as is evident from the nondiagonal covariance matrix in (4).

### 3.2  Random right-hand side with nondegenerate multivariate normal distribution

Perhaps the most important special case in practical applications arises from joint chance constraints with random right-hand side having a nondegenerate multivariate normal distribution. This assumption was made for instance, in the cash matching problem of Section 2.3. According to (6) the constraint takes the form $F_{\xi}(g(x)) \geq p$ then. As g is explicitly given by a formula, in general, the evaluation of such constraints by optimization algorithms requires the calculation of $F_{\xi}$, $\nabla F_{\xi}, \ldots,$  i.e., of values and (higher) derivatives of a nondegenerate multivariate normal distribution function. Fortunately, gradients of such distribution functions can be reduced analytically to some lower dimensional multivariate normal distribution functions (see [21], p. 204). Thus, proceeding by induction for higher order derivatives, the whole optimization issue hinges upon the evaluation of nondegenerate normal distribution functions in this situation. Much progress has been made here when the dimension of x is moderate. Two basic approaches can be distinguished: the first one is a combination of simulation and bounding techniques. Efficient bounds on probabilities of the intersection of events (special case: distribution functions) are based on improvements of classical Bonferroni bounds and on sophisticated graph theoretical derivations (e.g., [3,4,19]). Starting from simulated samples, these bounds may be used for constructing a variety of unbiased estimators for the probability to be approximated. An optimal convex combination of these estimators produces a final estimator having much less variance and, hence, yielding much more precise approximations than the original ones (e.g., [31]). A simulation method specially designed for multivariate normal distributions has been proposed in [6]. The second approach, which seems to be more precise in low dimensions, relies on numerical integration (e.g., [11,29]). A Fortran code is available at [10]. A hybrid method combining the two approaches is discussed in [9].

### 3.3  Convexity

Convexity is a basic issue for theory (structure, stability) and algorithms (convergence towards global solutions) in any optimization problem. In chance constrained programming, the first question one could deal with is convexity of the feasible set defined say by a very simple probabilistic constraint of the type

$\left\{ x \mid P(\xi \leq x) \geq p \right\} = \{ x \mid F_{\xi}(x) \geq p \}. \tag{7}$

It is well known that such a set is convex if $F_{xi}$ is a quasiconcave function. Although distribution functions can never be concave or convex (due to being bounded by zero and one) it turns out that many of them are quasiconcave. The left plot of Figure 3 shows the graph of the bivariate normal distribution function with independent components. It is neither concave nor convex, but all of its upper level sets are convex (the boundary of the upper level set corresponding to the level p=0.5 is depicted by a curve on the graph).

Figure 3: Bivariate normal distribution function (left) and standard normal distribution and its logarithm (right).

For algorithmic purposes it is often desirable to know that the function defining an inequality constraint of type ' ³ ' is not just quasiconcave but actually concave. As mentioned above, this cannot hold for inequalities of type (7). However, a suitable transformation might do the job. Indeed, it turns out that most of the prominent multivariate distribution functions (e.g., multivariate normal, uniform distribution on convex compact sets, Dirichlet, Pareto, etc.) share the property of being log-concave, i.e., $\log F_{\xi}$ is concave (an illustration for the one-dimensional normal distribution and its log is given in the right plot of Figure 3). The key for verifying such a nontrivial property for the distribution function is to check the same property of log-concavity for the density of $F_\xi$, if it exists. The latter task is easy in general. For instance, a nondegenerate normal density is proportional to the exponential of a concave function, hence multivariate normal distributions are logarithmically concave. The mentioned result is a consequence of a famous theorem due to Prékopa [21]. Now, when $F_{\xi}$ is log-concave, (7) may be equivalently rewritten as a concave inequality constraint $\{ x : \log F_{\xi}(x) \geq \log p \}$.

Nothing changes of course for more general constraints of the type $F_{\xi}(Ax) \geq p$ where A is a matrix. This allows us, for instance, to solve the cash matching problem with joint chance constraints by means of convex optimization algorithms, such as the supporting hyperplane method. Care has to be taken here, however, of the imprecise nature of $F_{\xi}$ which is calculated by the methods mentioned in Section 3.2. For details of adapted algorithms, we refer to [18,21,30].

A challenging question with importance both for algorithms and stability (see Section 3.4) is to characterize strong concavity of the log of distribution functions. Examples of distributions sharing this property are the uniform distribution on rectangles [13] and the multivariate normal distribution with independent components [15]. It is interesting to observe that uniform distributions on arbitrary polytopes may lack strong log-concavity (e.g., on conv {(0,1),(1,1)(1,0)}). The problem of normal distributions with correlated components is open.

### 3.4  Stability

In stochastic programming models, it is usually assumed that the probability distribution of the random parameter is known. In practice, however, this is rarely the case and one is faced with the need to approximate this distribution by parameter estimation or empirically. Solving the problem with approximating distributions yields solutions and optimal values that differ from those of the "true" problem. The main concern then is whether small approximation errors may lead to large deviations between solutions, or, expressed the other way around, whether it pays to spend large efforts in obtaining good approximations in order to arrive at solutions of high precision. The example of the 'value-at-risk' (see below) confirms that even the most simple chance constrained problems may fail to have stable solutions. In multistage stochastic programming this issue is central, for instance, in the context of scenario reduction/construction. [25] offers a general introduction to stability of stochastic programming problems. Recent results on stability in chance constrained programming can be found in [13,15,17]. To give a (strongly simplified) idea of these results, consider a program of type

$\min \{ f(x) \mid F_{\xi}(Ax) \geq p \} . \tag{8}$

In particular the cash matching problem fits this setting upon rewriting (5) in terms of the distribution function $F_{\xi}$. We denote by $v(\xi)$ the optimal value associated with (8), where $\xi$ is considered as a varying random parameter close to some fixed $\zeta$ (the theoretical random vector). The following result holds:

Theorem 1 In (8), let f be convex and let $\zeta$ have a log-concave distribution function $F_{\zeta}$. If there exists some $\xi$ such that $F_{\zeta}(A \xi) > p$, and if the solution set for (8) at $\xi = \zeta$ is nonempty and bounded, then one has a local Lipschitz estimate of the type

$| v(\xi) - v(\zeta) | \leq L \sup_{z \in \mathbb{R}^s} | F_{\xi}(z) - F_{\zeta}(z) | . \tag{9}$

At the same time, the solution set mapping is upper semicontinuous at $\zeta$.

To achieve quantitative results (Hölder- or Lipschitz stability) for solution sets, one has to impose further structural conditions (strong log-concavity, strict complementarity, C1,1-differentiability, see [15]).

Let us apply the theorem above to the well-known p-value-at-risk ($p \in [0,1]$) which is defined for a one dimensional random variable $\xi$ as the quantity

$\mathrm{VaR}_p(\xi) = \inf \{ x \mid P(\xi \geq x) \geq p \}.$

This is clearly a special instance of (8) with $f(\xi) = \xi)$, A=1, so VaRp is the optimal value of a special chance constrained program. In particular, if $\zeta$ has a log-concave distribution and $p \in (0,1)$ , then the assumptions of the stability result in Theorem 1, and we may derive from (9) the following Lipschitz continuity result for VaRp:

$\mathrm{VaR}_p(\xi) - \mathrm{VaR}_p(\zeta) \leq L \sup_{z \in \mathbb{R}^s} | F_{\xi}(z) - F_{\zeta}(z) | . \tag{10}$

Note that, in general, the value-at-risk does not depend continuously on $\xi$ (e.g., for discrete distributions). Hence, (10) strongly depends on the assumption of $\zeta$ having a log-concave distribution.

### 3.5  Beyond the classical setting

The setting of joint chance constraints with random right-hand side and nondegenerate multivariate normal distribution enjoys many desirable features such as differentiability or convexity (via log-concavity). Of course, other settings may have practical importance too. For instance, the distribution of the random right-hand side could be other than normal. The cases of multivariate Gamma or Dirichlet distributions are discussed in [21], Section 6.6. Here, log-concavity remains an important tool.

Things become different, however, when passing to discrete distributions. These are of interest for at least two reasons: first, the problem to be solved could have been directly modeled by discrete random variables (see, e.g., [1]). Second, there may be a need to approximate continuous distributions (e.g., multivariate normal) by discrete ones, for instance when treating probabilistic constraints in two stage models with scenario formulations [27]. The key issue in discrete chance constrained programming is finding the so called p-efficient points (introduced in [20]) of the distribution function $F_{\xi}$ of $\xi$. These are points z such that $F_{\xi}(z) \geq p$ and the relations $F_{\xi}(y) \geq p, y \leq z$ (partial order of vectors) imply that $y=z$. One easily observes that all the information about the p-level set of $F_\xi$ is contained in these points because

$\{ y \mid F_{\xi}(y) \geq p \} = \bigcup_{z \in E} (z + \mathbb{R}_+^s),$

where E is the set of p-efficient points and R+s is the positive orthant in the space of the random vector. In the case of $\xi$ having integer-valued components and $p \in (0,1)$, $E$ is a finite set (see Theorem 1 in [7]). Algorithms for enumerating or generating p-efficient points are described, for instance, in [1,7,22,23]. It is interesting to note that the log-concavity concept, even if not directly applicable, can be adapted with useful consequences to discrete distributions as well [7].

When aiming at the more general case of probabilistic constraints, where the random parameter x does not necessarily appear on the right hand side of the inequalities, it is mainly the convex or polyhedral case under normal distribution which has some good chance of efficient treatment. More precisely, the function $h(x,\xi)$ is assumed to be quasiconcave in $\xi$ so that the probability of the convex set $\{\xi \mid h(x,\xi) \geq 0 \}$ is to be calculated then. This can be done, for each given x, by the already mentioned method proposed in [6] which is designed to calculate normal probabilities of convex sets. However, unlike the case of normal distribution functions, the derivative of $P(h(\cdot,\xi) \geq 0)$ no longer reduces analytically to values of the distribution function itself. A general derivative formula for probabilities of parameter dependent inequalities has been established in [32], but its practical use may be limited.

There is more hope in the specific polyhedral case $h(x,\xi) = A(x)\xi - b(x)$, where A(x) and b(x) are matrix and vector functions, respectively. For each x fixed, (1) amounts to the calculation of the probability of some polyhedron. Apart from treating polyhedra as special convex sets and applying [6] again, one could alternatively pass to the transformed random vector $h_{\xi} := -A(x)\xi$ so that (1) can be equivalently written in terms of the distribution function

$F_{h_{\xi}}(-b(x)) \geq p .$

Then, apparently, one seems to be back to the classical setting discussed in Section 3.2. For instance, if $\xi$ had a multivariate normal distribution, then so does $h_{\xi}$ (as a linear transform). However, in many important applications (like capacity expansion of networks with stochastic demands, see [21], section 14.3), the number of inequalities in the system $A(x)\xi \geq b(x)$ will typically exceed the dimension of $\xi$ even after elimination of redundance. As a consequence, $F_{h_{\xi}}$ will have a singular normal distribution which can no longer be calculated by the methods mentioned in Section 3.2. An algorithm for calculating singular normal distributions is proposed in [12]. A different approach, which is more efficient in moderate dimension, relies on directly calculating (regular) normal distributions of polyhedra. Based on a specific inclusion-exclusion formula for polyhedra proved in [5], the problem can be reduced to the calculation of a sum of regular normal distribution functions. At the same time, this offers the possibility of obtaining gradients too.

## 4  Concluding remarks

Chance constraints offer a way to model reliability in optimization problems. This brief introduction could neither provide a survey on the whole subject nor give a representative list of references. There are many topics of interest not covered by this paper. Research on theory and algorithms of chance constraints is quickly progressing with a focus on risk aversion (e.g., integrated chance constraints or stochastic dominance) which is important in finance applications. This in turn introduces additional interesting structures like "semi-infinite chance constraints" (infinite number of continuously indexed inequalities).

## References

[1]
P. Beraldi and A. Ruszczynski. A branch and bound method for stochastic integer problems under probabilistic constraints. Optimization Methods and Software, 17:359-382, 2002.

[2]
J.R. Birge and F. Louveaux. Introduction to Stochastic Programming. Springer, New York, 1997.

[3]
E. Boros and A. Prékopa. Closed form two-sided bounds for probabilities that exactly r and at least r out of n events occur. Mathematics of Operations Research, 14:317-342, 1989.

[4]
J. Bukszár and T. Szántai. Probability bounds given by hypercherry trees. Optimization Methods and Software, 17:409-422, 2002.

[5]
J. Bukszár, R. Henrion, M. Hujter and T. Szántai. Polyhedral inclusion-exclusion. Preprint No. 913, Weierstrass Institute Berlin, 2004, submitted.

[6]
I. Deák. Computing probabilities of rectangles in case of multidimensional distribution. Journal of Statistical Computation and Simulation, 26:101-114, 1986.

[7]
D. Dentcheva, A. Prékopa and A. Ruszczynski. Concavity and efficient points for discrete distributions in stochastic programming, Mathematical Programming, 89:55-79, 2000.

[8]
D. Dentcheva, B. Lai and A. Ruszczynski. Efficient point methods for probabilistic optimization problems. Stochastic Programming E-Print Series (SPEPS), No. 25, 2003.

[9]
H.I. Gassmann, I. Deák and T. Szántai. Computing multivariate normal probabilities: A new look. Journal of Computational and Graphical Statistics, 11:920-949, 2002.

[10]
A. Genz. http://www.sci.wsu.edu/math/faculty/genz/homepage

[11]
A. Genz. Numerical computation of the multivariate normal probabilities. Journal of Computational and Graphical Statistics, 1:141-150, 1992.

[12]
A. Genz and K. Kwong. Numerical evaluation of Singular Multivariate Normal Distributions, Journal of Statistical Computation and Simulation, 68:1-21, 2000.

[13]
R. Henrion and W. Römisch. Metric regularity and quantitative stability in stochastic programs with probabilistic constraints. Mathematical Programming, 84:55-88, 1999.

[14]
R. Henrion and A. Möller. Optimization of a continuous distillation process under random inflow rate. Computers&Mathematics with Applications, 45:247-262, 2003.

[15]
R. Henrion and W. Römisch. Hölder and Lipschitz stability of solution sets in programs with probabilistic constraints. Mathematical Programming, 100:589-611, 2004.

[16]
P. Kall and S. Wallace. Stochastic Programming. Wiley, New York, 1994. Online available, see SP Resources at stoprog.org

[17]
V. Kanková. A note on multifunctions in stochastic programming. Lecture Notes in Economics and Mathematical Systems, 458:154-168, 1998.

[18]
J. Mayer. Stochastic Linear Programming Algorithms. Gordon and Breach, Amsterdam, 1998.

[19]
A. Prékopa. Boole-Bonferroni inequalities and linear programing. Operations Research, 36:145-162, 1988.

[20]
A. Prékopa. Sharp bound on probabilities using linear programming. Operations Research, 38:227-239, 1990.

[21]
A. Prékopa. Stochastic Programming. Kluwer, Dordrecht, 1995.

[22]
A. Prékopa, B. Vizvári and T. Badics. Programming under probabilistic constraint with discrete random variable. In (F. Giannessi et al. eds.): New Trends in Mathematical Programming, pp. 235-255, Kluwer, Dordrecht, 1998.

[23]
A. Prékopa. Probabilistic programming. In: [28] (Chapter 5).

[24]
A. Prékopa and T. Szántai. Flood control reservoir system design using stochastic programming. Mathematical Programming Study, 9:138-151, 1978.

[25]
W. Römisch. Stability of Stochastic Programming Problems. In: [28] (Chapter 8).

[26]
A. Ruszczynski. http://www.rusz.rutgers.edu

[27]
A. Ruszczynski. Probabilistic programming with discrete distributions and precedence constrained knapsack polyhedra. Mathematical Programming, 93:195-215, 2002.

[28]
A. Ruszczynski and A. Shapiro. Stochastic Programming. Handbooks in Operations Research and Management Science, Vol. 10. Elsevier, Amsterdam, 2003.

[29]
M. Schervish. Multivariate normal probabilities with error bound. Applied Statistics, 33:81-87, 1984.

[30]
T. Szántai. A computer code for solution of probabilistic-constrained stochastic programming problems. In (Y. Ermoliev and R.J.-B. Wets eds.): Numerical Techniques for Stochastic Optimization, pp. 229-235, Springer, 1988.

[31]
T. Szántai and A. Habib. On the k-out-of-r-from-n:F System with Unequal Element Probabilities. In (F. Giannessi et al. eds.): New Trends in Mathematical Programming, pp. 289-303, Kluwer, Dordrecht, 1998.

[32]
S. Uryasev. Derivatives of probability functions and integrals over sets given by inequalities. Journal of Computational and Applied Mathematics 56:197-223, 1994.

File translated from TEX by TTH, version 3.49.
On 04 Sep 2004, 00:14.

### Current Research Areas

Modelling & Applications Algorithms & Computation Theory
Chance-constrained models x x x
Decision-dependent models x
Deterministic approximations   x x
Duality     x
Multi-stage programs x x x
Risk x   x
Scenario generation x   x
Scheduling x x x
Simple recourse x x x
Software/modeling systems/languages x x x
Stability     x
Statistical approximations   x x
Stochastic equilibria,
multi-person stochastic games
x   x
Stochastic integer programming x x x
Two-stage programs x x x
Application areas:
agriculture x
capacity planning x
energy x x
engineering x
finance x x x
fisheries management x
forestry x
interdiction x
military x
production control x
sport x
telecommunications x
transportation x
water management x