Linear Regions
I talked a fair bit about this idea of a binary vector that results from embedding a ReLU activated neural network into a MIP. This is denoted as $\mathscr{Z}(x)$ throughout the presentation. But where does that come from?
Since the ReLU function is just the max between 0 and its input, we can model this using big-M constraints, $out \ge 0, out \ge in, out \le in + Mz, out \le M(1-z)$.
Extending to Joint and Multiple Chance Constraints
I only really cover the case in which we have a single chance constraint, but I mentioned that it extends to joint and multiple chance constraints. How does this happen?
Multiple Chance Constraints
To have multiple chance constraints, you just treat each as its own constraint. Since there is no overlap, you can add another set of constraints that correspond to each chance constraint that you want to consider.
For neural networks, we are able to use multiple outputs, where each output node corresponds to a different chance constraint. Then, you add the feasibility constraints individually on each of these outputs, corresponding to each chance constraint that you have. For example $\mathscr{N}_1(x)$ could be the probability that $x$ violates the first chance constraint, so then you could add a constraint $\mathscr{N}_1(x) \le \varepsilon_1$ for chance constraint 1 (where $\mathscr{N}_1(x)$ is the first element of the vector produced by $\mathscr{N}(x)$), $\mathscr{N}_2(x) \le \varepsilon_2$ for chance constraint 2, etc.
Joint Chance Constraints
I’d strongly recommend reading more formal literature if you are more interested in join chance constraints in the SAA case. The very rough explanation of it is that there are multiple components to the chance constraint, so one “scenario” of a chance constraint really means that 2 or 3 or however many components of the joint chance constraint are all generated, and then all need to be satisfied for the “scenario” to be satisfied. We can abstract most of this away since for the neural network, we just need to be able to get a probability of the joint chance constraint being violated.
In-Sample vs Out-Of-Sample
This idea is somewhat more common in machine learning than it is stochastic optimization. Essentially, the goal is to check how well our solution that we get generalizes to the underlying distribution. One way that we can go about this is by taking another set of samples from the underlying distribution, and checking its error. For our purpose, we generate the same number of scenarios to generate a solution from the MIP (In-Sample Scenarios) as we do to calculate the violation on a separate dataset from the underlying distribution (Out-of-Sample Scenarios). This can help to be some measure of how well our solution generalizes to the underlying distribution