I talked a fair bit about this idea of a binary vector that results from embedding a ReLU activated neural network into a MIP. This is denoted as $\mathscr{Z}(x)$ throughout the presentation. But where does that come from?
Since the ReLU function is just the max between 0 and its input, we can model this using big-M constraints, $out \ge 0, out \ge in, out \le in + Mz, out \le M(1-z)$.
The following is the general MIP formulation of a ReLU function $$ out \ge in, out \ge 0$$
This is my first post using hugo