MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582
Neural networks have demonstrated considerable success on a wide variety of real-world problems. However, neural networks can be fooled by adversarial examples -- slightly perturbed inputs that are misclassified with high confidence. Verification of networks enables us to gauge their vulnerability to such adversarial examples. We formulate verification of piecewise-linear neural networks as a mixed integer program. Our verifier finds minimum adversarial distortions two to three orders of magnitude more quickly than the state-of-the-art. We achieve this via tight formulations for non-linearities, as well as a novel presolve algorithm that makes full use of all information available. The computational speedup enables us to verify properties on convolutional networks with an order of magnitude more ReLUs than had been previously verified by any complete verifier, and we determine for the first time the exact adversarial accuracy of an MNIST classifier to perturbations with bounded l[infinity] norm e = 0:1. On this network, we find an adversarial example for 4.38% of samples, and a certificate of robustness for the remainder. Across a variety of robust training procedures, we are able to certify more samples than the state-of-the-art and find more adversarial examples than a strong first-order attack for every network.
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.