Publications

DNNV: A Framework for Deep Neural Network Verification

David Shriver, Sebastian Elbaum, Matthew B. Dwyer

Despite the large number of sophisticated deep neural network (DNN) verification algorithms, DNN verifier developers, users, and researchers still face several challenges. First, verifier developers must contend with the rapidly changing DNN field to support new DNN operations and property types. Second, verifier users have the burden of selecting a verifier input format to specify their problem. Due to the many input formats, this decision can greatly restrict the verifiers that a user may run. Finally, researchers face difficulties in re-using benchmarks to evaluate and compare verifiers, due to the large number of input formats required to run different verifiers. Existing benchmarks are rarely in formats supported by verifiers other than the one for which the benchmark was introduced. In this work we present DNNV, a framework for reducing the burden on DNN verifier researchers, developers, and users. DNNV standardizes input and output formats, includes a simple yet expressive DSL for specifying DNN properties, and provides powerful simplification and reduction operations to facilitate the application, development, and comparison of DNN verifiers. We show how DNNV increases the support of verifiers for existing benchmarks from 30% to 74%.

Cite:
David Shriver, Sebastian Elbaum, Matthew B. Dwyer. 2021. DNNV: A Framework for Deep Neural Network Verification. To Appear in CAV 2021.

Download: [Paper] [Tool]

Reducing DNN Properties to Enable Falsification with Adversarial Attacks

David Shriver, Sebastian Elbaum, Matthew B. Dwyer

Deep Neural Networks (DNN) are increasingly being deployed in safety-critical domains, from autonomous vehicles to medical devices, where the consequences of errors demand techniques that can provide stronger guarantees about behavior than just high test accuracy. This paper explores broadening the application of existing adversarial attack techniques for the falsification of DNN safety properties. We contend and later show that such attacks provide a powerful repertoire of scalable algorithms for property falsification. To enable the broad application of falsification, we introduce a semantics-preserving reduction of multiple safety property types, which subsume prior work, into a set of equivalid correctness problems amenable to adversarial attacks. We evaluate our reduction approach as an enabler of falsification on a range of DNN correctness problems and show its cost-effectiveness and scalability.

Cite:
David Shriver, Sebastian Elbaum, Matthew B. Dwyer. 2021. Reducing DNN Properties to Enable Falsification with Adversarial Attacks. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 275-287. https://doi.org/10.1109/ICSE43902.2021.00036

Download: [Paper] [Artifact] [Tool] [Video]

Systematic Generation of Diverse Benchmarks for DNN Verification

Dong Xu, David Shriver, Matthew B. Dwyer, Sebastian Elbaum

The field of verification has advanced due to the interplay of theoretical development and empirical evaluation. Benchmarks play an important role in this by supporting the assessment of the state-of-the-art and comparison of alternative verification approaches. Recent years have witnessed significant developments in the verification of deep neural networks, but diverse benchmarks representing the range of verification problems in this domain do not yet exist. This paper describes a neural network verification benchmark generator, GDVB, that systematically varies aspects of problems in the benchmark that influence verifier performance. Through a series of studies, we illustrate how GDVB can assist in advancing the sub-field of neural network verification by more efficiently providing richer and less biased sets of verification problems.

Cite:
Dong Xu, David Shriver, Matthew B. Dwyer, Sebastian Elbaum. 2020. Systematic Generation of Diverse Benchmarks for DNN Verification. In Computer Aided Verification - 32nd International Conference, CAV 2020, Los Angeles, CA, USA, July 21-24, 2020, Proceedings, Part I. 97-121. https://doi.org/10.1007/978-3-030-53288-8_5

Download: [Paper]

Refactoring Neural Networks for Verification

David Shriver, Dong Xu, Sebastian Elbaum, Matthew B. Dwyer

Deep neural networks (DNN) are growing in capability and applicability. Their effectiveness has led to their use in safety critical and autonomous systems, yet there is a dearth of cost-effective methods available for reasoning about the behavior of a DNN. In this paper, we seek to expand the applicability and scalability of existing DNN verification techniques through DNN refactoring. A DNN refactoring defines (a) the transformation of the DNN’s architecture, i.e., the number and size of its layers, and (b) the distillation of the learned relationships between the input features and function outputs of the original to train the transformed network. Unlike with traditional code refactoring, DNN refactoring does not guarantee functional equivalence of the two networks, but rather it aims to preserve the accuracy of the original network while producing a simpler network that is amenable to more efficient property verification. We present an automated framework for DNN refactoring, and demonstrate its potential effectiveness through three case studies on networks used in autonomous systems.

Cite:
David Shriver, Dong Xu, Sebastian Elbaum, Matthew B. Dwyer. 2019. Refactoring Neural Networks for Verification. arXiv:1908.08026. https://arxiv.org/abs/1908.08026

Download: [Paper]

Evaluating Recommender System Stability with Influence-Guided Fuzzing

David Shriver, Sebastian Elbaum, Matthew B. Dwyer, David S. Rosenblum

Recommender systems help users to find products or services they may like when lacking personal experience or facing an overwhelming set of choices. Since unstable recommendations can lead to distrust, loss of profits, and a poor user experience, it is important to test recommender system stability. In this work, we present an approach based on inferred models of influence that underlie recommender systems to guide the generation of dataset modifications to assess a recommender’s stability. We implement our approach and evaluate it on several recommender algorithms using the MovieLens dataset. We find that influence-guided fuzzing can effectively find small sets of modifications that cause significantly more instability than random approaches.

Cite:
David Shriver, Sebastian Elbaum, Matthew B. Dwyer, David S. Rosenblum. 2019. Evaluating Recommender System Stability with Influence-Guided Fuzzing. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019. 4934-4942. https://doi.org/10.1609/aaai.v33i01.33014934

Download: [Paper] [Poster]

Toward the development of richer properties for recommender systems

David Shriver

The performance of recommender systems is commonly characterized by metrics such as precision and recall. However, these metrics can only provide a coarse characterization of the system, as they offer limited intuition and insights on potential system anomalies, and may fail to provide a developer with an understanding of the strengths and weaknesses of a recommendation algorithm. In this work, we start to describe a model of recommender systems that defines a space of properties. We begin exploring this space by defining templates that relate to the properties of coverage and diversity, and we demonstrate how instantiated characteristics offer complementary insights to precision and recall.

Cite:
David Shriver. 2018. Toward the development of richer properties for recommender systems. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceedings, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018. 173-174. http://doi.acm.org/10.1145/3183440.3195082

Download: [Paper] [Poster]

Assessing the Quality and Stability of Recommender Systems

David Shriver

Recommender systems help users to find products they may like when lacking personal experience or facing an overwhelmingly large set of items. However, assessing the quality and stability of recommender systems can present challenges for developers. First, traditional accuracy metrics, such as precision and recall, for validating the quality of recommendations, offer only a coarse, one-dimensional view of the system performance. Second, assessing the stability of a recommender systems requires generating new data and retraining a system, which is expensive. In this work, we present two new approaches for assessing the quality and stability of recommender systems to address these challenges. We first present a general and extensible approach for assessing the quality of the behavior of a recommender system using logical property templates. The approach is general in that it defines recommendation systems in terms of sets of rankings, ratings, users, and items on which property templates are defined. It is extensible in that these property templates define a space of properties that can be instantiated and parameterized to characterize a recommendation system. We study the application of the approach to several recommendation systems. Our findings demonstrate the potential of these properties, illustrating the insights they can provide about the different algorithms and evolving datasets. We also present an approach for influence-guided fuzz testing of recommender system stability. We infer influence models for aspects of a dataset, such as users or items, from the recommendations produced by a recommender system and its training data. We define dataset fuzzing heuristics that use these influence models for generating modifications to an original dataset and we present a test oracle based on a threshold of acceptable instability. We implement our approach and evaluate it on several recommender algorithms using the MovieLens dataset and we find that influence-guided fuzzing can effectively find small sets of modifications that cause significantly more instability than random approaches.

Cite:
David Shriver. Assessing the Quality and Stability of Recommender Systems. MS Thesis, University of Nebraska-Lincoln, 2018.

Download: [Paper]

At the End of Synthesis: Narrowing Program Candidates

David Shriver, Sebastian G. Elbaum, Kathryn T. Stolee

Program synthesis is succeeding in supporting the generation of programs within increasingly complex domains. The use of weaker specifications, such as those consisting of input/output examples or test cases, has helped to fuel the success of program synthesis by lowering adoption barriers. Yet, employing weaker specifications has the side effect of generating a potentially large number of candidate programs. This was not a problem for simpler and smaller program domains, but it is becoming evident that differentiating among many synthesized programs is a challenge that needs addressing. We sketch an approach to mitigate this challenge, requiring less effort from the user while automatically identifying inputs that can differentiate clusters of synthesized programs. The approach has the potential to more cost-effectively narrow the space of candidate programs in a range of synthesis applications.

Cite:
David Shriver, Sebastian G. Elbaum, Kathryn T. Stolee. 2017. At the End of Synthesis: Narrowing Program Candidates. In 39th IEEE/ACM International Conference on Software Engineering: New Ideas and Emerging Technologies Results Track, ICSE-NIER 2017, Buenos Aires, Argentina, May 20-28, 2017. 19-22. https://doi.org/10.1109/ICSE-NIER.2017.7

Download: [Paper]