Symbolic executions (and their recent variants called dynamic symbolic
executions) are an important technique in automated testing. Instead
of analysing only concrete executions of a program, one could treat
such executions symbolically (i.e. with some variables that are not
bound to concrete values) and use constraint solvers to determine this
(symbolic) path feasibility so as to guide the path explorations of
the system under test, which in combination with dynamic analysis
gives the best possible path coverage. For string-manipulating
programs, solvers need to be able to handle constraints over the
string domain. This gives rise to the following natural question: what
is an ideal decidable logic over strings for reasoning about path
feasibility in a program with strings? This is a question that is
connected to a number of difficult results in theoretical computer
science (decidability of the theory of strings with concatenations,
a.k.a., word equations) and long-standing open problems (e.g.
decidability of word equations with length constraints). Worse yet,
recent examples from cross-site scripting vulnerabilities suggest that
more powerful string operations (e.g. finite-state transducers) might
be required as first class citizens in string constraints. Even though
putting all these string operations in a string logic leads to
undecidability, recent results show that there might be a way to
recover decidability while retaining expressivity for applications in
symbolic execution. In this talk, I will present one such result from
my POPL’16 paper (with P. Barcelo). The string logic admits
concatenations, regular constraints, finite-state transductions,
letter-counting and length constraints (which can consequently express
charAt operator, and string disequality). I will provide a number of
examples from the cross-site scripting literature that shows how a
string logic can, for the first time, be used to discover a bug in or
prove correctness of the programs. I will conclude by commenting on a
new decision procedure for the logic that leads to an efficient
implementation (POPL’18 with L. Holik, P. Janku, P. Ruemmer, and T.
Vojnar) and a recent attempt to incorporate the fully-fledged
replace-all operator into a string logic (POPL’18 with T. Chen, Y.
Chen, M. Hague, and Z. Wu).
Buggy and insecure software could have serious consequences including
the loss of human lives, financial losses, and confidential
information leakage, to name a few. Program analysis is a field that
concerns the problem of analysing the behaviour of programs especially
with respect to the issue of correctness. Over the years computational
logic has played an important role in program analysis particularly in
the development of precise automatic methods for verifying the
correctness and optimising the performance of programs. In this talk I
will illustrate how logic can help program analysis, drawing examples
from my own research inspired by challenges in web security (e.g. how
to detect/prevent cross-site scripting vulnerabilities), web
performance optimisation (e.g. how to remove code redundancies in web
pages), and verification of distributed protocols. A theme that will
emerge during the talk is that there is often a tight connection
between logic and automata that can be exploited when designing (often
theoretically optimal) algorithms.
The rapid growth in the size and scope of datasets in science and technology has created a need for novel foundational perspectives on data analysis that blendthe inferential and computational sciences. That classical perspectives from these fields are not adequate to address emerging problems in Data Science is apparent from their sharply divergent nature at an elementary level—in computer science, the growth of the number of data points is a source of “complexity” that must be tamed via algorithms or hardware, whereas in statistics, the growth of the number of data points is a source of “simplicity” in that inferences are generally stronger and asymptotic results can be invoked. On a formal level, the gap is made evident by the lack of a role for computational concepts such as “runtime” in core statistical theory and the lack of a role for statistical concepts such as “risk” in core computational theory. I present several research vignettes aimed at bridging computation and statistics, including the problem of inference under privacy and communication constraints, and including a surprising cameo role for symplectic geometry.
Machine learning (ML) models, e.g., deep neural networks
(DNNs), are vulnerable to adversarial examples: malicious inputs
modified to yield erroneous model outputs, while appearing unmodified
to human observers. Potential attacks include having malicious content
like malware identified as legitimate or controlling vehicle
behavior. Yet, most existing adversarial example attacks require
knowledge of either the model internals or its training data. We will
describe a black-box attack on ML models. These algorithms yield
adversarial examples misclassified by Amazon and Google at rates of
96.19% and 88.94%. We also find that this black-box attack strategy is
capable of evading defense strategies previously found to make
adversarial example crafting harder.
Data nets are yet another extension of Petri Nets in which the relations between consumed and produced tokens are very restricted. Their subclasses like, Unordered Data Petri Nets (UDPN), from the theory perspective are natural and easy to define. Furthermore, similarly
to Petri Nets they have a lot of structure to explore. During the talk, we will start form defining Data Nets and formulating the state equation, a generalization of one of the simplest and most important equations for Petri Nets.
Next, I will present a sketch of the proof of the correctness of the NP-time algorithm, to solve the equation in case of Unordered Data Petri Nets. Finally, I will mention some novel results for other classes of Data Nets and open problems that we are working on.
The talk will base on a joint work with Patrick Totzke and Jerome Leroux
“Linear Combinations of Unordered Data Vectors” published at LICS-2017
and on unpublished results with Sławomir Lasota.
Reachability analysis is a useful tool for checking whether a cyber-physical system satisfies a given safety property. For instance, one could ask whether an electro-magnetic braking system brings a car to a standstill within a given time frame. In set-based reachability, one takes a given set of initial states (ranges for the position and speed of the car) and computes the image of the set of states as it evolves over time. Even for simple types of systems, this so-called reach set can only be computed approximately, and accuracy comes at an extremely steep cost. A highly scalable way to approximate the reach set is known for the special case of linear dynamics. It is based on template polyhedra, which are polyhedra (sets bounded by linear constraints) with normal vectors from a given finite set. Simple instances of template polyhedra are boxes or octagons. A template instance that tightly bounds the reach set is found by solving a set of optimization problems. The accuracy of the approximation can be improved by adding more normal vectors to the template.
In this talk, we propose an approach that extends this idea from linear to nonlinear dynamics. We linearize the system around a reference trajectory and solve ODEs to obtain templates that bound the reach set. The ODEs are particular in that they involve an optimization problem constrained by the template itself. We show how, similarly to the linear case, the template can be adapted over time to match the dynamics of the system. For both static and dynamic templates, we identify conditions that guarantee convergence. The potential of the approach is discussed on several benchmarks.
In this talk, I will address talk about two projects. The first project deals with the question how we can reason about asynchronous programs. We present a program logic and a corresponding type system that allow us to reason locally about programs with asynchronous concurrency and mutable state; we instantiate this model for OCaml using the Lwt library. The key innovation is to introduce the concept of a “wait permission”, which describes the resources that a given task will yield on termination. An earlier version of this work was published at ECOOP ’15. The second project deals with the question how we can perform a kind of “asynchronous parallelization” optimization, where we are given a (more-or-less) sequential program and rewrite it to make use of asynchronous concurrency. We use a set of program rewriting rules, most notably replacing synchronous I/O operations with asynchronous counterparts in a safe way, and pushing wait statements as far back as possible. As it turns out, proving the soundness of these rewriting rules is surprisingly tricky; I will sketch a reasoning approach that allows us to show refinement, in the the following sense: Let $e$ be a program, and $e’$ the result of rewriting $e$ using the given rules. For every terminating execution of $e’$, there is a corresponding terminating execution of $e$ that ends in an equivalent state.
Interactive program synthesizers enable a user to communicate his/her intent via input-output examples. Unfortunately, such synthesizers only guarantee that the synthesized program is correct on the provided examples. A user that wishes to guarantee correctness for all possible inputs has to manually inspect the synthesized program, an error-prone and challenging task.
In this talk, I will present a novel synthesis framework that communicates only through (abstract) examples and guarantees that the synthesized program is correct on all inputs.
The main idea is to use abstract examples—a new form of examples that represent a potentially unbounded set of concrete examples. An abstract example captures how part of the input space is mapped to corresponding outputs by the synthesized program. Our framework uses a generalization algorithm to compute abstract examples which are then presented to the user. The user can accept an abstract example, or provide a counterexample in which case the synthesizer will explore a different program. When the user accepts a set of abstract examples that covers the entire input space, the synthesis process is completed.
I will also show some experimental results to demonstrate that our synthesizer communicates with the user effectively.
Joint work with Sharon Shoham and Eran Yahav.
The central problem addressed in this talk is the following: given the design of a hybrid system, conclude if the system is stable. Hybrid systems are apt to model cyber-physical systems since they capture the mixed discrete-continuous behaviour which appears naturally in cyber-physical systems. Stability is a fundamental property in control system design, which demands that small perturbations in the input to the system lead to small variations in the future behaviour. The classical approach to establishing stability involves finding a particular kind of function, called Lyapunov function. In relation to this method, there is some effort towards automation which consists of a template-based search that uses sum-of-squares solvers. However, the shortcomings with this template-based method are that the choice of a correct template requires designer’s ingenuity and that a template failure does not provide instability reasons or guidance for the choice of the next template.
We propose an alternate algorithmic approach for the stability verification of hybrid systems, which is a counterexample-guided abstraction refinement (CEGAR) framework. The key idea is that given a hybrid system, its stability can be determined by constructing an abstract weighted graph which over-approximates the behaviour of the former one. The relation between the abstraction and the hybrid system is such that if every cycle in the graph has weight below 1, then the hybrid system is stable. In the case that the graph has a cycle with weight greater than 1, this cycle can be evaluated to infer instability of the hybrid system or guide the choice of subsequent templates. This CEGAR approach addresses the shortcomings of the template-based search.
We have implemented the CEGAR algorithm in the tool AVERIST, and we report experimental evaluations on some examples for demonstrating the feasibility of the approach.