Software Analysis презентация

Содержание

Слайд 2

Software Analysis Analysis tool Model checking Static analysis Deductive reasoning

Software Analysis

Analysis tool
Model checking
Static analysis
Deductive reasoning
Testing
Runtime monitoring

Product M

Specification S

Program P

Logics/automata
Ad-hoc patterns
Implicit

(built in tool)
Program annotations

Automata-theoretic Verification
P: Generator for possible executions
S: Acceptor for (in)correct executions
Model checking: Language inclusion
Runtime monitoring: Membership

Слайд 3

do { KeAcquireSpinLock(); nPacketsOld = nPackets; if(request){ request = request->Next;

do {
KeAcquireSpinLock();
nPacketsOld = nPackets;
if(request){
request = request->Next;
KeReleaseSpinLock();
nPackets++;
}
} while (nPackets != nPacketsOld);
KeReleaseSpinLock();

SLAM

Verification Example

Does this code
obey the
locking spec?

Слайд 4

Appeal of Regular Languages Well-understood expressiveness: multiple characterizations Deterministic/nondeterministic/alternating finite

Appeal of Regular Languages

Well-understood expressiveness: multiple characterizations
Deterministic/nondeterministic/alternating finite automata
Regular expressions
Monadic

second order logic of linear order
Syntactic congruences
Regular languages are effectively closed under many operations
Union, intersection, complement, conactenation, Kleene-*, homomorphisms…
Algorithms for decision problems
Membership
Determinization and minimization
Language emptiness (single-source graph reachability)
Language inclusion, language equivalence …
Слайд 5

Checking Structured Programs Control-flow requires stack, so (abstracted) program P

Checking Structured Programs

Control-flow requires stack, so (abstracted) program P defines

a context-free language
Algorithms exist for checking regular specifications against context-free models
Emptiness of pushdown automata is solvable
Product of a regular language and a context-free language is context-free
But, checking context-free spec against a context-free model is undecidable!
Context-free languages are not closed under intersection
Inclusion as well as emptiness of intersection undecidable
Existing software model checkers: pushdown models (Boolean programs) and regular specifications
Слайд 6

Are Context-free Specs Interesting? Classical Hoare-style pre/post conditions If p

Are Context-free Specs Interesting?

Classical Hoare-style pre/post conditions
If p holds when

procedure A is invoked, q holds upon return
Total correctness: every invocation of A terminates
Integral part of emerging standard JML
Stack inspection properties (security/access control)
If setuuid bit is being set, root must be in call stack
Interprocedural data-flow analysis
All these need matching of calls with returns, or finding unmatched calls
Recall: Language of words over [, ] such that brackets are well matched is not regular, but context-free
Слайд 7

Checking Context-free Specs Many tools exist for checking specific properties

Checking Context-free Specs

Many tools exist for checking specific properties
Security research

on stack inspection properties
Annotating programs with asserts and local variables
Inter-procedural data-flow analysis algorithms
What’s common to checkable properties?
Both program P and spec S have their own stacks, but the two stacks are synchronized
As a generator, program should expose the matching structure of calls and returns

Solution: Nested words and theory of
regular languages over nested words

Слайд 8

Program Executions as Nested Words Program global int x; main()

Program Executions as Nested Words

Program

global int x;
main() {
x = 3;

if P x = 1 ;
….
}
bool P () {
local int y=0;
x = y;
return (x==0);
}

If a procedure writes
to x, it must later read it

Слайд 9

Words: Data with linear order (Unordered) Trees: Data with hierarchical

Words:
Data with linear order

(Unordered) Trees:
Data with hierarchical

order

Ordered Trees/Hedges:
Data with hierarchical order +
Linear order on siblings

Nested Words (AM06):
Data with linear order +
Nesting edges

Слайд 10

Document Processing HTML Document CSR 2007 Ekaterinburg Park Inn Google

Document Processing

HTML Document



CSR 2007



Ekaterinburg


Park Inn



Google


Microsoft


Query 1: Find documents that contain “Ekaterinburg” followed by “Google”
(refers to linear/word structure)
Query 2: Find documents related to conferences sponsored by Google in Ekaterinburg
(refers to hierarchical/tree structure)

Query Processing

Model a document d as a nested word
Nesting edges from to
Compile query into automata over nested words
Analysis: Membership question
Does document d satisfy query L ?

Слайд 11

Talk Overview Introduction to Nested Words Regular Languages of Nested

Talk Overview

Introduction to Nested Words
Regular Languages of Nested Words

Relation to Pushdown Automata and Tree Automata
Conclusions and Future Work
Слайд 12

Nested Shape: Linear sequence + Non-crossing nesting edges Nesting edges

Nested Shape:
Linear sequence + Non-crossing nesting edges
Nesting edges can be

pending, Sequence can be infinite

Positions classified as:
Call positions: both linear and hierarchical outgoing edges
Return positions: both linear and hierarchical incoming edges
Internal positions: otherwise

Nested word:
Nested shape + Positions labeled with symbols in Σ

Слайд 13

Linguistic Annotated Data Linguistic data stored as annotated sentences (eg.

Linguistic Annotated Data

Linguistic data stored as annotated sentences (eg. Penn Treebank)
Sample

query: Find nouns that follow a verb which is a child of a verb phrase

NP V Det Adj N Prep Det N N
I saw the old man with a dog today

NP

PP

NP

VP

Слайд 14

RNA as a Nested Word Primary structure: Linear sequence of

RNA as a Nested Word

Primary structure: Linear sequence of nucleotides (A,

C, G, U)
Secondary structure: Hydrogen bonds between complementary nucleotides (A-U, G-C, G-U)

In literature, this is modeled as trees.
Algorithmic question: Find similarity between RNAs using edit distances

Слайд 15

Word operations: Prefixes, suffixes, concatenation, reverse

Word operations:
Prefixes, suffixes, concatenation, reverse

Слайд 16

Tree operations: Inserting/deleting well-matched words Well-matched: no pending calls/returns

Tree operations:
Inserting/deleting well-matched words
Well-matched: no pending calls/returns

Слайд 17

Nested Word Automata (NWA) a1 a2 a3 a4 a5 a6

Nested Word Automata (NWA)

a1

a2

a3

a4

a5

a6

a7

a8

a9

States Q, initial state q0, final states F
Reads

the word from left to right labeling edges with states
Transition function:
δc : Q x Σ -> Q x Q (for call positions)
δi : Q x Σ -> Q (for internal positions)
δr : Q x Q x Σ -> Q (for return positions)
Nested word is accepted if the run ends in a final state

q8

q7

q5

q4

q3

q2

q1

q0

q9=δr(q8,q29,a9)

q6=δi(q5,a6)

(q2,q29)=δc(q1,a2)

q29

q47

Слайд 18

Regular Languages of Nested Words A set of nested words

Regular Languages of Nested Words

A set of nested words is

regular if there is a finite-state NWA that accepts it
Nondeterministic automata over nested words
Transition function: δc: QxΣ->2QxQ, δi :Q x Σ -> 2Q, δr:Q x Q x Σ -> 2Q
Can be determinized: blow-up 2n2
Appealing theoretical properties
Effectively closed under various operations (union, intersection, complement, concatenation, prefix-closure, projection, Kleene-* …)
Decidable decision problems: membership, language inclusion, language equivalence …
Alternate characterization: MSO, syntactic congruences
Слайд 19

Determinization Goal: Given a nondeterministic automaton A with states Q,

Determinization

Goal: Given a nondeterministic automaton A with states Q, construct an

equivalent deterministic automaton B
Intuition: Maintain a set of “summaries” (pairs of states)
State-space of B: 2QxQ
Initially, and after every call, state contains q->q, for each q
At any step q->q’ is in B’s state if A can be in state q’ when started in state q at the most recent unmatched call position
Acceptance: must contain q->q’, where q is initial and q’ is final

q->q
q’->q’…

q->u
q’->v…

u’’->u’
v’’->v’…

u’’->w
u’’->w’
v’’->w’’…

q->w
q->w’
q’->w’’…

q->u’’
q’->v’’…

Слайд 20

Closure Properties The class of regular languages of nested words

Closure Properties

The class of regular languages of nested words is effectively

closed under many operations
Intersection: Take product of automata (key: nesting given by input)
Union: Use nondeterminism
Closure under prefixes and suffixes
Complementation: Complement final states of deterministic NWA
Concatenation/Kleene*: Guess the split (as in case of word automata)
Reverse (reversal of a nested word reverses nested edges also)
Слайд 21

Decision Problems Membership: Is a given nested word w accepted

Decision Problems

Membership: Is a given nested word w accepted by

NWA A?
Solvable in polynomial time
If A is fixed, then in time O(|w|) and space O(nesting depth of w)
Emptiness: Given NWA A, is its language empty?
Solvable in time O(|A|3): view A as a pushdown automaton
Universality, Language inclusion, Language equivalence:
Solvable in polynomial-time for deterministic automata
For nondeterministic automata, use determinization and complementation; causes exponential blow-up, Exptime-complete problems
Слайд 22

MSO-based Characterization Monadic Second Order Logic of Nested Words First

MSO-based Characterization

Monadic Second Order Logic of Nested Words
First order variables:

x,y,z; Set variables: X,Y,Z…
Atomic formulas: a(x), X(x), x=y, x < y, x -> y
Logical connectives and quantifiers
Sample formula:
For all x,y. ( (a(x) and x -> y) implies b(y))
Every call labeled a is matched by a return labeled b
Thm: A language L of nested words is regular iff it is definable by an MSO sentence
Robust characterization of regularity as in case of languages of words and languages of trees
Слайд 23

Application: Software Analysis A program P with stack-based control is

Application: Software Analysis

A program P with stack-based control is modeled by

a set L of nested words it generates
If P has finite data (e.g. pushdown automata, Boolean programs, recursive state machines) then L is regular
Specification S given as a regular language of nested words
Allows many properties not specifiable in classical temporal logics
PAL: instrumentation language of C programs (SPIN 2007)
Verification: Does every behavior in L satisfy S ?
Take product of P and complement of S and analyze
Runtime monitoring: Check if current execution is accepted by S (compiled as a deterministic automaton)
Model checking: Check if L is contained in S, decidable when P has finite data (no extra cost, as analysis still requires context-free reachability)
Слайд 24

Writing Program Specifications Intuition: Keeping track of context is easy;

Writing Program Specifications

Intuition: Keeping track of context is easy; just skip

using a summary edge
Finite-state properties of paths, where a path can be a local path, a global path, or a mixture

Sample regular properties:
If p holds at a call, q should hold at matching return
If x is being written, procedure P must be in call stack
Within a procedure, an unlock must follow a lock
All properties specifiable in standard temporal logics (LTL)

Слайд 25

Temporal Logic of Nested Time: CaRet Global paths, Local paths,

Temporal Logic of Nested Time: CaRet

Global paths, Local paths, Caller paths


Three versions of every temporal modality

Sample CaRet formulas:
(if p then local-next q) global-unless r
if p then caller-eventually q
Global-always (if p then local-eventually q)

Слайд 26

So far: Nested words have appealing theoretical properties with possible

So far: Nested words have appealing theoretical properties with possible applications


Common framework: linear encoding using brackets/tags

Coming up: How do finite nested words compare with ordered trees/hedges?

Слайд 27

Linear Encoding of Nested Words Nested word over Σ is

Linear Encoding of Nested Words

Nested word over Σ is encoded as

a word over tagged alphabet <Σ>
For each symbol a, call , internal a
Two views are isomorphic: every word over <Σ> corresponds to a nested word over Σ
Linear view useful for streaming, and word operations such as prefixes
Number of nested words of length k: (3 |Σ|)k

a1

a2

a3

a4

a5

a6

a7

a8

a9

a10

a11

a12