Software Analysis презентация

Содержание

Слайд 2

Software Analysis

Analysis tool
Model checking
Static analysis
Deductive reasoning
Testing
Runtime monitoring

Product M

Specification S

Program P

Logics/automata
Ad-hoc patterns
Implicit (built in

tool)
Program annotations

Automata-theoretic Verification
P: Generator for possible executions
S: Acceptor for (in)correct executions
Model checking: Language inclusion
Runtime monitoring: Membership

Software Analysis Analysis tool Model checking Static analysis Deductive reasoning Testing Runtime monitoring

Слайд 3

do {
KeAcquireSpinLock();
nPacketsOld = nPackets;
if(request){
request = request->Next;
KeReleaseSpinLock();
nPackets++;
}
} while (nPackets != nPacketsOld);
KeReleaseSpinLock();

SLAM Verification Example

Does

this code
obey the
locking spec?

do { KeAcquireSpinLock(); nPacketsOld = nPackets; if(request){ request = request->Next; KeReleaseSpinLock(); nPackets++; }

Слайд 4

Appeal of Regular Languages

Well-understood expressiveness: multiple characterizations
Deterministic/nondeterministic/alternating finite automata
Regular expressions
Monadic second order

logic of linear order
Syntactic congruences
Regular languages are effectively closed under many operations
Union, intersection, complement, conactenation, Kleene-*, homomorphisms…
Algorithms for decision problems
Membership
Determinization and minimization
Language emptiness (single-source graph reachability)
Language inclusion, language equivalence …

Appeal of Regular Languages Well-understood expressiveness: multiple characterizations Deterministic/nondeterministic/alternating finite automata Regular expressions

Слайд 5

Checking Structured Programs

Control-flow requires stack, so (abstracted) program P defines a context-free

language
Algorithms exist for checking regular specifications against context-free models
Emptiness of pushdown automata is solvable
Product of a regular language and a context-free language is context-free
But, checking context-free spec against a context-free model is undecidable!
Context-free languages are not closed under intersection
Inclusion as well as emptiness of intersection undecidable
Existing software model checkers: pushdown models (Boolean programs) and regular specifications

Checking Structured Programs Control-flow requires stack, so (abstracted) program P defines a context-free

Слайд 6

Are Context-free Specs Interesting?

Classical Hoare-style pre/post conditions
If p holds when procedure A

is invoked, q holds upon return
Total correctness: every invocation of A terminates
Integral part of emerging standard JML
Stack inspection properties (security/access control)
If setuuid bit is being set, root must be in call stack
Interprocedural data-flow analysis
All these need matching of calls with returns, or finding unmatched calls
Recall: Language of words over [, ] such that brackets are well matched is not regular, but context-free

Are Context-free Specs Interesting? Classical Hoare-style pre/post conditions If p holds when procedure

Слайд 7

Checking Context-free Specs

Many tools exist for checking specific properties
Security research on stack

inspection properties
Annotating programs with asserts and local variables
Inter-procedural data-flow analysis algorithms
What’s common to checkable properties?
Both program P and spec S have their own stacks, but the two stacks are synchronized
As a generator, program should expose the matching structure of calls and returns

Solution: Nested words and theory of
regular languages over nested words

Checking Context-free Specs Many tools exist for checking specific properties Security research on

Слайд 8

Program Executions as Nested Words

Program

global int x;
main() {
x = 3;
if P

x = 1 ;
….
}
bool P () {
local int y=0;
x = y;
return (x==0);
}

If a procedure writes
to x, it must later read it

Program Executions as Nested Words Program global int x; main() { x =

Слайд 9

Words:
Data with linear order

(Unordered) Trees:
Data with hierarchical order

Ordered

Trees/Hedges:
Data with hierarchical order +
Linear order on siblings

Nested Words (AM06):
Data with linear order +
Nesting edges

Words: Data with linear order (Unordered) Trees: Data with hierarchical order Ordered Trees/Hedges:

Слайд 10

Document Processing

HTML Document



CSR 2007



Ekaterinburg


Park

Inn



Google


Microsoft


Query 1: Find documents that contain “Ekaterinburg” followed by “Google”
(refers to linear/word structure)
Query 2: Find documents related to conferences sponsored by Google in Ekaterinburg
(refers to hierarchical/tree structure)

Query Processing

Model a document d as a nested word
Nesting edges from to
Compile query into automata over nested words
Analysis: Membership question
Does document d satisfy query L ?

Document Processing HTML Document CSR 2007 Ekaterinburg Park Inn Google Microsoft Query 1:

Слайд 11

Talk Overview

Introduction to Nested Words
Regular Languages of Nested Words
Relation to

Pushdown Automata and Tree Automata
Conclusions and Future Work

Talk Overview Introduction to Nested Words Regular Languages of Nested Words Relation to

Слайд 12

Nested Shape:
Linear sequence + Non-crossing nesting edges
Nesting edges can be pending, Sequence

can be infinite

Positions classified as:
Call positions: both linear and hierarchical outgoing edges
Return positions: both linear and hierarchical incoming edges
Internal positions: otherwise

Nested word:
Nested shape + Positions labeled with symbols in Σ

Nested Shape: Linear sequence + Non-crossing nesting edges Nesting edges can be pending,

Слайд 13

Linguistic Annotated Data

Linguistic data stored as annotated sentences (eg. Penn Treebank)
Sample query: Find

nouns that follow a verb which is a child of a verb phrase

NP V Det Adj N Prep Det N N
I saw the old man with a dog today

NP

PP

NP

VP

Linguistic Annotated Data Linguistic data stored as annotated sentences (eg. Penn Treebank) Sample

Слайд 14

RNA as a Nested Word

Primary structure: Linear sequence of nucleotides (A, C, G,

U)
Secondary structure: Hydrogen bonds between complementary nucleotides (A-U, G-C, G-U)

In literature, this is modeled as trees.
Algorithmic question: Find similarity between RNAs using edit distances

RNA as a Nested Word Primary structure: Linear sequence of nucleotides (A, C,

Слайд 15

Word operations:
Prefixes, suffixes, concatenation, reverse

Word operations: Prefixes, suffixes, concatenation, reverse

Слайд 16

Tree operations:
Inserting/deleting well-matched words
Well-matched: no pending calls/returns

Tree operations: Inserting/deleting well-matched words Well-matched: no pending calls/returns

Слайд 17

Nested Word Automata (NWA)

a1

a2

a3

a4

a5

a6

a7

a8

a9

States Q, initial state q0, final states F
Reads the word

from left to right labeling edges with states
Transition function:
δc : Q x Σ -> Q x Q (for call positions)
δi : Q x Σ -> Q (for internal positions)
δr : Q x Q x Σ -> Q (for return positions)
Nested word is accepted if the run ends in a final state

q8

q7

q5

q4

q3

q2

q1

q0

q9=δr(q8,q29,a9)

q6=δi(q5,a6)

(q2,q29)=δc(q1,a2)

q29

q47

Nested Word Automata (NWA) a1 a2 a3 a4 a5 a6 a7 a8 a9

Слайд 18

Regular Languages of Nested Words

A set of nested words is regular if

there is a finite-state NWA that accepts it
Nondeterministic automata over nested words
Transition function: δc: QxΣ->2QxQ, δi :Q x Σ -> 2Q, δr:Q x Q x Σ -> 2Q
Can be determinized: blow-up 2n2
Appealing theoretical properties
Effectively closed under various operations (union, intersection, complement, concatenation, prefix-closure, projection, Kleene-* …)
Decidable decision problems: membership, language inclusion, language equivalence …
Alternate characterization: MSO, syntactic congruences

Regular Languages of Nested Words A set of nested words is regular if

Слайд 19

Determinization

Goal: Given a nondeterministic automaton A with states Q, construct an equivalent deterministic

automaton B
Intuition: Maintain a set of “summaries” (pairs of states)
State-space of B: 2QxQ
Initially, and after every call, state contains q->q, for each q
At any step q->q’ is in B’s state if A can be in state q’ when started in state q at the most recent unmatched call position
Acceptance: must contain q->q’, where q is initial and q’ is final

q->q
q’->q’…

q->u
q’->v…

u’’->u’
v’’->v’…

u’’->w
u’’->w’
v’’->w’’…

q->w
q->w’
q’->w’’…

q->u’’
q’->v’’…

Determinization Goal: Given a nondeterministic automaton A with states Q, construct an equivalent

Слайд 20

Closure Properties

The class of regular languages of nested words is effectively closed under

many operations
Intersection: Take product of automata (key: nesting given by input)
Union: Use nondeterminism
Closure under prefixes and suffixes
Complementation: Complement final states of deterministic NWA
Concatenation/Kleene*: Guess the split (as in case of word automata)
Reverse (reversal of a nested word reverses nested edges also)

Closure Properties The class of regular languages of nested words is effectively closed

Слайд 21

Decision Problems

Membership: Is a given nested word w accepted by NWA A?
Solvable

in polynomial time
If A is fixed, then in time O(|w|) and space O(nesting depth of w)
Emptiness: Given NWA A, is its language empty?
Solvable in time O(|A|3): view A as a pushdown automaton
Universality, Language inclusion, Language equivalence:
Solvable in polynomial-time for deterministic automata
For nondeterministic automata, use determinization and complementation; causes exponential blow-up, Exptime-complete problems

Decision Problems Membership: Is a given nested word w accepted by NWA A?

Слайд 22

MSO-based Characterization

Monadic Second Order Logic of Nested Words
First order variables: x,y,z; Set

variables: X,Y,Z…
Atomic formulas: a(x), X(x), x=y, x < y, x -> y
Logical connectives and quantifiers
Sample formula:
For all x,y. ( (a(x) and x -> y) implies b(y))
Every call labeled a is matched by a return labeled b
Thm: A language L of nested words is regular iff it is definable by an MSO sentence
Robust characterization of regularity as in case of languages of words and languages of trees

MSO-based Characterization Monadic Second Order Logic of Nested Words First order variables: x,y,z;

Слайд 23

Application: Software Analysis

A program P with stack-based control is modeled by a set

L of nested words it generates
If P has finite data (e.g. pushdown automata, Boolean programs, recursive state machines) then L is regular
Specification S given as a regular language of nested words
Allows many properties not specifiable in classical temporal logics
PAL: instrumentation language of C programs (SPIN 2007)
Verification: Does every behavior in L satisfy S ?
Take product of P and complement of S and analyze
Runtime monitoring: Check if current execution is accepted by S (compiled as a deterministic automaton)
Model checking: Check if L is contained in S, decidable when P has finite data (no extra cost, as analysis still requires context-free reachability)

Application: Software Analysis A program P with stack-based control is modeled by a

Слайд 24

Writing Program Specifications

Intuition: Keeping track of context is easy; just skip using a

summary edge
Finite-state properties of paths, where a path can be a local path, a global path, or a mixture

Sample regular properties:
If p holds at a call, q should hold at matching return
If x is being written, procedure P must be in call stack
Within a procedure, an unlock must follow a lock
All properties specifiable in standard temporal logics (LTL)

Writing Program Specifications Intuition: Keeping track of context is easy; just skip using

Слайд 25

Temporal Logic of Nested Time: CaRet

Global paths, Local paths, Caller paths
Three versions

of every temporal modality

Sample CaRet formulas:
(if p then local-next q) global-unless r
if p then caller-eventually q
Global-always (if p then local-eventually q)

Temporal Logic of Nested Time: CaRet Global paths, Local paths, Caller paths Three

Слайд 26

So far: Nested words have appealing theoretical properties with possible applications

Common framework:

linear encoding using brackets/tags

Coming up: How do finite nested words compare with ordered trees/hedges?

So far: Nested words have appealing theoretical properties with possible applications Common framework:

Слайд 27

Linear Encoding of Nested Words

Nested word over Σ is encoded as a word

over tagged alphabet <Σ>
For each symbol a, call , internal a
Two views are isomorphic: every word over <Σ> corresponds to a nested word over Σ
Linear view useful for streaming, and word operations such as prefixes
Number of nested words of length k: (3 |Σ|)k

a1

a2

a3

a4

a5

a6

a7

a8

a9

a10

a11

a12

Linear Encoding of Nested Words Nested word over Σ is encoded as a