1*1f5207b7SJohn Levon- A slightly edited irc discussion with Josh Triplett. 2*1f5207b7SJohn Levon- Describes most data structures used in sparse. 3*1f5207b7SJohn Levon 4*1f5207b7SJohn LevonAs far as the parsing structures go... 5*1f5207b7SJohn LevonThe C parser exists in two main files: parse.c, which parses statements, and expression.c, which parses expressions. 6*1f5207b7SJohn Levonparse.h contains the definition of struct statement, which represents a C statement. 7*1f5207b7SJohn LevonThat includes only those things which can't appear as an expression, which primarily includes control flow statements such as if, loops, switch/case, and goto. 8*1f5207b7SJohn Levonexpression.h contains the definition of struct expression, which represents a C expression. That has a lot more content, since most C constructs can appear in expressions. 9*1f5207b7SJohn LevonA series of statements forms a compound statement (STMT_COMPOUND). 10*1f5207b7SJohn LevonThat appears as another struct statement which has a statement_list member. 11*1f5207b7SJohn LevonA function body consists of a compound statement. 12*1f5207b7SJohn LevonWhen you look at a loop body, if or else body, or case body, you'll notice that they just have a struct statement, not a statement_list; they can have multiple statements by using a compound statement. 13*1f5207b7SJohn LevonAlso note that all loops get turned into a single "iterator" statement. 14*1f5207b7SJohn Levonfor, while, and do-while. 15*1f5207b7SJohn LevonA symbol, then, represents a name in a C file. A symbol might represent a variable, a function, a label, or various other things. 16*1f5207b7SJohn LevonSee symbol.h. 17*1f5207b7SJohn Levon"struct symbol" represents one symbol. 18*1f5207b7SJohn LevonAs with the various other structures, it has some common data and a union of sub-structures for the parts that differ between different types. 19*1f5207b7SJohn LevonMost of the interesting bits come in the NS_SYMBOL case. 20*1f5207b7SJohn LevonAmong other things, it has a struct statement for the body of a function (if any), a list of symbols for the arguments, an expression for a variable initializer, and so on. 21*1f5207b7SJohn LevonTogether, struct symbol, struct statement, and struct expression represent most of the abstract syntax tree for C. 22*1f5207b7SJohn LevonSo, that represents most of the "front-end" of Sparse: parsing C and generating that abstract syntax tree. 23*1f5207b7SJohn LevonThat much occurs in pretty much any program using the Sparse frontend. 24*1f5207b7SJohn LevonThe backend varies among programs. 25*1f5207b7SJohn LevonFor instance, the c2xml backend goes that far, then outputs XML. 26*1f5207b7SJohn LevonThe sparse static analysis backend has a few steps: it generates linearized bytecode, does some evaluation on that, and outputs some warnings. 27*1f5207b7SJohn LevonSeveral other backends run that linearized bytecode stage. 28*1f5207b7SJohn LevonThe linearized bytecode itself has a set of nested structures. 29*1f5207b7SJohn Levonlinearize.h defines all of them. 30*1f5207b7SJohn LevonAt the top level, it has struct entrypoint. 31*1f5207b7SJohn LevonThat represents an entrypoint to the code, which would normally mean a function. 32*1f5207b7SJohn LevonAn entrypoint has a list of basic blocks. 33*1f5207b7SJohn Levonstruct basic_block. 34*1f5207b7SJohn LevonA basic block represents a series of instructions with no branches. 35*1f5207b7SJohn LevonStraight-line code. 36*1f5207b7SJohn LevonA branch only occurs at the end of a basic block, and branches can only target the beginning of a basic block. 37*1f5207b7SJohn LevonTypically, a conditional will consist of a basic block leading up to the branch, a basic block for the true case, a basic block for the false case, and a basic block where the two paths merge back together. 38*1f5207b7SJohn LevonEither the true or the false case may not exist. 39*1f5207b7SJohn LevonA loop will normally have a basic block for the loop body, which can branch to the top at the end or continue to the next basic block. 40*1f5207b7SJohn LevonSo basic blocks represent a node in the control flow graph. 41*1f5207b7SJohn LevonThe edges in that graph lead from one basic block to a basic block which can follow it in the execution of the program. 42*1f5207b7SJohn LevonEach basic block has a series of instructions, "struct instruction". 43*1f5207b7SJohn Levon"enum opcode" lists all the instructions. 44*1f5207b7SJohn LevonFairly high-level instruction set, corresponding directly to bits of C. 45*1f5207b7SJohn LevonSo you have an entrypoint, which has a graph of basic blocks, each of which has a list of instructions. 46*1f5207b7SJohn LevonAn entrypoint also has a pointer to the first instruction. 47*1f5207b7SJohn LevonOne last bit of trickiness: struct pseudo. 48*1f5207b7SJohn LevonHave you ever heard of "static single assignment" or SSA form? 49*1f5207b7SJohn Levonstruct pseudo represents one of those single-assignment variables. 50*1f5207b7SJohn LevonEach one has a pointer to the symbol it represents (which may have many pseudos referencing it). 51*1f5207b7SJohn LevonEach one also has a pointer to the instruction that defines it. 52*1f5207b7SJohn LevonThat covers most of the major data structures in Sparse. 53*1f5207b7SJohn LevonNow, given all that, some of the top-level stuff in sparse.c may make more sense. 54*1f5207b7SJohn LevonFor instance, the context checking works in terms of basic blocks. 55*1f5207b7SJohn LevonHopefully some of that helped you understand Sparse better. 56