Cs 3304 Comparative Languages Lecture 15: Midterm Exam Review 13 March 2012 Why Study Programming Languages? (Ch. 1)

Download 42,17 Kb.
Date conversion28.04.2018
Size42,17 Kb.

CS 3304 Comparative Languages

  • Lecture 15: Midterm Exam Review
  • 13 March 2012

Why Study Programming Languages? (Ch. 1)

  • Help you choose a language:
  • Make it easier to learn new languages some languages are similar; easy to walk down family tree.
  • Help you make better use of whatever language you use:
    • Understand obscure features.
    • Understand implementation costs: choose between alternative ways of doing things, based on knowledge of what will be done underneath.
    • Figure out how to do things in languages that don't support them explicitly.
    • Figure out how to do things in languages that don't support them explicitly.

Language Groups

  • Imperative:
    • von Neumann (Fortran, Pascal, Basic, C).
    • Object-oriented (Smalltalk, Eiffel, C++?).
    • Scripting languages (Perl, Python, JavaScript, PHP).
  • Declarative:
    • Functional (Scheme, ML, pure Lisp, FP).
    • Logic, constraint-based (Prolog, VisiCalc, RPG).
  • Imperative languages, particularly the von Neumann languages, predominate:
    • They will occupy the bulk of our attention.
  • We also plan to spend a lot of time on functional, logic languages.

The von Neumann Architecture (Ch. 2)

  • Fetch-execute-cycle (on a von Neumann architecture computer)
  • initialize the program counter
  • repeat forever
  • fetch the instruction pointed by the counter
  • increment the counter
  • decode the instruction
  • execute the instruction
  • end repeat

Programming Methodologies

  • 1950s and early 1960s - Simple applications; worry about machine efficiency.
  • Late 1960s - People efficiency became important; readability, better control structures:
    • Structured programming.
    • Top-down design and step-wise refinement.
  • Late 1970s - Process-oriented to data-oriented:
    • Data abstraction.
  • Middle 1980s - Object-oriented programming:
    • Data abstraction + inheritance + polymorphism.

Compilation vs. Interpretation

  • Not opposites.
  • Not a clear-cut distinction.
  • Interpretation:
    • Greater flexibility.
    • Better diagnostics (error messages).
  • Compilation:
    • Better performance.

Compilation Phases

Defining Languages

  • Recognizers:
    • A recognition device reads input strings over the alphabet of the language and decides whether the input strings belong to the language.
    • Example: syntax analysis part of a compiler (scanning).
  • Generators:
    • A device that generates sentences of a language.
    • One can determine if the syntax of a particular sentence is syntactically correct by comparing it to the structure of the generator.

Regular Expressions

  • A regular expression is one of the following:
    • A character.
    • The empty string, denoted by ε.
    • Two regular expressions concatenated.
    • Two regular expressions separated by | (i.e., or).
    • A regular expression followed by the Kleene star (concatenation of zero or more strings).
  • Numerical literals in Pascal may be generated by the following:

Context-Free Grammars

  • Context-Free Grammars:
    • Developed by Noam Chomsky in the mid-1950s.
    • Language generators, meant to describe the syntax of natural languages.
    • Define a class of languages called context-free languages.
  • Backus-Naur Form (1959):
    • Invented by John Backus to describe Algol 58.
    • BNF is equivalent to context-free grammars (CFGs).
  • A CFG consists of:
    • A set of terminals T.
    • A set of non-terminals N.
    • A start symbol S (a non-terminal).
    • A set of productions.

BNF Fundamentals

  • In BNF, abstractions are used to represent classes of syntactic structures: they act like syntactic variables (also called nonterminal symbols, or just terminals).
  • Terminals are lexemes or tokens.
  • A rule has a left-hand side (LHS), which is a nonterminal, and a right-hand side (RHS), which is a string of terminals and/or nonterminals.
  • Nonterminals are often italic or enclosed in angle brackets.
    • Examples of BNF rules:
      • → identifier | identifier,
      • → if then
  • Grammar: a finite non-empty set of rules.
  • A start symbol is a special element of the nonterminals of a grammar.

Scanner Responsibilities

  • Tokenizing source.
  • Removing comments.
  • (Often) dealing with pragmas (i.e., significant comments).
  • Saving text of identifiers, numbers, strings.
  • Saving source locations (file, line, column) for error messages.

Deterministic Finite Automaton

  • Pictorial representation of a scanner for calculator tokens, in the form of a finite automaton.
  • This is a deterministic finite automaton (DFA):
    • Lex, scangen, ANTLR, etc. build these things automatically from a set of regular expressions.
    • Specifically, they construct a machine that accepts the language.

Building Scanners

  • Scanners tend to be built three ways:
    • Ad-hoc.
    • Semi-mechanical pure DFA (usually as nested case statements).
    • Table-driven DFA.
  • Ad-hoc generally yields the fastest, most compact code by doing lots of special-purpose things, though good automatically-generated scanners come very close.
  • Writing a pure DFA as a set of nested case statements is a surprisingly useful programming technique (Figure 12.1):
    • It is often easier to use perl, awk, sed or similar tools.
  • Table-driven DFA is what lex and scangen produce:
    • lex (flex): C code
    • scangen: numeric tables and a separate driver (Figure 2.12).
    • ANTLR: Java code.


  • By analogy to regular expressions and DFAs, a context-free grammar (CFG) is a generator for a context-free language (CFL):
    • A parser is a language recognizer.
  • There is an infinite number of grammars for every context-free language:
    • Not all grammars are created equal, however.
  • It turns out that for any CFG we can create a parser that runs in O(n3) time.
  • There are two well-known parsing algorithms that permit this:
    • Early's algorithm.
    • Cooke-Younger-Kasami (CYK) algorithm.
  • O(n3) is unacceptable for a parser in a compiler: too slow.

Faster Parsing

  • Fortunately, there are large classes of grammars for which we can build parsers that run in linear time:
    • The two most important classes are called LL and LR.
  • LL stands for ‘Left-to-right, Leftmost derivation’.
  • LR stands for ‘Left-to-right, Rightmost derivation’.
  • LL parsers are also called ‘top-down’, or 'predictive' parsers & LR parsers are also called ‘bottom-up’, or 'shift-reduce' parsers.
  • There are several important sub-classes of LR parsers.
    • Simple LR parser (SLR).
    • Look-ahead LR parser (LALR).
  • We won't be going into detail on the differences between them.

LL Parsing

  • Like the bottom-up grammar, this one captures associativity and precedence, but most people don't find it as pretty:
    • For one thing, the operands of a given operator aren't in a RHS together!
    • However, the simplicity of the parsing algorithm makes up for this weakness.
  • How do we parse a string with this grammar?
    • By building the parse tree incrementally.

LR Parsing

  • LR parsers are almost always table-driven:
  • Like a table-driven LL parser, an LR parser uses a big loop to repeatedly inspects a two-dimensional table to find out what action to take.
  • Unlike the LL parser, the LR driver has non-trivial state (like a DFA), and the table is indexed by current input token and current state.
  • The stack contains a record of what has been seen SO FAR (NOT what is expected).

Core Issues (Ch. 3)

  • The early development of programming languages was driven by two complementary goals, machine independence and ease of programming.
  • Machine Independence: a programming language should not rely on the features of any particular instruction set for its efficient implementation (e.g., Java).
  • Ease of programming: more elusive and a matter of science than of aesthetics and trial and error.
  • Core issues for the midterm:
    • Names, scopes, and bindings: Chapter 3.
    • Control-flow constructs: Chapter 6.
    • Data types: Chapter 7.

Name, Scope, and Binding

  • A name is a mnemonic character string used to represent something else:
    • Most names are identifiers.
    • Symbols (like '+') can also be names.
  • A binding is an association between two things, such as a name and the thing it names.
  • The scope of a binding is the part of the program (textually) in which the binding is active.
  • Binding Time is the point at which a binding is created or, more generally, the point at which any implementation decision is made.
  • The terms “static” and “dynamic” are generally used to refer to things bound before run time and at run time, respectively.

Storage Allocation Mechanisms

  • Static: objects are given an absolute address that is retained throughout the program’s execution.
  • Stack: objects are allocated and deallocated in last-in, first-out order, usually in conjunction with subroutine calls and returns.
  • Heap: objects may be allocated and deallocated at arbitrary times. They require a more general (and expensive) storage management algorithm.

Static Allocation Examples

  • Global variables: accessible throughout the program.
  • Code: the machine instructions.
  • Static local variables: retain their values from one invocation to the next.
  • Explicit constants (including strings, sets, etc.):
    • Small constants may be stored in the instructions.
  • Tables: most compilers produce a variety of tables used by runtime support routines (debugging, dynamic-type checking, garbage collection, exception handling).


  • Central stack for parameters, local variables and temporaries.
  • Why a stack?
    • Allocate space for recursive routines: not necessary if no recursion.
    • Reuse space: in all programming languages.
  • Contents of a stack frame (Figure 3.1): arguments and returns, local variables, temporaries, bookkeeping (saved registers, line number static link, etc.).
  • Local variables and arguments are assigned fixed offsets from the stack pointer or frame pointer at compile time.
  • Maintenance of stack is responsibility of calling sequence and subroutine prolog and epilog:
    • Saving space: putting as much in the prolog and epilog as possible.
    • Time may be saved by putting stuff in the caller instead or combining what's known in both places (interprocedural optimization).

Heap-Based Allocation

  • Heap: a region of storage in which subblocks can be allocated and deallocated at arbitrary times.
  • Heap space management: speed vs. space tradeoffs.
  • Space concerns:
    • Internal fragmentation: allocating a block large than required.
    • External fragmentation: unused space is fragmented so the ability to meet allocation requests degrades over time.

Static Scoping

  • Static (lexical) scope rules: a scope is defined in terms of the physical (lexical) structure of the program:
    • The determination of scopes can be made by the compiler.
    • All bindings for identifiers can be resolved by examining the program.
    • Typically, we choose the most recent, active binding made at compile time.
    • Most compiled languages, C and Pascal included, employ static scope rules.
  • The classical example of static scope rules is the most closely nested rule used in block structured languages such as Algol 60 and Pascal (nested subroutines):
    • An identifier is known in the scope in which it is declared and in each enclosed scope, unless it is re-declared in an enclosed scope.
    • To resolve a reference to an identifier, we examine the local scope and statically enclosing scopes until a binding is found.

Dynamic Scoping

  • The key idea in static scope rules is that bindings are defined by the physical (lexical) structure of the program.
  • With dynamic scope rules, bindings depend on the current state of program execution:
    • They cannot always be resolved by examining the program because they are dependent on calling sequences.
    • To resolve a reference, we use the most recent, active binding made at run time.
  • Dynamic scope rules are usually encountered in interpreted languages:
    • Early LISP dialects assumed dynamic scope rules.
  • Such languages do not normally have type checking at compile time because type determination isn’t always possible when dynamic scope rules are in effect.

Binding of Referencing Environments

  • Referencing environment of a statement at run time is the set of active bindings.
  • A referencing environment corresponds to a collection of scopes that are examined (in order) to find a binding.
  • Scope rules determine that collection and its order.
  • Binding rules determine which instance of a scope should be used to resolve references when calling a procedure that was passed as a parameter:
    • They govern the binding of referencing environments to formal procedures.
  • Shallow binding: the referencing environment created only when the subroutine is actually called.
  • Deep binding: the referencing environment when the subroutine was passed as a parameter.

Semantic Analyzer (Ch. 4)

  • The principal job of the semantic analyzer is to enforce static semantic rules:
    • Constructs a syntax tree (usually first).
    • Information gathered is needed by the code generator.
    • This interface is a boundary between the front end and the back end.
  • There is a considerable variety in the extent to which parsing, semantic analysis, and intermediate code generation are interleaved.
    • Fully separated phases: a full parse tree, a syntax tree, and semantic check.
    • Fully interleaved phases: no need to build both pars and syntax trees.
  • A common approach interleaves construction of a syntax tree with parsing (no explicit parse tree), follows with separate, sequential phases for semantic analysis and code generation.

Static Analysis

  • Compile-time algorithms that predict run-time behavior.
  • It is precise if it allows the compiler to determine whether a given program will always follow the rules: type checking.
  • Also useful when not precise: a combination of compile time check and code for run time checking.
  • Static analysis is also used for code improvement:
    • Alias analysis: when values can be safely cached in registers.
    • Escape analysis: all references to a value confined to a given context.
    • Subtype analysis: an OO variable is of a certain subtype.
  • Unsafe and speculative optimization.
  • Conservative and optimistic compilers.
  • Some languages have tighter semantic rules to avoid dynamic checking.

Attribute Grammars

  • Both semantic analysis and (intermediate) code generation can be described in terms of annotation, or “decoration” of a parse or syntax tree.
  • Attribute grammars provide a formal framework for decorating such a tree.
  • The attribute grammar serves to define the semantics of the input program.
  • Attribute rules are best thought of as definitions, not assignments.
  • They are not necessarily meant to be evaluated at any particular time, or in any particular order, though they do define their left-hand side in terms of the right-hand side.

Synthesized Attributes

  • The S-attributed grammar uses only synthesized attributes.
  • Its attribute flow (attribute dependence graph) is purely bottom-up.
  • The arguments to semantic functions in an S-attributed grammar are always attributes of symbols on the right-hand side of the current production.
  • The return value is always placed into an attribute of the left hand side of the production.
  • The intrinsic properties of tokens are synthesized attributes initialized by the scanner.

Inherited Attributes

  • Inherited attributes: values are calculated when their symbol is on the right-hand side of the current production.
  • Contextual information flow into a symbols for above or from the side: provide different context.
  • Symbol table information is commonly passed be means of inherited attributes.
  • Inherited attributes of the root of the parse tree can be used to represent external environment.
  • Example: left-to-right associativity may create a situation where an S-attributed grammar would be cumbersome to use. By passing attribute values left-to-right in the tree, things are much simpler.

Parsers and Attribute Grammars

  • Each synthetic attribute of a LHS symbol (by definition of synthetic) depends only on attributes of its RHS symbols.
    • A bottom-up parser: in general paired with an S-attributed grammar.
  • Each inherited attribute of a RHS symbol (by definition of L-attributed) depends only on:
    • Inherited attributes of the LHS symbol, or
    • Synthetic or inherited attributes of symbols to its left in the RHS.
    • A top-down parser: in general paired with an L-attributed grammar.
  • There are certain tasks, such as generation of code for short-circuit Boolean expression evaluation, that are easiest to express with non-L-attributed attribute grammars.
  • Because of the potential cost of complex traversal schemes, however, most real-world compilers insist that the grammar be L-attributed.

Translation Scheme

  • There are automatic tools that construct a semantic analyzer (attribute evaluator) for a given attribute grammar. In other words, they generate translation schemes for context-free grammars or tree grammars (which describe the possible structure of a syntax tree):
    • These tools are heavily used in syntax-based editors and incremental compilers.
    • Most ordinary compilers, however, use ad-hoc techniques.
  • Most production compilers use an ad hoc, handwritten translation scheme:
    • Interleave parsing with at least the initial construction of a syntax tree.
    • Possibly all of semantic analysis and intermediate code generation.
    • Since the attributes of each production are evaluated as the production is parsed, there is no need for the full parse tree.

Action Routines

  • An ad-hoc translation scheme that is interleaved with parsing takes the form of a set of action routines:
    • An action routine is a semantic function that we tell the compiler to execute at a particular point in the parse.
    • Semantic analysis &code generation interleaved with parsing: action routines can be used to perform semantic checks and generate code.
  • LL parser generator: an action routine can appear anywhere within a right-hand side.
  • Implementation: when the parser predicts a production, the parser pushes all of the right hand side onto the stack.
  • If semantic analysis & code generation are separate phases, then action routines can be used to build a syntax tree:
    • A parse tree could be built completely automatically.
    • We wouldn't need action routines for that purpose.

Space Management for Attributes

  • If there is a parse tree, the attributes can be stored in nodes.
  • For a bottom-up parser with an S-attributed grammar, maintain an attribute stack mirroring the parse stack:
    • Next to every state number is an attribute record for the symbol shifted when entering the state.
    • Entries are pushed and popped automatically.
  • For a top-down parser with an L-attributed grammar:
    • Automatic: an attribute stack that does not mirror the parse stack.
    • Short-cutting copy rules: action routines allocate and deallocate space for attributes explicitly.
  • Contextual information:
    • Symbol table that always represents the current referencing environment.

Chapter 5

  • Not included in the exam.

Control Flow (Ch. 6)

  • Control flow (or ordering) in program execution.
  • Eight principal categories of language mechanisms used to specify ordering:
    • Sequencing.
    • Selection.
    • Iteration.
    • Procedural abstraction.
    • Recursion.
    • Concurrency.
    • Exception handling and speculation.
    • Nondeterminacy.
  • The relative importance of different categories of control flow varies significantly among the different classes of programming languages.

Expression Evaluation

  • Expression:
    • A simple object: e.g., a literal constant, a named variable or constant.
    • An operator or function applied to a collection of operands or arguments, each of which in turn is an expression.
  • Function calls: a function name followed by a parenthesized, comma-separated list of arguments.
  • Operator: built-in function that uses special, simple syntax – one or two arguments, no parenthesis or commas.
    • Sometimes they are “syntactic sugar” for more “normal” looking functions (in C++ a+b is shprt for a.operator+(b))
  • Operand: an argument of an operator.

Precedence and Associativity

  • Infix notation requires the use of parenthesis to avoid ambiguity.
  • The choice among alternative evaluation orders depends on the precedence and associativity of the operators:
    • C has very rich precedence structure: problems with remembering all the precedence levels (15 levels).
    • Pascal has relatively flat precedence hierarchy (3 levels).
    • APL and Smalltalk: all operators are of equal precedence.
  • Associativity rules specify whether sequences of operators of equal precedence group to the right or to the left:
    • Usually the operators associate left-to-right.
    • Fortran: the exponentiation operator ** associates right-to-left.
    • C: the assignment operator associates right-to-left.

References and Values

  • Subtle but important differences in the semantics of assignment in different imperative languages.
    • Based on the context, a variable may refer to the value of the variable (r-value) or its location (l-value) – a named container for a value.
  • Value model of variables: an expression can be either an l-value or an r-value, based on the context in which it appears.
    • Built-in types can’t be passed uniformly to methods expecting class type parameters: wrapper classes, automatic boxing/unboxing.
  • Reference model of variables: a variable is a named reference for a value – every variable is an l-value.
    • E.g., integer values (like 2) are immutable.
    • A variable has to be dereferenced to obtain its value.
  • a 4 a 4
  • b 2 b
  • 2
  • c 2 c

Structured and Unstructured Flow

  • Assembly language: conditional and unconditional branches.
  • Early Fortran: relied heavily on goto statements (and labels): IF (A .LT. B) GOTO 10 … 10
  • Late 1960s: Abandoning of GOTO statements started.
  • Move to structured programming in 1970s:
    • Top-down design (progressive refinement).
    • Modularization of code.
    • Descriptive variable.
  • Within a subroutine, a well-designed imperative algorithm can be expressed with only sequencing, selection, and iteration.
  • Most of the structured control-flow constructs were introduced by Algol 60.


  • The principal means of controlling the order in which side effects occur.
  • Compound statement: a delimited list of statements.
  • Block: a compound statement optionally preceded by a set of declarations.
  • The value of a list of statements:
    • The value of its final element (Algol 68).
    • Programmers choice (Common Lisp – not purely functional).
  • Can have side effects; very imperative, von Neumann.
  • There are situations where side effects in functions are desirable: random number generators.
  • Euclid and Turing: functions are not permitted to have side effects.


  • Selection statement: mostly some variant of if…then…else.
  • Languages differ in the details of the syntax.
  • Short-circuited conditions:
    • The Boolean expression is not used to compute a value but to cause control to branch to various locations.
    • Provides a way to generate efficient (jump) code.
    • Parse tree: inherited attributes of the root inform it of the address to which control should branch: if ((A > B) and (C > D)) or (E ≠ F) then r1 := A r2 := B then_clause if r1 <= r2 goto L4 else r1 := C r2 := D else_clause if r1 > r2 goto L1 L4: r1 := E r2 := F if r1 = r2 goto L2 L1: then_clause goto L3 L2: else_clause L3:


  • Iteration: a mechanism that allows a computer to perform similar operations repeatedly.
  • Favored in imperative languages.
  • Mostly some form of loops executed for their side effects:
    • Enumeration-controlled loops: executed once of every value in a given finite set.
    • Logically controlled loops: executed until some Boolean condition changes value.
    • Combination loops: combines the properties of enumeration-controlled and logically controlled loops (Algol 60).
    • Iterators: executed over the elements of a well-defined set (often called containers or collections in object-oriented code).


  • Recursion requires no special syntax: why?
  • Recursion and iteration are equally powerful.
  • Most languages provide both iteration (more “imperative”) and recursion (more “functional”).
  • Tail-recursive function: additional computation never follows a recursive call. The compiler can reuse the space, i.e., no need for dynamic allocation of stack space. int gcd(int a, int b) { if (a == b) return a; else if (a > b) return gcd(a - b,b); else return gcd(a, b – a); }
  • Sometimes simple transformations are sufficient to produce tai-recursive code: continuation-passing style.

Data Types (Ch. 7)

  • Implicit context for many operations:
    • The programmer does not have to specify the context explicitly.
    • Example: in C, the expressions a+b will use integer addition if a and b are integers, floating point addition if a and b are floating points.
  • Limit the set of operations that may be performed in a semantically valid program:
    • Example: prevent from adding a character and a record.
    • Type checking cannot prevent all meaningless operations.
    • It catches enough of them to be useful.

Classification of Types

  • Discrete (ordinal) types – countable: integer, boolean, char, enumeration, and subrange.
  • Scalar (simple) types - one-dimensional: discrete, rational, real, and complex.
  • Composite types:
    • Records (structures).
    • Variant records (unions).
    • Arrays; strings are arrays of characters.
    • Sets: the mathematical powerset of their base types.
    • Pointers: l-values.
    • Lists: no notion of mapping or indexing.
    • Files.

Type Systems

  • A type system consists of:
    • A mechanism to defines types and associate them with certain language constructs.
    • A set of rules for type equivalence, type compatibility, type inference:
      • Type equivalence: when are the types of two values the same?
      • Type compatibility: when can a value of type A be used in a context that expects type B?
      • Type inference: what is the type of an expression, given the types of the operands?
  • Compatibility is the more useful concept, because it tells you what you can do.
  • Polymorphism results when the compiler finds that it doesn't need to know certain things.
  • Subroutines need to have types if they are first- or second-class value.

Type Checking

  • Type checking is the process of ensuring that a program obeys the language’s type compatibility rules.
  • Type clash: a violation of these rules.
  • Strong typing means that the language prevents you from applying an operation to data on which it is not appropriate.
  • Static typing: compiler can check all at compile time.
  • Examples:
    • Common Lisp is strongly typed, but not statically typed.
    • Ada is statically typed, Pascal is almost statically typed.
    • C less strongly typed than Pascal, e.g. unions, subroutines with variable numbers of parameters.
    • Java is strongly typed, with a non-trivial mix of things that can be checked statically and things that have to be checked dynamically.
    • Scripting languages are generally dynamically typed.

Type Equivalence

  • Two major approaches – structural and name equivalence:
    • Name equivalence is based on declarations.
    • Structural equivalence is based on some notion of meaning behind those declarations. The exact definition varies.
    • Name equivalence is more fashionable these days.
  • There are at least two common variants on name equivalence:
    • The differences between all these approaches boils down to where you draw the line between important and unimportant differences between type descriptions.
    • In all three schemes described in the book, we begin by putting every type description in a standard form that takes care of “obviously unimportant” distinctions like those on the next slide.

Name Equivalence

  • How about type aliasing? Program dependent? TYPE new_type = old_type;
  • Examples when not the same: TYPE celisus_temp = REAL; fahrenheit_temp = REAL; VAR c : celsius_temp; f : fahrenheit_temp; ... f := c;
  • Equivalence types:
    • Strict name equivalence: aliased types considered distinct.
    • Loose name equivalence: aliased types considered equivalent.
  • Tricky to implement in the presence of separate compilation.

Type Compatibility

  • Instead of type equivalence, a value’s type must be compatible with that of the context in which it appears:
    • Assignment statement: the right-hand side type must be compatible with that of the left-hand side.
    • Arithmetic operator operands types must be compatible with some common type that supports the arithmetic operation.
  • The definition of type compatibility varies:
    • Ada: type S is compatible with an expected type T if and only if:
      • S and T are equivalent.
      • One is a subtype of the other or both are subtypes of the same base type.
      • Both are arrays, with the same numbers and types of elements in each dimension.

Universal Reference Types

  • Used for systems programming or for general-purpose container objects:
    • C, C++: void *.
    • Clu: any.
    • Java: Object.
  • Arbitrary l-values can be assigned into an object of universal reference type with no concern about type safety since the compiler will not allow any operation to be performed.
  • Problems with the assignment:
    • Make objects self-descriptive.
    • If not, no way to identify their type at run time.

Composite Data Types

  • Supporting composite data types (arrays, strings, sets, pointers, lists, and files) involves additional syntactic, semantic, and pragmatic issues.
  • Pointer related issues require a more detailed discussion of the value and reference models of variables and of the heap management issues.
  • Input and output mechanisms are important when dealing with files.

Records (Structures)

  • Record types allow related data of heterogeneous types to be stored and related together.
  • Usually laid out contiguously.
  • Possible holes for alignment reasons.
  • Compilers keep track of the offset of each field within each record type.
  • Smart compilers may re-arrange fields to minimize holes (C compilers promise not to).
  • Implementation problems are caused by records containing dynamic arrays but we won't be going into that in any detail.


  • Arrays are the most common and important composite data types.
  • Unlike records, which group related fields of disparate types, arrays are usually homogeneous.
  • Semantically, they can be thought of as a mapping from an index type to a component or element type:
    • Index type integer or any discrete type.
    • Element type scalar (Fortran 77) or any type.
  • Associative arrays (nondiscrete index types):
    • Implemented with hash tables or search trees.
    • Supported by the standard libraries of object-oriented languages.


  • Pointers serve two purposes:
    • Efficient (and sometimes intuitive) access to elaborated objects (C).
    • Dynamic creation of linked data structures, in conjunction with a heap storage manager.
  • Pointers are used with a value model of variables:
    • Pointers (high-level concept) are not addresses (low-level concept).
    • They are not needed with a reference model.
  • Several languages (e.g. Pascal) restrict pointers to accessing things in the heap:
    • How and when is storage reclaimed for objects no longer needed?
    • Many languages require the programmer to explicitly reclaim space:
      • Memory leak: failure to reclaim space for objects no longer needed.
      • Dangling reference: reclaims objects that are still in use.
      • Garbage collection: automatic storage reclamation.


  • List is defined recursively:
    • The empty list.
    • A pair consisting of an object (list or atom) and another (shorter) list.
  • Suited for functional and logic languages but also used in imperative languages.
    • ML list are homogeneous: a chain of blocks, each of which contains an element and a pointer to the next block.
    • Lisp list are heterogeneous: a chain of cons cells, each containing two pointers, one to the element an one to the next cons cell.
  • List notation:
    • ML: enclosed in square brackets, with elements separated by commas, [a, b, c, d].
    • Lisp: enclosed in parenthesis, with elements separated by white spaces, (a b c d).


  • The midterm exam is a closed book exam.
  • It covers Chapters 1-7 (except Chapter 5).
  • Two parts:
    • 10 quiz questions, 2 points each, 20 points total.
    • 4 essay/problem questions, 20 points each, 80 points total.
  • Homeworks/programming assignments related questions are possible.

The database is protected by copyright ©sckool.org 2016
send message

    Main page