| Previous Chapter: Appendix - SQL |
This chapter is included in the manual to provide information about the design and implementation issues of Ovrimos. Some technical details are described, but none of the issues addressed here are by any means necessary for the non-technical user.
Communication Model
The communication model of the SQL Server is best represented by the following picture, where the basic modules used for the database implementation appear. The rectangles represent program modules, while the ellipses represent data structures (e.g. the shared memory). The classical cylinder representation is used for data files. No arrows appear in the picture. The connection lines denote a two-way communication.

In the connection with the sqlcore, the grey lines indicate that a request is addressed to the multiplexer, while the solid ones that the request is delegated to and handled by the appropriate processing mechanism. More on communication and request handling is described in the working ina pools section.
Parts of the model
The basic modules which compose the database are the dbmcore, the sqlcore,
the virtual processors (working ina pools), and the SQL and HTTP clients.
The dbmcore and sqlcore modules constitute the server kernel and accept
calls from the client programs. In the following paragraphs, the dbmcore
and sqlcore modules, as well as their communication and exchange methods
are analyzed.
Before proceeding to more detailed descriptions, we give a short definition of the elements used in the model.
dbmcore
It is a set of modules handling low level calls to the data base (filing
system).
sqlcore
It is a set of modules that interpret the SQL calls to low-level calls
and transmit them to the dbmcore. An HTTP server is embedded into the sqlcore,
allowing the module to double as a Web server.
The client-server model, in combination with the fact that the two basic kernel parts reside on the network server, minimizes useless data traffic and maximizes the speed of execution of SQL commands. The communication protocol between the server part and the client applications is TCP/IP.
network endpoints and working ina
pools (wip)
Before reaching the sqlcore, the client calls for initiating a connection
are "caught" by a multiplexer that, depending on the call, delegates it
to an SQL serving mechanism (ina) or an HTTP serving mechanism (ina). The
communication protocol at this point is TCP/IP. Before accepting a call,
the multiplexer checks the working ina pool for a free processing mechanism.
SQL-clients
SQL clients are either SQL terminal programs or any other application
(Java, Perl, etc.) that uses SQL.
HTTP clients
HTTP clients are Web browsers or programs and scripts that use the
HTTP protocol.
The network endpoint communication with the database is achieved through a protocol where a shared memory and three semaphores are used. Details on the use of semaphores are presented in paragraph dbmcore/sqlcore communication protocol.
Connection terminology
Four terms, namely session, session group, statement and connection
are used throughout this technical part of the manual and they are explained
below to clarify an unavoidable overloading of these words in the context
of a database user's guide.
Session and Session group
Every application program retains one or more sessions, which are grouped
in session groups. In its simplest definition, a session is a communication
channel, identified uniquely through a handle (session handle). A group
of sessions is a session group. More on these two terms is written in the
paragraph where the dbmcore communication
protocol is described.
Statement and Connection
For every SQL application there is an active entity through which the
SQL commands are executed. This is called a statement. Every SQL application
may have grouped statements in one or more Connections. The Connection
is a handle for these statements. A statement is a subordinate notion of
connection.
More or less, a statement corresponds to a dbmcore session and a connection
to a dbmcore session group. Therefore, all statements of a connection end
up to sessions of the same session group.
Note!
The term statement is used throughout this chapter to denote
an active entity that executes SQL commands and does not have
the meaning of an SQL statement (i.e. an SQL command). The term SQL
command is used whenever the need arises to talk about one in this
chapter.
In the following discussion, the terms table and data file,
key and index file, and row and record have
the same meaning.
The DBMCORE
The dbmcore is the central module of Ovrimos. Together with the sqlcore
they constitute the kernel of the product. It is a layer that serves low
level calls to the relational filing system.
The basic submodules of the dbmcore are the following:
Dbmcore/Sqlcore communication protocol
The communication to the dbmcore is demand driven. The implementation
of a shared memory data structure mandates the use of a semaphore that
allows access to the shared memory. For the dispatching of a message from
the shared memory to the dbmcore and vice-versa, two more semaphores are
used.
The three semaphores are supported by the operating system where the
dbmcore and the sqlcore reside. Their implementation is operating system
dependent.
The steps for the timing and serving of requests are as follows:
1. The client addresses semaphore 1 (which controls message dispatching to the shared memory). The client is a statement handling ina.
4. The dbmcore reads the message and executes the request (if all other conditions are satisfied) and puts the reply into the shared memory.
5. The client is signaled, via semaphore 3, that the reply is ready and may be accessed. The dbmcore waits again on semaphore 2 for reading.
6. The client quits the critical section and releases, through semaphore 1, the shared memory, which, thus, becomes available to another client.
All messages have a limited length and correspond to a simple function,
which may be executed very fast. The testing with this architecture (in
previous applications with the use of dbmcore) shows that the multiplexing
of many sessions is performed easily and without problems.
Every message is an elementary call to the dbmcore, which is served
immediately. The only exception is when the client requests a wait-lock
on a record. In this case the client may have to repeat the same message
several times, until either a successful return, a time-out, or a definite
failure of the request.
The following picture, app-3.2, shows the main submodules of the dbmcore.
The rectangles denote parts of the program (submodules) and the data in
the hexagon rules and procedures. The arrows denote a dependence (every
rectangle with leaving arrows depends on the rectangles where the arrows
point). Even though it does not appear in the picture, the communication
is two-way, which means that when a submodule delegates the execution of
an operation to one of the submodules it depends on, it will wait for a
reply.
The lower half of the shared memory structure appears on the top of
the picture to remind that it is the only means of communication between
the dbmcore and the sqlcore, which issues the calls.

The coreserv is the upper level of the dbmcore and, basically, it is a wait loop that reads messages, placed by the client applications into a shared memory data structure. See, also the description of communication to the dbmcore via semaphores. Messages to the server are requests for low level processing in the database files, therefore the basic function of coreserv is to delegate the requests to the appropriate submodule.
Most requests to the dbmcore will end up to the database submodule
(file and index manager) for execution. But since the data file manager
should not have to deal with users, access permissions, transactions, etc.,
the coreserv first addresses the submodule dbnet, which, in turn,
addresses the submodule locktrans (lock and transaction manager).
Through the locktrans the commands to the database are integrated in a
wider frame of actions, which result to the positioning of the request
into the proper transaction and the handling of the request in a multi-user
environment.
When the message is accessed from the shared memory and is delegated
to dbnet, the session and the session_group where the message was issued
are known to the locktrans. The session parameters are passed to dbnet.
The dbnet examines if the message belongs to an active transaction. It
also makes a request for a lock (if necessary) to the lock manager. Only
when the lock permission is granted, the dbnet proceeds to pass the request
to the file manager.
The sessions of one session_group have access to the bookmarks of another
and may use the locks in common. More on bookmarks in the Data
and Index Manager paragraph.
If the process fails, then a rollback of the transaction where the
request belongs is performed.
Yet another of the core submodules is the schema. The schema
contains information of the file and index catalog. Many calls to the database
require access to this catalog.
One of the supported SQL client commands is the modification of the
catalog data (data definition). In this case, the schema addresses the
dbnet and requests proper locking actions, in order to implement the requested
modification. The coreserv reserves a session specifically for this purpose.
The lock and transaction manager is one submodule, locktrans, since these functions are tightly interwoven and there is no point in attempting to separate them. The file and index manager and the data integrity mechanisms are also described in the following sections.
The transaction is basic in all data management of Ovrimos. Commit and Rollback are used with their classical meanings and they are not described any further.
Dbmcore transactions are executed under the following rules:

T1 is an example of a simple transaction, without nesting. T2, which contains nested transactions, may be described by the following pseudocode:
Start T2
...
Statements of T2
...
Start T4
...
Statements of T4
...
The nested transaction mechanism is implemented in the dbmcore but is
not yet available to the user through the existing SQL terminal interface.
It may, however be used by sqlapp4 applications. The nesting of transactions
is possible through the nesting of cursors (declaring a cursor within a
cursor).
Checkpoint and nested transactions will be available to SQL Terminal
users in the Gold Edition of Ovrimos.
The lock manager is mandatory in a multi-user
environment for two reasons:
1. For consistency
The data available to one user are not modified (by another user) until
the first user completes any transaction with these data.
2. For integrity
The data in the database files must reflect in the correct order all
updates made to them.
Data locking is performed on the row, but there are many isolation levels, as we will see below. If no locking is done, three unwanted conditions may arise in the process of concurrent data access and modification by client requests.
1. dirty read (reading of non-existing
or incorrect data)
Suppose there is a transaction T1. T1 accesses and updates a field
X from a row.
Before the T1 transaction is committed, another transaction T2 may
'see' the updated field.
T1 is rolled back and all its updates are canceled.
Now T2 has performed a dirty read, because it has read something that
does not really exist.
2. non-repeatable read
Suppose there is a transaction T1. T1 accesses and reads one row.
Suppose there is a second transaction T2, which accesses this row,
updates it and commits the updates.
During the same transaction T1, the same row is accessed, but this
row is now modified.
Now T1 has performed a non-repeatable read, because it could not read
twice the same data from the row.
3. phantoms (sudden appearance of previously
non-existing data)
Suppose there is a transaction T1, which selects, during a query, a
set of rows, all of them satisfying a certain predicate.
Suppose there is a second transaction T2, which inserts a row, that
also satisfies the same predicate.
T1 performs the same query and accesses a set of rows, which contain
a row that was not there before.
Now T1 has read a phantom.
To avoid all these abnormal situations, there exist, as mentioned before, four isolation levels, according to the SQL standard. In particular, the four isolation levels are:
The locking in Ovrimos is made in the record level (table row). There
are two different locks:
1. read-lock
2. write-lock
The four isolation levels are achieved with the use of the locks in the following way:
1. READ UNCOMMITTED
Static information
The lower layer of Ovrimos is a file and index manager. The database
files are structured files, where every record is placed in the first available
blank or after the last record. Every data file consists of a header and
an unlimited number of records.
For every data key there is an index file, where the index is stored
in a B-tree form. The B-tree is a multi-level tree. The nodes may be terminal
or non-terminal. Index information is stored only in the B-tree leaves.
All the non-terminal nodes are used for access to the leaves.
Dynamic information
The data files grow automatically as the stored data increase and the
process is transparent to either developer or user. Similarly, the index
files are reorganized dynamically every time a leaf data exceed its storing
capacity. The leaf becomes automatically a non-terminal node with actions
that are transparent to the user.
The file manager opens files when a request is made for them and closes
the opened files only if there is a handle shortage. A memory map is used
for caching file data and about a thousand B-tree nodes are also cached,
allowing a very fast accessing of data. The data caching uses the virtual
memory facilities of the operating system. It is operating system dependent.
For every dbmcore session there is an anonymous
bookmark in every file, which shows the current record according to the
most recently used access key. If the last access was sequential, the bookmark
is not related with any key. A set of functions is available to the user
(through ODBC, Java, Perl etc., but not SQL terminal) to allow the positioning
of named bookmarks. There is no limit to the number or the place where
these bookmarks are set.
This way, the anonymous bookmark allows only one current position for
each data file (table) per session but as many named bookmarks as the user
needs.
All sessions of one session_group are allowed to have access to the
anonymous bookmarks of each other. We will return to bookmarks when we
talk about the sqlcore.
Server Actions for data integrity
Ovrimos ensures data and referential integrity.
The data integrity is guaranteed by disallowing null values in any
one component of a primary key.
The data file manager of dbmcore examines the record and allows insertion
of the row only when it complies to this rule.
Other integrity rules, such as unique values in the primary key and,
in general, of every key that is defined as having unique values are checked
at this point.
Finally, fields other than keys, defined as not null are checked to
ensure that they comply to the rule.
A log is used to record all actions performed to the database. All actions recorded on the log are based on transactions. The beginning and end points of each transaction, as well as the actions on the data base are recorded on the log and used for a possible roll back of open transactions.
The log is particularly useful to allow on-line backup. Since a previous backup exists and the log is always updated, it is possible to take a new backup any time without having to stop the server from running. A copy of the log and the previous backup are made and may be used in combination if data recovery is required. See also, Database Maintenance.
A backup is automatically taken at shutdown. Every time the database is shutdown, the log is run forward, the backup files are replaced by the new backup, the dirty bit is turned to No and, subsequently, the log is erased. When the database is started again, a new log file starts and the dirty bit is turned to Yes.
It is also necessary to have a log for data recovery from a backup after an abnormal termination, such as power failure. In this case, the log is run and the database returns to the status it was before the crash. Afterwards, all uncommitted transactions are rolled back.
The procedures of restoring the database after a soft or a hard failure
are described in Database Maintenance:
Data recovery.
The SQLCORE
The sqlcore is a set of submodules, where the SQL commands are translated
to low-level calls and are passed to the dbmcore of Ovrimos.
The hub part of the sqlcore is the SQL compiler, which receives the
SQL commands and interprets them to elementary calls to the dbmcore.
The calls to the sqlcore come from either SQL clients or HTTP clients.
Before proceeding to the description of the SQL compiler, we examine the connection of SQL calls and HTTP calls to the sqlcore.
There is a pool of working virtual processors (called inas),
similar to threads, dedicated to handling either SQL commands or HTTP commands
and a multiplexer monitoring active sockets.
The number of inas for each database is a parameter that may be tuned
by the database manager (see also parameters).
When a request for connection, an HTTP request or an SQL message arrives,
the multiplexer examines if there are free inas to handle the request.
If an appropriate free ina exists, then the socket is assigned to it. Otherwise,
(if no free ina exists) the connection call is put on a priority wait-queue.
This mechanism is a built-in TP Monitor and limits the number of inas to
a manageable quantity and the system resources are used more efficiently.
Note! To avoid forcing the casual user to read technical details, inas are called threads throughout the rest of the manual sections.
The handling of an asynchronous SQL request from there on is graphically represented in picture app-3.4
The process of translating each SQL statement to simple calls to the
dbmcore is the compilation of each command followed by the execution, which
generates the elementary calls.
The steps of this translation are similar to the steps taken by a standard
compiler, which generates, instead of machine language instructions, executions
plans. The execution of such a plan generates dbmcore commands. These simple
dbmcore commands are the messages passed by the sqlcore to the shared memory
of the dbmcore (See also, dbmcore/sqlcore
communication).
Translation of an SQL command
Like all compilers, the SQL compiler is based on a number of passes,
where the input and actions of each pass depend on the results of the previous
pass.
The grammar, describing SQL is LALR(1) and is implemented with YACC.
Since YACC uses inherently static internal data structures, the resulting
code is not re-entrant. Thus, a critical section, shared by all the statement
handles (stmthdl) exists. At this point only a few changes are required
to turn the code into re-entrant and they will implemented in a future
version. Since this part of the code is small, the bottleneck is not severe.
During the syntax analysis of a command the semantic tree is also generated. Up to 200 types of tree nodes (terminal and non-terminal) are used at this point.
Next picture, app-3.5 is a general layout of the compilation phases an SQL command has to go through until it becomes an executable code by the dbmcore.

Each one of the four passes is described in detail below.
Syntax Analysis
It has already been mentioned that YACC is used for syntax analysis
of the SQL commands. The syntax analysis leads eventually to the analysis
of semantic data and the generation of the semantic data tree. Early during
syntax analysis, syntax errors and mistyped words are flagged and the user
is notified by an error message about the possible source of the problem.
If no syntax errors are found, then the preparation proceeds to the
rest of the passes, which reform the command to the optimal form it may
take before execution.
For the following discussion, the term expression will be used for an SQL command.
The first pass (preliminary actions)
Also, if an * exists in select, i.e. selection of all columns of a table, the * is replaced by all the columns of the table. These columns are prefixed with the range variable of the appropriate table, e.g. if table ALPHA has columns X, Y, Z and ALPHA has range variable ALPHA_0, then select * from ALPHA where ... is interpreted as: select ALPHA_0.X, ALPHA_0.Y, ALPHA_0.Z from ALPHA_0 where...
If the predicate ANY is used, e.g.
Conditions with EXISTS and UNIQUE are not simplified.
During the second pass the types are checked for compatibility and error messages are issued. If some types are compatible but not the same, they will be converted in the third pass.
The third pass
The actions in the third pass are:
During the fourth pass the translation and the production of the execution
plan is made. From the execution plan the elementary calls to the dbmcore
are generated.
The execution plan is a set of actions which must be done in order
to perform the required action.
Before proceeding any further into the description of the execution
plan production, the notion of the iterator is introduced, as iterators
constitute the building blocks of the execution plan.
For every command that contains a select statement, the user expects
a sequence of results. A set of results is fully defined when we can traverse
it and study its values. The mechanism through which we can traverse the
results is the cursor. More technically, one could say that a cursor
is the name of a pointer to the memory which is used for the execution
of a given command.
The notion of a sequence is a little narrower than the meaning of a
set, because a set has no order in its members, while the elements of a
sequence (at least as long as the traversing cursor is active) must always
be returned in the same order.
A sequence of results, like a set of results, may be described in two
ways, which are either through enumeration of its members e.g. {1,3,5,7,9}
or through a description of the relationship that generates the set, e.g.
{X/X odd, between 1 and 10}.
When the sequence of results is generated from the elements of an existing
table (either permanent or temporary) it is defined as an enumeration
or description by extension. On the other hand, when the sequence is
produced dynamically, as a combination of results that do not physically
exist in a table, then this is defined as a description by intension.
In both cases, the main point is to have a result generation in a way
that does not allow the user (as the final receiver of the results) to
distinguish the way the sequence has been obtained.
To describe the answers to SQL queries, a set of mechanisms, called
iterators, has been defined. The iterators are software mechanisms,
which can be combined recursively to produce the required result sequences.
These combinations are dynamic relationships between iterators and deal
with the way these iterators are embedded one inside the other to set out
the execution plan. Before setting the execution plan of any SQL query,
the compiler must always use an iterator, that describes the results. This
is necessary whenever results are produced, either when the rows physically
exist in a permanent data table, or when they are dynamically calculated
at traversal time. If the table is temporary, then the results are placed
in this table through the use of another iterator.
According to what was previously mentioned, the user should not be
able to distinguish what iterator combinations have been used and in what
way.
The iterators are the building blocks of the execution plan and will be briefly described before going into the details of the execution plan setup.
For the generation of an execution plan, the SQL compiler uses a well-defined set of iterators, which may be combined together to result to all possible sequences. This way, iterators belong to other iterators, but they are all handled in a uniform manner.
There is a cursor for each SQL statement. But each cursor that executes
a command uses the root iterator, which consists of many elementary iterators.
The root iterator is the abstract class iter. The notion of
an abstract class in this context is similar to the object-oriented language
abstract classes. All other classes are generated from an abstract class,
but no instances of this abstract class exist.
Picture app-3.6 is the abstract tree.

The static hierarchy tree shows how a class is generaterd from another and what properties does it own. For the dynamic tree, which deals with the embedding of iterators in the execution plan, we will talk in the execution plan paragraph.
iter
This is the hierarchy root. It defines the mobility rules in the result
sequence. It allows movement to the first, last, previous and next object
of the sequence. All iterators are scrollable.
1. A table that needs evaluation of the select expression.
2. Select with Join (use of many tables).
3. Sequence with the use of an index, where a composite key
exists and the first consituent of the key get their values from other
sequences.
For example, there may be an index with a composite key of two or more
columns, where the first part of the key is filled during the execution
of other sequences.
Every SQL command that results in a table is called table expression. Usually, the commands that have sequences of rows as a result (a table form) are select commands.
The notion of the access path has to do with the way the compiler decides
about the best path for a table expression, i.e. which is the best path,
depending on the conditions, that the compiler will have to follow in order
to traverse the records of a set of tables involved in a select statement.
Since the select statements are, in general, complex and nested, there
are many access paths in one execution plan.
The current paragraph refers to the path generation and the criteria,
according to which a path is defined as optimal.
For every select command there is always an elementary select command, that does not contain the clauses distinct, group by and having. The access path for such a command is called elementary access path.
General rules for finding the elementary access path
For a set of tables, a combination of tables and condition placing on these tables is made. For every possible combination of table access, a path is generated as follows:
Although this strategy is based on combinations and, theoretically, it could be proved very time-consuming, it has, practically, very good results, because the tables involved in a select statement are rarely more than three or four, rendering thus the number of possible combinations very low.
We call subquery the part of an SQL statement that follows a select (i.e. the tables, rows, and conditions involved a query). The order by clause does not belong to the subquery.
The access to every table is obtained through a fileiter and every table
is traversed either sequentially or via an index (seq_iter or indexed_iter).
There is an exception, where the access is made via a compo_iter, if
the creation of the compo_iter is justified by the IN conditions (e.g.
IN (value1, value2, ..., valueN)), when these conditions fill in the first
components of a subquery key or when sets of values (union_iter composed
of value_iter) end up in an indexed_iter. The indexed_iter exploits the
existing indexes and makes optimal use of the database.
As a criterion for the finding of the optimum access path, the minimization
of the number of accesses to the tables is considered. The criterion is
local, in the sense that every access path is examined locally (in isolation
for the select under consideration) and not in combination with any other
select.
The only exception where a non-local criterion may be used is when
an order by clause, which defines a desired sequence of results
exists in the subquery. In this case, it is possible to use the table keys
and avoid the generation of a temporary table. In this case the benefit
of using the keys may be deemed greater than the selection of an optimum
access path.
Calculation of the access path cost
For every access path there is a sequence of access to the tables and selection conditions from these tables: (Table1, Conditions1), ... (TableN, ConditionsN). As mentioned before, the conditions are placed as close to Table1 as possible, i.e. as close to the most external loop as possible.
Range of repetition
We call range, Ni, of each repetition the number of records
that we calculate that have to be examined (accesses to the table). These
accesses are either the size of the table (i.e. the number of rows) for
sequential iterators or a number that is calculated depending on the key
for indexed iterators.
Condition selectivity
We call selectivity of each condition, fi, the evaluated
percentage of records that fulfill the condition. Selectivity is calculated
according to the following criteria:
The = has a 0.01 selectivity
The <> has a 0.99 selectivity
The > and < have a 0.5 selectivity
When the condition is composite and contains the logical operators
AND and OR, then the AND has as selectivity the minimum of the conjunction,
while the OR has as selectivity the maximum of the disjunction.
In general, the selectivity is less than one with the exception of conditions
containing table expressions. In this case the selectivity calculation
will take into account the range of this expression as well.
The following discussion is a very simple example with a few nodes,
where the evaluation of the access path cost is presented.
Suppose there is a select statement, using tables A, B and C and there
exist conditions for each one of the tables.
In picture app-3.7 we can see one of the possible access paths, where
the tables and the conditions are placed and the evaluation of the cost
is performed.

The repetition range of the first level is N1 (i.e.
N1 accesses to table 1). The number of records selected
for the next level (i.e. satisfying the condition f1)
is a total of N1*f1. The repetition range
of the second level is N2, so there will be (N1*f1)*N2
accesses for this level. ((N1*f1)*(N2*f2)
of them will be satisfying this condition and these will be the records
to participate to the record selection of the next level.
In the third level, the repetition range is N3,
so we have
(N1*f1)*(N2*f2)*N3
records.
Condition f3 participates in the definition of
the final number of records that will be selected, but does not limit the
number of accesses to the last level.
The total number of accesses is the sum of accesses to each level,
i.e:
N1+(N1*f1)*N2+(N1*f1)*(N2*f2)*N3
The access path cost is evaluated by the generic formula:
N1+(N1*f1)*N2+(N1*f1)*(N2*f2)*N3+...+(Pi=1,n-1Ni*fi)*Nn
The last selectivity fn is not used in the evaluation,
because it does not have any effect at this point. It does not prevent
the access to the database, it merely shows which records will be selected
because of the condition.
When the cost evaluation of each access path is completed, the best
access path is selected (i.e. the one with the minimum cost).
Iterators and bookmarks
As mentioned before, every cursor corresponds to an SQL statement, which
generates an execution plan, which in its turn is a tree of iterators.
The leaves of this tree always refer to a traversable table (they are
fileiters). The fileiter may be a simple sequence or an indexed iterator,
but in both cases, there is a bookmark in the table and this bookmark corresponds
to the bookmark described in the data and index
manager of the dbmcore.
An SQL command may refer several times to the same table. Every fileiter
has its own position for every table. All in all, a cursor, as a tree of
iterators, has a number of current positions (bookmarks), which correspond
to the leaves of the tree.
The table expression found in the outermost select of the command is
called main table expression. In general, the main table expression
may contain a number of secondary table expressions. Each of them also
contains table expressions. Consequently, all internal select queries must
have been recursively evaluated before executing a select command.
Every table expression is correlated up to a certain degree to those
containing it. For every table expression execution an attempt is made
to place it as close to the outer point as possible (this way it is executed
fewer times).
It is possible, of course, to have an internal expression that is completely
uncorrelated. This expression may be transferred to the outmost part of the
execution plan.
Most of the time, a select expression is executed in two phases:
1. Evaluation of the table expression.
2. Traversal of the results.
The evaluation of a table expression may be considered as producing
a table as a result. In fact, it may either generate a real, temporary
table or refer to a virtual table via an iterator.
In case of a real, temporary table generation, the execution plan determines
the iterator, which will traverse another table and fill the real temporary
table. If prerequisites for the expression execution exist, then
we have a recursive evaluation.
Every SQL statement query is presented abstractly as a tree of nodes. Each of the nodes defines its own iterator. In the end, we define also, a basic operator, a primitive operator and a temporary table iterator.
Basic iterator of a node is the node that traverses the sequence generated by the evaluation of the expression represented by this node.
Primitive iterator is the one traversing a table expression that does not contain distinct, group by and having.
Temporary table iterator is the one that fills the temporary table with values, fetched by the traversal of an expression.
In the following table, the node type (i.e. the query type) is presented in one column and the iterator used to evaluate the query is presented in the other column. We use symbol S for the node, symbol iter(S) for the basic iterator and symbol iter'(S) for the primitive iterator.
The term subquery is used to describe a classical select statement with
syntax:
select <column-name-1>, ..., <column-name-2> | *
from <table-name-1>, ..., <table-name-2>
where condition
| Name | Node S = | iter(S) = |
| a | Select(without order by) with subquery x | iter (x) |
| a' | Delete searched with subquery x | iter(x) |
| a'' | Update searched with subquery x | iter(x) |
| b | Select with order by and subquery x | indexed_iter to a temporary table, generated by iterx(x) or iterx(x) if it is compatible with order by |
| c | Union all of x and y | union_iter (iter(x), iter(y)) |
| d | Union (without all) of x and y | seq_iter in a temporary table generated by union_iter |
| e | Subquery (with aggregation) | seq_iter in a temporary table generated by using iter'(S) for aggregation |
| f | Subquery with distinct (without aggregation) | seq_iter in a temporary distinct table generated by iter'(S) |
| g | Subquery with all (without aggregation) or select for update | iter'(S) |
| g' | Subquery inside a unique condition | No iterator is defined.
iter'(S) is used to fill a temporary table and check the unique condition. |
| h | Subquery count(*) in a non-scalar position | value_iter(S), where execution of S as a scalar select produces a value. |
Since the last line of the iterator table requires some explanation, we note that it is always the case that a select count(*) returns a number and not a table row. If, however, it becomes necessary to have the results returned as a sequence, then we use a value_iter, which returns only one value, but its existence allows a uniform handling of the iterator tree.
For every execution plan of a select statement there is an access path, which defines the order by which the real table accesses are made. If we wish to express concisely the relationship between the execution plan and the access path, we may say that the execution plan is a tree showing what is going to happen, In other words, the execution plan shows the order of the expression evaluation, while the access path shows how the evaluation is to be done, i.e. what is the order of accesses in the tables.
A compilation and execution example
In the following example there is a table (a system table), called _columns,
which has information about the columns of all the database tables.
Every row of the table contains information about the characteristics
of each column defined by the system.
The table has, among others, a composite key the (table_id, column_id),
which will be used in the example.
SQL statement: select * from _columns where table_id = 8 and column_id < 3;
Steps:
Replacement of the * by the table columns.
Column_0: 'table_id' smallint Column 1: 'column_id' smallint Column 2: 'column_name' varchar(60) Column 3: 'data_type' smallint Column 4: 'precision' smallint Column 5: 'length' smallint Column 6: 'scale' smallint Column 7: 'radix' smallint Column 8: 'nullable' bitThe range variable _COLUMNS_0 is used as a prefix to every table column. The types in conditions are checked for consistency and converted if necessary. The condition < is evaluated.
Query:
select _COLUMNS_0.table_id, _COLUMNS_0.column_id, _COLUMNS_0.column_name,
_COLUMNS_0.data_type, _COLUMNS_0.precision, _COLUMNS_0.length, _COLUMNS_0.scale,
_COLUMNS_0.radix, _COLUMNS_0.nullable from _COLUMNS as _COLUMNS_0
where (_COLUMNS_0.TABLE_ID = $cvt (8, smallint) and _COLUMNS_0.COLUMN_ID
< $pred ($cvt(3, smallint)))
where $cvt() is the type conversion function. In the first case, number 8 is converted to smallint to be consistent with the TABLE_ID type and in the second case, number 3 is converted to smallint to be consistent to the COLUMN_ID type. Function $pred() returns the predecessor.
The execution plan requires the proper iterator and shows the form of the row the iterator will select.
Execution Plan:
Composite iterator {
Traversal of table 2 _columns by key 0 {
Bounds:
0: Equal to
$cvt (8, smallint)
1: From
Minimum
To
$cvt (3, smallint)
}
Calculating:
_COLUMNS_0.table_id, _COLUMNS_0.column_id, _COLUMNS_0.column_name,
_COLUMNS_0.data_type, _COLUMNS_0.precision, _COLUMNS_0.length,
_COLUMNS_0.scale, _COLUMNS_0.radix, _COLUMNS_0.nullable
}
The execution plan shows a traversal of table 2 (_columns) with the key
table_id. Since (table_id, column_id) is a key (key 0), the condition is
evaluated using an indexed_iter, with bounds, the table_id = 8 for the
first part of the key (Equal To $cvt(3, smallint)), and column_id
< 3 (1:From Minimum To $cvt(3, smallint)).
Finally, we see the rows returned after finding which rows fulfill the conditions:
*
| 0 | table_id: 8 |
| 1 | column_id: 0 |
| 2 | column_name: 'table_id' |
| 3 | data_type: 5 |
| 4 | precision: (null) |
| 5 | length: 1 |
| 6 | scale: (null) |
| 7 | radix: (null) |
| 8 | nullable: off |
| 0 | table_id: 8 |
| 1 | column_id: 1 |
| 2 | column_name: 'fkey_no' |
| 3 | data_type: 5 |
| 4 | precision: (null) |
| 5 | length: 1 |
| 6 | scale: (null) |
| 7 | radix: (null) |
| 8 | nullable: off |
| 0 | table_id: 8 |
| 1 | column_id: 2 |
| 2 | column_name: 'col_no' |
| 3 | data_type: 5 |
| 4 | precision: (null) |
| 5 | length: 1 |
| 6 | scale: (null) |
| 7 | radix: (null) |
| 8 | nullable: off |
| Previous Chapter: Appendix - SQL |