This site is 100% ad supported. Please add an exception to adblock for this site.

SQL, Storage, Indexing, Design, Tuning


undefined, object
copy deck
The amount of data read or written in one I/O operation.
Same as a block
Blocking factor
The number of physical records per block.
Inputs of physical design phase
Logical (implementation) model
DBMS Characteristics
Response time requirements
Security, Backup, Recovery, Retention, Integrity requirements
Outputs of physical design phase
Produces a Description of the Implementation of the Database on Secondary Storage.

Describes the storage structures and access methods used to achieve efficient access to the data.
Primary & Secondary index differences
Primary: An ordered, fixed-length file that stores the primary key and a pointer.
Each record in the index file corresponds to each block in the ordered data file.
Sparse / nondense index

A secondary index is an ordered file that stores the nonordering field of a data file and a pointer.
If the indexing field is a secondary or candidate key, then the index is dense.
If the indexing field is a nonkey field, then the index is sparse.
Introduces the idea of adding a level of indirection
Clustered vs. Unclustered indexes
If order of data records is the same as, or `close to’, order of data entries, then called clustered index.

A file can be clustered on at most one search key.
Cost of retrieving data records through index varies greatly based on whether index is clustered or not!
Describe hash based indexes
Hash function calculates the address of the page on which the record is stored.
The field that’s used in the hash function is called the hash field.
Called a hash key if the hash field is also a key field.
Good for equality searches
Not good for range searches
Describe B-tree Indexes
If the root is not a leaf node, it must have at least two children.
For a tree of order n, each node (other than root/leaf nodes) must have between n/2 and n pointers and children.
⬦ (other rules)
Tree must always be balanced
Every path from the root to a leaf must have same length.  This is what matters.
This means that it always takes about the same time to access any record.
Leaf pages contain data entries, and are chained (prev & next)
Non-leaf pages contain index entries and direct searches:
Tradeoffs in deciding which indexes to create
Indexes can make queries go faster, updates slower but require disk space, too.
Index selection guidelines
Attributes in WHERE clause are candidates for index keys.
Exact match condition suggests hash index.
Range query suggests tree index.
Clustering is especially useful for range queries; can also help on equality queries if there are many duplicates.
Multi-attribute search keys should be considered when a WHERE clause contains several conditions.
Order of attributes is important for range queries.
Such indexes can sometimes enable index-only strategies for important queries.
For index-only strategies, clustering is not important!
Try to choose indexes that benefit as many queries as possible. Since only one index can be clustered per relation, choose it based on important queries that would benefit the most from clustering.
Rules of thumb for indexes
Index primary key fields
DBMS may do this automatically
Composite keys require composite indexes
Index foreign key fields
Index other fields frequently used in:
WHERE clauses
GROUP BY clauses
ORDER BY clauses
Consider composite indexes when fields are used together in conditions
Advantages/Disadvantages of Clustered Indexes
Efficient storage; faster searches than heap
Faster inserts/deletes than sorted file

Only one clustering factor per file
Advantages/Disadvantages of Unclustered tree indexes
Fast searches, fast inserts/deletes

Slow scans and range searches with many matches
Advantages/Disadvantages of Hashed indexes
Same as unclustered, but faster on equality searches

Does not support range searches
Four steps of transaction analysis
1. Create xact/table cross-reference and/or usage map.
2. Determine which tables are most frequently accessed by xacts
3. Analyze usage data for selected xacts that involve the most frequently-used tables.
4. Design to improve performance of key transactions (indexes)
Which transactions should be analyzed?
May not be able to analyze all xacts
Focus on “important” transactions
80/20 rule
80% of data access comes from 20% of xacts
Also consider high-priority xacts
What information is contained in a cross reference table
Transactions and relations and the operations performed for each transaction
What information should you consider when performing a detailed analysis of a transaction
Tables and attributes used
Type of access (select, update)
Avoid indexes for update xacts
Attributes used in conditional statements
Range vs. equality (cluster vs. hash index)
Attributes used in joins
Attributes used in order/group by
Expected frequency of query
Performance goals:
High freq/priority queries are priority candidates for indexes.
What is denormalization, why would it be considered
Introduce additional redundancy
Improves performance
May lead to anomolies
What is a view? Why are views used?
Virtual table based on base tables in the DB
Only the definition really exists in the DB
Provide a limited “view” of the DB
“Pre-do” joins
Limit tables
Limit columns
Way to implement logical independence
Used for security
Name and briefly describe four design goals for distributed DBMS'
Local transparency
users don’t need to know where data is stored
Replication transparency
user can treat data as if it is stored at only one site, even if it is replicated at several sites
Failure transparency
a transaction is committed or not committed
commit protocol
Concurrency transparency
it appears to the user as their transaction is the only activity on the system

Deck Info