DBMS PART II
Chapter 6
Integrity
- Integrity means completeness
or wholeness of the database.
- Initially each program
was taking care of different validations and consistency constraints, ( so
as to put only valid data into the database ) but as field goes on increasing,
the database starts taking care of different integrity constraints instead
of each program.
- The integrity constraints
are mainly of two types
Implicit integrity constraints
Explicit integrity constraints
Implicit integrity constraints
Domain Integrity :- These are the constraints through which values for set
of columns or tables is defined. ( Use DDL ). Means fix the upper and lower
limit values for required fields in the table or tables, so as to achieve
completeness of data.
Referential Integrity :- It there are two relations R & S, if an attribute
set in R refers to the attribute set in S which is also primary key of S in
such case the attribute set in R must have value which is either available
in S or null. Here R is referencing table while S is referenced table ( Foreign
key concept )
With referential integrity one should define 3 properties
- Cascade :- Deletion from
S will delete records from R
- Set to Null :- Values
in the referencing table should be set to null if deletion is in referenced
table.
- Restrict :- Any change
in referencing table will not change anything in referenced table.
Entity Integrity :- All entities should have / must some valid values in its
prime attribute and values should not be Null. i.e. Primary key should not
be Null.
Functional Dependancy :- If in relation R there is set of attributes called
X and other set called Y, for each occurrence of X there will be only one
occurrence of Y and in such case Y is functionally dependant on X.
X Y
Say for roll num. 73 all other attributes should be same.
Explicit Integrity Constraints
Assertion :- It means to say something forcefully. It is a predicate expressing
condition that we wish, the database always to satisfy. Generally defined
by the user while using program.
Triggers :- A trigger is a statement that is executed automatically by the
system as a side effect of a modification to the database. When certain event
occours then the respective action is defined.
Condition Action
Chapter 7
Recovery
- Recovery means to get
back the information, which is lost. Recovery system of a DBMS is designed
to anticipate and recover from the following types of failure
1. Failures without loss of data ( Mainly transaction fails before it reaches
to commit state. )
1. Failure with loss of volatile storage :- Mainly due to software and hardware
errors with loss of data from volatile storage.
1. Failure with loss of non-volatile storage :- Head crash on a disk drive
or errors in writing to a nonvolatile storage.
1. Failures to stable storages :- Mainly natural or man-made disasters. Recovery
of this type requires manual regeneration of the database.
- Two types of recovery
:- Preventive ( Follow Backup methods )
Corrective ( Load backup after disasters )
- What Types of Backup
1. Hardware Backup :- Tested hardware Backup arrangement
1. Software Backup :- Application software installations.
1. Human Backup :- Manpower Backup.
Backup ( Factors to be considered at the time of backup )
- Frequency :- Depends
upon volume of transactions and data.
- Backup storage device
:- Onsite ( backup at same location ) or offsite.
- Backup place :- Offsite
( at other branch )
- Backup media :- Volumewise,
Time required for backup, Cost.
- Backup should be taken
in standard format so it can be useful afterwards.
- Types of backup
Incremental backup Application backup Directory wise backup
Sector by sector backup User wise backup
- Who to take backup (
allocation of responsibilities )
- What records should to
be taken as backup ( selected files ) Maintain backup registers.
- Testing of backup :-
backup should be tested again give responsibility to someone.
- Backups are for maximum
quality of recovery in less time. Compression utilities can be used but risk
of loosing data. Maintaining generations of backup ( Grand - parent - child
relations of backup ) Floppy sets for single days backup.
Transaction
- Transaction is access
to the database. It is logical process of work. Change of consistent stage
of data to another consistent stage of data. ( FIGURE )
- Stages of transaction
Initiation of transaction
Partially committed transaction ( not written to phy. Buffers )
Failures
Committed stage of transaction
Aborted transaction
- Properties of transaction
Automicity :- It must be all or none.
Correctness :- Consistency should be maintained.
Isolation :- Effects of transaction should not be felt outside unless and
until committed.
Durability :- Once transaction gets committed its effects should be permanent
or durable.
Serializability :- Transaction must be serializable. Schedules generated by
transaction must be serialazible.
If volatile memory fails due to above aborted transactions then there are
three mechanisms or recovery.
- Log Based Recovery
- Shadow Paging Techniques
- Check Point Mechanism
Log Based Recovery
Also called as journals. Log is nothing but a simple data file, maintained
on stable storage. It contains history to the accesses of database. History
means tran. No., Before image, After image, time, User Id., Record Id. Etc.
). All such files are maintained by DBMS itself. Normally these are sequential
files.
Recovery has two factors Rollback ( Undo )and Roll forward ( Redo ).
Two types of log
Write Ahead Log ( Writes in physical buffer first )
Differed Log ( Writes in log file first )
Check - Point
If millions of transactions at a point then for recovery the method is check
point mechanism. It is an extension to a log based mechanism. It is a decided
interval at decided frequency. The contents ( may be in incomplete stage )
are dumped on non volatile storage.
There is a separate check-point record in log file which have all contents
after say 1 hour. It is a efficient mechanism. No need of going by to history.
Database administrator will look after the period of check-point.
Shadow Paging Technique
A page in memory is a segment which is accessible. It is a logical concept.
Page table is an index of pages ( where pages are on HD, kept in this table
). As transaction goes on, concurrent page table gets changed. Shadow page
table is a copy current page table which is maintained on stable memory. It
is alive during the life of transaction only. And current page table become
shadow page for next transaction.
Shadow paging technique leads to problem of garbage and garbage collection
problem (garbage means wastage of space in memory. )
Utilities
Defrag
Garbage collection routine
Chapter 9
Concurrency Control
When many transactions are being executed simultaneously then it is called
as concurrent transactions. Concurrency is required to increase time efficiency
and to utilize resources more freely. Concurrency control is required to maintain
consistency of data and to avoid Phantom phenomena.
Concurrency control mechanisms
Locks Time stamp based mechanism
Optimistic scheduling Multiversion technique
Locks : Lock is nothing but a variable associated with data item and
which gives status of data item whether it is assessable or not. Lock indicates
binary value i.e. open or closed / on or off. Locks are of different types
such as
Exclusive Lock : No one can read and write the value of the data item
except the owner.
Shared Lock : It is also called as read only lock where other transaction
can refer the value but can not write.
Wait Lock : It is put when data item is locked by some other transaction.
A separate wait queue is maintained to decide the order.
There are two protocols are used to implement the locks
Two Phase Protocol : It is having two phases in the locking mechanism.
1. Growing Phase : In which transaction obtains locks may not release.
2. Shrinking Phase : in which transaction may release locks but may
not obtain.
Intention locks : it provides explicit locking at the lower lever of the tree
(finer granularity) and intention locks on all ancestors.
Time stamp based mechanism : A serial order is created among the concurrent
transactions by assigning to each transaction a unique non decrementing number,
normally the system clock value, at the start of the transaction is assigned
hence the name time stamp ordering. This value can used in deciding the order
in which conflict between two transactions can be resolved. It enforces serialisability
through chronological order of the time stamp of concurrent transactions.
A conflict occurs when an older transaction tries to read a value written
by a younger transaction or older transaction tries to modify a value already
read or written by a younger transaction.
Optimistic Scheduling : It assumes that all data items can be successfully
updated at the end of the transaction and to read in data values without locking.
Reading is done and if any data item is found inconsistent with respect to
the value in, at the end of the transaction then the transaction is rolled
back. Optimistic scheduling has 3 steps.
Read phase : various data items are read and stored in temporary local variables.
All operations are performed in these variables without actually updating
the database.
Validation phase : All concurrent data items in data base are checked.
Any change in the value causes the transaction rollback.
Write phase : If second phase is passed, the transaction gets committed.
Optimistic scheduling doesn't use any locks hence deadlock free. However problem
of starvation of popular data item may occur.
Multiversion Technique : In MVT each write and update of a data item is achieved
by making a new copy or version of that data item. This is also called as
time domain addressing scheme. It follows accounting principle of never overwrite
a value. History of the evolution of data item values is recorded in the database.
In case of read operation DBMS selects one of the versions for processing.
There are two major problems with MVT
1. it is little bit slower in operations
2. Rollbacks are expensive
Chapter 10
Distributed Databases
In a distributed database system, the database is stored on several computers.
The computers in a distributed system communicate with one another through
various communication media's such as high-speed buses or telephone lines.
They do not share main memory, nor they do share a clock.
A distributed database system consists of a collection of sites, each of may
participate in the execution of transaction which access data at one site,
or several sites. The main difference between centralized and distributed
database systems is that, in the former the data resides in one single location
while in the later, the data resides in several locations. A distributed database
system consists of a collection of sites, each of which maintains a local
database system. Each site is able to process local transactions, those transactions
that access data only in the single site. In addition, a site may participate
in the execution of global transactions.
Advantages of Data distribution
- Ability to share and
access data in a reliable and efficient manner, user at one site may be able
to access data available at another site.
- Reliability and Availability
:- If one site fails in a distributed system, the remaining sites may be able
to continue operating. Thus the failure of a site does not necessarily imply
the shutdown of the system.
- Speedup of query processing
:- If query involves data at several sites, it may be possible to split the
query into sub queries that can be executed in parallel. ( Parallel evaluation
)
- Local Control
Disadvantages of distributed database
- Difficult to implement,
more costly
- Greater potential for
bugs (harder to ensure the correctness of algorithms)
- Increased prepossessing
overheads ( more cost of exchange of messages and additional computation )
Distribution Techniques (One have to consider following aspects at the time
of Distribution)
- Distribution of Transaction
- Distribution of Design
- Distribution of Data
Activity of distribution
- Concept of data fetching
- Transaction processing
where data resides
- Data from two places
send somewhere else and then transaction gets executed
Concept of fragmentation & Replication
Multi database system
Naming techniques
Chapter 11 Security
The data stored in the database needs to be protected from unauthorized access,
malicious destruction or alteration and accidental introduction of inconsistency.
Misuse of the database can be categorized as being either international or
accidental. Accidental loss of data consistency may result from
1. Crashes during transaction processing
2. Anomalies due to concurrent access
3. Logical errors that violates assumptions
Database security usually refers to security from malicious access, while
integrity refers to the avoidance of accidental loss of consistency.
Security measures must be taken at several levels.
1. Physical security : physical security to system i.e. armed guards
2. Human security : authorized carefully to reduce the change of any such
user giving access to an intruder in exchange for bribe or other favors.
3. Operating system : Security at operating system level, Network security
and software level security.
4. Database security : it is also called as Logical security
Authorization and views
A view can hide data that a user does not need to see. The ability of views
to hide data serves both to simplify usage of the system and to enhance security.
Relational database systems typically provide security at two levels
Relational : a user may be permitted or denied direct access to a relation.
View : a user may be permitted or denied access to data appearing in
a view.
The authorization is mainly maintained in form of table called an access matrix.
Contains rows called subjects and columns called objects. Column indicate
types of access.
Security means (Definition)
Assets of organization be Protected Reconstructable , Auditable, Tamper proof
- Its users should be identifiable and their actions should be Authorized
and monitored.
Physical Security : Protecting the assets of IT from Fire, Water, Energy variations,
cyclones, earthquakes etc.
Logical security
The authorization mechanism prepared the user profile for a user and indicates
the portion of the database assessable to that user and the mode of access
allowed. Before making any request, the user has to identify himself or herself
to the system and authenticate the identification to confirm that the user
is the correct person. Various method used in authentication are
1. Something known only to the user : Password
2. Something in user's possession : encoded keys or badges
3. Some characteristic or user : require the use of special hardware and software
to identify some physical or psychological characteristics of user.
4. Eye scanners - Retina Scanners
5. Skull scanners
6. Thumb impression - Palm scanners
7. User profile database
8. Grace logins
9. Locking the server for specific Day, Date, Time
Data Encryption
Various authorization provisions may not be sufficient to provide protection
for highly sensitive data, in such cases data may be encypted. The science
of encryption is known is cryptology. It is conversion of plain text in cipher
text.
A good encryption technique has following properties
It is relatively simple for authorized users to encrypt and decrypt data.
The encryption scheme depends not on the secrecy of the algorithm but on a
parameter of the algorithm called as encryption key.
It is extremely difficult for an intruder to determine the encryption key.
|
|
|
|