Database Management System

Chat

DBMS PART II
Chapter 6
Integrity

Integrity means completeness or wholeness of the database.
Initially each program was taking care of different validations and consistency constraints, ( so as to put only valid data into the database ) but as field goes on increasing, the database starts taking care of different integrity constraints instead of each program.
The integrity constraints are mainly of two types
Implicit integrity constraints
Explicit integrity constraints
Implicit integrity constraints
Domain Integrity :- These are the constraints through which values for set of columns or tables is defined. ( Use DDL ). Means fix the upper and lower limit values for required fields in the table or tables, so as to achieve completeness of data.
Referential Integrity :- It there are two relations R & S, if an attribute set in R refers to the attribute set in S which is also primary key of S in such case the attribute set in R must have value which is either available in S or null. Here R is referencing table while S is referenced table ( Foreign key concept )
With referential integrity one should define 3 properties …
Cascade :- Deletion from S will delete records from R
Set to Null :- Values in the referencing table should be set to null if deletion is in referenced table.
Restrict :- Any change in referencing table will not change anything in referenced table.
Entity Integrity :- All entities should have / must some valid values in its prime attribute and values should not be Null. i.e. Primary key should not be Null.
Functional Dependancy :- If in relation R there is set of attributes called X and other set called Y, for each occurrence of X there will be only one occurrence of Y and in such case Y is functionally dependant on X.
X Y
Say for roll num. 73 all other attributes should be same.
Explicit Integrity Constraints
Assertion :- It means to say something forcefully. It is a predicate expressing condition that we wish, the database always to satisfy. Generally defined by the user while using program.
Triggers :- A trigger is a statement that is executed automatically by the system as a side effect of a modification to the database. When certain event occours then the respective action is defined.
Condition Action
Chapter 7
Recovery
Recovery means to get back the information, which is lost. Recovery system of a DBMS is designed to anticipate and recover from the following types of failure
1. Failures without loss of data ( Mainly transaction fails before it reaches to commit state. )
1. Failure with loss of volatile storage :- Mainly due to software and hardware errors with loss of data from volatile storage.
1. Failure with loss of non-volatile storage :- Head crash on a disk drive or errors in writing to a nonvolatile storage.
1. Failures to stable storages :- Mainly natural or man-made disasters. Recovery of this type requires manual regeneration of the database.
Two types of recovery :- Preventive ( Follow Backup methods )
Corrective ( Load backup after disasters )
What Types of Backup
1. Hardware Backup :- Tested hardware Backup arrangement
1. Software Backup :- Application software installations.
1. Human Backup :- Manpower Backup.
Backup ( Factors to be considered at the time of backup )
Frequency :- Depends upon volume of transactions and data.
Backup storage device :- Onsite ( backup at same location ) or offsite.
Backup place :- Offsite ( at other branch )
Backup media :- Volumewise, Time required for backup, Cost.
Backup should be taken in standard format so it can be useful afterwards.
Types of backup
Incremental backup Application backup Directory wise backup
Sector by sector backup User wise backup
Who to take backup ( allocation of responsibilities )
What records should to be taken as backup ( selected files ) Maintain backup registers.
Testing of backup :- backup should be tested again give responsibility to someone.
Backups are for maximum quality of recovery in less time. Compression utilities can be used but risk of loosing data. Maintaining generations of backup ( Grand - parent - child relations of backup ) Floppy sets for single days backup.
Transaction
Transaction is access to the database. It is logical process of work. Change of consistent stage of data to another consistent stage of data. ( FIGURE )
Stages of transaction
Initiation of transaction
Partially committed transaction ( not written to phy. Buffers )
Failures
Committed stage of transaction
Aborted transaction
Properties of transaction
Automicity :- It must be all or none.
Correctness :- Consistency should be maintained.
Isolation :- Effects of transaction should not be felt outside unless and until committed.
Durability :- Once transaction gets committed its effects should be permanent or durable.
Serializability :- Transaction must be serializable. Schedules generated by transaction must be serialazible.
If volatile memory fails due to above aborted transactions then there are three mechanisms or recovery.
Log Based Recovery
Shadow Paging Techniques
Check Point Mechanism
Log Based Recovery
Also called as journals. Log is nothing but a simple data file, maintained on stable storage. It contains history to the accesses of database. History means tran. No., Before image, After image, time, User Id., Record Id. Etc. ). All such files are maintained by DBMS itself. Normally these are sequential files.
Recovery has two factors Rollback ( Undo )and Roll forward ( Redo ).
Two types of log
Write Ahead Log ( Writes in physical buffer first )
Differed Log ( Writes in log file first )
Check - Point
If millions of transactions at a point then for recovery the method is check point mechanism. It is an extension to a log based mechanism. It is a decided interval at decided frequency. The contents ( may be in incomplete stage ) are dumped on non volatile storage.
There is a separate check-point record in log file which have all contents after say 1 hour. It is a efficient mechanism. No need of going by to history. Database administrator will look after the period of check-point.
Shadow Paging Technique
A page in memory is a segment which is accessible. It is a logical concept. Page table is an index of pages ( where pages are on HD, kept in this table ). As transaction goes on, concurrent page table gets changed. Shadow page table is a copy current page table which is maintained on stable memory. It is alive during the life of transaction only. And current page table become shadow page for next transaction.
Shadow paging technique leads to problem of garbage and garbage collection problem (garbage means wastage of space in memory. )
Utilities
Defrag
Garbage collection routine
Chapter 9
Concurrency Control
When many transactions are being executed simultaneously then it is called as concurrent transactions. Concurrency is required to increase time efficiency and to utilize resources more freely. Concurrency control is required to maintain consistency of data and to avoid Phantom phenomena.
Concurrency control mechanisms
Locks Time stamp based mechanism
Optimistic scheduling Multiversion technique
Locks : Lock is nothing but a variable associated with data item and which gives status of data item whether it is assessable or not. Lock indicates binary value i.e. open or closed / on or off. Locks are of different types such as …
Exclusive Lock : No one can read and write the value of the data item except the owner.
Shared Lock : It is also called as read only lock where other transaction can refer the value but can not write.
Wait Lock : It is put when data item is locked by some other transaction. A separate wait queue is maintained to decide the order.
There are two protocols are used to implement the locks
Two Phase Protocol : It is having two phases in the locking mechanism.
1. Growing Phase : In which transaction obtains locks may not release.
2. Shrinking Phase : in which transaction may release locks but may not obtain.
Intention locks : it provides explicit locking at the lower lever of the tree (finer granularity) and intention locks on all ancestors.
Time stamp based mechanism : A serial order is created among the concurrent transactions by assigning to each transaction a unique non decrementing number, normally the system clock value, at the start of the transaction is assigned hence the name time stamp ordering. This value can used in deciding the order in which conflict between two transactions can be resolved. It enforces serialisability through chronological order of the time stamp of concurrent transactions. A conflict occurs when an older transaction tries to read a value written by a younger transaction or older transaction tries to modify a value already read or written by a younger transaction.
Optimistic Scheduling : It assumes that all data items can be successfully updated at the end of the transaction and to read in data values without locking. Reading is done and if any data item is found inconsistent with respect to the value in, at the end of the transaction then the transaction is rolled back. Optimistic scheduling has 3 steps.
Read phase : various data items are read and stored in temporary local variables. All operations are performed in these variables without actually updating the database.
Validation phase : All concurrent data items in data base are checked. Any change in the value causes the transaction rollback.
Write phase : If second phase is passed, the transaction gets committed.
Optimistic scheduling doesn't use any locks hence deadlock free. However problem of starvation of popular data item may occur.
Multiversion Technique : In MVT each write and update of a data item is achieved by making a new copy or version of that data item. This is also called as time domain addressing scheme. It follows accounting principle of never overwrite a value. History of the evolution of data item values is recorded in the database. In case of read operation DBMS selects one of the versions for processing. There are two major problems with MVT
1. it is little bit slower in operations
2. Rollbacks are expensive
Chapter 10
Distributed Databases
In a distributed database system, the database is stored on several computers. The computers in a distributed system communicate with one another through various communication media's such as high-speed buses or telephone lines. They do not share main memory, nor they do share a clock.
A distributed database system consists of a collection of sites, each of may participate in the execution of transaction which access data at one site, or several sites. The main difference between centralized and distributed database systems is that, in the former the data resides in one single location while in the later, the data resides in several locations. A distributed database system consists of a collection of sites, each of which maintains a local database system. Each site is able to process local transactions, those transactions that access data only in the single site. In addition, a site may participate in the execution of global transactions.
Advantages of Data distribution
Ability to share and access data in a reliable and efficient manner, user at one site may be able to access data available at another site.
Reliability and Availability :- If one site fails in a distributed system, the remaining sites may be able to continue operating. Thus the failure of a site does not necessarily imply the shutdown of the system.
Speedup of query processing :- If query involves data at several sites, it may be possible to split the query into sub queries that can be executed in parallel. ( Parallel evaluation )
Local Control
Disadvantages of distributed database
Difficult to implement, more costly
Greater potential for bugs (harder to ensure the correctness of algorithms)
Increased prepossessing overheads ( more cost of exchange of messages and additional computation )
Distribution Techniques (One have to consider following aspects at the time of Distribution)
Distribution of Transaction
Distribution of Design
Distribution of Data
Activity of distribution
Concept of data fetching
Transaction processing where data resides
Data from two places send somewhere else and then transaction gets executed
Concept of fragmentation & Replication
Multi database system
Naming techniques
Chapter 11 Security
The data stored in the database needs to be protected from unauthorized access, malicious destruction or alteration and accidental introduction of inconsistency.
Misuse of the database can be categorized as being either international or accidental. Accidental loss of data consistency may result from
1. Crashes during transaction processing
2. Anomalies due to concurrent access
3. Logical errors that violates assumptions
Database security usually refers to security from malicious access, while integrity refers to the avoidance of accidental loss of consistency.
Security measures must be taken at several levels.
1. Physical security : physical security to system i.e. armed guards
2. Human security : authorized carefully to reduce the change of any such user giving access to an intruder in exchange for bribe or other favors.
3. Operating system : Security at operating system level, Network security and software level security.
4. Database security : it is also called as Logical security
Authorization and views
A view can hide data that a user does not need to see. The ability of views to hide data serves both to simplify usage of the system and to enhance security. Relational database systems typically provide security at two levels
Relational : a user may be permitted or denied direct access to a relation.
View : a user may be permitted or denied access to data appearing in a view.
The authorization is mainly maintained in form of table called an access matrix. Contains rows called subjects and columns called objects. Column indicate types of access.
Security means (Definition)
Assets of organization be Protected Reconstructable , Auditable, Tamper proof - Its users should be identifiable and their actions should be Authorized and monitored.
Physical Security : Protecting the assets of IT from Fire, Water, Energy variations, cyclones, earthquakes etc.
Logical security
The authorization mechanism prepared the user profile for a user and indicates the portion of the database assessable to that user and the mode of access allowed. Before making any request, the user has to identify himself or herself to the system and authenticate the identification to confirm that the user is the correct person. Various method used in authentication are …
1. Something known only to the user : Password
2. Something in user's possession : encoded keys or badges
3. Some characteristic or user : require the use of special hardware and software to identify some physical or psychological characteristics of user.
4. Eye scanners - Retina Scanners
5. Skull scanners
6. Thumb impression - Palm scanners
7. User profile database
8. Grace logins
9. Locking the server for specific Day, Date, Time
Data Encryption
Various authorization provisions may not be sufficient to provide protection for highly sensitive data, in such cases data may be encypted. The science of encryption is known is cryptology. It is conversion of plain text in cipher text.
A good encryption technique has following properties
It is relatively simple for authorized users to encrypt and decrypt data.
The encryption scheme depends not on the secrecy of the algorithm but on a parameter of the algorithm called as encryption key.
It is extremely difficult for an intruder to determine the encryption key.