Wednesday 21 July 2021

Index in SQL Server

Lets understand disk structure and how table data stores on disk.
In above diagram, orange color section is tracks and yellow color section is sector. In my case based on above diagram, I have disk with 6 sectors and 3 tracks. Track 1 (blue color) is inner circle from above diagram with number 1 shown above which is 2nd inner circle.
Gray color section is block. Block can be identified with the help of sector and track. Above gray highlighted block is from sector 4 and track 2.
Each block from disk has specific memory in my case lets assume 100 byte. Each block in disk is of size 100 bytes. Each block contains 100 offset shown below starts from 0-99.
Now lets understand above concept while storing table data. Suppose, you have database table having 100 rows, and for each 5 rows for storing into disk(Rom) requires 1 block.
Total blocks required to store entire table = 20 blocks
To search any row from this table, It might I have to search entire blocks which is called as table scan or heap. I have to go through and check all 20 blocks where table data stored which is time consuming task.
To avoid this, Index come into picture. Index hold primary key value/index column value as key and base address as value.
Index table also stored on disk and each 25 rows from index takes 1 block to store. Total records from index takes 4 blocks.
Now to find any specific records from table if index available, I need to go through 5 block, 4 blocks from index and 1 block is from actual block where row stored. I can directly find specific row address using index block.
Instead of scanning 20 blocks I have to scan 5 block which reduces time but also increase some space in disk to manage index.
Lets understand this with diagram shown below.
But as size of table increase, index size also increase and needs to search entire index, to avoid this multilevel index come into picture.
For storing database records, B-tree is used. Above diagram is just for understanding how actually multilevel index and DB table are in relation to reduce time reading data from database table.
B tree is extension to M-search tree, which is again created based on above concept of multilevel index. Support I want to retrieve
Lets turn diagram and understand with tree structure.


There are two major type of indexes in MS SQL Server
1. Cluster Index
2. Non Cluster Index

Cluster Index : Clustered index stored table data physically in sorted order. As clustered stored table data physically sorted order only one clustered index can be created per table. By default clustered index get created if primary is created for table. Internally it uses B-tree data structure and leaf node of B-tree contains table data.
Insert, update and delete operation become slow as data stored physically in sorted order. For insert or update operation it check free space available in page or not and if space not available in that case it make space available and insert/update data.
To avoid this, while creating index make sure you provide fill factor. Fill factor is percentage of occupied memory in page. Suppose you provide 80 as fill factor in that case 80% page filled and remaining 20% remains free.

Non Clustered Index : Non Clustered index not sort data physically in table. Non Clustered index can be created more than one and max 999 per table. Non clustered index also follow B-tree to stored data but leaf nodes of B-tree contains reference of page extend which contains table data.
Lets create table with index and without index to compare performance.
Right click on above query and select menu option display estimate execution plan.

Sunday 4 July 2021

SQL Join

In this article I will explain about joins in SQL server, join means combine/merge tow or more different table in SQL server based on certain join/filter condition. Join applied on two or more table when logical relation exists between SQL server tables. Logical relation means SQL server tables containing Primary and foreign key relation.

Join can also be applied on tables which may not be in primary and foreign key relation but data in both tables are more of a like parent-child (primary – foreign key relation). Lets understand with simple example.

Types of Join
1. Inner Join
2. Left Join
3. Right Join
4. Full Join
5. Cross Join


Before looking into types on join, lets create required tables and insert data.

CREATE TABLE [dbo].[tblDepartment]
(
[Id] INT NOT NULL PRIMARY KEY,
[Name] VARCHAR(50) NULL
)


CREATE TABLE [dbo].[tblEmployee]
(
[Id] INT NOT NULL PRIMARY KEY,
[Name] VARCHAR(50) NULL,
[DepartmentId] INT NULL
)


I have successfully created tblDepartment and tblEmployee. As discussed, above two tables not following parent-child relationship (primary and foreign key). Let’s insert some dummy data shown below.

Insert into tblDepartment values (1, 'IT')
Insert into tblDepartment values (2, 'Sales')
Insert into tblDepartment values (3, 'HR')
Insert into tblDepartment values (4, 'Finance')
Insert into tblDepartment (Id, Name) values (5, 'Accounts')


Insert into tblEmployee values (1, 'Salim', 4)
Insert into tblEmployee values (2, 'Rohit', 1)
Insert into tblEmployee values (3, 'John', 3)
Insert into tblEmployee values (4, 'Sameer', 2)
Insert into tblEmployee values (5, 'Saddam', 2)
Insert into tblEmployee values (6, 'Sandeep', 3)
Insert into tblEmployee values (7, 'Roma', 1)
Insert into tblEmployee values (8, 'Puja', 4)
Insert into tblEmployee (Id, Name) values (9, 'Ramesh')


Inner Join : It return all the records from both table which satisfy Join condition or common data from both the table based on join condition.

The above query will return common data from both the tables. If you notice I have inserted 9 rows in employee table but only 8 rows showing which are common from both the table and matching join condition. In Department table I have inserted 5 rows out of only 4 rows which are common shown in query result.

Left Join : This join returns all data from left table and common/matching data from both tables.
As shown in above figure, left join returns all data from left table (tblEmployee) and common data from both the tables.

Right Join : This join returns data from right table and common table from both the tables.
As shown in above figure, left join returns all data from right table (tblDepartment) and common data from both the tables.

Full Join : This type returns common data from both tables and also returns remaining data from both left and right tables.
Cross Join : This join is also called as Cartesian join since it returns the Cartesian product of the sets of rows from the joined tables. It means each row from table one combines with each row of the table two.
Number of rows return in cross join is multiplication of number of rows from table one and table two.

Number of rows from table one * Number of rows from table two = Cross Join [ 9 * 5 = 45 ]