2011-09-12

Clustered and Nonclustered Indexes

A common interview question is the difference between a clustered and a nonclustered index in SQL Server. To really understand it, you need to understand how SQL Server stores data...

Clustered indices include both the sorted data columns and the entire row. In SQL Server, you may only have one clustered index, and your clustered index will likely (but not always) be your primary key. The data is added into the data pages (which are logical splits of data records on your disk.

Nonclustered indices, however, copy the data specified into a new data store (a logical table, if you will) that's sorted based on the data specified in your index. This does result in duplicated data and as a result, a larger footprint to store your table on disk than you might expect. Also, if when querying on a non-clustered index, if you return a column not specified in the index, SQL Server will seek your non-clustered index for all of it's data, and then it will perform a lookup to the original data pages (read: Clustered Index), and therefore you'll take a bit extra of an I/O hit.

Now that you understand the basics, here's the super simplified answer:

Clustered - One per table, table data stored in specified index order.
Nonclustered - Any number per table, Index data is copied, and contains a pointer to the Clustered Index.

More details found on books online...


If you're really looking to optimize your data store, you'll really want to get into how SQL Server stores data, and I won't lie, that's one of the driest topics out there...



No comments: