Tuesday, August 14, 2012

Query not using Index... 2 of 4


Index Selectivity - Selectivity decide which data access method is optimum for execution plan. As a guideline index should be created on columns when query is returning 15% or less of table rows. Selectivity ranges from 0 to 1, 1 being ideal and it can be 1 on primary keys or unique keys indexes. The higher the index selectivity( near to 1) the more chances of optimizer using it for index scan.

           Selectivity = No of distinct values in column/Total number of rows.


Selectivity ranges from 0 to 1 and 1 is ideal. Selectivity of 1 exists in primary key or unique keys columns where each value is unique. The poor index selectivity means - the less chance of index being utilized.

Selectivity Example:

SQL>Drop table test;

SQL>Create table test( id number, name varchar(10)) ;

SQL>Insert into test select round(DBMS_RANDOM.VALUE(1,15)), upper(DBMS_RANDOM.STRING('A',10)) NAME from user_objects;

541 Records inserted.

SQL>Create index idx_test_id on test(id);

SQL>Create index idx_test_name on test(name);

SQL>Exec dbms_stats.gather_table_stats(user, 'test', method_opt=>'for all columns size auto' );

SQL>Exec dbms_stats.gather_table_stats(user, 'test' );

SQL>Select index_name,distinct_keys, num_rows from user_indexes where table_name='TEST'
Index_name        distinct_keys num_rows
-------------------  --------------  --------------
IDX_TEST_NAME 541              541
IDX_TEST_ID       15               541 

There are all unique values in IDX_TEST_NAME and selectivity of doing index scan would be 1.

Select id, count(1), round(count(1)/541,2)*100||'%' "% of Total" from test group by id order by id

ID Count %Of Total
-- ----    -------
1  34     6%
2  39     7%
3  33     6%
4  36     7%
5  30     6%
6  47     9%
7 31      6%
8 30      6%
9 36      7%
10 50    9%
11 32    6%
12 45    8%
13 40    7%
14 42    8%
15 16    3% 
 
Select distinct_keys, B.NUM_ROWS, round(distinct_keys/b.num_rows*100)||'%' "Selectivity of using Index" from user_indexes a, user_tables b where a.table_name=b.table_name and a.table_name = 'TEST' and index_name like 'IDX%'; 

Distinct_Keys Num_rows Selectivity of using Index
--------------  ------------ -----------------------------
15               541              3%
541             541             100% 
 
SQL> select * from test where id=1;
Execution Plan
----------------------------------------------------------                     
                                                                               
---------------------------------------------------------------                
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)|                
---------------------------------------------------------------                
|   0 | SELECT STATEMENT  |      |    34 |   476 |     2   (0)|                
|   1 |  TABLE ACCESS FULL| TEST |    34 |   476 |     2   (0)|                
---------------------------------------------------------------                
                                                                               
SQL> select * from test where id=2;
Execution Plan
----------------------------------------------------------                     
                                                                               
---------------------------------------------------------------                
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)|                
---------------------------------------------------------------                
|   0 | SELECT STATEMENT  |      |    39 |   546 |     2   (0)|                
|   1 |  TABLE ACCESS FULL| TEST |    39 |   546 |     2   (0)|                
---------------------------------------------------------------                
                                                                               
SQL> select * from test where id=16;
Execution Plan
----------------------------------------------------------                     
                                                                               
--------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)|
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |     7 |    98 |     2   (0)|
|   1 |  TABLE ACCESS BY INDEX ROWID| TEST        |     7 |    98 |     2   (0)|
|   2 |   INDEX RANGE SCAN          | IDX_TEST_ID |     7 |       |     1   (0)|
--------------------------------------------------------------------------------
                                                                               
In case of id=1 or 2 optimizer thinks the cost of doing table scan (2) would be cheaper than cost of index scan( 2(index scan) +2(table scan) =4). Look into the following cost when I am forcing optimizer to use index.

SQL> select /*+ index(test, idx_test_id) */ * from test where id=2;
Execution Plan
----------------------------------------------------------                     
                                                                               
--------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)|
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |    34 |   504 |     4   (0)|
|   1 |  TABLE ACCESS BY INDEX ROWID| TEST        |    34 |   504 |     4   (0)|
|   2 |   INDEX RANGE SCAN          | IDX_TEST_ID |    34 |       |     1   (0)|
--------------------------------------------------------------------------------