Oracle and Hadoop Blogs: August 2012

This is continuation of my previous post : http://vkoracle.blogspot.com/2012/08/query-not-using-index-1-of-4.html

Index Selectivity - Selectivity decide which data access method is optimum for execution plan. As a guideline index should be created on columns when query is returning 15% or less of table rows. Selectivity ranges from 0 to 1, 1 being ideal and it can be 1 on primary keys or unique keys indexes. The higher the index selectivity( near to 1) the more chances of optimizer using it for index scan.

Selectivity = No of distinct values in column/Total number of rows.

Selectivity ranges from 0 to 1 and 1 is ideal. Selectivity of 1 exists in primary key or unique keys columns where each value is unique. The poor index selectivity means - the less chance of index being utilized.

Selectivity Example:

SQL>Drop table test;

SQL>Create table test( id number, name varchar(10)) ;

SQL>Insert into test select round(DBMS_RANDOM.VALUE(1,15)), upper(DBMS_RANDOM.STRING('A',10)) NAME from user_objects;

541 Records inserted.

SQL>Create index idx_test_id on test(id);

SQL>Create index idx_test_name on test(name);

SQL>Exec dbms_stats.gather_table_stats(user, 'test', method_opt=>'for all columns size auto' );

SQL>Exec dbms_stats.gather_table_stats(user, 'test' );

SQL>Select index_name,distinct_keys, num_rows from user_indexes where table_name='TEST'
Index_name        distinct_keys num_rows
------------------- -------------- --------------
IDX_TEST_NAME 541              541
IDX_TEST_ID       15               541　

There are all unique values in IDX_TEST_NAME and selectivity of doing index scan would be 1.

Select id, count(1), round(count(1)/541,2)*100||'%' "% of Total" from test group by id order by id

ID Count %Of Total
-- ----    -------
1 34     6%
2 39     7%
3 33     6%
4 36     7%
5 30     6%
6 47     9%
7 31    6%
8 30      6%
9 36      7%
10 50    9%
11 32    6%
12 45    8%
13 40    7%
14 42 8%
15 16 3%　
　
Select distinct_keys, B.NUM_ROWS, round(distinct_keys/b.num_rows*100)||'%' "Selectivity of using Index" from user_indexes a, user_tables b where a.table_name=b.table_name and a.table_name = 'TEST' and index_name like 'IDX%';　

Distinct_Keys Num_rows Selectivity of using Index
-------------- ------------ -----------------------------
15               541              3%
541            541             100%　
　
SQL> select * from test where id=1;
Execution Plan
----------------------------------------------------------

---------------------------------------------------------------
| Id | Operation         | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------
|   0 | SELECT STATEMENT |      |    34 |   476 |     2   (0)|
|   1 | TABLE ACCESS FULL| TEST |    34 |   476 |     2   (0)|
---------------------------------------------------------------

SQL> select * from test where id=2;
Execution Plan
----------------------------------------------------------

---------------------------------------------------------------
| Id | Operation         | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------
|   0 | SELECT STATEMENT |      |    39 |   546 |     2   (0)|
|   1 | TABLE ACCESS FULL| TEST |    39 |   546 |     2   (0)|
---------------------------------------------------------------

SQL> select * from test where id=16;
Execution Plan
----------------------------------------------------------

--------------------------------------------------------------------------------
| Id | Operation                   | Name        | Rows | Bytes | Cost (%CPU)|
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |     7 |    98 |     2   (0)|
|   1 | TABLE ACCESS BY INDEX ROWID| TEST        |     7 |    98 |     2   (0)|
|   2 |   INDEX RANGE SCAN          | IDX_TEST_ID |     7 |       |     1   (0)|
--------------------------------------------------------------------------------

In case of id=1 or 2 optimizer thinks the cost of doing table scan (2) would be cheaper than cost of index scan( 2(index scan) +2(table scan) =4). Look into the following cost when I am forcing optimizer to use index.

SQL> select /*+ index(test, idx_test_id) */ * from test where id=2;
Execution Plan
----------------------------------------------------------

--------------------------------------------------------------------------------
| Id | Operation                   | Name        | Rows | Bytes | Cost (%CPU)|
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |    34 |   504 |     4   (0)|
|   1 | TABLE ACCESS BY INDEX ROWID| TEST        |    34 |   504 |     4   (0)|
|   2 |   INDEX RANGE SCAN          | IDX_TEST_ID |    34 |       |     1   (0)|
--------------------------------------------------------------------------------

Oracle and Hadoop Blogs

Tuesday, August 14, 2012

Query not using Index... 2 of 4

Selectivity = No of distinct values in column/Total number of rows.