Tuning Collections in Queries [1]

[To Introduction

I will continue series about tuning collections in queries starting from the simplest case – collection of values joined to one table. Our subject SQL statements look similar to the following two forms.

1)
select table1.*
  from table1
 where table1.f1 in
       (select column_value from table(test_statement.getList1) nlist1);

and

2)
select table1.*
  from table1, table(test_statement.getList1) nlist1
 where t1.object_id = nlist1.column_value;

We will run them in several ways by adding hints and CBO parameters. First test run is dedicated to show default CBO computations. Note cryptic name KOKBF$ in the predicate section of the explain plan. KOKBF$ is a generic alias given the table expressions. COLLECTION ITERATOR PICKLER FETCH is an operation that retrieves values from collections (if collection was instantiated in the query the operation will read: COLLECTION ITERATOR CONSTRUCTOR FETCH).

Test 1.1:

select /*+ gather_plan_statistics */        t1.object_name,t1.object_type   from t1
where t1.object_id in        (select *           from table(cast(test_statement.getList1 as table_of_number)) nlist1)

Plan:

----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |
----------------------------------------------------------------------------------------------------------------------------
|   1 |  NESTED LOOPS                       |          |      1 |    255 |   1252K|00:02:07.32 |    5091K|    177K|   2959 |
|   2 |   SORT UNIQUE                       |          |      1 |    255 |   1878K|00:00:31.30 |      31 |   2959 |   2959 |
|   3 |    COLLECTION ITERATOR PICKLER FETCH| GETLIST1 |      1 |        |   2000K|00:00:04.28 |       0 |      0 |      0 |
|   4 |   TABLE ACCESS BY INDEX ROWID       | T1       |   1878K|      1 |   1252K|00:01:31.73 |    5091K|    174K|      0 |
|*  5 |    INDEX UNIQUE SCAN                | T1_I     |   1878K|      1 |   1252K|00:00:37.04 |    3839K|  28328 |      0 |
----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("T1"."OBJECT_ID"=VALUE(KOKBF$))

About two minute runtime is not completely bad. But let’s see what we are driving into here. Nested Loops and Index Access isn’t very good choice for getting > 10% of the data of the base table. Clearly, the estimated cardinality in bold is the culprit here. The value in row with Id 2 is not the part of the output, rather it was taken from CBO trace figures. The original 10053 trace file doesn’t have definite clue where this figure comes from. For the reason described below, this could be a function of the default block size.

BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: KOKBF$  Alias: KOKBF$  (NOT ANALYZED)
    #Rows: 8168  #Blks:  100  AvgRowLen:  100.00

...
Join order[2]:  KOKBF$[KOKBF$]#1  T1[T1]#0
    SORT resource      Sort statistics
      Sort width:         459 Area size:      402432 Max Area size:    80530432
      Degree:               1
      Blocks to Sort:    2865 Row size:           13 Total Rows:        1800000
      Initial runs:         2 Merge passes:        1 IO Cost / pass:       1554
      Total IO sort cost: 4419      Total CPU sort cost: 1764307007
      Total Temp space used: 28894000
***************
Now joining: T1[T1]#0
***************
NL Join
  Outer table: Card: 255.25  Cost: 4653.33  Resp: 4653.33  Degree: 1  Bytes: 2

The outer table KOKBF$ was assigned default cardinality as num_of_blocks * (block_size – cache_layer) / avg_row_len. In all tests cache_layer is 24. Something tells me it is similar to the 24 bytes per INITRANS, but I still cannot get my head around the fact where the block fixed header size has gone. It might be something else though. Anyway, given the figures, our default cardinality is 100 * (8192 – 24) / 100 = 8168. With 16K block this would equal 16360 etc. When the collection is uniquely sorted for “IN” clause, its computed cardinality is reduced to 255.25 and rounded to 255. With 16K block this becomes 511.5. This is persuasively looks like default_cardinality/32. One more reason to think the value comes from default statistics assumptions shown in the below CBO trace which is generated by this simple query.

select distinct column_value from table(test_statement.getList1);

The CBO trace in this case has following number of distinct values taken from the default statistics of the collection. If this is so, it explains how CBO arrives at the cardinality of the uniquely sorted collection values in the above calculation.

BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: KOKBF$  Alias: KOKBF$  (NOT ANALYZED)
    #Rows: 8168  #Blks:  100  AvgRowLen:  100.00
  Column (#1): KOKBC$(VARCHAR2)  NO STATISTICS (using defaults)
    AvgLen: 1.00 NDV: 255 Nulls: 0 Density: 0.0039177

But I will shut up about it for now until I find out this for certain. Now let’s see what the second form of the testing queries produce.

Test 1.2:

select /*+ gather_plan_statistics */
 t1.object_name,t1.object_type   from t1, table(cast(test_statement.getList1 as table_of_number)) nlist1
where t1.object_id = nlist1.column_value

Plan:

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------------
|   1 |  NESTED LOOPS                      |          |      1 |   8168 |   1333K|01:27:04.70 |    5422K|   2692K|
|   2 |   COLLECTION ITERATOR PICKLER FETCH| GETLIST1 |      1 |   8168 |   2000K|00:00:06.30 |       0 |      0 |
|   3 |   TABLE ACCESS BY INDEX ROWID      | T1       |   2000K|      1 |   1333K|01:26:51.90 |    5422K|   2692K|
|*  4 |    INDEX UNIQUE SCAN               | T1_I     |   2000K|      1 |   1333K|00:05:39.05 |    4088K|   1382K|
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("T1"."OBJECT_ID"=VALUE(KOKBF$))

In this case the default cardinality of the 8168 rows of the collection was left intact. The plan didn’t change from Nested Loops for obvious reasons. The time and number of reads skyrocketed. This happened because the collection values were randomly chosen and not sorted as they were in the previous example. Access of the base table became more scattered and subsequently more reads took place. If a key value distribution is known and clustered, it might be worth using co-located values in collections.

Now it’s time to move on to the different settings. To change CBO default cardinality estimation this query uses officially unsupported CARDINALITY hint. You should ask Oracle permission to use it in production.

Test 2.1 (adjusted cardinality):

select /*+ gather_plan_statistics */        t1.object_name,t1.object_type   from t1  where
t1.object_id in        (select /*+ cardinality(nlist1 1800000) */ *           from table(cast(test_statement.getList1 as table_of_number)) nlist1)

Plan:

----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |
----------------------------------------------------------------------------------------------------------------------------
|   1 |  NESTED LOOPS                       |          |      1 |    255 |   1252K|00:01:37.31 |    5092K|    172K|   2959 |
|   2 |   SORT UNIQUE                       |          |      1 |    255 |   1878K|00:00:27.63 |     160 |   2970 |   2959 |
|   3 |    COLLECTION ITERATOR PICKLER FETCH| GETLIST1 |      1 |        |   2000K|00:00:04.32 |     129 |     11 |      0 |
|   4 |   TABLE ACCESS BY INDEX ROWID       | T1       |   1878K|      1 |   1252K|00:01:04.31 |    5091K|    169K|      0 |
|*  5 |    INDEX UNIQUE SCAN                | T1_I     |   1878K|      1 |   1252K|00:00:23.96 |    3839K|  23944 |      0 |
----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("T1"."OBJECT_ID"=VALUE(KOKBF$))

No changes in the plan and statistics. What happened?! Did CBO even considered the change? Let’s take a look at the CBO trace.

***********************
Table Stats::
  Table: KOKBF$  Alias: KOKBF$  (NOT ANALYZED)
    #Rows: 8168  #Blks:  100  AvgRowLen:  100.00
***************************************
SINGLE TABLE ACCESS PATH
  No statistics type defined for function TEST_STATEMENT
  Table: KOKBF$  Alias: KOKBF$
    Card: Original: 8168    >> Single Tab Card adjusted from: 8168.00  to: 1800000.00
  Rounded: 1800000  Computed: 1800000.00  Non Adjusted: 8168.00

It has obviously mentioned the hint value. But the final cardinality got value as before for the same reason of collection being processed as part of “IN” sub-query. I’ll get back to this a little later.

Alright, let’s see what the second form would be like.

Test 2.2 (adjusted cardinality):

select /*+ gather_plan_statistics cardinality(nlist1 1800000) */
t1.object_name,t1.object_type   from t1, table(cast(test_statement.getList1 as table_of_number)) nlist1  where t1.object_id = nlist1.column_value

Plan:

---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |
---------------------------------------------------------------------------------------------------------------------------
|*  1 |  HASH JOIN                         |          |      1 |   1800K|   1333K|00:01:37.68 |     145K|    189K|  45724 |
|   2 |   COLLECTION ITERATOR PICKLER FETCH| GETLIST1 |      1 |   1800K|   2000K|00:00:04.22 |       0 |      0 |      0 |
|   3 |   TABLE ACCESS FULL                | T1       |      1 |     10M|     10M|00:00:31.42 |     145K|    143K|      0 |
---------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."OBJECT_ID"=VALUE(KOKBF$))

Now, the hinted cardinality is in the final execution plan. And no wonder the CBO switched the plan from Nested Loops to Hash Join. Compare the logical and physical reads of this test and tests 1.1 and 2.1. If I would have allowed bigger hash area for this query, it could run in-memory and show even better results. Last test for this post is suppose to fix the query with the sub-query. For this we would need to do an additional modification. This time modification of the CBO parameter “_always_semi_join” is needed.

Test 3.1: (adjusted cardinality, “_always_semi_join”=off):

select /*+ gather_plan_statistics */        t1.object_name,t1.object_type   from t1  where
t1.object_id in        (select /*+ cardinality(nlist1 1800000) */ *           from table(cast(test_statement.getList1 as table_of_number)) nlist1)

Plan:

-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |
-----------------------------------------------------------------------------------------------------------------------------
|*  1 |  HASH JOIN                           |          |      1 |   1800K|   1252K|00:01:26.75 |     145K|    195K|  50757 |
|   2 |   VIEW                               | VW_NSO_1 |      1 |   1800K|   1878K|00:00:11.52 |       0 |   6825 |   6825 |
|   3 |    HASH UNIQUE                       |          |      1 |   1800K|   1878K|00:00:09.64 |       0 |   6825 |   6825 |
|   4 |     COLLECTION ITERATOR PICKLER FETCH| GETLIST1 |      1 |   1800K|   2000K|00:00:04.24 |       0 |      0 |      0 |
|   5 |   TABLE ACCESS FULL                  | T1       |      1 |     10M|     10M|00:00:31.40 |     145K|    144K|      0 |
-----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."OBJECT_ID"="COLUMN_VALUE")

Why, you might ask, the Semi Join parameter has any effect on this? In pre evaluation phase CBO tries to apply different query rewrite techniques in order to optimize the query. One of them is unnesting sub-query. In this case it can either do Semi Join or unnest the sub-query as a set (collection of distinct values). Since the Semi Join is disabled it used the later. The sub-query took following form with cardinality hint propagated to the outer query.

SELECT "T1"."OBJECT_NAME" "OBJECT_NAME","T1"."OBJECT_TYPE" "OBJECT_TYPE" FROM  (SELECT /*+ OPT_ESTIMATE (TABLE "KOKBF$" ROWS=1800000.000000 ) */ DISTINCT VALUE(KOKBF$) "COLUMN_VALUE" FROM TABLE(CAST("TEST"."TEST_STATEMENT"."GETLIST1"() AS "TABLE_OF_NUMBER") ) "KOKBF$") "VW_NSO_1","TEST"."T1" "T1" WHERE "T1"."OBJECT_ID"="VW_NSO_1"."COLUMN_VALUE"

Basically the original query was rewritten into following form. It is similar to the second original form with the exception of distinct clause. Cardinality hint was not ignored as it was in the original query and CBO generated desired plan.

select /*+ cardinality(nlist1 1800000) */
       t1.object_name,t1.object_type
  from t1
     , (select distinct column_value from table(cast(test_statement.getList1 as table_of_number))) nlist1
 where t1.object_id = nlist1.column_value;

Few words in conclusion. When tuning these queries, make sure CBO doesn’t rely on default statistics. In most cases CBO will subsequently generate sub-optimal plan, i.e. NLs instead of desired HJ or vice versa. If you cannot disable Semi Join, unnest “IN” sub-query. When tuning “IN” sub-query, use cardinality value close to the number of distinct elements, because that is what CBO will use in its calculations. If distribution of table data is deterministic, co-locate the values of the joined collection whenever possible to increase clustering of the access operation.

Advertisements

6 Responses to “Tuning Collections in Queries [1]”

  1. sukumar Says:

    Tried these hints and they worked great. Really appreciate this great article

  2. Vlad Sadilovskiy Says:

    Hi Sukumar,

    Thanks for the feedback. The credits should also be allowed to Tom Kyte and Jonathan Lewis for the thoughts they had given on asktom.com.

    Now I’m in the process of researching and developing framework that would allow this matter be seamlessly used in applications that use collections via Hibernate/EJB3 infrastructure.

  3. Venkat Says:

    Of the given 2 subject SQL statement, i see that the former(1) is more optimal than (2). (1) has better cardinality than (2).
    I did not use any hints.

    Your thoughts?

  4. Brian Blades Says:

    Had this problem with a query the same as you 1st example (IN query) except the database was Oracle 9.2.0.6. Couldn’t work around it with hints.

  5. jackson kayak Says:

    Fantastic beat ! I would like to apprentice
    while you amend your web site, how can i subscribe for
    a blog website? The account aided me a acceptable deal.
    I had been tiny bit acquainted of this your broadcast provided bright clear idea

  6. Антон Says:

    Множество Ваших партнеров и покупателей ждут Вас!

    ПИШИТЕ ТОЛЬКО НА ЭТУ ПОЧТУ: baza-gorodov(собака)yandex.ru

    Любые базы данных фирм 2016 года городов стран: России, Белоруссии, Украины и Казахстана

    Базы данных НЕ ИЗ 2 ГИС! Собираем базы данных из ЯНДЕКСА !!!

    Стоимость баз от 700 рублей за 1 город!

    В базах есть (формат Ексель): страна, регион, населенный пункт, адрес, телефон, ЕМАЙЛЫ, сайт, город,
    (разделы и рубрики для выбора по фильтру), а так же странички или группы соцсетей фирм: ВК, Твиттер, Фейсбук, ОД

    Пишите какие базы городов Вам надо, так же собираем базы по видам деятельности фирм.

    ГАРАНТИЯ СБОРА 2016 ГОДА, а так же быстрота предоставления!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: