[Oracle] Wait event "library cache lock" by (security) design and exploring system state dumps (oradebug)

November 10, 2012, 5:29 am

≫ Next: [Oracle] A short SQL*Net research or how-to drill down network related ORA errors

Introduction

In the last days i received a few AWR reports by mail from my colleague due to Oracle performance issues in a SAP environment. There were several different issues (based on insufficient) SQLs (and structures), but there was also a pretty interesting behavior about "library cache locks". This blog will focus on this "unusual" library cache issue only and how to troubleshoot such issues.

AWR reports

Particular sections of the AWR reports basically looked like this (ignoring details like snapshot time, database time and so on).

Notice the very unusual high "connection management call elapsed time" and the not tolerable wait event "library cache lock" in this snapshot period. Let's check the official Oracle documentation for these metrics first.

library cache lock

This event controls the concurrency between clients of the library cache. It acquires a lock on the object handle so that either:

One client can prevent other clients from accessing the same object
The client can maintain a dependency for a long time which does not allow another client to change the object

This lock is also obtained to locate an object in the library cache.

Library cache lock will be obtained on database objects referenced during parsing or compiling of SQL or PL/SQL statements (table, view, procedure, function, package, package body, trigger, index, cluster, synonym). The lock will be released at the end of the parse or compilation.

connection management call elapsed time

Amount of elapsed time spent performing session connect and disconnect calls.

The documentation describes, that a "library cache lock" usually occurs by parsing or compiling a SQL statement, but what is the connection between parsing / compiling a SQL and the high wait times by connecting and disconnecting to/from the database. Are these two different problems? We need to dig much deeper to get the root cause of these two issues.

System state dump

There are several ways to analyze this issue now, but the easiest way to get all of the relevant information is a system state dump (if you are not on-site to query all of the different views and x$ table structures on your own). Before going into the details - let's explain a system state dump first:

"A system state is a set of process states for all processes on the instance when the dump is taken. A system state dump is useful in determining the interaction between processes. A systemstate dump will report on resources that are being held by each process. The <level> sets the amount of additional information that will be extracted from the processes found by SYSTEMSTATE dump based on the "STATE" of the node.

1 Very basic process information only
2 process + session state objects
10 Most common level - includes state object trees for all processes
Level + 256 Adding 256 to the level will try to dump short stack info for each process.

You need to run an oradebug command (to get a useful system state dump) when that particular library cache lock issue occurs. I usually start with the common level 10.

shell> su - ora<SID>
shell> sqlplus / as sysdba
SQL> oradebug setmypid
SQL> oradebug unlimit
SQL> oradebug dump systemstate 10

The system state dump file is created in RDBMS ADR (since 11g) or in the usertrace folder (until 10g). Now let's search for the wait event "library cache lock" in that trace file.

I just include the interesting parts (corresponding to that "library cache lock" issue) of the system state dump. There is a lot of other useful information in it, but let's keep the focus on our library cache lock issue here.

We got our both sessions (interestingly both from the same OS process) waiting on the library cache lock. We also see the corresponding library cache handle (7000000ed99d980) and its lock (7000000daa7f108) address. So let's search for these addresses in the trace file.

We can examine the details about that library cache object lock and that it is associated with the namespace "ACCOUNT_STATUS". It seems to be something like the status of an user account (like column ACCOUNT_STATUS in DBA_USERS) in the library cache just by interpreting the name of the namespace.

I checked MOS and found some relating bug (13496395) information like that:

"A session logging on keeps hold of a lock on the ACCOUNT_STATUS object for the user unnecessarily"

Unfortunately the client was not running this particular database version and the necessary steps to run into this bug were not relevant, but we verified our assumption about that particular library cache lock usage.

I found some information on other blogs and finally MOS note #1309738.1 after further researching:

"'Library cache lock' can be observed when concurrent users login with wrong password to the database"

Wow .. what a feature ;-)) .. a solution was already provided in 11.2.0.2 (luckily the database version, that the client was running on), but the event "28401 trace name context forever, level 1" was not set. Mystery solved.

Summary

Do not brute force your oracle database or you will badly end up with "library cache locks" (with default settings). System state dumps can be very helpful, if you need all the relevant information when you are not on-site or if you need to dig deeper into locking or latching issues. Additionally there are a lot of useful details in system state dumps for researching, troubleshooting and learning purposes.

Further information about that "new security feature" can be found in this blog: Simon Kwek - Logon failures causes "row cache lock" waits

If you have any further questions - please feel free to ask or get in contact directly, if you need assistance by troubleshooting Oracle database issues.

References

↧

[Oracle] A short SQL*Net research or how-to drill down network related ORA errors

January 2, 2013, 1:56 am

Latest and popular articles on SAP ERP

≫ Next: [Oracle] Advanced (performance) troubleshooting with oradebug and stack sampling

≪ Previous: [Oracle] Wait event "library cache lock" by (security) design and exploring system state dumps (oradebug)

Introduction

In the last weeks i was involved in a SAP migration project for troubleshooting Oracle database related issues when my client noticed multiple "ORA-12571: TNS:packet writer failure" errors. This specific error occurred only on high network load like running R3load in a migration scenario or SGEN for the whole ABAP source code afterwards. Luckily this was not a sporadic issue as it occurs every time at high network load and so it could be reproduced easily.

The R3load error looks like this (trimmed the R3load output to the most important lines):

(RTF) ########## WARNING ###########        Without ORDER BY PRIMARY KEY the exported data may be unusable for some databases

ORADPL:2012.12.20 11:52:36 (ret=-1) ORA-12571: TNS:packet writer failure

ORADPL:2012.12.20 11:52:36 fatal error in line 4357, (status=12571) error loading stream
(IMP) ERROR: ExeFastload: rc = 2
(DB) INFO: disconnected from DB

SAP already provides sapnote #534729 with general explanations for such issues, but unfortunately it includes no description or further information how-to troubleshoot such network related ORA errors in SQL*Net beside using niping which is mostly useless in such cases. This blog keeps it focus on unix based operation systems like AIX or Linux and the database version 11g R2.

Enable the trace

As previously mentioned we were running on Oracle 11.2.0.3 with ADR enabled (by default). Please check the references (Diagnostic Parameters in sqlnet.ora), if you want to get more information about ADR and the SQL*Net tracing possibilities (for older Oracle database releases).

We need to set the following parameters in sqlnet.ora (usually located in /sapmnt/<SID>/profile/oracle for newer SAP releases or specified by environment variable TNS_ADMIN) to enable the SQL*Net trace on client side.

ADR_BASE = <FOLDER_FOR_TRACE_FILES>
TRACE_LEVEL_CLIENT = SUPPORT

Please be aware, that these trace files can become very large in a short time - so be sure that you have enough space in the corresponding file system. Each oracle database connection will create a separate trace file from now.

Reproduce and analyze

After we have enabled the SQL*Net trace it is time to reproduce the specific ORA error and check the corresponding trace file. In our case we ran another R3load run and waited for the "ORA-12571: TNS:packet writer failure" issue.

The first section of the trace file contains a summary about the trace options and client settings.

Afterwards the usual data stream (like an INSERT statement with all of its data content) is traced until the specific network error occurs and the SQL*Net stream is interrupted and aborted. I omit the whole SQL*Net data stream and jump to the error itself right now.

The INSERT data stream aborted right before line "ntt2err". The following trace information points us to the real issue. You find the "generic" network related ORA-12571, but you also see the root cause of this error (line "nioqper: nt (2) err code: 70"). I have marked the underlying error code of the operating system in blue (in our case 70).

Now let's check the operating system error definition for that code (in my case the system was running on AIX, but it works that way on Linux as well).

shell> cat /usr/include/errno.h | grep 70 
#define ENETUNREACH 70 /* Network is unreachable */

Now we exactly know why an ORA-12571 was raised, because of the network itself was unreachable. Starting from here we could go on with checking the operating system logs for network issues (like a temporary link down). In my particular case the network traffic was routed on the loopback interface (localhost) so searching the bug database of the operating system vendor for TCP/IP stack issues or running tcpdump (and checking the output with Wireshark) would be a better way here.

Summary

Enabling the SQL*Net trace and checking for the root (operating system) cause is the proper way, if you can not find any obvious network outtakes / errors in your environment when an network related ORA error is raised.

If you have any further questions - please feel free to ask or get in contact directly, if you need assistance by troubleshooting Oracle database issues.

References

↧

[Oracle] Advanced (performance) troubleshooting with oradebug and stack sampling

January 10, 2013, 4:36 am

Latest and popular articles on SAP ERP

≫ Next: [Oracle] SQL Execution Engine Part I - Join data by hashing alias Hash Joins

≪ Previous: [Oracle] A short SQL*Net research or how-to drill down network related ORA errors

Introduction

Every now and then you notice an Oracle performance issue without any obvious reason. In the past year two of my clients hit the following issue without any previous notice. This blog focus on an "advanced troubleshooting technique", if all of the usual performance analysis are failed. It describes the performance issue, the root cause (in this particular case it was an Oracle bug) and the analysis itself by using a drill down approach. It will also introduce several tools and scripts that i use on regularly basis.

"Drill-Down Approach"

1. Using the wait interface / additional SQL execution statistics

Sampling or grouping the wait events for a particular SQL by V$SESSION or AWR. For further analysis i use the additional SQL execution statistics, SQL tracing or the Real Time SQL Monitoring since Oracle 11g R2. In many cases you already can provide a solution with this data, but what if the SQL wait interface is not showing anything (like CPU usage only). You can check the root causer (execution plan step) of the CPU usage with Real Time SQL Monitoring, but not the "why".

2. Using the session statistics / performance counters

Sampling data with the simple but still awesome V$SESSTAT. This view reports what Oracle is doing - not anything about timing (like the wait interface), but you can track down the operation that Oracle is doing. At this time i want to introduce the first tool for sampling this particular data called Snapper by Tanel Poder. We will use this script later on for troubleshooting the previously mentioned Oracle bug.

3. Using stack traces or system call / signal traces

If all of the previous efforts are still not enough and you still have no clue what is going on and why - one of the last options is about checking the internals stuff like call stacks and so on. You should not be so often at this point, but if it is getting really nasty you are here

There are several tools for tracing and most of them are OS platform dependent - just to mention a few:

DTrace for Solaris (unfortunately such a comprehensive dynamic tracing facility is only available on Solaris in full function, but we can use an Oracle database approach to get similar information on any OS platform - introducing the second tool called "Oradebug short_Stack Profiler" also by Tanel Poder)
Pstack / Truss
Strace

A lot of theory about approaches and how i usually do it, but let's go on with a real life example and see how it looks like and why it makes sense (to me).

The performance issue

My two clients called me, because of their SAP systems were nearly unusable. In both cases it was a "large" SAP system with thousands of concurrent users and nearly all work processes try to perform a simple INSERT on the same table. This blog explains the approach and the issue using the example of BSIS, but it can occur on any other table as well. Each INSERT (for just one data set) took round about 60 to 80 seconds and in consequence all work processes were busy.

The SQL looks quite simple like this:

SQL> INSERT INTO "BSIS" VALUES(:A0 ,:A1 ,:A2 ,:A3 ,:A4 ,:A5 ,:A6 ,:A7 ,:A8 ,:A9 ,:A10 ,:A11 ,:A12 ,:A13 ,:A14 ,:A15 ,:A16 ,:A17 ,:A18 
,:A19 ,:A20 ,:A21 ,:A22 ,:A23 ,:A24 ,:A25 ,:A26 ,:A27 ,:A28 ,:A29 ,:A30 ,:A31 ,:A32 ,:A33 ,:A34 ,:A35 ,:A36 ,:A37 ,:A38 ,:A39 ,:A40 ,
:A41 ,:A42 ,:A43 ,:A44 ,:A45 ,:A46 ,:A47 ,:A48 ,:A49 ,:A50 ,:A51 ,:A52 ,:A53 ,:A54 ,:A55 ,:A56 ,:A57 ,:A58 ,:A59 ,:A60 ,:A61 ,:A62 ,
:A63 ,:A64 ,:A65 ,:A66 ,:A67 ,:A68 ,:A69 ,:A70 ,:A71 ,:A72 ,:A73 ,:A74 ,:A75 ,:A76 ,:A77 ,:A78 ,:A79 ,:A80 ,:A81 );

The analysis

Let's follow the previous explained approach for troubleshooting this issue. Here is a short summary of the corresponding database objects (we will need that later on) before we start with the analysis itself.

1. Using the wait interface / additional SQL execution statistics

Unfortunately the wait interface (by running query on V$SESSION) showed no waiting state (like "db file sequential read" or "enq: TX - index contention") - the SQL statement was just running on CPU. From time to time you could see a wait event called "latch: cache buffers chain", that gave us a first hint why the CPU was burned up (by running through the database cache like crazy).

The execution plan of this SQL basically looks like this:

In this case it makes no sense to run this SQL with additional execution statistics or monitor it with Real Time SQL Monitoring, because of that execution plan is too simple and we already know that it is running on CPU only.

2. Using the session statistics / performance counters

We need to dig into the session statistics to check what Oracle is doing by running on CPU as we don't get more information from the wait interface now. I ran the snapper tool on a particular session that performed that INSERT statement on BSIS.

Our previous assumption was right - the CPU is used by running through the database cache only. The statistic "consistent gets from cache (fastpath)" is just a special way (by pinning buffers) of getting the data from the buffer cache. The database reads a lot of data - just for a simple INSERT with 2 indexes.

I wanted to be absolutely sure, that these buffer gets correspond to the table and not to the indexes. I flushed the buffer cache and enabled a short SQL trace by using the PL/SQL procedure DBMS_MONITOR.SESSION_TRACE_ENABLE for one particular session.

This output is just an extract of the whole SQL trace, but it continues that way. The INSERT statement performed thousands of single block I/Os on the object 437114 (which refers to the table object BSIS and not to the indexes - check the object summary above) for no obvious reason.

We knew the root cause of the high CPU usage now, but what we still did not know why Oracle performed so many single block I/Os on the table for a simple INSERT statement. There is no logical reason for that in the current context. So we needed to check the call stack, if there is an indicator for that behavior.

3. Using stack traces or system call / signal traces

I did not have anything like DTrace for checking the call stack in real time as the database was running on AIX. I have used the manual approach by running oradebug several times and checking the call stack, but you could also use the previously mentioned sampling tool "Oradebug short_Stack Profiler" for that. For those who are not familiar with oradebug - you need to attach to a process with oradebug and then you can perform several actions on/with that process (in my case i attached to a process that performed that particular INSERT of course).

That particular session always ran through the code path of function "ktspscan_bmb". Scan sounds like "Read" and bmb sounds like "Bitmap Block" to me. The table (and indexes) were stored in an ASSM tablespace which substantiated the suspicion. I will not dig into ASSM internals (check the references for more information about that), but basically said the BMB contains the information for finding free space, which would make sense in our INSERT example as well.

My suspicion was confirmed after checking MOS and finding several issues about ASSM and the function ktspscan_bmb. To make a long story short - my clients hit Oracle Bug #13641076. Luckily the patch was already available in a SBP and the issue could be solved here.

Summary

The wait interface and the session statistics are the "winner" in most cases, but if there is some really strange behavior for no obvious reason you need to dig deeper. Oradebug is very useful for troubleshooting or for understanding how Oracle works internally (if the OS does not provide the possibility).

If you have any further questions - please feel free to ask or get in contact directly, if you need assistance by troubleshooting Oracle database issues.

References

↧

[Oracle] SQL Execution Engine Part I - Join data by hashing alias Hash Joins

January 26, 2013, 12:34 am

Latest and popular articles on SAP ERP

≫ Next: [Oracle] SQL*Net researching - Setting Session Data Unit (SDU) size and how it can go wrong

≪ Previous: [Oracle] Advanced (performance) troubleshooting with oradebug and stack sampling

Introduction

We already took a (tiny) closer look at the cost based optimizer in my first series called "DB Optimizer". So i decided to start another tiny series about the SQL execution engine, which executes the calculated and generated execution plan. In the first part of this series i would like to cover a common ABAP coding "mistake", that i notice on regular basis and how it effects the execution / processing by using hash join(s) for two or more tables.

The Hash Join

Let's check what hash joins are about before we start to take a look at the execution of hash joins. The following explanation is extracted from the official oracle documentation:

Hash Join

A join in which the database uses the smaller of two tables or data sources to build a hash table in memory. In hash joins, the database scans the larger table, probing the hash table for the addresses of the matching rows in the smaller table.

The Query Optimizer - Hash Joins

The database uses hash joins to join large data sets. The optimizer uses the smaller of two tables or data sources to build a hash table on the join key in memory. It then scans the larger table, probing the hash table to find the joined rows.This method is best when the smaller table fits in available memory. The cost is then limited to a single read pass over the data for the two tables.

The optimizer uses a hash join to join two tables if they are joined using an equijoin and if either of the following conditions are true:

- A large amount of data must be joined.

- A large fraction of a small table must be joined.

Let's omit the topic "Why does the optimizer choose a Hash Join and not a Nested Loop (for example)" as we don't want to talk about the query optimizer this time. The documentation states an important point about how a hash join works, but misses some as well:

The smaller of the two tables (= data sources) is used to build a hash table in memory (PGA)
The hash table contains all rows (that means all select columns as well) from the "smaller table" (which is calculated by object statistics and filter expressions) organized by a hash value of the join column(s)
The larger table is probed against the hash table based on the hash value of the join column(s)

Basically said (without going too much in detail about the PGA and its work areas - check the references if you are interested into the internals about that) the amount of "used memory" for a hash join is mainly driven by the amount of data (join column + additional selected columns) that is extracted from the smaller table (there is some overhead for the second table as well, but let's disregard this at this point). We are talking about an optimal execution, if all of the data can be handled in memory and we don't need to swap it to the temporary tablespace. It is called "one pass execution", if we need to dump our hashed and probed data only once to the temporary tablespace. The last and worst case scenario is called "multi pass execution" in which we need to dump an re-read the data many times to/from the temporary tablespace.

The common ABAP coding mistake

ABAP development usually starts with tiny data examples and i see a lot of things like "SELECT * FROM <TAB1> <TAB2> ...." in custom coding even if not all columns are needed in further processing. The data is loaded into an internal table from the database and processed afterwards. In most ABAP cases this is not an issue, because of the joined data set is pretty small, but the amount of work that needs to be performed on the database level can be huge. Maybe this kind of coding is based on comfort or the data processing is not fully known at this point (something like "maybe i need this data later on"). However if the oracle database has already chosen a hash join for that small data examples it can be pretty fast, because of all the data fits into PGA memory or maybe a completely different execution plan is chosen for that small data set. But what happens if the the amount of data is getting larger or the query is executed by many users in parallel and/or the (PGA) memory is not sufficient anymore?

The (query) performance can drop immediately and most of you know the statement: "It is so slow, but we did not change anything"

So let's take a look at the impact of such "insufficient" written queries in case of hash join(s).

The demo

The following demos were run on an Oracle 11.2.0.3 database on OEL 6.2.

I will create two sample tables called TAB1 and TAB2, which are a copy from dba_objects and contain 74.553 rows each. These tables will be used for the hash join demo, but there will be no "smaller" data set in this example, because of no filter condition is applied. So Oracle will choose the "smaller hash table" based on the amount of selected columns.

I will run various SELECTs with the same amount of returned rows by each test query, but with different amount of columns. Please check my previous blogs for more details about the gather_plan_statistics hint, if you are not familiar with it. The execution plan statistics got a few more columns now, which i have not mentioned in my previous blog posts. So let's start with them first.

0Mem

Estimated memory (PGA) size for optimal execution (in memory)

1Mem

Estimated memory (PGA) size for one-pass execution (dump data once to temporary tablespace)

Used-Mem

Actual memory used by work area during last execution

Creating two test data sources

SQL> create table TAB1 as select * from dba_objects;
SQL> create table TAB2 as select * from dba_objects;

SQL> exec DBMS_STATS.GATHER_TABLE_STATS(NULL,'TAB1');
SQL> exec DBMS_STATS.GATHER_TABLE_STATS(NULL,'TAB2');

"SELECT *" example (SQL ID 3dqxu675n4p2k)

SQL> select /*+ gather_plan_statistics HASH-TEST */ *            from TAB1 T_00 join TAB2 T_01            on T_00.OBJECT_NAME = T_01.OBJECT_NAME;

Table TAB1 is used as data source for the memory hash table (in this case it makes no difference which table is the source, because of we select all columns from both tables and join on column OBJECT_NAME).
The memory hash table is organized by the join column OBJECT_NAME and contains all columns of table TAB1.
Rows from table TAB2 are incrementally loaded into memory and probed on the hash value of column OBJECT_NAME (after the complete hash table of TAB1 is built up).
13 MB memory of PGA memory is used for that operation.

"SELECT TAB1.<COL>, TAB1.<COL>, TAB1.<COL>, TAB2.<COL>" example (SQL ID 61y4ny5c9w034)

SQL> select /*+ gather_plan_statistics HASH-TEST-COL */ T_00.OWNER, T_00.OBJECT_NAME, T_00.CREATED, T_01.STATUS            from TAB1 T_00 join TAB2 T_01            on T_00.OBJECT_NAME = T_01.OBJECT_NAME;

Table TAB2 is used as data source for the memory hash table. This approach is reasonable, because of it needs much less memory for only two columns (join column OBJECT_NAME and STATUS) instead of four columns (join column OBJECT_NAME and OWNER, OBJECT_NAME, CREATED).
The memory hash table is organized by the join column OBJECT_NAME and contains column STATUS of table TAB2.
Rows from table TAB1 are incrementally loaded into memory and probed on the hash value of column OBJECT_NAME (after the complete hash table of TAB2 is built up).
7.6 MB memory of PGA memory is used for that operation.

"SELECT TAB1.<COL>, TAB2.*" example (SQL ID 8tdbtj560y5j0)

SQL> select /*+ gather_plan_statistics HASH-TEST-COL-TAB1 */ T_00.OWNER, T_00.OBJECT_NAME, T_00.CREATED, T_01.*            from TAB1 T_00 join TAB2 T_01            on T_00.OBJECT_NAME = T_01.OBJECT_NAME;

Table TAB1 is used as data source for the memory hash table. This approach is reasonable, because of it needs much less memory for only three columns (join column OBJECT_NAME and OWNER and CREATED) instead of all columns of table TAB2.
The memory hash table is organized by the join column OBJECT_NAME and contains the columns OWNER and CREATED of table TAB1.
Rows from table TAB2 are incrementally loaded into memory and probed on the hash value of column OBJECT_NAME (after the complete hash table of TAB1 is built up).
7.6 MB memory of PGA memory is used for that operation.

"SELECT TAB1.<COL>, TAB2.<COL>" example (SQL ID 2xpkj8w6h5djd)

SQL> select /*+ gather_plan_statistics HASH-TEST-2COL-TAB1 */ T_00.OWNER, T_01.OBJECT_NAME            from TAB1 T_00 join TAB2 T_01            on T_00.OBJECT_NAME = T_01.OBJECT_NAME;

Table TAB2 is used as data source for the memory hash table. This approach is reasonable, because of it needs much less memory for only one column (join column OBJECT_NAME) instead of two columns (join column OBJECT_NAME and OWNER).
The memory hash table is organized by the join column OBJECT_NAME and contains no additional columns.
Rows from table TAB1 are incrementally loaded into memory and probed on the hash value of column OBJECT_NAME (after the complete hash table of TAB2 is built up).
6.6 MB memory of PGA memory is used for that operation.

SQL> select SQL_ID, CHILD_NUMBER, round(LAST_MEMORY_USED/1024/1024,2) LAST_MEMORY_USED_MB, round(LAST_TEMPSEG_SIZE/1024/1024,2) LAST_TEMPSEG_SIZE_MB,                        round(MAX_TEMPSEG_SIZE/1024/1024,2) MAX_TEMPSEG_SIZE_MB,  MULTIPASSES_EXECUTIONS, ONEPASS_EXECUTIONS, OPTIMAL_EXECUTIONS, TOTAL_EXECUTIONS, OPERATION_TYPE, OPERATION_ID           from V$SQL_WORKAREA            order by LAST_MEMORY_USED desc, TOTAL_EXECUTIONS desc;

This query output is just another way to check the work area usage for different SQL statements (based on the cached SQLs in the shared pool).

Summary

As you can see the memory consumption for hash joins is mainly driven by the amount data that needs to be stored in the hash table. In our example the worst case for all columns was round about 13 MB and the best case (for the join column only) was round about 6.6. MB of PGA memory.

Now you maybe think .. man that's not much memory - why should we consider this at all. But you have to see this behavior in another context:

This was just a demo with a small table of 15 columns and 74.553 rows. Not all columns contain data like the tables in the SAP schema (with default value and not nullable).
I was the only user running this query on this tiny amount of data - just scale this up for a large (data) environment with multiple sessions running this query.
CPU resource is needed for creating and processing the "hash table"
You assign "a lot of" memory to the PGA (usually 20% for OLTP and 40% for OLAP systems), but a single work area is not able to use all of this memory (for the details check the PGA Memory Management Internals Document)

I hope you get the key points here and why it is important to select only the needed data from the database in case of a hash join.

"SELECT only the needed data and use the database engine as effective as possible."

If you have any further questions - please feel free to ask or get in contact directly, if you need assistance by troubleshooting Oracle database (performance) issues.

References

↧

[Oracle] SQL*Net researching - Setting Session Data Unit (SDU) size and how it can go wrong

February 7, 2013, 7:00 am

Latest and popular articles on SAP ERP

≫ Next: [Oracle] DB Optimizer Part IV - What the heck ... Troubleshooting why hints are not considered

≪ Previous: [Oracle] SQL Execution Engine Part I - Join data by hashing alias Hash Joins

Introduction

I come across this SDU issue from time to time by doing oracle database consulting work and so i think it is worth to write a blog about it. Basically this blog is about how to verify that the SDU setting is considered (or not) at all. It also covers some basics and SAP suggestions.

Lets' start with an explanation and some SAP information about the SDU, before we go on with researching and testing several settings.

Official Oracle 11g R2 documentation

Under typical database configuration, Oracle Net encapsulates data into buffers the size of the session data unit (SDU) before sending the data across the network. Oracle Net sends each buffer when it is filled, flushed, or when an application tries to read data. Adjusting the size of the SDU buffers relative to the amount of data provided to Oracle Net to send at any one time can improve performance, network utilization, and memory consumption. When large amounts of data are being transmitted, increasing the SDU size can improve performance and network throughput.

The amount of data provided to Oracle Net to send at any one time is referred to as the message size. Oracle Net assumes by default that the message size will normally vary between 0 and 8192 bytes, and infrequently, be larger than 8192 bytes. If this assumption is true, then most of the time, the data is sent using one SDU buffer.

The SDU size can range from 512 bytes to 65535 bytes. The default SDU for the client and a dedicated server is 8192 bytes. The default SDU for a shared server is 65535 bytes.

The actual SDU size used is negotiated between the client and the server at connect time and is the smaller of the client and server values. Configuring an SDU size different from the default requires configuring the SDU on both the client and server computers, unless you are using shared servers. For shared servers, only the client value must be changed because the shared server defaults to the maximum value.

SAP Information / documentation

SAPnote #562403 - FAQ: Oracle Net

What do the contents of tnsnames.ora look like?

For example:

< sid>.WORLD= (DESCRIPTION = (SDU = 32767) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = IPC) (KEY = <sid>)) (ADDRESS = (PROTOCOL = TCP) (HOST = <host>) (PORT = <port>))) (CONNECT_DATA = (SID = <sid>)))

.. contains the SDU (Session Data Unit) that defines the size of the data packets that were sent at session level. The maximum value is 32767 (Oracle 11.2.0.2 or higher: 65535). The larger the value selected, the fewer the number of packages that have to be sent when larger volumes of data are exchanged.

What do the contents of listener.ora look like?

SID_LIST_LISTENER = (SID_LIST = (SID_DESC = (SDU = 32767) (SID_NAME = <sid>) (ORACLE_HOME = /oracle/<sid>/817_64)))

... contains the SDU that is also contained in tnsnames.ora.

What must I take into account with regard to network performance between SAP and Oracle?

The smaller the SDU set in listener.ora or tnsnames.ora, the less data can be transferred for each network communication. To avoid unnecessary network communications if there are large volumes of data, you can increase the SDU in listener.ora and tnsnames.ora to 32767 (Oracle 11.2.0.2 or higher: 65535); see the sample files above. However, note that a higher SDU value may result in an overhead when lower volumes of data are transferred.

As of Oracle 9.2.0.4 or 10g, you can also adjust the default SDU size. To do this, you must set the DEFAULT_SDU_SIZE=<size_in_byte> parameter in sqlnet.ora. The maximum value is 32767 (Oracle 11.2.0.2 or higher: 65535).

SAPnote #618868 - FAQ: Oracle performance

It also makes sense to set the maximum SDU size in tnsnames. ora and listener.ora, and, if possible, to define the maximum DEFAULT_SDU_SIZE in sqlnet.ora (refer to Note 562403).

SAPnote #1100926 - FAQ: Network performance

When you use an Oracle database, ensure in accordance with Note 562403 that the Oracle Net configuration (TCP.NODELAY, SDU, DEFAULT_SDU_SIZE, and so on) is optimal.

This is a lot of information for the beginning, but it also covers a lot of important topics.

Basically we need to remember the following key-points:

Setting SDU size to a higher value can improve the performance by avoiding unnecessary network communication and better "throughput" by processing huge amount of data
The SDU size is negotiated on handshake between the client and server (listener) by connect and the lowest setting of both is used
Setting DEFAULT_SDU_SIZE is enough regarding the SAPnotes (by the way DEFAULT_SDU_SIZE is already set to 32768 by SAPinst)
You can set it to a max value of 32768 or 65535 bytes depending on the Oracle release (server and client)

The researching

My test environment is an Oracle database server and client (11.2.0.3.2) on OEL 6.2. It is important to mention both components, because of i have found out that different supported combinations (e.g. like Oracle 11g R2 server and 10g R2 client) work differently. However this researching is focused on how to verify the negotiated SDU size.

I want to provide some information about the listener trace content before we take a look at it:

In the first section you can see the connection data being passed in by the client and in the last section you see the line where the server reports what it can do. If two nsconneg lines show two different values for the SDU size - the client and server both volunteer information about what they can do, and the final result is the lower of the two offers.

Default setting (DEFAULT_SDU_SIZE and SDU parameters are not set in tnsnames.ora / listener.ora)

shell> lsnrctl set trc_level admin
shell> lsnrctl show trc_file            LISTENER parameter "trc_file" set to ora_1920_139698349319936.trc
shell> sqlplus system/<PASS>@<SID>

.. SDU size of 8192 bytes is used as default setting if nothing is set explicitly. This is also mentioned in the official Oracle documentation and works as expected.

Setting DEFAULT_SDU_SIZE to 32768 in sqlnet.ora (SDU parameters are not set in tnsnames.ora / listener.ora)

shell> lsnrctl stop
shell> lsnrctl start
shell> lsnrctl set trc_level admin
shell> lsnrctl show trc_file            LISTENER parameter "trc_file" set to ora_1985_139887209207552.trc
shell> sqlplus system/<PASS>@<SID>

.. SDU size of 32768 bytes is requested by the client, but the server is only able to work with 8192 bytes and as documented only 8192 bytes are used in this case.

Setting DEFAULT_SDU_SIZE to 32768 in sqlnet.ora and SDU parameter in listener.ora (SDU parameter is not set in tnsnames.ora)

The following listener.ora setting is done according SAPnote #562403:

SID_LIST_LISTENER = (SID_LIST = (SID_DESC = (SDU = 32768) (SID_NAME = <sid>) (ORACLE_HOME = /oracle/<sid>/<vers>)))

shell> lsnrctl stop
shell> lsnrctl start
shell> lsnrctl set trc_level admin
shell> lsnrctl show trc_file            LISTENER parameter "trc_file" set to ora_2139_140549120628480.trc
shell> sqlplus system/<PASS>@<SID>

.. SDU size of 32768 bytes is requested by the client, but the server is only able to work with 8192 bytes and as documented only 8192 bytes are used in this case. But what is wrong here? We follow the second SAP instruction (the first suggestion is about setting DEFAULT_SDU_SIZE and second suggestion is about set the SDU parameter in the listener.ora as well), but it still does not work as expected.

Setting DEFAULT_SDU_SIZE to 32768 in sqlnet.ora and SDU parameters in listener.ora and tnsnames.ora

This combination is not tested, because of the client already uses a SDU size of 32768 bytes and setting this in tnsnames.ora once again makes no difference at all.

Setting DEFAULT_SDU_SIZE to 32768 in sqlnet.ora and SDU parameter in listener.ora (SDU parameter is not set in tnsnames.ora) - Part II

Let's try another configuration, that is not mentioned in the Oracle or SAP documentation:

LISTENER = (DESCRIPTION_LIST = (DESCRIPTION = (SDU = 32768) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP) (HOST = <host>) (PORT = <port>) )) ))

shell> lsnrctl stop
shell> lsnrctl start
shell> lsnrctl set trc_level admin
shell> lsnrctl show trc_file            LISTENER parameter "trc_file" set to ora_2245_139786876253952.trc
shell> sqlplus system/<PASS>@<SID>

.. SDU size of 32768 bytes is requested by the client and the server is able to work with it as well. Now we are using a SDU size of 32768 bytes as wanted.

Summary

This blog should not provide a "recommended SDU setting" - it is much more about how you can verify the implemented settings (by SAP) on your own. Unfortunately the information that is provided by Oracle and SAP does not seem to be correct at all (anymore). Maybe the mentioned settings have worked in older Oracle versions (just check the mentioned ORACLE_HOME version in SAPnote #562403), but nowadays it seems to get even more complex. However the deployed default sqlnet.ora configuration (by SAPinst) is not sufficient at all for increasing the SDU size.

Don't believe everything you read - test and verify it on your own.

If you have any further questions - please feel free to ask or get in contact directly, if you need assistance by troubleshooting Oracle database issues.

References

↧

[Oracle] DB Optimizer Part IV - What the heck ... Troubleshooting why hints are not considered

February 9, 2013, 4:35 am

Latest and popular articles on SAP ERP

≫ Next: [Oracle] Unchain database and SQL performance by design (Row Chaining)

≪ Previous: [Oracle] SQL*Net researching - Setting Session Data Unit (SDU) size and how it can go wrong

Introduction

Today i received an e-mail from one of my clients about an issue with Oracle hints. I get asked similar questions from time to time, so i think it would be a good next topic in my "DB optimizer" series. This blog is focused on how to verify that an Oracle hint is considered by an execution plan compilation. It will provide a brief introduction into Oracle hints as well, but there is already a lot of information out there about hints and i don't want to spend too much time with it. This blog is not about providing recommendations for using hints or any particular hint syntax.

The Oracle Hint

Let's check the official Oracle documentation first.

A hint is an instruction to the optimizer. When writing SQL code, you may know information about the data unknown to the optimizer. Hints enable you to make decisions normally made by the optimizer, sometimes causing the optimizer to select a plan that it sees as higher cost.

Optimizer hints are grouped into the following categories:

Hints for Optimization Approaches and Goals
Hints for Enabling Optimizer Features
Hints for Access Paths
Hints for Join Orders
Hints for Join Operations
Hints for Online Application Upgrade
Hints for Parallel Execution
Hints for Query Transformations
Additional Hints

The important word in the previous text is "decision", but unfortunately it seems like it is interpreted wrong or skipped at all. However what does it mean?

Let's assume that you want to use a full table scan on a table and you provide the FULL hint for that. This does not mean "I want to use a full table scan, so search and generate an execution plan with it". It basically means "If you have to decide between an index and a full table scan - use the full table scan". So you are able to influence the optimizer with Oracle hints, if it has a valid choice and not in general.

You can query the view V$SQL_HINT, if you want to get a list of the available hints (with Oracle 11g R1 or newer).

The researching

The following demo is run on an Oracle 11.2.0.3 database with OEL 6.2. I will use a pretty simple example for demonstrating and troubleshooting Oracle hints.

Basically i will create a table with 1 column (null able) and an index on it, so that the optimizer has a valid choice (except IS NULL predicate).

SQL> create table HTAB (a number);
SQL> create index HTABI on HTAB(a);
SQL> begin
for i in 1 .. 300 loop  insert into HTAB values(i);
end loop;
end;
/
commit;

So let's run a simple SQL without any hint and check the execution plan.

... so far so good. The index HTABI is used, but we want to "force" a full table scan. So let's use the FULL hint for that.

... perfect. We run a full table scan now. But how can we be sure, that this execution plan was compiled due to our provided hint and not due to object changes like dropped indexes (just think about BW environments), new collected statistics or anything else.

The answers are:

CBO trace (Oracle event 10053)
SQL plan dumping (Oracle event 10132)
PL/SQL procedure DBMS_SQLDIAG.DUMP_TRACE

shell> oerr ora 10053
10053, 00000, "CBO Enable optimizer trace"
// *Cause:
// *Action:

shell> oerr ora 10132
10132, 00000, "dump plan after compilation"
// *Cause:
// *Action:  set this event only under the supervision of Oracle development

Now let's flush the shared pool and use one of these possibilities. (the flush of the shared pool is needed for hard parsing of the same SQL)

SQL> alter system flush shared_pool;
SQL> alter session set events '10053 trace name context forever, level 1';
SQL> select /*+ FULL(HTAB) */ * from HTAB where a = 100;
SQL> alter session set events '10053 trace name context off';
SQL> oradebug setmypid
SQL> oradebug tracefile_name
/oracle/T11/oratrace/diag/rdbms/t11/T11/trace/T11_ora_1565.trc

Search for "atom_hint" in this trace file and you will find the following section.

You can verify the following things in that trace file:

If a hint was detected (otherwise it is treated like an usual comment section and you will find no "atom_hint" for it all)
If its syntax was ok
If it was used

In our previous case we can see that the hint was detected, that the syntax was alright and that it was used.

Now let's add a conflicting INDEX hint to the SQL and verify it once again.

SQL> alter session set events '10053 trace name context forever, level 1';
SQL> select /*+ FULL(HTAB) INDEX (HTAB HTABI) */ * from HTAB where a = 100;
SQL> alter session set events '10053 trace name context off';
SQL> oradebug setmypid
SQL> oradebug tracefile_name
/oracle/T11/oratrace/diag/rdbms/t11/T11/trace/T11_ora_1780.trc

... so we maybe think our hint has worked (and the FULL hint was disabled / overridden, because it was provided first) as we run an index range scan, but let's check the trace file for verification.

So in this case the execution plan was compiled without any hint influence (like the first example without any hint at all), because of the optimizer was not able to interpret the conflicting instructions as reasonable and so it does not use any of these hints.

Summary

There are so many possibilities why hints are ignored or not considered, but now you are able to find the root cause with that troubleshooting procedure and trace files.

If you have any further questions - please feel free to ask or get in contact directly, if you need assistance by troubleshooting Oracle database issues.

References

Oracle Documentation - Using Optimizer Hints

↧

[Oracle] Unchain database and SQL performance by design (Row Chaining)

February 17, 2013, 1:47 am

Latest and popular articles on SAP ERP

≫ Next: [Oracle] The importance of a correct (performance) test environment and why LOBs don't generate the expected redo content

≪ Previous: [Oracle] DB Optimizer Part IV - What the heck ... Troubleshooting why hints are not considered

Introduction

Have you ever extended a SAP standard table with custom appends or includes and never thought about its performance impact (except data growth)? Then this blog should be worth to read. I have seen a bunch of such implementations after working several years in the SAP and Oracle performance area and most of the time it was not well known, that extending an Oracle table (to a specific limit) can have nasty side effects.

This blog will focus on the "Row Chaining" effect, that always occur from a specific implementation point and a short side trip to some used data types, which can increase this effect as well. I will not discuss the "Row Migration" effect, which can be mixed up very easily with "Row Chaining", but this can be solved by adjusting PCTFREE accordingly and reorganizing the table once - so not a real big deal. "Row Chaining" is much more difficult to avoid.

Data type usage

This blog section should be a short side trip only. You are mostly able to specify the column data type, if you add custom appends or includes to an existing SAP (standard) table. Let's just shortly verify two possible Oracle data types and the needed space for storing this data.

I have no idea why SAP uses the data type VARCHAR2 for numeric values only (like MATNR for example) and pads the content with zeros or does not make usage of NULL values (instead of using a space character as default value). Maybe this implementation is due to historical or compatibility reasons or whatever - i have no clue.

Let's verify the effect with VSIZE for the space usage in data blocks with a numeric only column like MATNR and the default used data type VARCHAR2(54).

SQL> create table VSIZET (MATNR_VARCHAR VARCHAR2(54),                                                   MATNR_NUMBER_LZERO NUMBER,                                                   MATNR_NUMBER NUMBER);
SQL> insert into VSIZET values ('000000002403870025',000000002403870025, 2403870025);

SQL> select vsize(MATNR_VARCHAR),                        vsize(MATNR_NUMBER_LZERO),                        vsize(MATNR_NUMBER)            from VSIZET;
VSIZE(MATNR_VARCHAR) VSIZE(MATNR_NUMBER_LZERO) VSIZE(MATNR_NUMBER)
-------------------- ------------------------- -------------------                      18                                   6                               6

The VARCHAR2 data type needs 3 times more space for storing the same information than a number data type (without leading zeros). We should go for this approach as we only need to store numeric values in column MATNR. So be careful by choosing the right ABAP DDIC data type for your data (and the referring data type on the Oracle database). This is not always possible (e.g. if you need to join data from an existing SAP table on the Oracle database layer, that already got a VARHCAR2 data type), but think about it if the custom information is independent. The wrong data type can increase the amount of needed space and in consequence migrated or chained rows.

Row Chaining

Let's check the official Oracle documentation about "Row Chaining" first:

Oracle Database must manage rows that are too large to fit into a single block. The following situations are possible:

The row is too large to fit into one data block when it is first inserted.

In row chaining, Oracle Database stores the data for the row in a chain of one or more data blocks reserved for the segment. Row chaining most often occurs with large rows. Examples include rows that contain a column of data type LONG or LONG RAW, a VARCHAR2(4000) column in a 2 KB block, or a row with a huge number of columns. Row chaining in these cases is unavoidable.

A row has more than 255 columns.

Oracle Database can only store 255 columns in a row piece. Thus, if you insert a row into a table that has 1000 columns, then the database creates 4 row pieces, typically chained over multiple blocks.

When a row is chained or migrated, the I/O needed to retrieve the data increases. This situation results because Oracle Database must scan multiple blocks to retrieve the information for the row. For example, if the database performs one I/O to read an index and one I/O to read a nonmigrated table row, then an additional I/O is required to obtain the data for a migrated row.

The first root cause ("The row is too large to fit into one data block when it is first inserted") of row chaining slightly appears in SAP environments as LONG RAW columns are not recommended anymore. However it can also happen with LOBs, if the stored data is less than 4000 bytes and IN-Line LOBs are enabled or with a lot of columns that need the corresponding space.

The second root cause ("A row has more than 255 columns") is "much more applicable", if SAP standard tables are extended with custom columns for additional business logics. Tables like VBAP or LIPS with 300 or more columns are very common (by custom extensions) in my experience.

Such implementations have two nasty side effects, that intensify each other as we are talking about additional I/Os.

Row Chaining occurs by every stored row
Oracle Advanced Compression can not be used on tables with more than 255 columns (SAPnote #1431296)

Fast growing and frequently accessed SAP core tables can not be compressed (to reduce logical and physical I/O) and have chained rows that force additional I/Os in consequence. Now just think about a table like VBAP, that is queried in high frequency by the SAP standard business logic and you have extended such table over this limit. You may notice the performance impact more and more (even if you are accessing VBAP by primary key most of the time) by every SELECT or DML that needs columns from both "row pieces" as your system keeps growing.

Here is a graphic about a chained row, before we go on with a short demonstration of its effect on logical and physical I/Os.

In this case there was not enough space to store the second row piece (e.g. more than 255 columns) in the same block and so it needs to be stored in another block. All row pieces of the next inserted row could be stored in the free space of the right block (regarding this graphic). It is called "intra-block row chaining", if all row pieces are stored in the same block. Chained rows can be caused by INSERTs or UPDATEs (and migrated at the same time) on the other hand migrated rows are caused by UPDATEs only.

Demonstration

The following demo was done on an Oracle 11.2.0.3 database on OEL 6.2.

I will create two tables (one with 255 and one with 256 columns) with an unique index for demonstrating the "row chaining" effect. Several rows are populated after the tables are created.

SQL> create table TAB_255 ( C1 VARCHAR2(10), C2 VARCHAR2(10) DEFAULT 'AAA', ... , C255 VARCHAR2(10) DEFAULT 'AAA');
SQL> create unique index TAB_255_I on TAB_255(C1);

SQL> create table TAB_256 ( C1 VARCHAR2(10), C2 VARCHAR2(10) DEFAULT 'AAA', ... , C256 VARCHAR2(10) DEFAULT 'AAA');
SQL> create unique index TAB_256_I on TAB_256(C1);

SQL> begin
for i in 1..1000 loop    insert into TAB_255(C1) values(i);    insert into TAB_256(C1) values(i);
end loop;
commit;
end;
/

SQL> exec dbms_stats.gather_table_stats(NULL, 'TAB_255');
SQL> exec dbms_stats.gather_table_stats(NULL, 'TAB_256');

SQL> select /*+ gather_plan_statistics */ * from TAB_255 where C1 = '200';

SQL> select /*+ gather_plan_statistics */ * from TAB_256 where C1 = '200';

We performed the same amount of physical I/Os right here (with an empty buffer cache), but one more logical I/O was needed for the table with 256 columns. This additional I/O is caused by intra-block row chaining. In such a case we don't need to read an additional physical block, because of all row pieces are stored in the same data block.

Let's check a block dump of table TAB_256 for verification:

tab 0, row 0, @0x1b71
tl: 1023 fb: -----L-- lb: 0x1  cc: 255
col  0: [ 3]  41 41 41
....
col 253: [ 3]  41 41 41
col 254: [ 3]  41 41 41
tab 0, row 1, @0x1b64
tl: 13 fb: --H-F--- lb: 0x1  cc: 1
nrid:  0x0041a7a5.0
col  0: [ 3]  39 38 35

It is verified that every row needs two row pieces. One row piece with just one column (C1 in our case) and one row piece with the other 255 columns.

Now you maybe wonder "just one more logical I/O - why should i worry about this?", but just think about index ranges scans, that extract more rowids from one or more leaf blocks and the corresponding table access with logical I/Os only (in best cases) or additional physical I/Os (in worst cases).

Unfortunately intra-block chaining is not tracked by the "table fetch continued row" statistic in my previous test scenario, but you can use the following query in your productive environment. Chained rows (over several blocks) should occur over the time due to archiving and usual data processing in such scenarios.

SQL> select s.name, m.value from v$statname s, v$mystat m            where s.statistic# = m.statistic#             and s.name in ('table fetch continued row', 'table scan rows gotten', 'table fetch by rowid')            order by s.name;

However we can check the extra work, that needs to be performed by a full table scan on TAB_256 in this test scenario.

SQL> select s.name, m.value from v$statname s, v$mystat m            where s.statistic# = m.statistic#             and s.name in ('table fetch continued row', 'table scan rows gotten', 'table fetch by rowid')            order by s.name;
NAME                                                                          VALUE
---------------------------------------------------------------- ----------
table fetch by rowid                                                     0
table fetch continued row                                           0
table scan rows gotten                                                72

SQL> select * from TAB_256;
...
1000 rows selected.

SQL> select s.name, m.value from v$statname s, v$mystat m            where s.statistic# = m.statistic#             and s.name in ('table fetch continued row', 'table scan rows gotten', 'table fetch by rowid')            order by s.name;
NAME                                                                          VALUE
---------------------------------------------------------------- ----------
table fetch by rowid                                                      0
table fetch continued row                                           166
table scan rows gotten                                                1072

The counter "table fetch continued row" is not increased and still zero, if you execute the same procedure on table TAB_255. But what does this statistics mean in our example here? The table itself only got 256 blocks allocated (and the HWM was at 168 blocks), but we needed 166 extra data block accesses (logical or physical I/O) for finding the individual pieces by fetching 1000 rows from table TAB_256.

Summary

Be careful, if you want to extend a SAP table that already have a lot of columns (nearly 255, 510, ...). Adding custom columns to such tables can have an impact on the performance by increasing logical or physical I/Os. Be also aware, that you can not use Oracle Advanced Compression (on tables with more than 255 columns) anymore, if you plan to extend a SAP table and hit the 255 column limit. These both effects can increase each other in large SAP systems.

Sometimes it is better to create a custom "Z*" table and join it to the SAP standard table(s), if you need to implement some custom business logic, that is not used as much as the standard logic. Running a little bit more logical or physical I/Os in just a few cases is even better as in general.

If you have any further questions - please feel free to ask or get in contact directly, if you need assistance by troubleshooting Oracle database performance issues.

References

↧

[Oracle] The importance of a correct (performance) test environment and why LOBs don't generate the expected redo content

March 19, 2013, 5:09 am

Latest and popular articles on SAP ERP

≫ Next: Cut down your storage cost with Oracle ACO (Advanced Compression Option)

≪ Previous: [Oracle] Unchain database and SQL performance by design (Row Chaining)

Introduction

In the past two weeks i was involved in an Oracle database and OS platform scalability and performance test for a non-SAP banking application. The application uses basic LOBs for storing XML messages and from time to time the end users notice an performance issue by writing the LOB data.

My client built up a (performance) test environment with the same Oracle database and OS version for simulating a large write load on basic LOBs with different settings, but the test results were not correlated to production or logical at all. So we started to discuss the environment and the LOB settings in production and how the test environment was built up. This blog focus on a special case about basic LOBs and its "redo generation" in both environments.

The LOB definition / issue

The application schema was deployed by the software vendor of the banking / finance application. It uses the default storage settings for basic LOB (in our case the important ones are "nocache" and in-line storage enabled). The XML message size varies between just a few bytes and more than 4000 bytes, so in-line and out-line storage is used.

The LOB data is written by direct path writes (nocache clause) by storing the LOB data in its "external" LOB segment (out-line storage with data larger than 4000 bytes). I immediately thought about caching the LOB content and eliminating the bottleneck of direct path writes in such response time critical environments, but my client mentioned, that they want to make use of less redo for direct written LOB data. They provided me a PDF document from an Oracle consultancy which basically states, that you nearly generate no redo for out-line stored LOBs by using the nocache clause. I was immediately in doubt with that statement, but the document includes a test case and the results were also reproducible in the (performance) test environment. I did not understand how Oracle is able to recover such LOB data, if no "full" redo record was written for it. So i decided to do further researching in my virtual environment with the same client database settings.

The LOB test case

The following tests were run with an Oracle 11.2.0.3 database on OEL 6.2.

General database and LOB settings

The database is running in "force logging" mode, which means that DML / DDL operations are recorded in the redo log files even if you specify the nologging clause or hint. The LOB itself is created with its default settings for cache, logging and in-line storage enabled.

The LOB test data

shell> dd if=/oracle/T11/11203/bin/oracle of=/tmp/data.dd bs=1024 count=5

SYS@T11> create or replace directory LOBDATA as '/tmp';
SYS@T11> grant read, write on directory LOBDATA to UTEST;

I created a binary file with 5120 bytes on OS level, that i want to store in the "external" LOB segment. I used this approach as SQL*Plus limits the LOB content size by its design.

First test case with Oracle database in NOARCHIVELOG mode (like the test environment of the client)

UTEST@T11>
DECLARE   SRC_FILE BFILE := BFILENAME('LOBDATA', 'data.dd');  BLOB_POINTER BLOB;
BEGIN  INSERT INTO TAB_LOB_NOCACHE VALUES (1,EMPTY_BLOB()) RETURNING DATA   INTO BLOB_POINTER;  DBMS_LOB.OPEN(SRC_FILE, DBMS_LOB.LOB_READONLY);  DBMS_LOB.LOADFROMFILE(dest_lob => BLOB_POINTER, src_lob => SRC_FILE,                                                          amount => DBMS_LOB.getLength(SRC_FILE));  DBMS_LOB.CLOSE(SRC_FILE);  commit;
END;
/

I ran a script called Snapper (by Tanel Poder) in a separate session while this PL/SQL procedure inserted one record into the table / external LOB segment. The following session statistics regarding redo were captured for one insert of round about 5120 bytes.

Oracle has written only 52 bytes of redo content for storing over 5000 bytes of LOB data. That sounds pretty less, right? So let's dump and verify the redo log content.

SYS@T11> select OBJECT_ID, DATA_OBJECT_ID from dba_objects                       where OBJECT_NAME = 'LOB_BASIC_NOCACHE';
 OBJECT_ID DATA_OBJECT_ID
---------- --------------     83384              83388

SYS@T11> select lf.MEMBER from v$logfile lf, v$log lo                       where lf.GROUP# = lo.GROUP# and lo.STATUS = 'CURRENT';
MEMBER
-------------------------------------------------------------------------
/oracle/T11/origlog/redo_T11_g2_01.log
/oracle/T11/mirrlog/redo_T11_g2_02.log

SYS@T11> alter system dump logfile '/oracle/T11/origlog/redo_T11_g2_01.log';

Let's verify the redo record length (hex 0x0034 = decimal 52) at first. You can see that only 52 bytes are used for that redo record, but we inserted much more LOB data with round about 5120 bytes. The interesting part is the section called "Direct Loader invalidate block range redo entry". Such a redo record is written in case of a nologging operation, but we have defined the LOB, the table and the tablespace with the logging clause and the database is running in "force logging" mode in addition.

We are not able to run a valid scalability and performance test on external LOB segments, if the Oracle database is running in NOARCHIVELOG mode in the (performance) test environment. Let's re-run this test with ARCHIVELOG mode for verification.

Second test case with Oracle database in ARCHIVELOG mode (like the productive environment of the client)

Oracle has written round about 8272 bytes of redo content for storing 5120 bytes of LOB data now. Round about 8000 bytes are reasonable as a complete LOB chunk is written (even if we only store 5120 bytes) by DML and i used the default chunk size of 1 data block (=8 kb) on my test database. Let's dump and verify the redo log content for completeness.

SYS@T11> select lf.MEMBER from v$logfile lf, v$log lo                       where lf.GROUP# = lo.GROUP# and lo.STATUS = 'CURRENT';
MEMBER
---------------------------------------------
/oracle/T11/origlog/redo_T11_g3_01.log
/oracle/T11/mirrlog/redo_T11_g3_02.log

SYS@T11> alter system dump logfile '/oracle/T11/origlog/redo_T11_g3_01.log';

Let's verify the redo record length again (hex 0x2050 = decimal 8272). You can see that 8272 bytes are used for that redo record even if you only inserted 5120 bytes of data (check the 00 values at the end of the redo record), but this works as designed. Now we have written a valid / full redo record and are able to recover the database to any point in time.

Thinking about test results

Why does Oracle behave like that? It seems to be strange from an application point of view, but it makes absolutely sense from a database technology point of view. You never have the possibility to perform a point in time in recovery, if you are running the Oracle database in NOARCHIVELOG mode. Oracle needs the redo log content for instance crash recovery only and the LOB data is already written to the data files in case of direct path writes. Oracle does not need to log a valid / full redo record for such operations (call it an optimization or whatever).

I verified my findings with the PDF document and find a foot note, that the database was running in NOARCHIVELOG mode as well. Unfortunately it seems like the author of this paper did not thought about his observations and its consequences as no productive Oracle database should run in NOARCHIVELOG mode and you generate the expected amount of redo (like full chunks), if you are running the Oracle database in ARCHIVELOG mode.

However my client tests its application with cached LOBs now as it was proved, that the original assumption (and its consequence) about less redo by non-cached LOBs was just wrong for any Oracle databases running in ARCHIVELOG mode.

Summary

You need to setup your test environment exactly the same way as your production, if you want to have reliable and meaningful test results. Even such inconsiderable settings like archive log mode (for a test environment) can impact the database behavior drastically. In the test case from above (with NOARCHIVELOG mode) you are able to produce much more LOB I/O load as it would ever be possible in production. You usually hit log buffer / log file sync issues much earlier in production before reaching the load of the test environment.

If you have any further questions - please feel free to ask or get in contact directly, if you need assistance by troubleshooting Oracle database performance issues.

References

Oracle Documentation - LOB storage parameters

↧

Cut down your storage cost with Oracle ACO (Advanced Compression Option)

April 15, 2013, 10:29 pm

Latest and popular articles on SAP ERP

≫ Next: [Oracle] RMAN (backup) performance with synchronous I/O dependent on OS limitations

≪ Previous: [Oracle] The importance of a correct (performance) test environment and why LOBs don't generate the expected redo content

As a customer and technical lead I was worrying about the rapid storage growth with our organic growth. Every time whenever there is a need to add new storage I was first thinking about "how to control and stop this as a long time approach". SAP has it's own design of saving data at database level and no matter of database type that the customer use.

Now a days database vendors are coming with many options where we can save the database in compressed mode to save $s on storage. In this rapid growing technological world people says storage is cheaper than anything, yes it is. But it has it's limitations as well where high storage always need high resources to run.

We pay big number of $s for our storage (may be customer specific, not for all), this might be becasue of low maintanance cost on other things (Yes we pay more for storage than other comparable ones).

SQL 2008 R2 - A new-gen technology with matured data store procedure delivered 86.5% of compression ratio on our each BW instance (It was a BIG saving for us)

Oracle 11gR2 - Advanced Compression Option, delivered 66.6% of compression ratio (3X) on our SAP R/3 (4.7) instance (Again a BIG saving)

It's a matured data store procedure, saves data in a magical way by creating local symbol tables where duplicates get disappears. Really an innovative thinking

Resulting in high savings on storage/cost and performance improvement becuase of reduced database size.

Data store mechanism:

82% of compression on top 10 tables:

ACO implemetation saved storage of 7+ TB in our landscape (P,Q and DR systems), it reduced the total database size from 2.7TB to 0.9TB where overall performance improved by 10%.

Now a days most of the Oracle customers are investing on ACO for immediate ROI (ACO need seperate license if you have your database license from Oracle).

For more information:

1289494 - FAQ: Oracle compression
1109743-Use of Index Key Compression for Oracle Databases
1436352 - Oracle 11g Advanced Compression for SAP Systems
1431296 - LOB conversion and table compression with BRSPACE

Enjoy

↧

[Oracle] RMAN (backup) performance with synchronous I/O dependent on OS limitations

May 21, 2013, 12:41 am

Latest and popular articles on SAP ERP

≫ Next: [Oracle] DB Optimizer Part V - Introduction of dynamic sampling and why it is used in SAP BI environments by (SAP) default

≪ Previous: Cut down your storage cost with Oracle ACO (Advanced Compression Option)

Introduction

Have you ever switched from an "Oracle flat file database backup" to RMAN and the read I/O throughput for the backup decreased drastically? Have you ever migrated an Oracle database from a platform that supports asynchronous I/O (like Linux or AIX) to a platform that supports synchronous I/O (like HP-UX) only and the read I/O throughput for the backup decreased drastically? .. or have you ever disabled asynchronous I/O due to performance issues and the read I/O throughput for the backup decreased drastically?

If you answer one of the questions with "Yes", then this blog is worth to read and get some more insights into synchronous I/O and RMAN backups.

I came across this issue recently as one of my clients had to disable asynchronous I/O on Solaris with ZFS due to I/O performance issues and high CPU usage. Basically the root cause for the issue looked like this (by running truss on the specific Oracle processes):

14051/1:     kaio(AIOWAIT, 0xFFFFFFFFFFFFFFFF)      Err#22 EINVAL
14051/1:     kaio(AIOWAIT, 0xFFFFFFFFFFFFFFFF)      Err#22 EINVAL
14051/1:     kaio(AIOWAIT, 0xFFFFFFFFFFFFFFFF)      Err#22 EINVAL

My client set the initialization parameter "DISK_ASYNCH_IO" to FALSE to avoid such aio errors and perform pread / pwrites only. In consequence the backup performance dropped drastically and the backup time windows were not sufficient anymore.

Let's start with an explanation and some SAP information about the DISK_ASYNCH_IO, before we go on with researching and performance measures.

Official Oracle 11g R2 documentation

Parameter DISK_ASYNCH_IO

DISK_ASYNCH_IO controls whether I/O to datafiles, control files, and logfiles is asynchronous (that is, whether parallel server processes can overlap I/O requests with CPU processing during table scans). If your platform supports asynchronous I/O to disk, Oracle recommends that you leave this parameter set to its default value. However, if the asynchronous I/O implementation is not stable, you can set this parameter to false to disable asynchronous I/O. If your platform does not support asynchronous I/O to disk, this parameter has no effect.

If you set DISK_ASYNCH_IO to false, then you should also set DBWR_IO_SLAVES to a value other than its default of zero in order to simulate asynchronous I/O.

1.3.4 DISK_ASYNCH_IO Initialization Parameter (HP-UX)

The DISK_ASYNCH_IO initialization parameter determines whether the database files reside on raw disks or file systems. Asynchronous I/O is available only with Automatic Storage Management disk group which uses raw partitions as the storage option for database files. The DISK_ASYNCH_IO parameter can be set to TRUE or FALSE depending on where the files reside. By default, the value is set to TRUE.

Note:

The DISK_ASYNCH_IO parameter must be set to FALSE when the database files reside on file system. This parameter must be set to TRUE only when the database files reside on raw partitions.

SAP Information / documentation

SAPnote #1431798 - Oracle 11.2.0: Database Parameter Settings

DISK_ASYNCH_IO FALSE (only on HP-UX, only for standard file systems, not for OnlineJFS(VxFS 5.x), not for ASM, not for raw devices, see SAP note 798194)

Performance researching

The following tests were performed on Solaris x86 with an attached enterprise SAN storage and an Oracle 11.2.0.3 database. RMAN VALIDATE was used for simulating the read I/O load by a RMAN database backup (basically the write phase of a RMAN backup is missing - for details check the graphic below). The backup was performed with one channel only (no parallel backup).

Let's start with some basic information about how RMAN performs a backup and how we can influence the behavior with different buffer sizes (without hidden parameters) or throughput. In our case we talk about the read and copy phase only as we perform a RMAN VALIDATE of the whole database.

Phases of a RMAN backup

Each channel (in our test case we have only one channel) reads the data into the input buffers, processes the data while copying it from the input buffers to the output buffers, and then writes the data from the output buffers to tape (the write phase is missing by a RMAN VALIDATE). Channel 1 writes the data to a locally attached tape drive, whereas channel 2 sends the data over the network to a remote media server.

Disk I/O Slaves

You can control disk I/O slaves by setting the DBWR_IO_SLAVES initialization parameter, which is not dynamic. The parameter specifies the number of I/O server processes used by the database writer process (DBWR). By default, the value is 0 and I/O server processes are not used. If asynchronous I/O is disabled, then RMAN allocates four backup disk I/O slaves for any nonzero value of DBWR_IO_SLAVES.

Allocation of Input Disk Buffers / Level of Multiplexing

Less than or equal to 4

The RMAN channel allocates 16 buffers of size 1 megabyte (MB) so that the total buffer size for all the input files is 16 MB.

Greater than 4 but less than or equal to 8

The RMAN channel allocates a variable number of disk buffers of size 512 kilobytes (KB) so that the total buffer size for all the input files is less than 16 MB.

Greater than 8

The RMAN channel allocates 4 disk buffers of 128 KB for each file, so that the total buffer size for each input file is 512 KB

MOS - RMAN Backup Performance [ID 360443.1]

Disk IO slaves should always be used to simulate asynchronous IO when native asynchronous IO is disabled. There are always 4 slaves spawned initially per channel but slaves will die if idle > 60 secs.

....

RMAN is designed to take advantage of asynchronous io. You cannot expect good performance of any kind if synchronous io is used in which case stop - implement slaves (Note 73354.1: RMAN: I/O Slaves and Memory Usage) or enable native async io and re-assess the situation after doing the backup again.

Let's test the various settings and check the corresponding system calls (to understand how it works internally) after we got all that detailed information.

RMAN VALIDATE with DISK_ASYNCH_IO = FALSE, MAXOPENFILES = 8 (Default Setting) and DBWR_IO_SLAVES = 0

RMAN> validate database;

shell> truss -p <PID> (<PID> of oracle shadow process for specific RMAN channel)

pread(279, "\0A2\0\0 @E0C101\0\0\0\0".., 524288, 0x3C080000) = 524288
pread(280, "06A2\0\0 @E0C102 ?\nBB01".., 524288, 0x3C080000) = 524288
pread(273, "06A2\0\080E00101 ' {\n\0".., 524288, 0x3C100000) = 524288
pread(274, "  A2\0\080E0 A02 9D39B\0".., 524288, 0x3C100000) = 524288
pread(275, "  A2\0\080E0 A01BA81EF0E".., 524288, 0x3C100000) = 524288
pread(276, "06A2\0\080E08101 dA9\t\0".., 524288, 0x3C100000) = 524288
pread(277, "06A2\0\080E08102 l BD2\0".., 524288, 0x3C100000) = 524288
pread(278, " EA2\0\080E00103 t Q 901".., 524288, 0x3C100000) = 524288

pread(279, "\0A2\0\080E0C101\0\0\0\0".., 524288, 0x3C100000) = 524288 
pread(280, "06A2\0\080E0C102FEF8BA01".., 524288, 0x3C100000) = 524288
pread(273, "06A2\0\0C0E00101 J {\n\0".., 524288, 0x3C180000) = 524288
pread(274, "06A2\0\0C0E0 A0215B296\0".., 524288, 0x3C180000) = 524288
pread(275, "06A2\0\0C0E0 A01 p98 i\r".., 524288, 0x3C180000) = 524288
pread(276, "06A2\0\0C0E08101CDAA\t\0".., 524288, 0x3C180000) = 524288
pread(277, "06A2\0\0C0E08102 jD4BE\v".., 524288, 0x3C180000) = 524288
....

8 database files are read simultaneously with system call pread and a request / read size of 512 kb, if you look closely at the open file handles (273 to 280) and the corresponding read size (524288).

RMAN VALIDATE with DISK_ASYNCH_IO = FALSE, MAXOPENFILES = 4 and DBWR_IO_SLAVES = 0

RMAN> run
{
allocate channel ch1 type disk maxopenfiles 4;
validate database;
}

shell> truss -p <PID> (<PID> of oracle shadow process for specific RMAN channel)

pread(275, "\0A2\0\080A7 I01\0\0\0\0".., 1048576, 0x134F00000) = 1048576
pread(276, "06A2\0\080A78901B69F A\v".., 1048576, 0x134F00000) = 1048576
pread(273, "06A2\0\0\0A8\t01B584 ,\0".., 1048576, 0x135000000) = 1048576
pread(274, "\0A2\0\0\0A8 I02\0\0\0\0".., 1048576, 0x135000000) = 1048576

pread(275, "\0A2\0\0\0A8 I01\0\0\0\0".., 1048576, 0x135000000) = 1048576
pread(276, "06A2\0\0\0A88901B69F A\v".., 1048576, 0x135000000) = 1048576
pread(273, "  A2\0\080A8\t018984 ,\0".., 1048576, 0x135100000) = 1048576
pread(274, "\0A2\0\080A8 I02\0\0\0\0".., 1048576, 0x135100000) = 1048576
....

4 database files are read simultaneously with system call pread and a request / read size of 1024 kb, if you look closely at the open file handles (273 to 276) and the corresponding read size (1048576).

RMAN VALIDATE with DISK_ASYNCH_IO = FALSE, MAXOPENFILES = 8 (Default Setting) and DBWR_IO_SLAVES = 4

RMAN> validate database;

shell> truss -p <PID> (<PID> of oracle shadow process for specific RMAN channel)

semtimedop(352321614, 0xFFFFFD7FFFDF7718, 1, 0xFFFFFD7FFFDF7720) = 0
semctl(352321614, 43, SETVAL, 1)                = 0
semtimedop(352321614, 0xFFFFFD7FFFDF7718, 1, 0xFFFFFD7FFFDF7720) = 0
semctl(352321614, 43, SETVAL, 1)                = 0
semtimedop(352321614, 0xFFFFFD7FFFDF7718, 1, 0xFFFFFD7FFFDF7720) = 0
semctl(352321614, 43, SETVAL, 1)                = 0
semtimedop(352321614, 0xFFFFFD7FFFDF7718, 1, 0xFFFFFD7FFFDF7720) = 0
semctl(352321614, 44, SETVAL, 1)                = 0
semtimedop(352321614, 0xFFFFFD7FFFDF7718, 1, 0xFFFFFD7FFFDF7720) = 0
semctl(352321614, 44, SETVAL, 1)                = 0
semtimedop(352321614, 0xFFFFFD7FFFDF7718, 1, 0xFFFFFD7FFFDF7720) = 0
semctl(352321614, 44, SETVAL, 1)                = 0
....

shell> truss -p <PID> (<PID> of one I/O slave process)

semctl(352321614, 38, SETVAL, 1)                = 0
semtimedop(352321614, 0xFFFFFD7FFFDFDCB8, 1, 0xFFFFFD7FFFDFDCC0) = 0
semtimedop(352321614, 0xFFFFFD7FFFDFDCB8, 1, 0xFFFFFD7FFFDFDCC0) = 0
pread(256, "06A2\0\080 > H02 N9A G\v".., 524288, 0x107D00000) = 524288
semctl(352321614, 38, SETVAL, 1)                = 0
semtimedop(352321614, 0xFFFFFD7FFFDFDCB8, 1, 0xFFFFFD7FFFDFDCC0) = 0
semtimedop(352321614, 0xFFFFFD7FFFDFDCB8, 1, 0xFFFFFD7FFFDFDCC0) = 0
pread(260, "06A2\0\080 >\b03 \938D01".., 524288, 0x107D00000) = 524288
semctl(352321614, 38, SETVAL, 1)                = 0
semtimedop(352321614, 0xFFFFFD7FFFDFDCB8, 1, 0xFFFFFD7FFFDFDCC0) = 0
semtimedop(352321614, 0xFFFFFD7FFFDFDCB8, 1, 0xFFFFFD7FFFDFDCC0) = 0
pread(263, "\0A2\0\0C0 >\b01\0\0\0\0".., 524288, 0x107D80000) = 524288
semctl(352321614, 38, SETVAL, 1)                = 0
....

No database files are read by the Oracle (RMAN) shadow process itself anymore. The I/O requests are "swapped out" to the I/O slaves and exchanged by memory operations (semctl, semtimedop). The I/O slaves perform the same pread system call as the "RMAN channel shadow process" before (check request / read size for example).

RMAN VALIDATE with DISK_ASYNCH_IO = FALSE, MAXOPENFILES = 4 and DBWR_IO_SLAVES = 4

RMAN> run
{
allocate channel ch1 type disk maxopenfiles 4;
validate database;
}

shell> truss -p <PID> (<PID> of one I/O slave process)

semctl(352321614, 38, SETVAL, 1)                = 0
semtimedop(352321614, 0xFFFFFD7FFFDFDCB8, 1, 0xFFFFFD7FFFDFDCC0) = 0
times(0xFFFFFD7FFFDFE2D0)                       = 233253482
times(0xFFFFFD7FFFDFE2D0)                       = 233253482
semtimedop(352321614, 0xFFFFFD7FFFDFDCB8, 1, 0xFFFFFD7FFFDFDCC0) = 0
pread(259, "06A2\0\0\0 /9601\v bEB02".., 1048576, 0x2C5E00000) = 1048576
semctl(352321614, 38, SETVAL, 1)                = 0
semtimedop(352321614, 0xFFFFFD7FFFDFDCB8, 1, 0xFFFFFD7FFFDFDCC0) = 0
semtimedop(352321614, 0xFFFFFD7FFFDFDCB8, 1, 0xFFFFFD7FFFDFDCC0) = 0
pread(257, "06A2\0\080 / V02 01998\0".., 1048576, 0x2C5F00000) = 1048576
semctl(352321614, 38, SETVAL, 1)                = 0
semtimedop(352321614, 0xFFFFFD7FFFDFDCB8, 1, 0xFFFFFD7FFFDFDCC0) = 0
semtimedop(352321614, 0xFFFFFD7FFFDFDCB8, 1, 0xFFFFFD7FFFDFDCC0) = 0
pread(256, "06A2\0\0\0 01601 %B0 w\t".., 1048576, 0x2C6000000) = 1048576
semctl(352321614, 38, SETVAL, 1)                = 0
semtimedop(352321614, 0xFFFFFD7FFFDFDCB8, 1, 0xFFFFFD7FFFDFDCC0) = 0
semtimedop(352321614, 0xFFFFFD7FFFDFDCB8, 1, 0xFFFFFD7FFFDFDCC0) = 0
pread(259, "06A2\0\0\0 0960119 yEB02".., 1048576, 0x2C6000000) = 1048576
semctl(352321614, 38, SETVAL, 1)                = 0
....

We have checked how Oracle handles the I/O requests internally with various settings, but how does this affect the I/O read throughput for a classic RMAN database backup?

The following table represents the results of the I/O read throughput with one channel only and DBWR_IO_SLAVES set to 4 (last column).

You can scale up the I/O read throughput to the enterprise storage maximum by parallel RMAN backup of course, but the main focus was on increasing the I/O throughput for each channel.

MAXOPENFILES / Input Buffer Size	Average synchronous I/O throughput without I/O slaves	Average synchronous I/O throughput with I/O slaves
8 / 512 k	49 MB per second	125 MB per second
4 / 1024 k	47 MB per second	206 MB per second

Summary

Using I/O slaves (for "simulating asynchronous I/O") can improve the I/O throughput drastically, if you are running an Oracle database on an OS platform, that does not support asynchronous I/O at all or if you need to disable asynchronous I/O due to various reasons (like bugs or file system designs).

If you have any further questions - please feel free to ask or get in contact directly, if you need assistance by troubleshooting Oracle database performance issues.

References

↧

[Oracle] DB Optimizer Part V - Introduction of dynamic sampling and why it is used in SAP BI environments by (SAP) default

June 4, 2013, 5:05 am

Latest and popular articles on SAP ERP

≫ Next: RMAN-20242: specification does not match any archived log in the repository

≪ Previous: [Oracle] RMAN (backup) performance with synchronous I/O dependent on OS limitations

Introduction

Have you ever wondered why SAP recommends different cost based optimizer settings (like init parameter OPTIMIZER_DYNAMIC_SAMPLING) for OLTP and OLAP systems? Have you ever wondered why Oracle is not clever enough to handle "complex" predicates very well? I see similar issues with Oracle databases in SAP or non-SAP based environments from time to time and so here we go with part 5 of my DB optimizer blog series.

This blog is focused on dynamic sampling and how it can effect the cardinality estimations in execution plans in several cases. It will also demonstrate a common advantage of dynamic sampling in SAP BI environments.

Dynamic sampling basics

We should clarify some dynamic sampling basics first, before we dig into a common use case of it. This section covers the basics of dynamic sampling only. Randolf Geist has already written a detailed blog series about dynamic sampling, its possibilities and how to enable it in several ways. Please check the reference section in this blog, if you want to dig deeper into it by reading the blog posts of Randolf Geist.

Oracle Documentation

With dynamic sampling, the database augments statistics by issuing recursive SQL to scan a small random sample of table blocks.

Dynamic sampling augments missing or insufficient optimizer statistics. Using dynamic sampling the optimizer can improve plans by making better estimates for predicate selectivity. Dynamic sampling can supplement statistics such as table block counts, applicable index block counts, table cardinalities (estimated number of rows), and relevant join column statistics.

13.6.2.2 When the Optimizer Uses Dynamic Sampling

During compilation, the optimizer decides whether to use dynamic sampling based on a number of factors, including whether the statements use parallel processing or serial processing.

For parallel statements, the optimizer automatically decides whether to use dynamic sampling and which level to use. The decision depends on the size of the tables and the complexity of the predicates. The optimizer expects parallel statements to be resource-intensive, so the additional overhead at compile time is worth it to ensure the best plan. The database ignores the OPTIMIZER_DYNAMIC_SAMPLING setting unless set to a nondefault value, in which case the value is honored.

For serially processed SQL statements, the dynamic sampling level depends on the value of the OPTIMIZER_DYNAMIC_SAMPLING parameter and is not triggered automatically by the optimizer. Serial statements are typically short-running, so that any overhead at compile time could have a huge impact on their performance.

In both the serial and parallel cases, the database performs dynamic sampling when existing statistics are not sufficient:

Missing statistics

When one or more of the tables in the query do not have statistics, the optimizer gathers basic statistics on these tables before optimization. In this case, the statistics are not as high-quality or as complete as the statistics gathered using the DBMS_STATS package. This tradeoff is made to limit the impact on the compile time of the statement.

Collected statistics cannot be used or are likely to lead to poor estimates

For example, a statement may contain a complex predicate expression, but extended statistics are not available (see "Extended Statistics"). Extended statistics help the optimizer get good quality cardinality estimates for complex predicate expressions. Dynamic sampling can compensate for the lack of extended statistics.

Note:

If no rows have been inserted, deleted, or updated in the table being sampled, then dynamic sampling is repeatable. This means that the optimizer generates the same statistics each time you run dynamic sampling against the table.

Please check the following table for more information about the dynamic sampling levels (the default level 2 and level 6 is used in SAP based environments by default): Dynamic Sampling Levels

** Footnote: The information about the block sample sizes is not entirely correct in this table, but we will see this in the CBO trace files later on.

SAP Documentation

SAPnote #1431798 - Oracle 11.2.0: Database Parameter Settings

OPTIMIZER_DYNAMIC_SAMPLING (OLTP: Do not set! / OLAP: 6)

Dynamic sampling can be activated on system level (init parameter), session level (init parameter), table level (hint DYNAMIC_SAMPLING) or on SQL level with the help of SQL profiles or SQL patches (hint opt_param) without modifying the SQL itself.

Let's summarize all that information related to SAP environments:

Dynamic sampling gathers statistics "on-the-fly", if there are no statistics at all or it gathers additional statistics in various scenarios (dependent on sampling level)
Dynamic sampling can have a negative impact on the performance due to recursive queries / sampling (typically for short running queries in OLTP environments and not for long running OLAP queries) while compiling SQLs
Dynamic sampling has the default value 2 in OLTP based environment and 6 in OLAP environments
Dynamic sampling can provide additional information for "complex" queries, if predicate values can be evaluated (bind variable peeking and adaptive cursor sharing is disabled in SAP based environment - keyword "literals" which are commonly used in BI queries)

Common use case

The following use case was created and run on an Oracle 11.2.0.3 database with OEL 6.2.

SYS@T11:133> create table DYNTEST (COUNTRY VARCHAR2(40), WERKS VARCHAR(20),                               TEXT VARCHAR(4000));
SYS@T11:133> exec DBMS_STATS.GATHER_TABLE_STATS('SYS', 'DYNTEST',                               method_opt => 'FOR ALL COLUMNS SIZE 1');

This example is lean on a common SAP BI scenario, but without the extra dimension tables and no parallel execution to keep it as simple as possible. Basically we have a table called DYNTEST with 3 columns (one for the country , one for the plant number and one for a various text). The most important point to notice is, that we have a correlation between the columns COUNTRY and WERKS (plant number).

Let's assume, that the plant number 1200 references to a plant in Munich.

In consequence 1000 data sets can only be relevant in combination with COUNTRY = 'DE' and WERKS = '1200' . There could never be a plant 1200 in the countries US or CH. We (as a human) know that there is a correlation, but the cost based optimizer does not know this by default (basic table / column statistics).

So let's check the cardinality estimation, if we combine both columns in the WHERE clause.

Default cardinality estimation (without extended statistics, histograms, dynamic sampling, etc.)

The optimizer is not able to calculate the right cardinality (it expects 273, but in reality there are 1000 rows returned), because of it does not know anything about the correlation. It uses its default formula for calculating the cardinality: 7650 rows * 0.25 (COUNTRY) * 0.142857143 (WERKS) = round(273.2142859875) = 273 rows

The optimizer is not able to calculate the right cardinality (it expects 273, but in reality there are 5000 rows returned), because of it does not know anything about the correlation. It uses its default formula for calculating the cardinality: 7650 rows * 0.25 (COUNTRY) * 0.142857143 (WERKS) = round(273.2142859875) = 273 rows

The optimizer is not able to calculate the right cardinality (it expects 273, but in reality there are 0 rows returned), because of it does not know anything about the correlation. It uses its default formula for calculating the cardinality: 7650 rows * 0.25 (COUNTRY) * 0.142857143 (WERKS) = round(273.2142859875) = 273 rows

Cardinality estimation with dynamic sampling set to 4

SYS@T11:144> alter session set optimizer_dynamic_sampling = 4;

The optimizer samples additional statistics "on-the-fly" to get an idea of the "complex" predicate and its correlation. The optimizer is closer to the real world (1133 rows to 1000 rows) after sampling and adjusting the estimation. It doesn't hit it exactly, but it is much better than before (273 estimated rows).

The optimizer samples additional statistics "on-the-fly" to get an idea of the "complex" predicate and its correlation. The optimizer is closer to the real world (4635 rows to 5000 rows) after sampling and adjusting the estimation. It doesn't hit it exactly, but it is much better than before (273 estimated rows).

The optimizer samples additional statistics "on-the-fly" to get an idea of the "complex" predicate and its correlation. The optimizer got it right (E-Rows = 1 means 1 or 0 rows) after sampling and adjusting the estimation.

Cardinality estimation with dynamic sampling set to 9

SYS@T11:144> alter session set optimizer_dynamic_sampling = 9;

The optimizer samples additional statistics "on-the-fly" to get an idea of the "complex" predicate and its correlation. The optimizer hit it exactly (E-Rows = 1000 / A-Rows 1000) after sampling and adjusting the estimation with a higher sample size.

The optimizer samples additional statistics "on-the-fly" to get an idea of the "complex" predicate and its correlation. The optimizer hit it exactly (E-Rows = 5000 / A-Rows 5000) after sampling and adjusting the estimation with a higher sample size.

The result is still correct, but the optimizer already got it correct with a smaller sample size.

The CBO trace for dynamic sampling level 4 (COUNTRY = 'US' and WERKS = '3400')

I will post only one CBO trace section for illustration here. I have chosen the query with COUNTRY = 'US' and WERKS='3400'.

In the CBO trace you will find the (recursive) dynamic sampling query, which is executed by compiling the SQL / generating the SQL execution plan (= overhead of dynamic sampling). You will also see the level (level: 4), but in case of level 4 you also find a little discrepancy with the official documentation. The documentation states, that 64 blocks will be sampled by level 4, but in reality the max sample block count is only 32 blocks. So in my case the table got 44 blocks (regarding the gathered basic statistics - in reality the segment consists of 48 blocks), but only 31 blocks are sampled dynamically.

Oracle calculated a new selectivity (0.60589319) after dynamic sampling has finished, which results in 4635 estimated rows in consequence (7650 rows * 0.60589319 = round(4635.0829035) = 4635 rows)

The CBO trace for dynamic sampling level 9 (COUNTRY = 'US' and WERKS = '3400')

I will post only one CBO trace section for illustration here. I have chosen the query with COUNTRY = 'US' and WERKS='3400'.

In the CBO trace you will find the (recursive) dynamic sampling query, which is executed by compiling the SQL / generating the SQL execution plan (= overhead of dynamic sampling). You will also see the level (level: 9), but in case of level 9 the official documentation and the reality does not match in full detail as well. The max sample block count is 4096 and not 4086 (like mentioned). However my demo table is much smaller and so only 44 blocks are sampled anyway.

Oracle calculated a new selectivity (0.60589319) after dynamic sampling the whole segment, which results in 5000 estimated rows in consequence (7650 rows * 0.65359477 = round(4999.9999905) = 5000 rows)

Summary

We just covered one possibility of how dynamic sampling can help the optimizer to get better estimations, but i think it is a pretty good example to get the key point. You can also create extended statistics (>= Oracle 11) to get the similar effect (without sampling overhead), but you need to know all possible query combinations to create the corresponding extended statistics. This would be possible on a limited number of possibilities, but just think about dynamic BEx queries with drill downs and so on - nearly impossible.

Check out one of my previous blogs, if you don't know why it is so important, that the cost based optimizer estimates the correct cardinality.

Dynamic sampling has its limitations and issues as well, but this would go too far right now (check the mentioned blog posts of Randolf Geist, if you want to know more about the different scenarios and solutions).

If you have any further questions - please feel free to ask or get in contact directly, if you need assistance by troubleshooting Oracle database performance issues.

References

↧

RMAN-20242: specification does not match any archived log in the repository

June 4, 2013, 6:39 am

Latest and popular articles on SAP ERP

≫ Next: [Oracle] A deeper insight into stack tracing / sampling for a better understanding and troubleshooting of Oracle issues

≪ Previous: [Oracle] DB Optimizer Part V - Introduction of dynamic sampling and why it is used in SAP BI environments by (SAP) default

I played around today on a test system with backups and had a problem that could help someone.

I restored the database using POINT IN TIME. The full and incremental restore was completed, I had problems with archive logs as they get backed up to disk and then after wards to tape I could not restore them using BRTOOLS at first.

I got the error below

RMAN-00571: ===========================================================

RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

RMAN-00571: ===========================================================

RMAN-03002: failure of available command at 06/04/2013 13:50:55

RMAN-20215: backup set not found

RMAN-06159: error while looking up backup set

RMAN>

released channel: ORA_MAINT_DISK_1

RMAN> 2> 3>

allocated channel: dsk

channel dsk: SID=150 device type=DISK

Starting restore at 04-JUN-2013-13:50:55

released channel: dsk

RMAN-00571: ===========================================================

RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

RMAN-00571: ===========================================================

RMAN-03002: failure of restore command at 06/04/2013 13:50:55

RMAN-20242: specification does not match any archived log in the repository

I retrieve the backup sets from tape using netbackup

Once this is completed you need to add the archives backup into the repository

rman

connect target;

catalog backuppiece '/tmp/SIDarch1.dbf.S';

cataloged backup piece

Once this is completed you can use BRRESTORE to restore the files from the backup set and apply the logs

just my thoughts

↧

[Oracle] A deeper insight into stack tracing / sampling for a better understanding and troubleshooting of Oracle issues

June 6, 2013, 2:45 am

Latest and popular articles on SAP ERP

≫ Next: [Oracle] Wrong SQL_ID in view V$SORT_USAGE / V$TEMPSEG_USAGE and how to handle it in 11.2.0.2 or higher

≪ Previous: RMAN-20242: specification does not match any archived log in the repository

Introduction

I am currently working on my blog backlog or on questions / requests of my followers and it seems that one of my previous blogs [Oracle] Advanced (performance) troubleshooting with oradebug and stack sampling raised similar questions about the stack tracing / sampling method.

The two most recurring questions about the last part ("3. Using stack traces or system call / signal traces") of this blog are similar to the following ones:

What is the difference between a call stack trace and a system call trace?
How do you know that the function "ktspscan_bmb" was the problematic one and not the previous recurring functions like "ksedsts, ksdxfstk or ksdxcb and so on"?

.. so this blog will be about the details of stack tracing / sampling of Oracle database processes on Linux operating systems (in my case Oracle Enterprise Linux with unbreakable enterprise kernel 2.6.39-100.7.1.el6uek.x86_64) and how the different stack sampling methods can influence the output.

Question 1: "What is the difference between a call stack trace and a system call trace?"

The best starting point is the (official) documentation before demonstrating the difference between a call stack trace and system call trace.

System call

The system call is the fundamental interface between an application and the Linux kernel.

System calls and library wrapper functions

System calls are generally not invoked directly, but rather via wrapper functions in glibc (or perhaps some other library). For details of direct invocation of a system call, see intro(2). Often, but not always, the name of the wrapper function is the same as the name of the system call that it invokes. For example, glibc contains a function truncate() which invokes the underlying "truncate" system call.

Often the glibc wrapper function is quite thin, doing little work other than copying arguments to the right registers before invoking the system call, and then setting errno appropriately after the system call has returned.

Note: system calls indicate a failure by returning a negative error number to the caller; when this happens, the wrapper function negates the returned error number (to make it positive), copies it to errno, and returns -1 to the caller of the wrapper.

Sometimes, however, the wrapper function does some extra work before invoking the system call. For example, nowadays there are (for reasons described below) two related system calls, truncate(2) and truncate64(2), and the glibc truncate() wrapper function checks which of those system calls are provided by the kernel and determines which should be employed.

Call Stack

A call stack is the list of names of methods called at run time from the beginning of a program until the execution of the current statement.

A call stack is mainly intended to keep track of the point to which each active subroutine should return control when it finishes executing. Call stack acts as a tool to debug an application when the method to be traced can be called in more than one context. This forms a better alternative than adding tracing code to all methods that call the given method.

So the main difference (in our context) is that the call stack trace includes the called methods or functions of an application (Oracle process) and that the system call trace includes only the (function) requests to the operating system kernel.

Let's demonstrate the difference with a tiny example by running a simple SELECT statement, that reads a lot of data from disk. I will use strace (for tracing the system calls) and pstack (which is a wrapper script for gdb for tracing the call stack) as the Oracle database is running on Linux. I am not using oradebug (for tracing the call stack) right now, because of oradebug is behaving differently by grabbing the call stack (more details about this are included in the answer of question 2).

SYS@T11:133> create table READTEST as select * from DBA_SOURCE;
SYS@T11:133> insert into READTEST SELECT * FROM READTEST;
SYS@T11:133> insert into READTEST SELECT * FROM READTEST;
SYS@T11:133> alter system flush buffer_cache;

shell> ps -fu orat11 | grep LOCAL=NO
orat11    1690     1  2 12:00 ?        00:00:02 oracleT11 (LOCAL=NO)

Let's run a SELECT statement in client session 113 with Oracle shadow process pid 1690 on table READTEST now. In parallel i will run strace with option "-cf" (to get a summary of the sys calls at the end) and do the same procedure with pstack on process id 1690 afterwards. Be careful - pstack will have no backtrace information, if you run strace and pstack at the same time on the same pid.

SYS@T11:133> alter session set "_serial_direct_read" = TRUE;
SYS@T11:133> select count(*) from READTEST;

Output pstack (samples of the stack)

Output strace (summary of sys calls)

Now you can clearly see the difference between a system call trace and call stack trace. The system call trace (strace) is missing the whole oracle functions (= application implementation). You just see the used kernel functions like gettimeofday (= get time) or pread (= read from a file handle/descriptor), but not the application code or function that invoked them.

Otherwise you see the whole call stack (including possible system calls) with pstack and which (C-)functions are called in the corresponding order.

A call stack needs to be read bottom up - regarding the previous stack trace example (a C-program always starts with the main function):

main -(calls)-> ssthrdmain -(calls)->opimai_real -(calls)-> sou2o -(calls)-> opidrv -(calls)-> … -(calls)-> gettimeofday (the last function is the currently executed code part of the program)

Question 2: "How do you know that the function "ktspscan_bmb" was the problematic one and not the later recurring functions like "ksedsts, ksdxfstk or ksdxcb and so on"?

Before i start to explain and demonstrate it - let's re-add the corresponding stack traces of my previous blog.

The question is pretty meaningful as i previously mentioned "the last function is the currently executed code part of the program". So the assumption that the function "ksedsts" is the problematic one is pretty close, but i marked the function ktspscan_bmb as the indicator. So how do come to this conclusion?

The answer is hidden in the implementation of oradebug. It sends a signal (SIGUSR2) to the corresponding process, if you issue a "short_stack" trace request. The function sspuser() is the handler for signal SIGUSR2 and runs further code (depending on the request) or in other words "everything to the left / above of function sspuser() is caused by dumping the stack via oradebug and not relevant for troubleshooting". In consequence you will alter the usual code path of Oracle, if you run a "short_stack" strace with oradebug. OS tools behave differently as they usually suspend the process and dump the stack. You will not find any special code path in such cases.

Let's do a short demonstration with oradebug (stack_trace and system state dump) and pstack (gdb) to see the difference.

I just connect to the Oracle database with SQL*Plus (Oracle shadow process pid 10676) and run an oradebug stack_trace (from a different session) and pstack on this OS pid. (the session is just in idle mode and waiting for input by SQL*Net).

As you can see oradebug has altered the code path for dumping the stack trace. The OS tool (in my case pstack / gdb) does not need to do this and just dumped the call stack. As a last example let's run a system state dump (with call stacks) for cross-checking the code path in such cases.

SYS@T11:16> oradebug setmypid
SYS@T11:16> oradebug dump systemstate 257

The following screenshot is just an extract of the whole trace file, but you can always find the corresponding modified code path of Oracle (except for the process that initiated the system state dump).

Summary

I hope that this blog clears the doubts and questions about call stack tracing and system call tracing. There are some risks by getting call stack traces as well, but Tanel Poder has already written several blogs about the different possibilities and impacts. Please check out his blogs in the reference section, if you want to know more about the risks on several operating systems and how to avoid them (when possible).

If you have any further questions - please feel free to ask or get in contact directly, if you need assistance by troubleshooting Oracle database (performance) issues.

References

↧

[Oracle] Wrong SQL_ID in view V$SORT_USAGE / V$TEMPSEG_USAGE and how to handle it in 11.2.0.2 or higher

June 10, 2013, 8:43 am

Latest and popular articles on SAP ERP

≫ Next: [Oracle] DB Optimizer Part VI - Effects of disabled bind variable peeking, adaptive cursor sharing and cardinality feedback on the CBO in SAP environments

≪ Previous: [Oracle] A deeper insight into stack tracing / sampling for a better understanding and troubleshooting of Oracle issues

Introduction

This blog is just a short one (with demo) about a known issue / bug with the views V$SORT_USAGE and V$TEMPSEG_USAGE (both synonyms are referring to view V_$SORT_USAGE). I already replied on a SCN thread and pointing to this issue, but it should be much clearer with a short test case.

Basically this issue happens most of the time by using temporary LOB segments (implicitly or explicitly). Now you maybe wonder - when or in which scenarios does SAP use temporary LOB segments? Let's check the corresponding SAP documentation for this first.

SAPnote #500340 - FAQ: LOBS

17. Where do temporary LOBs come from?

Under some circumstances it can happen that PSAPTEMP is filled with a significant amount of temporary LOBs. This can be verified with the V$TEMPORARY_LOBS view. These temporary LOBs are created in SAP J2EE due to technical reasons (guarantee of atomicity of LOB creation) whenever a new LOB is created. In situations with a high LOB creation rate (e.g. when tables with LOBs are imported in the database) this behaviour can be responsible for a significant filling level or an overflow of PSAPTEMP.

SAPnote #659946 - FAQ: Temporary tablespaces

Temporary LOBs (for example in J2EE environments)

With BW 2. x, permanent objects are also created in the temporary tablespace in some situations (see Note 216440).

You probably notice this issue in WLS (Oracle WebLogic Server) environments heavily, if JDBC connection pool testing is enabled (WebLogic Documentation). You find temporary LOB segments (if created) with a reference SQL statement like "SELECT 1 from DUAL", which obviously can not be true.

Demo

The following demo is performed with an Oracle database (11.2.0.3.2) on OEL 6.2.

I will create two SQL*Plus sessions. One that is running the application SQL statements (in my case a temporary LOB creation and a SELECT) and one that is used to query the views V$SORT_USAGE and V$TEMPSEG_USAGE.

Initial situation

-- Monitoring session (SID 15)
SYS@T11:15> select * from V$SORT_USAGE;
no rows selected

SYS@T11:15> select * from V$TEMPSEG_USAGE;
no rows selected

Create temporary LOB segment (= application behavior)

-- Working session (SID 138)
SYS@T11:138> declare  TEST_BLOB BLOB;
begin  DBMS_LOB.CREATETEMPORARY(TEST_BLOB,TRUE, DBMS_LOB.SESSION);
end;
/

Check views V$SORT_USAGE and V$TEMPSEG_USAGE

-- Monitoring session (SID 15)
SYS@T11:15> select * from V$SORT_USAGE;
SYS@T11:15> select * from V$TEMPSEG_USAGE;
SYS@T11:15> select SQL_TEXT from V$SQL where SQL_ID = 'cqr10p45ppn1j';

.. everything is correct so far. Both views show the correct SQL_ID, if no other SQL was running afterwards. So let's run a simple SELECT in session 138 after the creation of the temporary LOB segment.

Further SQL

-- Working session (SID 138)
SYS@T11:138> select 1 from dual;

Re-check views V$SORT_USAGE and V$TEMPSEG_USAGE

-- Monitoring session (SID 15)
SYS@T11:15> select * from V$SORT_USAGE;
SYS@T11:15> select * from V$TEMPSEG_USAGE;
SYS@T11:15> select SQL_TEXT from V$SQL where SQL_ID = '520mkxqpf15q8';

Now you see the SQL_ID for the simple SELECT, but this obviously can not be true. A SELECT statement like "select 1 from dual" can not force a temporary lob segment creation or even use it. There seems to be something wrong. Kerry Osborne has already described this problem in one of his blog posts (check the reference section).

He also posted a snippet from an Oracle bug description:

"It looks like this really needs a larger change – something like capturing the SQL_ID etc.. at the time that the temp seg gets created and then exposing that information through some new X$ colums in x$ktsso?"

.. and this change was introduced with Oracle version 11.2.0.2, but unfortunately the views were not adjusted until yet.

Use custom query with new column in table x$ktsso

-- Monitoring session (SID 15)
SYS@T11:15> select k.inst_id "INST_ID", ktssoses "SADDR", sid "SID", ktssosno "SERIAL#", username "USERNAME", osuser "OSUSER",
ktssosqlid "SQL_ID", ktssotsn "TABLESPACE", decode(ktssocnt, 0, 'PERMANENT', 1, 'TEMPORARY') "CONTENTS",
decode(ktssosegt, 1, 'SORT', 2, 'HASH', 3, 'DATA', 4, 'INDEX', 5, 'LOB_DATA', 6, 'LOB_INDEX' , 'UNDEFINED') "SEGTYPE",
ktssofno "SEGFILE#", ktssobno "SEGBLK#", ktssoexts "EXTENTS", ktssoblks "BLOCKS", round(ktssoblks*p.value/1024/1024, 2) "SIZE_MB",
ktssorfno "SEGRFNO#"
from x$ktsso k, v$session s, v$parameter p
where ktssoses = s.saddr and ktssosno = s.serial#  and p.name = 'db_block_size'
order by sid;

Notice how the correct SQL_ID "cqr10p45ppn1j" is returned now (like running the first query on the views V$SORT_USAGE and V$TEMPSEG_USAGE), even if further SQL statements were run afterwards in this session.

** Side note: Implicit temporary LOB creation

In the previous test case we have created a temporary LOB segment explicitly with PL/SQL package DBMS_LOB. However it is also possible that you are creating and using temporary LOB segments implicitly by executing conversion functions like TO_CLOB.

For example:

SQL> SELECT TO_CLOB('TEST TEXT') FROM dual;

Summary

Be careful and think twice, if you notice a temporary space usage by SQL statements that makes no sense at all.

If you have any further questions - please feel free to ask or get in contact directly, if you need assistance by troubleshooting Oracle database issues.

References

↧

[Oracle] DB Optimizer Part VI - Effects of disabled bind variable peeking, adaptive cursor sharing and cardinality feedback on the CBO in SAP environments

June 13, 2013, 8:22 am

Latest and popular articles on SAP ERP

≫ Next: [Oracle] Myths and common misconceptions about (transparent) huge pages for Oracle databases (on Linux) uncovered

≪ Previous: [Oracle] Wrong SQL_ID in view V$SORT_USAGE / V$TEMPSEG_USAGE and how to handle it in 11.2.0.2 or higher

Introduction

We already talked about dynamic sampling, hints, histograms or extending and interpreting execution plans in the previous blog posts of my Oracle DB Optimizer series (CBO). In many cases i have written something like this "we will use literals for simplification" or "we will use literals, because of SAP treats bind variables very rudimentarily".

So this blog is about using bind variables in a SAP environment and how it effects the CBO calculations / decisions. The first important point to mention is, that bind variables are absolutely necessary in high transactional OLTP environments to avoid a high hard parsing rate and the corresponding latches/locks and CPU usage.

Bind variables have their side effects as well, but Oracle engineered several optimizer enhancements to workaround or avoid them. Before we start with the demo cases - let's check the official documentation first.

Oracle Documentation

Bind Variable

A placeholder in a SQL statement that must be replaced with a valid value or value address for the statement to execute successfully. By using bind variables, you can write a SQL statement that accepts inputs or parameters at run time.

Bind Variable Peeking

In bind variable peeking (also known as bind peeking), the optimizer looks at the value in a bind variable when the database performs a hard parse of a statement.

When a query uses literals, the optimizer can use the literal values to find the best plan. However, when a query uses bind variables, the optimizer must select the best plan without the presence of literals in the SQL text. This task can be extremely difficult. By peeking at bind values the optimizer can determine the selectivity of a WHERE clause condition as if literals had been used, thereby improving the plan.

Adaptive Cursor Sharing

The adaptive cursor sharing feature enables a single statement that contains bind variables to use multiple execution plans. Cursor sharing is "adaptive" because the cursor adapts its behavior so that the database does not always use the same plan for each execution or bind variable value.

For appropriate queries, the database monitors data accessed over time for different bind values, ensuring the optimal choice of cursor for a specific bind value. For example, the optimizer might choose one plan for bind value 9 and a different plan for bind value 10. Cursor sharing is "adaptive" because the cursor adapts its behavior so that the same plan is not always used for each execution or bind variable value.

Note that adaptive cursor sharing does not apply to SQL statements containing more than 14 bind variables.

Cardinality feedback

Cardinality feedback was introduced in Oracle Database 11gR2. The purpose of this feature is to automatically improve plans for queries that are executed repeatedly, for which the optimizer does not estimate cardinalities in the plan properly. The optimizer may misestimate cardinalities for a variety of reasons, such as missing or inaccurate statistics, or complex predicates. Whatever the reason for the misestimate, cardinality feedback may be able to help.

During the first execution of a SQL statement, an execution plan is generated as usual. During optimization, certain types of estimates that are known to be of low quality (for example, estimates for tables which lack statistics or tables with complex predicates) are noted, and monitoring is enabled for the cursor that is produced. If cardinality feedback monitoring is enabled for a cursor, then at the end of execution, some of the cardinality estimates in the plan are compared to the actual cardinalities seen during execution. If some of these estimates are found to differ significantly from the actual cardinalities, the correct estimates are stored for later use. The next time the query is executed, it will be optimized again, and this time the optimizer uses the corrected estimates in place of its usual estimates.

SAP Documentation

SAPnote #1431798 - Oracle 11.2.0: Database Parameter Settings

_OPTIM_PEEK_USER_BINDS FALSE (Note 755342)

_OPTIMIZER_ADAPTIVE_CURSOR_SHARING FALSE

_OPTIMIZER_EXTENDED_CURSOR_SHARING_REL NONE

_OPTIMIZER_USE_FEEDBACK FALSE

SAPnote #755342 - Incorrect execution plans with bind variable peeking

From time to time, you may notice incorrect execution plans for SQL statements that otherwise run without any problems. After executing the command "ALTER SYSTEM FLUSH SHARED_POOL;" and other actions that trigger additional parsing, the system often selects a different path.

Bind variable peeking is executed, that is, Oracle only decides on an execution plan once it knows the first value of the bind variable.

The phenomenon only occurs as of Oracle Version 9.2 in connection with SAP R3 Release 6.1 or higher, because the Oracle C-interface Version 8 (OCI8) is used there. OCI7 is used for earlier kernel releases. OCI7 does not permit any bind variable peeking.

This way of determining the execution plan is used as standard

- whenever histograms exist for the columns of the WHERE condition

- whenever there are statistics without histograms for these columns and the operators > or < (or <= and >=) are used or when the columns are marked with "=", but the value lies outside of the min/max area of the relevant column.

To deactivate bind variable peeking, select

_optim_peek_user_binds = false

in init<sid>.ora and restart.

In older SAP kernel releases (lower than 6.1), the system does not take the value of the parameter into account. However you should still set it to avoid problems during upgrades. You can ignore any messages that the system may issue in Early Watch reports for these versions.

Aren't these Oracle CBO enhancements great? Yes they are (except bugs of course) and they also make several workarounds in non-SAP Oracle environments unnecessary, but unfortunately all of these Oracle features are disabled in a SAP environment by default (referring to the hidden parameters in SAPnote #1431798 from above).

Why? I know that there were several (nasty) ACS bugs in the starting time of Oracle 11 g, but i guess that the main reason for SAP nowadays is that it makes (performance) troubleshooting more complex and so the clients and SAP support would need much more know how in this area. Finally i really don't know.

How does the cost based optimizer behave in a SAP environment when there is no bind variable peeking, adaptive cursor sharing or cardinality feedback? The following demo will show the impact in several cases to get an idea of the basics.

The demo

The following demo was run with an Oracle database (11.2.0.3.2) on OEL 6.2. The following test case is a really simple one, but it illustrates the impact clearly.

Build up test data

SYS@T11:13> create table BINDTEST (num NUMBER, text VARCHAR2(40));

SYS@T11:13> begin
for i in 1 .. 10000 loop  insert into BINDTEST values (i, 'TEST TEXT TO FILL');
end loop;
commit;
end;
/

SYS@T11:13> exec DBMS_STATS.GATHER_TABLE_STATS('SYS', 'BINDTEST',                             method_opt => 'FOR ALL COLUMNS SIZE 1');

I created a table called BINDTEST with two columns (NUM and TEXT). The values of column NUM are distributed equally (1 to 10.000 for 10.000 rows) and the database knows all about it after the statistic collection. Column TEXT is irrelevant in the following test cases - i just used it to increase the table size a bit. Column NUM could be something like a order or idoc number in a SAP environment.

SQL*Plus environmental and session settings

SYS@T11:13> var A0 NUMBER;
SYS@T11:13> var A1 NUMBER;
SYS@T11:13> var A2 NUMBER;
SYS@T11:13> exec :A0 := 2000;
SYS@T11:13> exec :A1 := 8000;
SYS@T11:13> exec :A2 := 12000;

SYS@T11:13> alter session set "_OPTIM_PEEK_USER_BINDS" = FALSE;
SYS@T11:13> alter session set "_OPTIMIZER_ADAPTIVE_CURSOR_SHARING" = FALSE;
SYS@T11:13> alter session set "_OPTIMIZER_EXTENDED_CURSOR_SHARING_REL" = 'NONE';
SYS@T11:13> alter session set "_OPTIMIZER_USE_FEEDBACK" = FALSE;

I created 3 bind variables with 3 different values (2000, 8000 and 12000) and i will also use these values as literals for comparison. I set the corresponding optimizer settings (disabled bind peeking, adaptive cursor sharing and cardinality feedback) on session level to get the same behavior as running this database for a SAP system.

Test case 1 - Using literals and bind variables with equal operator

In case of an equality operator the optimizer calculates the same amount of estimated rows (=cardinality). This estimation is correct as well in our case as the values of column NUM are distributed equally. This is a very common case in SAP OLTP environments and works pretty well.

In case of unequally distributed values we would need a histogram and literals to get a good cardinality estimation (as the optimizer is not able to take a look at the value of the bind variable due to disabled bind peeking).

Optimizer calculation:

Cardinality = 10000 rows * 0.0001 = 1 row

Test case 2 - Using literals and bind variables with unbounded and open range predicate

We see a huge underestimation of the cardinality with the bind variable in case of an unbounded and open range predicate (> 2000 / :A0). What is the reason for this?

Let's start with simple arithmetic for the literal value: The optimizer is aware of the following variables (all values greater than 2000 are requested, the table has 10000 rows, the min value is 1 and the max value is 10000 for column NUM)

Optimizer calculation for literal value:

Selectivity = (high_value - limit) / (high_value - low_value) = (10000 - 2000) / (10000 - 1) = 0,8000800080008

Cardinality = 10000 rows * 0,8000800080008 = round(8000,800080008) = 8001 rows

... if you check the formula closely it is based on the known literal value 2000. The optimizer is not aware of this value in case of a bind variable with disabled bind peeking. The optimizer needs to make a guess for the bind variable and it simply sets the selectivity to 5 % in such cases.

Optimizer calculation for bind variable:

Cardinality = 10000 rows * 0.05 = 500 rows

Test case 3 - Using literals and bind variables with bounded and closed range predicate

We see a huge underestimation of the cardinality with the bind variables in case of a bounded and closed range predicate (between 2000 and 8000 / :A0 and :A1). What is the reason for this?

In addition you can see that the execution plan changed as well based on the added predicates by query transformation (additional check for runtime - maybe there is no need to run execution plan step 2 based on the bind variable values).

Final query after transformations:******* UNPARSED QUERY IS *******
SELECT "BINDTEST"."NUM" "NUM","BINDTEST"."TEXT" "TEXT" FROM "SYS"."BINDTEST" "BINDTEST"
WHERE "BINDTEST"."NUM">=:B1 AND "BINDTEST"."NUM"<=:B2 AND :B3<=:B4
kkoqbc: optimizing query block SEL$1 (#0)

Let's start with simple arithmetic for the literal values: The optimizer is aware of the following variables (all values between 2000 and 8000 including the limits are requested, the table has 10000 rows, the min value is 1 and the max value is 10000 for column NUM)

Optimizer calculation for literal values:

Selectivity = (high limit - low limit) / (high_value - low_value) + 1/num_distinct (=density) + 1/num_distinct (=density) = (8000 - 2000) / (10000 - 1) + 0.0001 + 0.0001 = 0,6002600060006

Cardinality = 10000 rows * 0,6002600060006 = round(6002,600060006) = 6003 rows

... if you check the formula closely it is based on the known literal values 2000 and 8000. The optimizer is not aware of these values in case of bind variables with disabled bind peeking. The optimizer needs to make a guess for the bind variables and it simply sets the selectivity to 0.25 % (= 5% of 5%) in such cases.

Optimizer calculation for bind variables:

Cardinality = 10000 rows * 0.0025 = 25 rows

Test case 4 - Using literals and bind variables with unbounded and open outside range predicate

We see a huge overestimation of the cardinality with the bind variable in case of an unbounded and open outside range predicate (> 12000 / :A2). What is the reason for this?

Optimizer calculation for literal value:

This is a very special case as the requested values are placed outside the known value range (1 to 10000) and so the optimizer assumes that no rows are returned (1 estimated row means 0 or 1 row returned). The exact calculation is very Oracle database version dependent (and on the range to the known value range) and changed very often in the past.

Optimizer calculation for bind variable:

Cardinality = 10000 rows * 0.05 = 500 rows

The optimizer behaves exactly like in test case 2 as it is not aware of the bind value (and that it is outside the known range).

Summary

We have seen that the optimizer does nonsense cardinality calculations in various cases by using bind variables (without bind peeking and all the additional features around it). Be aware of optimizer calculations, if you see cardinality estimations based on 5 % (or a multiplication of it) - it could be a guess.

"The CBO is only as clever as the provided statistic and runtime data (except bugs and limits)."

As SAP suggests to disable the mentioned features and most costumers follow them of course - the most important question is: "What can i do, if i notice that behavior?" ... well as always - it depends

Here are just a few possibilities:

Use ABAP hints like "&SUBSTITUTE LITERALS&" or "&SUBSTITUTE VALUES&", if the SQL statements are not executed too frequently with different values
Use SQL profiles or SQL patches to force the right execution plan
Adjust the statistics so that the optimizer calculates the right cardinality based on its assumptions
... and so on

If you have any further questions - please feel free to ask or get in contact directly, if you need assistance by troubleshooting Oracle database (performance) issues.

References

↧

[Oracle] Myths and common misconceptions about (transparent) huge pages for Oracle databases (on Linux) uncovered

June 17, 2013, 3:10 am

Latest and popular articles on SAP ERP

≫ Next: How to check if Oracle Database instance is up?

≪ Previous: [Oracle] DB Optimizer Part VI - Effects of disabled bind variable peeking, adaptive cursor sharing and cardinality feedback on the CBO in SAP environments

Introduction

In the past years i have worked a lot with mission critical Oracle databases in highly consolidated or centralized environments and noticed several myths and common misconceptions about the memory management for Oracle databases on Linux (mainly SLES and OEL).

This blog covers the basics of the relevant memory management for Oracle databases on Linux and tries to clarify several myths. I just start with the common ones and maybe extend this blog post with several new and interesting little details over the time. It should be something like a central and sorted collection of relevant information.

Definition and insights into huge pages and transparent huge pages

Official Linux Documentation

Huge Pages and Transparent Huge Pages

Memory is managed in blocks known as pages. A page is 4096 bytes. 1MB of memory is equal to 256 pages; 1GB of memory is equal to 256,000 pages, etc. CPUs have a built-in memory management unit that contains a list of these pages, with each page referenced through a page table entry.

There are two ways to enable the system to manage large amounts of memory:

Increase the number of page table entries in the hardware memory management unit
Increase the page size

The first method is expensive, since the hardware memory management unit in a modern processor only supports hundreds or thousands of page table entries. Additionally, hardware and memory management algorithms that work well with thousands of pages (megabytes of memory) may have difficulty performing well with millions (or even billions) of pages. This results in performance issues: when an application needs to use more memory pages than the memory management unit supports, the system falls back to slower, software-based memory management, which causes the entire system to run more slowly.

Red Hat Enterprise Linux 6 implements the second method via the use of huge pages.

Simply put, huge pages are blocks of memory that come in 2MB and 1GB sizes. The page tables used by the 2MB pages are suitable for managing multiple gigabytes of memory, whereas the page tables of 1GB pages are best for scaling to terabytes of memory.

Huge pages must be assigned at boot time. They are also difficult to manage manually, and often require significant changes to code in order to be used effectively. As such, Red Hat Enterprise Linux 6 also implemented the use of transparent huge pages (THP). THP is an abstraction layer that automates most aspects of creating, managing, and using huge pages.

THP hides much of the complexity in using huge pages from system administrators and developers. As the goal of THP is improving performance, its developers (both from the community and Red Hat) have tested and optimized THP across a wide range of systems, configurations, applications, and workloads. This allows the default settings of THP to improve the performance of most system configurations.

Note that THP can currently only map anonymous memory regions such as heap and stack space.

Huge Translation Lookaside Buffer (HugeTLB)

Physical memory addresses are translated to virtual memory addresses as part of memory management. The mapped relationship of physical to virtual addresses is stored in a data structure known as the page table. Since reading the page table for every address mapping would be time consuming and resource-expensive, there is a cache for recently-used addresses. This cache is called the Translation Lookaside Buffer (TLB).

However, the TLB can only cache so many address mappings. If a requested address mapping is not in the TLB, the page table must still be read to determine the physical to virtual address mapping. This is known as a "TLB miss". Applications with large memory requirements are more likely to be affected by TLB misses than applications with minimal memory requirements because of the relationship between their memory requirements and the size of the pages used to cache address mappings in the TLB. Since each miss involves reading the page table, it is important to avoid these misses wherever possible.

The Huge Translation Lookaside Buffer (HugeTLB) allows memory to be managed in very large segments so that more address mappings can be cached at one time. This reduces the probability of TLB misses, which in turn improves performance in applications with large memory requirements.

*** Side Note: Transparent Huge Pages (THB) support was officially announced with Linux kernel version 2.6.38.

Oracle Documentation addition

HugePages is a feature integrated into the Linux kernel with release 2.6. This feature basically provides the alternative to the 4K page size (16K for IA64) providing bigger pages.

Regarding the HugePages, there are some other similar terms that are being used like, hugetlb, hugetlbfs. Before proceeding into the details of HugePages, see the definitions below:

Page Table: A page table is the data structure of a virtual memory system in an operating system to store the mapping between virtual addresses and physical addresses. This means that on a virtual memory system, the memory is accessed by first accessing a page table and then accessing the actual memory location implicitly.
TLB: A Translation Lookaside Buffer (TLB) is a buffer (or cache) in a CPU that contains parts of the page table. This is a fixed size buffer being used to do virtual address translation faster.
hugetlb: This is an entry in the TLB that points to a HugePage (a large/big page larger than regular 4K and predefined in size). HugePages are implemented via hugetlb entries, i.e. we can say that a HugePage is handled by a "hugetlb page entry". The 'hugetlb" term is also (and mostly) used synonymously with a HugePage. In this document the term "HugePage" is going to be used but keep in mind that mostly "hugetlb" refers to the same concept.
hugetlbfs: This is a new in-memory filesystem like tmpfs and is presented by 2.6 kernel. Pages allocated on hugetlbfs type filesystem are allocated in HugePages.

Graphical illustration of regular (normal) and huge pages

When a single process works with a piece of memory, the pages that the process uses are reference in a local page table for the specific process. The entries in this table also contain references to the System-Wide Page Table which actually has references to actual physical memory addresses. So theoretically a user mode process (i.e. Oracle processes), follows its local page table to access to the system page table and then can reference the actual physical table virtually. As you can see below, it is also possible (and very common to Oracle RDBMS due to SGA use) that two different O/S processes can point to the same entry in the system-wide page table.

When HugePages are in the play, the usual page tables are employed. The very basic difference is that the entries in both process page table and the system page table has attributes about huge pages. So any page in a page table can be a huge page or a regular page. The following diagram illustrates 4096K hugepages but the diagram would be the same for any huge page size.

I guess this should be enough general information about huge pages and transparent huge pages to understand the concepts and basics of it. Please check the reference section, that it includes more detailed information, if you are interested into it (like performance comparison).

Why do we care about such memory handling at all and what are the advantages?

Well as i have previously mentioned i worked with Oracle databases in highly consolidated or centralized environments in the past years and in such environments there is a lot of thinking about "how to utilize the infrastructure and hardware in the best way". Just imagine a distributed SAP system landscape with a centralized Oracle database infrastructure (like on VMware or whatever). How can you put as much as possible databases on such a infrastructure without harming the performance of each other? "Classical" database or SQL tuning is important for reducing the I/O, CPU and memory load of course, but you can also tune the operating system to get a much better utilization and throughput.

... and so we get into memory management as well. RAM is still the most expensive and limiting hardware part and so we don't want to waste it without a valid reason. So we finally reached the use case of regular, huge and transparent huge pages for Oracle databases.

Jonathan Lewis has already written a blog post about a memory usage issue after a database migration from a 32 to 64-bit operating system and mentioned a solution called "huge pages" for it.

"A client recently upgraded from 32-bit Oracle to 64-bit Oracle because this would allow a larger SGA. At the same time they increased their SGA from about 2GB to 3GB hoping to take more advantage of their 8GB of RAM. The performance of their system did not get better – in fact it got worse.

…

It is important background information to know that they were running a version of Red Hat Linux and that there were typically 330 processes connected to the database using an average of about 4MB of PGA each.

Using small memory pages (4KB) on a 32-bit operating system the memory map for a 2GB SGA would be: 4 bytes for each of 524,288 pages, totalling 2MB per process, for a grand total of 660MB memory space used for mapping when the system has warmed up. So when the system was running at steady state, the total memory directly related to Oracle usage was: 2GB + 660MB + 1.2GB (PGA) = 3.8GB, leaving about 4.2GB for O/S and file system cache.

…

Upgrade to a 64-bit operating system and a 3GB SGA and you need 8 bytes for each page in the memory map and have 786,432 pages, for a total of 6MB per process, for a total of 1,980 MB of maps – an extra 1.3GB of memory lost to maps. Total memory directly related to Oracle usage: 3GB + 1.9GB + 1.2GB (PGA) = 6.1GB, leaving about 1.9GB for O/S and file system cache.

This example is about a pretty tiny SGA - now think about databases with a much larger cache size or a lot of databases with a small cache size (in a highly consolidated environment) and scale it up - i think you get the point here.

Advantages of huge pages

Larger Page Size and Less # of Pages: Default page size is 4K whereas the HugeTLB size is 2048K. That means the system would need to handle 512 times less pages.
Reduced Page Table Walking: Since a HugePage covers greater contiguous virtual address range than a regular sized page, a probability of getting a TLB hit per TLB entry with HugePages are higher than with regular pages. This reduces the number of times page tables are walked to obtain physical address from a virtual address.
Less Overhead for Memory Operations: On virtual memory systems (any modern OS) each memory operation is actually two abstract memory operations. With HugePages, since there are less number of pages to work on, the possible bottleneck on page table access is clearly avoided.
Less Memory Usage: From the Oracle Database perspective, with HugePages, the Linux kernel will use less memory to create pagetables to maintain virtual to physical mappings for SGA address range, in comparison to regular size pages. This makes more memory to be available for process-private computations or PGA usage.
No Swapping: We must avoid swapping to happen on Linux OS at all. HugePages are not swappable (whereas regular pages are). Therefore there is no page replacement mechanism overhead. HugePages are universally regarded as pinned.
No 'kswapd' Operations: kswapd will get very busy if there is a very large area to be paged (i.e. 13 million page table entries for 50GB memory) and will use an incredible amount of CPU resource. When HugePages are used, kswapd is not involved in managing them.

Myth 1 - We are running a Linux kernel version, that supports transparent huge pages and so the Oracle database already uses huge pages for the SGA

This is a common myth in newer Oracle / Linux system landscapes, but unfortunately not true at all. Starting with RedHat 6, OEL 6, SLES 11 SP 2 and UEK2 kernels, transparent huge pages are implemented and enabled (by default) in an attempt to improve the memory management, but not every kind of memory is currently supported.

The following information is grabbed from an Oracle Enterprise Linux 6.2 (2.6.39-100.7.1.el6uek.x86_64) and run with an Oracle database 11.2.0.3.2. The instance uses manual memory management (no AMM or ASMM) to keep it simple.

[root@OEL11 ~]# cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never

Transparent huge pages are enabled as "always" is included in brackets - so let's verify it with a running Oracle database.

*** Before database startup
[root@OEL11 ~]# cat /proc/meminfo | grep AnonHugePages
AnonHugePages:         0 kB

*** Database is started up with db_cache_size=300M, shared_pool_size=200M, 
*** pga_aggregate_target=100M and pre_page_sga=TRUE
SQL> startup 
ORACLE instance started.
Total System Global Area  559575040 bytes

*** After database startup
[root@OEL11 ~]# cat /proc/meminfo | grep AnonHugePages
AnonHugePages:      4096 kB

That seems to be pretty strange, right? The SGA is round about 500 MB and fully allocated, but only 4 MB of transparent huge pages are currently used. What's wrong here? Let's check each database process for its memory usage.

[root@OEL11 ~]# for PRID in $(ps -o pid -u orat11)
do  THP=$(cat /proc/$PRID/smaps | grep AnonHugePages | awk '{sum+=$2} END  {print sum}')  echo  "PID: $PRID - AnonHugePages: $THP"
done

PID: 11903 - AnonHugePages: 0
...
PID: 12653 - AnonHugePages: 0
PID: 12655 - AnonHugePages: 4096
PID: 12657 - AnonHugePages: 0
...
PID: 12926 - AnonHugePages: 0

[root@OEL11 ~]# ps -ef | grep 12655
orat11   12655     1  0 15:54 ?        00:00:00 ora_dbw0_T11

The DBWR is using the 4 MB of transparent huge pages only, but nothing compared to the SGA size, right?

In reality there is nothing wrong - it works as designed, if we check the kernel documentation for transparent huge pages:

[root@OEL11 ~]# cat /usr/share/doc/kernel-doc-2.6.39/Documentation/vm/transhuge.txt
...
Transparent Hugepage Support is an alternative means of using huge pages for the backing of virtual 
memory with huge pages that supports the automatic promotion and demotion of page sizes and 
without the shortcomings of hugetlbfs.
Currently it only works for anonymous memory mappings but in the future it can expand over the 
pagecache layer starting with tmpfs.
...

.. and here we go .. transparent huge pages are currently supported for anonymous memory (like PGA heap) only and nothing else. So the SGA (shared memory) still uses the regular page size and transparent huge pages are not useful here to reduce the mapping overhead.

IMPORTANT HINT: Due to known problems - Oracle does not recommend transparent huge pages at all (even not for PGA heap) - please check the reference section (MOS ID 1557478.1 or SAPnote #1871318) for details about deactivating this feature.

"Because Transparent HugePages are known to cause unexpected node reboots and performance problems with RAC, Oracle strongly advises to disable the use of Transparent HugePages. In addition, Transparent Hugepages may cause problems even in a single-instance database environment with unexpected performance problems or delays. As such, Oracle recommends disabling Transparent HugePages on all Database servers running Oracle."

Myth 2 - Huge pages are difficult to manage in highly consolidated and critical environments

This myth was true at all (and is still true in various cases nowadays), but Oracle has improved the procedure for allocating huge pages with Oracle patchset 11.2.0.3.

Let's clarify the root problem first. Imagine a highly consolidated Oracle database system landscape on several physical or virtual hosts. You needed to calculate and define the amount of huge pages for all databases on that particular host in the pre Oracle 11.2.0.3 times. This works pretty well, if you have a stable number of instances/databases (with a fixed memory size), but what if you need to add several new instances/databases to your production server. You can not adjust the corresponding kernel parameters (and maybe need to reboot the server) manually, just because of a newly deployed instance/database. Otherwise the instance would allocate the whole SGA memory size in regular pages (4 kb), if the SGA of the new instance/database does not fit into the remaining free huge pages area. This can cause nasty paging trouble as the memory calculating is based on using large pages. Or think about automated database provisioning - of course you could size the huge page area that big, that you never run into a problem, but then we have missed the original goal of using the hardware resources as effective as possible.

Let's check out the improvements of Oracle 11.2.0.3 for the huge page handling. The following information is grabbed from an Oracle Enterprise Linux 6.2 (2.6.39-100.7.1.el6uek.x86_64) and run with an Oracle database 11.2.0.3.2. The instance uses manual memory management (no AMM or ASMM) to keep it simple and transparent huge pages are disabled.

Initial settings for every parameter setting test

[root@OEL11 ~]# cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]

[root@OEL11 ~]# cat /proc/meminfo  | grep Huge
HugePages_Total:     150
HugePages_Free:      150
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

Transparent huge pages are disabled as "never" is included in brackets and round about 300 MB of memory is assigned to the huge pages "pool" and still free.

The SGA of my Oracle instance is still round about 500 MB - so it usually would not be able to allocate all the memory as huge pages. Let's verify the different behaviors with parameter "use_large_pages".

Parameter use_large_pages=TRUE (= Default)

Let's check the default behavior of Oracle 11.2.0.3 first.

*** Database is started up with db_cache_size=300M, shared_pool_size=200M, 
*** pga_aggregate_target=100M, pre_page_sga=TRUE and use_large_pages=TRUE
SQL> startup 
ORACLE instance started.
Total System Global Area  559575040 bytes

*** Alert Log
****************** Large Pages Information *****************
Total Shared Global Region in Large Pages = 300 MB (55%)
Large Pages used by this instance: 150 (300 MB)
Large Pages unused system wide = 0 (0 KB) (alloc incr 4096 KB)
Large Pages configured system wide = 150 (300 MB)
Large Page size = 2048 KB

*** After database startup
root@OEL11 ~]# cat /proc/meminfo | grep Huge
AnonHugePages:         0 kB
HugePages_Total:     150
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

root@OEL11 ~]# ipcs -a
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status      
0x00000000 65537      orat11     640        12582912   25                      
0x00000000 98306      orat11     640        276824064  25                      
0x00000000 131075     orat11     640        20971520   25                      
0x00000000 163844     orat11     640        4194304    25                      
0x00000000 196613     orat11     640        247463936  25                      
0x4eb56684 229382     orat11     640        2097152    25

As you can see Oracle has used all the available huge pages first and after it run out it used regular pages for the rest. Several shared memory segments are created and used as a side effect of this enhancement.

Parameter use_large_pages=ONLY

Let's check the parameter value "use_large_pages=ONLY" and its behavior, if there are not sufficient large pages at database startup.

*** Database is started up with db_cache_size=300M, shared_pool_size=200M, 
*** pga_aggregate_target=100M, pre_page_sga=TRUE and use_large_pages=ONLY
SQL> startup 
ORA-27137: unable to allocate large pages to create a shared memory segment
Linux-x86_64 Error: 12: Cannot allocate memory

*** Alert Log
****************** Large Pages Information *****************
Parameter use_large_pages = ONLY
Large Pages unused system wide = 150 (300 MB) (alloc incr 4096 KB)
Large Pages configured system wide = 150 (300 MB)
Large Page size = 2048 KB

ERROR:
  Failed to allocate shared global region with large pages, unix errno = 12.  Aborting Instance startup.  ORA-27137: unable to allocate Large Pages to create a shared memory segment

As you can see we can also force the instance to use large pages only for the whole SGA and the startup fails with an ORA-27137 error, if not enough large pages are available. This setting is usually used to avoid an out of memory situation based on a mix of regular and large pages (like the default behavior).

Parameter use_large_pages=AUTO

This is a completely new introduced option with Oracle 11.2.0.3 - let's verify its impact, if there are not sufficient large pages at database startup.

*** Database is started up with db_cache_size=300M, shared_pool_size=200M, 
*** pga_aggregate_target=100M, pre_page_sga=TRUE and use_large_pages=AUTO
SQL> startup
ORACLE instance started.
Total System Global Area  559575040 bytes

*** Alert Log
DISM started, OS id=1610
****************** Large Pages Information *****************
Parameter use_large_pages = AUTO
Total Shared Global Region in Large Pages = 538 MB (100%)
Large Pages used by this instance: 269 (538 MB)
Large Pages unused system wide = 0 (0 KB) (alloc incr 4096 KB)
Large Pages configured system wide = 269 (538 MB)
Large Page size = 2048 KB
Time taken to allocate Large Pages = 0.025895 sec
***********************************************************

*** After database startup
[root@OEL11 trace]# cat /proc/meminfo | grep Huge
HugePages_Total:     269
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

[root@OEL11 trace]# ipcs -a
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status      
0x6c6c6536 0          root       600        4096       0                       
0x00000000 360449     orat11     640        12582912   24                      
0x00000000 393218     orat11     640        549453824  24                      
0x4eb56684 425987     orat11     640        2097152    24

As you can see Oracle automatically reconfigured the Linux kernel and increased the amount of huge pages (temporarily), so that the complete SGA fits in. This is possible, if you have enough free(able) memory. You will also notice an unusual startup comment like "DISM started, OS id=1610", if you look closely at the alert log snippet. DISM is responsible for such tasks like increasing the amount of huge pages or increasing the process priority. For such tasks root privileges are needed - so check the correct permissions (s-bit and owner) for the binary dism.

Myth 3 - Huge pages can not be used for Oracle instances with ASMM (Automatic Shared Memory Management)

This is the most common misconception that i am confronted with. Personally i am not a fan of automatic shared memory management, but some of my clients use it of course. I guess the root cause of this misconception is based on the naming of two similar memory features called ASMM and AMM. So let's check the official documentation about both features and the huge pages restriction first.

Automatic Shared Memory Management (ASMM)

Automatic Shared Memory Management simplifies SGA memory management. You specify the total amount of SGA memory available to an instance using the SGA_TARGET initialization parameter and Oracle Database automatically distributes this memory among the various SGA components to ensure the most effective memory utilization.

When automatic shared memory management is enabled, the sizes of the different SGA components are flexible and can adapt to the needs of a workload without requiring any additional configuration. The database automatically distributes the available memory among the various components as required, allowing the system to maximize the use of all available SGA memory.

Automatic Memory Management (AMM)

The simplest way to manage instance memory is to allow the Oracle Database instance to automatically manage and tune it for you. To do so (on most platforms), you set only a target memory size initialization parameter (MEMORY_TARGET) and optionally a maximum memory size initialization parameter (MEMORY_MAX_TARGET). The total memory that the instance uses remains relatively constant, based on the value of MEMORY_TARGET, and the instance automatically distributes memory between the system global area (SGA) and the instance program global area (instance PGA). As memory requirements change, the instance dynamically redistributes memory between the SGA and instance PGA.

When automatic memory management is not enabled, you must size both the SGA and instance PGA manually.

Restrictions for HugePages Configurations

The Automatic Memory Management (AMM) and HugePages are not compatible. With AMM the entire SGA memory is allocated by creating files under /dev/shm. When Oracle Database allocates SGA that way HugePages are not reserved. You must disable AMM on Oracle Database to use HugePages.
If you are using VLM in a 32-bit environment, then you cannot use HugePages for the Database Buffer cache. HugePages can be used for other parts of SGA like shared_pool, large_pool, and so on. Memory allocation for VLM (buffer cache) is done using shared memory file systems (ramfs/tmpfs/shmfs). HugePages does not get reserved or used by the memory file systems.
HugePages are not subject to allocation or release after system startup, unless a system administrator changes the HugePages configuration by modifying the number of pages available, or the pool size. If the space required is not reserved in memory during system startup, then HugePages allocation fails.

So basically said - both features are used for automatic memory management, but ASMM is controlling the SGA only and AMM is controlling the SGA and PGA. If you look closely at the restrictions you will see that only AMM is not compatible with huge pages, but ASMM is. AMM is not based on the "classical shared memory segment" - it is implemented by using the /dev/shm "filesystem".

Initial settings for ASMM huge pages test

[root@OEL11 ~]#  cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]

[root@OEL11 ~]#  cat /proc/meminfo  | grep Huge
HugePages_Total:     150
HugePages_Free:      150
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

Transparent huge pages are disabled as "never" is included in brackets and round about 300 MB of memory is assigned to the huge pages "pool" and still free.

Using ASMM and check the huge pages behavior

*** Database is started up with sga_target=500M, pga_aggregate_target=100M, 
*** pre_page_sga=TRUE and use_large_pages=AUTO
SQL> startup 
ORACLE instance started.
Total System Global Area  521936896 bytes

*** Alert Log
DISM started, OS id=1400
****************** Large Pages Information *****************
Parameter use_large_pages = AUTO
Total Shared Global Region in Large Pages = 502 MB (100%)
Large Pages used by this instance: 251 (502 MB)
Large Pages unused system wide = 0 (0 KB) (alloc incr 4096 KB)
Large Pages configured system wide = 251 (502 MB)
Large Page size = 2048 KB
Time taken to allocate Large Pages = 0.022167 sec
***********************************************************

*** After database startup
[root@OEL11 trace]#  cat /proc/meminfo  | grep Huge
HugePages_Total:     251
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

[root@OEL11 trace]# ipcs -a
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status      
0x6c6c6536 0          root       600        4096       0                       
0x00000000 65537      orat11     640        12582912   23                      
0x00000000 98306      orat11     640        511705088  23                      
0x4eb56684 131075     orat11     640        2097152    23

As you can see huge pages and ASMM is fully compatible and even works with the "new automatic huge pages extension feature".

Summary

Wow - this blog already become quite large, but unfortunately this topic is very wide open and so we needed to cover a lot of the basics first. I will keep extending this blog as soon as i notice new topics or if you ask for something specific.

If you have any further questions - please feel free to ask or get in contact directly, if you need assistance by implementing complex Oracle database landscapes or by troubleshooting Oracle (performance) issues.

References

↧

How to check if Oracle Database instance is up?

June 17, 2013, 3:13 pm

Latest and popular articles on SAP ERP

≫ Next: [Oracle] Database cache internals and why one data block can use multiple cache buffers

≪ Previous: [Oracle] Myths and common misconceptions about (transparent) huge pages for Oracle databases (on Linux) uncovered

Log into infrastructure instance & login as sysdba;

bash-3.00$ sqlplus ‘/as sysdba’

SQL> select status from v$instance;

STATUS

------------------------------------

OPEN

____________________________________________________________________________________________________________________________

Log into infrastructure instance & issue the following command

bash-3.00$ ps -ef | grep pmon

ocsinfra 11021 1 0 09:15:15 ? 0:09 ora_pmon_orcl

ocsinfra 17960 17592 0 10:47:15 pts/2 0:00 grep pmon

ps -ef |grep ora
ps -ef |grep ora_
ps -ef |grep pmon

R3trans -d ; give RC – 0000 means active other than 0000 is error

________________________________________________________________________________________________________________________________

In Brtools

1.Instance Management,

6 – Show instance status,

3-Database instance,

give the instance name i mean

Database Name and press continue….give the status

_____________________________________________________________________________________________________________

ps -ef |grep pmon

If DB instance is running, then the above command returns the pmon process running

_______________________________________________________________________________________________________________

↧

[Oracle] Database cache internals and why one data block can use multiple cache buffers

June 19, 2013, 3:51 am

Latest and popular articles on SAP ERP

≫ Next: [Oracle] Summary - Exploring Oracle 12c (R1) - Is the sky really cloudy or rather shiny?

≪ Previous: How to check if Oracle Database instance is up?

Introduction

Have you ever heard of the myth "Increase the database cache to the database size and it can be cached completely" or have you ever wondered about how the database cache is organized internally in principle? .. then this blog should be worth to read.

You usually hear a lot of rule of thumbs and known expertise about a topic like database cache size or handling of such caches. Sometimes the cache topic is also used as a "silver-bullet" without a deeper investigation of the real root cause and some customers just follow the recommendations of their consultants without questioning. But you as customer should always question the recommendations and get the idea behind them (maybe there is none).

Cary Millsap, a well known Oracle performance expert already expressed it this way (and i really like the idea of it):

“People think that consultants get paid for having the right answers, but we don’t.

We get paid for convincing people that we have the right answers.

The way you do that, is by showing them the exact process that led to your conclusion."

Basics of the database buffer cache organization

I must admit, that this section is going to disregard a lot of details about the SGA, its structure, working data sets, different linked lists and queues, but otherwise this blog would be come very large and complex, if i include a lot of these details here as well. In this blog we just want to explore how one (physical) data block can allocate multiple cache buffers. Please let me know, if you are interested into some of the other details as well - so i maybe write another blog post about it.

"A picture tells us more than a thousand words" - let's start with a tiny illustration of the buffer cache organization.

The database block buffer headers (which point to the real cached data blocks) are hashed and attached to a corresponding hash bucket in a short double linked list. These hash buckets are protected by latches.

Here is a short summary of the most important facts:

The hashing algorithm for choosing the correct hash bucket for a data block is basically something like this: Hashing of file number and block number
Oracle creates a lot of these hash buckets (hidden parameter _db_block_hash_buckets or sga variable kcbnhb) to keep the double linked buffer header lists as short as possible (for performance reasons like searching for already cached data blocks or checking the content and so on)
In newer Oracle database releases one latch covers 32 hash buckets (formula = hidden parameter _db_block_hash_buckets / _db_block_hash_latches)

Let's illustrate the basic procedure for reading a data block:

Calculate the correct hash bucket by using the file and block number
Grab the relevant (cache buffers chain) latch
Follow the pointers from the hash bucket (jumping from buffer header to buffer header) to find the corresponding buffer
Do something with the buffer header and data content (if Oracle can already find it in the linked list)
Drop the relevant (cache buffers chains) latch

That's it - i already stop here with explaining. This should be enough "basic knowledge" to understand the following demo case and answer the question "why one data block can use multiple cache buffers".

*** Side note: Maybe you have already noticed a wait event called "latch: cache buffers chains" by troubleshooting performance issues, if several processes try to read a lot of data from the database cache (due to bad SQL execution plans or insufficient database structures). This is typically caused by a contention at step 2 from the procedure above.

Test case - One data block can use multiple cache buffers

The following demo was run with an Oracle database (11.2.0.3.2) on OEL 6.2.

Create the test case

SQL> create table BUFFCACHETEST (num number);

SQL> begin
for i in 1..10 loop    insert into BUFFCACHETEST values(i);
end loop;
commit;
end;
/

SQL> exec DBMS_STATS.GATHER_TABLE_STATS('SYS', 'BUFFCACHETEST');

Initial settings

The created database object (table BUFFCACHETEST) has the corresponding object id 83552 and allocated 8 contiguous database blocks from block id 93528 upwards.

Cross-check the data

SQL> select NUM, DBMS_ROWID.ROWID_OBJECT(ROWID) as OBJECT_ID,                                        DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID) as RELATIVE_FNO,                                        DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) as BLOCK_NUM,                                       DBMS_ROWID.ROWID_ROW_NUMBER(ROWID) AS ROW_NUM                            from BUFFCACHETEST;

All of the ten rows are stored in the same (physical) data block (93529) in data file 1. Now let's research what happens with the database cache (and the buffer headers), if we update these rows in several ways.

Remember: All of these rows are all placed in the same physical 8 kb block and so they only need 8 kb of "physical space".

Initialize test case

*** Restart Oracle Instance to initialize cache ***
SQL> startup force

*** Fill buffer cache with that one block again
SQL> select * from BUFFCACHETEST;

*** The following query will be used furthermore for getting the information about the buffer headers
SQL> select HLADDR, decode(STATE,0,'free',1,'xcur',2,'scur',3,'cr', 4,'read',5,'mrec',6,'irec',7,'write',8,'pi', 9,                       'memory',10,'mwrite',11,'donated', 12,'protected',  13,'securefile', 14,'siop',15,'recckpt', 16,                       'flashfree',  17, 'flashcur', 18, 'flashna') as STATE, PRV_HASH, NXT_HASH,                        BA, DBARFIL, DBABLK             from X$BH where OBJ = 83552 and DBABLK = 93529;

One buffer header (for physical block in file 1 block 93529) is attached to the linked list as we just read the data by a full table scan before. We should clarify the selected columns of X$BH, before we go on with the DML statements.

HLADDR = Address of the latch, that protects the corresponding hash bucket

STATE = State of the block like xcur (current version), cr (consistent version, which contains an older version of the block and is available for consistent reads)

PRV_HASH = Address of the previously attached buffer header in the double linked list

NXT_HASH = Address of the following attached buffer header in the double linked list

BA = Address of data block buffer

The other 2 columns should be self-explanatory. Let's verify the latch by its address to be absolutely consistent in our assumptions.

SQL> select NAME from V$LATCH_CHILDREN where ADDR = '00000000847E4830';
NAME
----------------------------------------------------------------
cache buffers chains

SQL> select count(*) from V$LATCH_CHILDREN where NAME = 'cache buffers chains';   COUNT(*)
----------      1024

As assumed the latch address corresponds to the "cache buffer chain latches" and my test instance got 1024 of these latches. This covers the previous hidden parameter value (check "Initial settings" section) of "_db_block_hash_latches" as well.

Finally let's do some arithmetic for cross-checking the "latches per hash bucket" assumption: _db_block_hash_buckets / _db_block_hash_latches = 32768 / 1024 = 32

DML test case

In the following section i will update one row after another and query the buffer cache header array afterwards. Please look carefully at the columns "HLADDR", "STATE" and "BA".

SQL> update BUFFCACHETEST set NUM=11 where NUM=1;

SQL> update BUFFCACHETEST set NUM=12 where NUM=2;

SQL> update BUFFCACHETEST set NUM=13 where NUM=3;

SQL> update BUFFCACHETEST set NUM=NUM+10 where NUM <= 10;

Only one additional block is needed, even if you update several rows by one DML statement (changes are done to the same block only).

SQL> update BUFFCACHETEST set NUM=21 where NUM = 11;

SQL> update BUFFCACHETEST set NUM=23 where NUM = 12;

We have seen, that every DML statement allocated and used a new block in the buffer cache (different values in column BA, but the same physical block) and attached the corresponding buffer header into the double linked list. This works until the limit of "_db_block_max_cr_dba" is reached. This limit is implemented to keep the linked lists as short as possible. Notice the state of the blocks - you have one current version of the block and several "older versions" of the block that can be used for consistent reads (like long running SQLs).

*** Side note: I already observed, that the limit of parameter "_db_block_max_cr_dba" does not count in any case (maybe due to bugs). I was not able to reproduce a corresponding test case on my Oracle environment with database version 11.2.0.3.2.

Finally we have only one assumption left: "All of these blocks are attached to the same hash bucket and linked together". Previously we just queried the HLADDR (latch address) and it was always the same, but one latch covers 32 hash buckets - so we can not be absolutely sure about it at this point.

The easiest way (for me) to cross-check this is dumping the information about the buffers.

SQL> oradebug setmypid
SQL> oradebug dump buffers 4;

So this particular hash bucket / double linked lists consists of 8 buffer headers. You can reconstruct the "linked chain", if you follow the blue marked addresses. You will also find the corresponding data buffer addresses as published through X$BH. The current version of the table block is always the first one (of the corresponding database object) as you walk down the linked list ("performance" optimization).

*** Side note: This particular demo case works with DMLs and full table scans only (Switch current to new buffer technique), but it is the easiest way to demonstrate such a behavior in the buffer cache. If you look closely at the BA column, you will see that the newly created "xcur" buffers are copies from the previously current ones.

Summary

Statements like "database cache size = database size = fully cached" are not true at all as we have demonstrated, that one physical data block (8 kb) can be stored multiple times in the database cache (in our case 8 kb * 6 = 48 kb).

You maybe wonder now, why do we care about this as we usually have indexes defined (in a SAP environment) and a DML statement does not use a full table scan at all. This is true (for most of the well written application/business logics), but a similar technique is used, if you need a read consistent block for long running SQLs. Oracle cross-checks the SCN of the current block in buffer cache (we assume that is already there now) when reading a data block, but Oracle needs to create a read-consistent copy of the block, if the block SCN is higher than our query SCN.

If Oracle notices such a SCN mismatch, it walks down the linked list furthermore, because it is possible that an appropriate read-consistent copy (column "STATE" = cr) already exists. Oracle reconstructs the block (by cloning the XCUR block and applying the undo) and attaches it, If there is no suitable version available.

If you have any further questions - please feel free to ask or get in contact directly, if you need assistance by troubleshooting Oracle database (performance) issues.

↧

[Oracle] Summary - Exploring Oracle 12c (R1) - Is the sky really cloudy or rather shiny?

July 5, 2013, 6:07 am

Latest and popular articles on SAP ERP

≫ Next: After the upgrade oracle db from 10G to 11G cannot start SAP WAS ABAP

≪ Previous: [Oracle] Database cache internals and why one data block can use multiple cache buffers

Introduction

The new Oracle database 12c R1 (12.1.0.1.0) release was finally available and published on 06/25/2013 by Oracle. SAP usually do not certify the R1 database releases for their software (exceptions are always possible), but it is still worth to investigate and explore the new Oracle database release right now for future intended SAP use or currently non-SAP use.

I downloaded it on the same evening and already tested a few enhancements, features or the new database architecture on Linux (x86_64) and Solaris (x86_64).

This blog should be a summary of all the interesting features (for me) and a documentation of my current and future researching cycles. Some of the smaller topics will be explained in this one, otherwise they will be placed in a separate blog and linked here.

... so stay tuned and crosscheck this blog from time to time for more results.

[UPDATE 1] 07/10/13: Added content to section "1.8.2.2 Tracking I/O Outliers"

[UPDATE 2] 07/11/13: Added content to section "1.5.5.3 Invisible Columns"

[UPDATE 3] 07/15/13: Added new section "1.1.6.11 SQL Translation Framework" and linked content

[UPDATE 4] 07/17/13: Added content to section "1.2.3.5 Partial Indexes for Partitioned Tables"

[UPDATE 5] 07/18/13: Added content to section "1.2.3.1 Asynchronous Global Index Maintenance for DROP and TRUNCATE Partition"

Exploring enhancements / features

The following feature / enhancement list is based on the Oracle Database New Features Guide 12c Release 1 (12.1), but it covers only the key topics, that i am particularly interested in.

1.1.6.11 SQL Translation Framework

I already took part in some re-searching and a discussion about this feature on Kerry Osborne's blog. So i will just copy & paste the external source for more detailed information and a test case of this feature.

1.2.3.1 Asynchronous Global Index Maintenance for DROP and TRUNCATE Partition

Global index maintenance is decoupled from the DROP and TRUNCATE partition maintenance operation without rendering a global index unusable. Index maintenance is done asynchronously and can be delayed to a later point-in-time. Delaying the global index maintenance to off-peak times without impacting the index availability makes DROP and TRUNCATE partition and subpartition maintenance operations faster and less resource intensive at the point-in-time of the partition maintenance operation. In pre-Oracle 12c times a DROP or TRUNCATE PARTITION DML statement makes the index unusable or took much more time and resources. This was especially critical in OLTP environments when data was partitioned for performance or data management reasons.

The partition maintenance operations DROP PARTITION and TRUNCATE PARTITION are optimized by making the index maintenance for metadata only. Asynchronous global index maintenance for DROP and TRUNCATE is performed by default; however, the UPDATE INDEXES clause is still required for backward compatibility.

The following list summarizes the limitations of asynchronous global index maintenance:

Only performed on heap tables
No support for tables with object types
No support for tables with domain indexes
Not performed for the user SYS

Maintenance operations on indexes can be performed with the automatic scheduler job SYS.PMO_DEFERRED_GIDX_MAINT_JOB to clean up all global indexes. This job is scheduled to run at 2:00 A.M. on a daily basis by default. You can run this job at any time using DBMS_SCHEDULER.RUN_JOB if you want to proactively clean up the indexes. You can also modify the job to run with a different schedule based on your specific requirements. However, Oracle recommends that you do not drop the job.

You can also force cleanup of an index needing maintenance using one of the following options:

DBMS_PART.CLEANUP_GIDX - This PL/SQL procedure gathers the list of global indexes in the system that may require cleanup and runs the operations necessary to restore the indexes to a clean state.
ALTER INDEX REBUILD [PARTITION] – This SQL statement rebuilds the entire index or index partition as is done in releases previous to Oracle Database 12c Release 1 (12.1). The resulting index (partition) does not contain any stale entries.
ALTER INDEX [PARTITION] COALESCE CLEANUP – This SQL statement cleans up any orphaned entries in index blocks.

The following demo shows the difference between Oracle 11g R2 (11.2.0.3.6) and Oracle 12c R1 (12.1.0.1) on OEL 6.4 (2.6.39-400.109.1.el6uek.x86_64).

-- Oracle 11g R2 & Oracle 12c R1 database structures
SQL> CREATE TABLE TAB_PART (MANDT VARCHAR2(3), TEXT VARCHAR2(40))             PARTITION BY LIST (MANDT)             (PARTITION MANDT_000 VALUES ('000'), PARTITION MANDT_066 VALUES ('066'),               PARTITION MANDT_100 VALUES ('100'));

SQL> CREATE INDEX TAB_PART_I ON TAB_PART(MANDT);
SQL> insert into TAB_PART VALUES ('000', 'MAN 000');
SQL> insert into TAB_PART VALUES ('066', 'MAN 066');
SQL> insert into TAB_PART VALUES ('100', 'MAN 100');
SQL> commit;

Let's test the DML behavior on Oracle 11g R2 first (as a base line test)

TEST@T11DB:15> select PARTITION_NAME, PARTITION_POSITION, HIGH_VALUE                                     from DBA_TAB_PARTITIONS where TABLE_NAME = 'TAB_PART';
PARTITION_NAME                           PARTITION_POSITION HIGH_VALUE
------------------------------ ------------------ ------------------------------
MANDT_000                                                  1 '000'
MANDT_066                                                  2 '066'
MANDT_100                                                  3 '100'

TEST@T11DB:15> select OWNER, STATUS, PARTITIONED from DBA_INDEXES                                     where INDEX_NAME = 'TAB_PART_I';
OWNER                         STATUS   PAR
-------------------- -------- ---
TEST                         VALID    NO

-- Let's assume that we want to get the rid of the data partition for client 100
TEST@T11DB:15> alter table TAB_PART drop partition MANDT_100 UPDATE INDEXES;

TEST@T11DB:15> select PARTITION_NAME, PARTITION_POSITION, HIGH_VALUE                                     from DBA_TAB_PARTITIONS where TABLE_NAME = 'TAB_PART';
PARTITION_NAME                           PARTITION_POSITION HIGH_VALUE
------------------------------ ------------------ ------------------------------
MANDT_000                                                  1 '000'
MANDT_066                                                  2 '066'

TEST@T11DB:15> select OWNER, STATUS, PARTITIONED from DBA_INDEXES                                     where INDEX_NAME = 'TAB_PART_I';
OWNER                         STATUS   PAR
-------------------- -------- ---
TEST                         VALID    NO

The index was maintained and the DML operation has to perform the "index clean up" work right now as no "delay option" is available as in Oracle 12c.

Let's check out what happens, if we omit the "UPDATE INDEXES" clause (Use the update_index_clauses to update the indexes on table as part of the table partitioning operation. When you perform DDL on a table partition, if an index is defined on table, then Oracle Database invalidates the entire index, not just the partitions undergoing DDL. This clause lets you update the index partition you are changing during the DDL operation, eliminating the need to rebuild the index after the DDL.).

TEST@T11DB:15> alter table TAB_PART drop partition MANDT_066;

TEST@T11DB:15> select OWNER, STATUS, PARTITIONED from DBA_INDEXES                                     where INDEX_NAME = 'TAB_PART_I';
OWNER                         STATUS   PAR
-------------------- -------- ---
TEST                         UNUSABLE NO

The global index become unusable and is not valid anymore for data access.

Now let's do the same procedures on Oracle 12c R1

TEST@T12DB:175> select PARTITION_NAME, PARTITION_POSITION, HIGH_VALUE                                       from DBA_TAB_PARTITIONS where TABLE_NAME = 'TAB_PART';
PARTITION_NAME               PARTITION_POSITION HIGH_VALUE
-------------------- ------------------ --------------------
MANDT_000                                    1 '000'
MANDT_066                                    2 '066'
MANDT_100                                    3 '100'

TEST@T12DB:175> select OWNER, STATUS, PARTITIONED, INDEXING, ORPHANED_ENTRIES                                       from DBA_INDEXES where INDEX_NAME = 'TAB_PART_I';
OWNER                         STATUS   PAR INDEXIN ORP
-------------------- -------- --- ------- ---
TEST                         VALID    NO  FULL            NO

-- Let's assume that we want to get the rid of the data partition for client 100
TEST@T12DB:175> alter table TAB_PART drop partition MANDT_100 UPDATE INDEXES;

TEST@T12DB:175> select PARTITION_NAME, PARTITION_POSITION, HIGH_VALUE                                       from DBA_TAB_PARTITIONS where TABLE_NAME = 'TAB_PART';
PARTITION_NAME               PARTITION_POSITION HIGH_VALUE
-------------------- ------------------ --------------------
MANDT_000                                    1 '000'
MANDT_066                                    2 '066'

TEST@T12DB:175> select OWNER, STATUS, PARTITIONED, INDEXING, ORPHANED_ENTRIES                                       from DBA_INDEXES where INDEX_NAME = 'TAB_PART_I';
OWNER                         STATUS   PAR INDEXIN ORP
-------------------- -------- --- ------- ---
TEST                         VALID    NO  FULL            YES

Check the column ORPHANED_ENTRIES (= Indicates whether a global index contains stale entries because of deferred index maintenance during DROP/TRUNCATE PARTITION, or MODIFY PARTITION INDEXING OFF operations.). The index was not maintained and the DML operation itself is pretty fast without making the index unusable.

-- Manual clean up of orphaned index entries
TEST@T12DB:175> exec DBMS_PART.CLEANUP_GIDX('TEST','TAB_PART');

TEST@T12DB:175> select OWNER, STATUS, PARTITIONED, INDEXING, ORPHANED_ENTRIES                                       from DBA_INDEXES where INDEX_NAME = 'TAB_PART_I';
OWNER                         STATUS   PAR INDEXIN ORP
-------------------- -------- --- ------- ---
TEST                         VALID    NO  FULL            NO

Let's check out what happens if we omit the "UPDATE INDEXES" clause (Use the update_index_clauses to update the indexes on table as part of the table partitioning operation. When you perform DDL on a table partition, if an index is defined on table, then Oracle Database invalidates the entire index, not just the partitions undergoing DDL. This clause lets you update the index partition you are changing during the DDL operation, eliminating the need to rebuild the index after the DDL.).

TEST@T12DB:175> alter table TAB_PART drop partition MANDT_066;

TEST@T12DB:175> select PARTITION_NAME, PARTITION_POSITION, HIGH_VALUE                                       from DBA_TAB_PARTITIONS where TABLE_NAME = 'TAB_PART';
PARTITION_NAME               PARTITION_POSITION HIGH_VALUE
-------------------- ------------------ --------------------
MANDT_000                                    1 '000'

TEST@T12DB:175> select OWNER, STATUS, PARTITIONED, INDEXING,  ORPHANED_ENTRIES                                       from DBA_INDEXES where INDEX_NAME = 'TAB_PART_I';
OWNER                         STATUS   PAR INDEXIN ORP
-------------------- -------- --- ------- ---
TEST                         UNUSABLE NO  FULL            NO

The global index become unusable and is not valid anymore for data access. (= same behavior as in older Oracle releases)

At last here is the time and work difference of the primary DML operation (drop partition update indexes) between Oracle 11g R2 and Oracle 12c R1 on partition"MANDT_100" with 1.000.000 rows in it.

-- Oracle 11g R2
TEST@T11DB:15> alter system flush buffer_cache;
TEST@T11DB:15> alter table TAB_PART drop partition MANDT_100 UPDATE INDEXES;

-- Oracle 12c R1
TEST@T11DB:175> alter system flush buffer_cache;
TEST@T12DB:175> alter table TAB_PART drop partition MANDT_100 UPDATE INDEXES;

The main advantage of this feature is that it decouples the index maintenance task from the DML operation itself and makes such (primary) DML operations much faster without invalidating the (global) indexes.

1.2.3.5 Partial Indexes for Partitioned Tables

A partial index is an index that is correlated with the indexing properties of an associated partitioned table. The correlation enables you to specify which table partitions are indexed. You can turn indexing on or off for the individual partitions of a table. A partial local index does not have usable index partitions for all table partitions that have indexing turned off. A global index, whether partitioned or not, excludes the data from all partitions that have indexing turned off. The database does not support partial indexes for indexes that enforce unique constraints.

Restrictions on Partial Indexes:

The underlying table of a partial index cannot be a non-partitioned table.
Unique indexes cannot be partial indexes. This applies to indexes created with the CREATE UNIQUE INDEX statement and indexes that are implicitly created when you specify a unique constraint on one or more columns.

So maybe you wonder when this can be useful? Let's assume that you have partitioned your IDoc table EDIDC by the IDoc status (it is not practical and useable with this implementation in real the real word - it is just used for a simple illustration of the partial index feature). Nearly all of your queries (based on the status) select IDocs with status "02" (= error) and so you don't want to maintain a global or local index with all the other status values for performance (DML) reasons or just because of saving disk space.

SYS@T12DB:9> CREATE TABLE TAB_PART (MANDT VARCHAR2(3), IDOC VARCHAR2(16),              STATUS VARCHAR2(2))             INDEXING OFF             PARTITION BY LIST (STATUS)             (PARTITION STATUS_ERR_02 VALUES ('02') INDEXING ON,               PARTITION STATUS_OK_12 VALUES ('12'),              PARTITION STATUS_OK_13 VALUES ('13') INDEXING OFF,               PARTITION STATUS_OK_14 VALUES ('14'));

SYS@T12DB:9> insert into TAB_PART values ('100','0000000000000001', '02');
SYS@T12DB:9> insert into TAB_PART values ('100','0000000000000002', '12');
SYS@T12DB:9> insert into TAB_PART values ('100','0000000000000003', '13');
SYS@T12DB:9> insert into TAB_PART values ('100','0000000000000004', '14');
SYS@T12DB:9> commit;

SYS@T12DB:9> select PARTITION_NAME, HIGH_VALUE, PARTITION_POSITION, INDEXING                                from DBA_TAB_PARTITIONS                               where TABLE_NAME = 'TAB_PART';
PARTITION_NAME               HIGH_VALUE             PARTITION_POSITION INDE
-------------------- -------------------- ------------------ ----
STATUS_ERR_02               '02'                                           1 ON
STATUS_OK_12               '12'                                           2 OFF
STATUS_OK_13               '13'                                           3 OFF
STATUS_OK_14               '14'                                           4 OFF

SYS@T12DB:9> create index TAB_PART_STATUS_ALL on TAB_PART(STATUS);

SYS@T12DB:9> select INDEX_NAME, INDEXING from DBA_INDEXES                                where INDEX_NAME = 'TAB_PART_STATUS_ALL';
INDEX_NAME                           INDEXIN
------------------------------ -------
TAB_PART_STATUS_ALL                 FULL

In the example above i created a partitioned table called TAB_PART and set the default attribute to "INDEXING OFF" on table level, which means that indexing is not enabled by default for the table partitions. I explicitly set "INDEXING ON" for table partition STATUS_ERR_02, which overrides the table default and enables indexing on this partition. I also explicitly set "INDEXING OFF" for table partition STATUS_OK_13 (for demonstration purpose only), but this is not necessary at all due to the table level default.

You can still create the "old fashion full global index" on the status column, even if the new partition attribute "INDEXING" is set on table or table partition level. But let's go on and check out the new partial feature.

SYS@T12DB:9> drop index TAB_PART_STATUS_ALL;
SYS@T12DB:9> create index TAB_PART_STATUS_PART on TAB_PART(STATUS) indexing partial;

SYS@T12DB:9> select INDEX_NAME, INDEXING, PARTITIONED from DBA_INDEXES                                where INDEX_NAME = 'TAB_PART_STATUS_PART';
INDEX_NAME                           INDEXIN PAR
------------------------------ ------- ---
TAB_PART_STATUS_PART                 PARTIAL NO

Now only the data with status = '02' is globally indexed and maintained - all the other partitions (and the data) are not included in the index (cross check this with column INDEXING from the query on DBA_TAB_PARTITIONS). Here is just a short verification of that.

SYS@T12DB:9> select count(*) from TAB_PART where STATUS = '02';

SYS@T12DB:9> select count(*) from TAB_PART where STATUS = '14';

At last let's create a local partial index instead of a global partial index.

SYS@T12DB:9> drop index TAB_PART_STATUS_PART;
SYS@T12DB:9> create index TAB_PART_STATUS_PART on TAB_PART(STATUS) local indexing partial;

SYS@T12DB:9> select INDEX_NAME, INDEXING, PARTITIONED from DBA_INDEXES                                where INDEX_NAME = 'TAB_PART_STATUS_PART';
INDEX_NAME                           INDEXIN PAR
------------------------------ ------- ---
TAB_PART_STATUS_PART                 PARTIAL YES

SYS@T12DB:9> select INDEX_NAME, PARTITION_NAME, STATUS from DBA_IND_PARTITIONS                                where INDEX_NAME = 'TAB_PART_STATUS_PART';
INDEX_NAME                           PARTITION_NAME              STATUS
------------------------------ -------------------- --------
TAB_PART_STATUS_PART                 STATUS_OK_14              UNUSABLE
TAB_PART_STATUS_PART                 STATUS_OK_13              UNUSABLE
TAB_PART_STATUS_PART                 STATUS_OK_12              UNUSABLE
TAB_PART_STATUS_PART                 STATUS_ERR_02              USABLE

As you can see the index partition STATUS_ERR_02 is the only one that is useable - the other unusable partitions have no corresponding database segment and are not maintained at all.

You can also create a concatenated partial index like "(STATUS, IDOC)", which can make more sense from an application point of view, but this test case should only provide a basic demo of the partial indexing feature.

1.2.3.6 Partition Maintenance Operations on Multiple Partitions

Did you ever need to re-partition some of your SAP BI fact tables after a while and the "MAXVALUE" partition already contained several data? I guess you already know the pretty well known issues and procedures (like "SAPnote #895539 - Appending/extending partitions to an E fact table" or "SAPnote #1016184 - ORA repartitioning: Attaching partitions by re-partitioning") then. Starting with Oracle 12c you can split up one partition (like the MAXVALUE one) into several different ones without a "step by step" or "additional logic" approach. Just one SQL command for splitting up the data partition into several ones.

Let's assume you have initially created 3 partitions for your E-Fact table based on a monthly partitioning interval and have already inserted data for 5 months now.

SYS@T12DB:182> CREATE TABLE TPART_BI
(MANDT VARCHAR2(3), DATEVALUE VARCHAR2(8), TEXT VARCHAR2(20))
PARTITION BY RANGE (DATEVALUE)
(
PARTITION TPART_BI_RANGE_201301 VALUES LESS THAN ('20130201') 
TABLESPACE USERS,
PARTITION TPART_BI_RANGE_201302 VALUES LESS THAN ('20130301') 
TABLESPACE USERS,
PARTITION TPART_BI_RANGE_MAXVALUE VALUES LESS THAN (MAXVALUE) 
TABLESPACE USERS
) 
TABLESPACE USERS;

SYS@T12DB:182> INSERT INTO TPART_BI VALUES ('100','20130115', 'January');
SYS@T12DB:182> INSERT INTO TPART_BI VALUES ('100','20130215', 'February');
SYS@T12DB:182> INSERT INTO TPART_BI VALUES ('100','20130315', 'March');
SYS@T12DB:182> INSERT INTO TPART_BI VALUES ('100','20130415', 'April');
SYS@T12DB:182> INSERT INTO TPART_BI VALUES ('100','20130515', 'May');
SYS@T12DB:182> commit;

SYS@T12DB:182> select PARTITIONING_TYPE, PARTITION_COUNT                                     from DBA_PART_TABLES where TABLE_NAME = 'TPART_BI';
PARTITION PARTITION_COUNT
--------- ---------------
RANGE                              3

SYS@T12DB:182> select PARTITION_POSITION, PARTITION_NAME, HIGH_VALUE                                    from DBA_TAB_PARTITIONS where TABLE_NAME = 'TPART_BI';
PARTITION_POSITION PARTITION_NAME               HIGH_VALUE
------------------ ------------------------- ---------------                     1 TPART_BI_RANGE_201301     '20130201'                     2 TPART_BI_RANGE_201302     '20130301'                     3 TPART_BI_RANGE_MAXVALUE   MAXVALUE

Now we want to re-construct our previous partitioning logic (one partition per month). This could be done easily with one SQL command now and does not need an additional logic (like described in SAPnote #1016184) as before by splitting up the MAXVALUE partition into several ones.

SYS@T12DB:182> ALTER TABLE TPART_BI SPLIT PARTITION TPART_BI_RANGE_MAXVALUE INTO
( PARTITION TPART_BI_RANGE_201303 VALUES LESS THAN ('20130401'),  PARTITION TPART_BI_RANGE_201304 VALUES LESS THAN ('20130501'),  PARTITION TPART_BI_RANGE_MAXVALUE
);

SYS@T12DB:182> select PARTITIONING_TYPE, PARTITION_COUNT                                     from DBA_PART_TABLES where TABLE_NAME = 'TPART_BI';
PARTITION PARTITION_COUNT
--------- ---------------
RANGE                              5

SYS@T12DB:182> select PARTITION_POSITION, PARTITION_NAME, HIGH_VALUE                                    from DBA_TAB_PARTITIONS where TABLE_NAME = 'TPART_BI';
------------------ ------------------------- ---------------                     1 TPART_BI_RANGE_201301     '20130201'                     2 TPART_BI_RANGE_201302     '20130301'                     3 TPART_BI_RANGE_201303     '20130401'                     4 TPART_BI_RANGE_201304     '20130501'                     5 TPART_BI_RANGE_MAXVALUE   MAXVALUE

Re-partitioning finished with one SQL command and without any additional logic for multiple partitions.

1.2.4 Performance With Zero Effort (Full)

... stay tuned - more to come

1.3.3 Information Lifecycle Management (Full)

... stay tuned - more to come

1.4.1 Database Consolidation (Full)

1.4.1.1 Integrate With Operating System Processor Groups

Some of my clients are running a highly consolidated Oracle database infrastructure (like on VMware, Solaris containers or IBM pSeries). One of the challenges is to "isolate" each database instance as far as possible, so that a "going crazy" database can not harm the other ones. Mostly this is done by (OS) resource managers like WLM or Solaris resource management. Now you can limit and bind the database instance to a corresponding CPU pool set on Linux or Solaris by using OS (kernel) features very easily and it is fully integrated.

This feature allows the DBA to specify a parameter, PROCESSOR_GROUP_NAME, to bind the database instance to a named subset of the CPUs on the server. On Linux, the named subset of CPUs can be created using a Linux feature called control groups (cgroups). On Solaris, the named subset of CPUs can be created using a Solaris feature called resource pools

The following demo is run on an Oracle Enterprise Linux 6.4 (2.6.39-400.109.1.el6uek.x86_64) with cgroups and 3 vCPUs (Intel Core i7-2675QM CPU @ 2.20GHz) on Oracle Virtual Box.

--- The following PL/SQL is run in 3 different sessions / dedicated server processes to use up all vCPUs
SYS@T12DB:182> declare  x number; 
begin   loop     x := x+1;  end loop;
end;
/

The result of this "CPU burning" PL/SQL procedure is represented in the following graphic (without setting init parameter PROCESSOR_GROUP_NAME).

Now let's create and assign a cgroup (cpu set) to the Oracle instance. Check the previously mentioned PDF or the kernel documentation for details about cgroups and how to configure them.

-- My custom defined cgroup called oracle is allowed to use only 1 vCPU
shell> cat /cgroup/oracle/cpuset.cpus 
1
shell> cat /cgroup/oracle/cpuset.mems
0
-- Both parameters are important for dynamically adding the Oracle PIDs to /cgroup/oracle/tasks at startup
-- Otherwise you will get an unnoticed error like this (traced with strace by Oracle instance startup):
-- open("/cgroup/oracle/tasks", O_WRONLY|O_CREAT|O_APPEND, 0666) = 25
-- write(25, "2260\n", 5)                  = -1 ENOSPC (No space left on device)


SQL> alter system set PROCESSOR_GROUP_NAME = 'oracle' scope=spfile sid='*';
SQL> startup force;

shell> ps -fu oracle
...
oracle    2868     1  2 14:21 ?        00:00:00 oracleT12DB (LOCAL=NO)
...
shell> cat /proc/2868/cpuset
/oracle
-- The dedicated Oracle shadow processes are running in the corresponding croup

The result of the previous "CPU burning" PL/SQL procedure is represented in the following graphic (with setting init parameter PROCESSOR_GROUP_NAME to "oracle").

The Oracle database/instance T12DB is restricted to and using 1 vCPU only (even if the PL/SQL procedure is run in multiple session like before), but what if we want to adjust these limits on "on-the-fly". No problem at all, because of the cgroups are "dynamic". Let's verify this at last.

shell> echo 0-1 >  /cgroup/oracle/cpuset.cpus

Works as designed - a really cool and dynamic feature for (highly) consolidated Oracle database environments.

...stay tuned - more to come for the other sub topics

1.4.3.1 Cloning a Database

... stay tuned - more to come

1.5.4.1 Oracle ASM Disk Scrubbing

... stay tuned - more to come

1.5.5.3 Invisible Columns

With Oracle 12c you can make individual table columns invisible. Any generic access of a table does not show the invisible columns in the table.

For example, the following operations do not display invisible columns in the output:

SELECT * FROM statements in SQL
DESCRIBE commands in SQL*Plus
%ROWTYPE attribute declarations in PL/SQL
Describes in Oracle Call Interface (OCI)

You can use a SELECT statement to display output for an invisible column only if you explicitly specify the invisible column in the column list. Similarly, you can insert a value into an invisible column only if you explicitly specify the invisible column in the column list for the INSERT statement. If you omit the column list in the INSERT statement, then the statement can only insert values into visible columns. You can make a column invisible during table creation or when you add a column to a table, and you can later alter the table to make the same column visible. You can also alter a table to make a visible column invisible. Virtual columns can be invisible. Also, you can use an invisible column as a partitioning key during table creation. (The last statement can enhance the custom partitioning of tables in a SAP environment drastically).

The following restrictions apply to invisible columns:

The following types of tables cannot have invisible columns:
- External tables
- Cluster tables
- Temporary tables
Attributes of user-defined types cannot be invisible.

Let's do a short simple demo of such invisible columns with some SAP reference (especially for the new custom partitioning possibility based on virtual and invisible columns).

SYS@T12DB:184> create table TAB1 (JINUM VARCHAR2(10),                                     TEXT VARCHAR2(40), JINUM2 NUMBER INVISIBLE);

SYS@T12DB:184> desc TAB1;
 Name            Null?    Type
 -------- -------- ----------------
 JINUM                       VARCHAR2(10)
 TEXT                       VARCHAR2(40)

SYS@T12DB:184> INSERT INTO TAB1 VALUES ('0000000010', 'TEST', 10);
ORA-00913: too many values

SYS@T12DB:184> INSERT INTO TAB1 VALUES ('0000000010', 'TEST');
1 row created.

SYS@T12DB:184> select * from TAB1;
JINUM             TEXT
---------- ----------------------------------------
0000000010 TEST

SYS@T12DB:184> select JINUM, TEXT, JINUM2 from TAB1;
JINUM             TEXT                                                   JINUM2
---------- ---------------------------------------- ----------
0000000010 TEST

SYS@T12DB:184> INSERT INTO TAB1(JINUM,TEXT,JINUM2) VALUES ('0000000010', 'TEST', 10);
1 row created.

SYS@T12DB:184> select * from TAB1;
JINUM             TEXT
---------- ----------------------------------------
0000000010 TEST
0000000010 TEST

SYS@T12DB:184> select JINUM, TEXT, JINUM2 from TAB1;
JINUM             TEXT                                                   JINUM2
---------- ---------------------------------------- ----------
0000000010 TEST
0000000010 TEST                                                       10

So you can see that an invisible column is completely hidden from the application layer (except from explicitly using it in SQL statements).

I have previously mentioned something about enhancing the custom partitioning in SAP environments. SAP has written its own partitioning framework called "SAP Partitioning Engine for Oracle", because of some Oracle features like interval partitioning were not usable due to the used data types in a SAP environment (like VARCHAR2 for numeric values only). The other partitioning possibilities need some "manual maintenance" to keep the partitions actual and fitting and so SAP needed to implement an "application based solution".

However by using invisible columns (in combination with virtual columns) you can remove such interval partitioning restrictions without thinking about already existing SQL statements or harming the data integrity. Here is just a tiny example of this (you can adapt this to your own business cases like on JITIT, BSIS or whatever).

SYS@T12DB:184> CREATE TABLE TAB1_PART (JINUM VARCHAR2(10),               TEXT VARCHAR2(40), JINUM2 NUMBER INVISIBLE AS (TO_NUMBER(JINUM)) VIRTUAL)                PARTITION BY RANGE(JINUM2)               INTERVAL (5000)               ( PARTITION P5000 VALUES LESS THAN (5000),                 PARTITION P10000 VALUES LESS THAN (10000),                  PARTITION P15000 VALUES LESS THAN (15000)               );

SYS@T12DB:184> desc TAB1_PART;
 Name            Null?    Type
 -------- -------- ----------------
 JINUM                       VARCHAR2(10)
 TEXT                       VARCHAR2(40)

SYS@T12DB:184> select TABLE_NAME, PARTITION_POSITION, PARTITION_NAME, HIGH_VALUE                                  from DBA_TAB_PARTITIONS where TABLE_NAME='TAB1_PART';
TABLE_NAME               PARTITION_POSITION PARTITION_NAME               HIGH_VALUE
-------------------- ------------------ -------------------- --------------------
TAB1_PART                                    1 P5000                         5000
TAB1_PART                                    2 P10000                         10000
TAB1_PART                                    3 P15000                         15000

SYS@T12DB:184> INSERT INTO TAB1_PART VALUES ('0000000010', 'TEST');
SYS@T12DB:184> INSERT INTO TAB1_PART VALUES ('0000006000', 'TEST');
SYS@T12DB:184> INSERT INTO TAB1_PART VALUES ('0000012000', 'TEST');

SYS@T12DB:184> select * from TAB1_PART where JINUM = '0000000010';

The optimizer is not clever enough to rewrite the query on partition level in such cases (partition pruning works only on column JINUM2) - so no performance improvement here, but you can manage your large data on partition level much better (= same target by using the table limited SAP Partitioning Engine).

At last let's add data, that does not fit in the currently created partitions.

SYS@T12DB:184> INSERT INTO TAB1_PART VALUES ('0000016000', 'TEST');

SYS@T12DB:184> select TABLE_NAME, PARTITION_POSITION, PARTITION_NAME, HIGH_VALUE                                     from DBA_TAB_PARTITIONS where TABLE_NAME='TAB1_PART';
TABLE_NAME               PARTITION_POSITION PARTITION_NAME               HIGH_VALUE
-------------------- ------------------ -------------------- --------------------
TAB1_PART                                    1 P5000                         5000
TAB1_PART                                    2 P10000                         10000
TAB1_PART                                    3 P15000                         15000
TAB1_PART                                    4 SYS_P461               20000

Partition management works perfectly and the application (SQL) does not need to be rewritten by using virtual hidden columns in such "hidden partitioning scenarios". The possible performance impact by such implementations need to be tested first of course, but you have a valid possibility in SAP environments now.

1.5.5.5 Metadata-Only DEFAULT Column Values for NULL Columns

... stay tuned - more to come

1.5.5.6 Move a Data File Online

Did you ever need to migrate a large Oracle database from an old storage subsystem to a new one (or ASM) with a nearly zero downtime requirement and the storage infrastructure was not virtualized to move it "on-the-fly"? Did you ever need to redesign the database file system layout on OS with a nearly zero downtime requirement? I guess every Oracle DBA got such requirements in the pasts and usually used Oracle Data Guard for that tasks. Starting with Oracle 12c it is getting much easier to move your database around with nearly zero downtime.

A data file can now be moved online while it is open and being accessed. Being able to move a data file online means that many maintenance operations, such as moving data to another storage device or moving databases into Oracle Automatic Storage Management (Oracle ASM), can be performed while users are accessing the system.

By default, when you run the ALTER DATABASE MOVE DATAFILE statement and specify a new location for a data file, the statement moves the data file. However, you can specify the KEEP option to retain the data file in the old location and copy it to the new location. In this case, the database only uses the data file in the new location when the statement completes successfully.

When you rename or relocate a data file with ALTER DATABASE MOVE DATAFILE statement, Oracle Database creates a copy of the data file when it is performing the operation. Ensure that there is adequate disk space for the original data file and the copy during the operation.

SYS@T12DB:18> select NAME, STATUS, ENABLED from V$DATAFILE where TS# = 4;
NAME                                                   STATUS  ENABLED
---------------------------------------- ------- ----------
/oracle/T12DB/oradata/users01.dbf           ONLINE  READ WRITE

SYS@T12DB:18> ALTER DATABASE MOVE DATAFILE '/oracle/T12DB/oradata/users01.dbf'                                  TO '/oracle/T12DB/oradata/users02.dbf';

SYS@T12DB:18> select NAME, STATUS, ENABLED from V$DATAFILE where TS# = 4;
NAME                                                   STATUS  ENABLED
---------------------------------------- ------- ----------
/oracle/T12DB/oradata/users02.dbf           ONLINE  READ WRITE


shell> ls -la /oracle/T12DB/oradata/user*
-rw-r----- 1 oracle dba 5251072 Jul  4 18:08 /oracle/T12DB/oradata/users02.dbf

1.5.5.9 Single Command REDEF_TABLE to Redefine Table or Partition

... stay tuned - more to come

1.5.6.1 Advanced Data Guard Broker Manageability

... stay tuned - more to come

1.5.6.4 Single Command Role Transitions

... stay tuned - more to come

1.5.6.11 Active Data Guard Far Sync

... stay tuned - more to come

1.5.9.2 Cross-Platform Backup and Restore

... stay tuned - more to come

1.5.9.6 Network-Enabled RESTORE

... stay tuned - more to come

1.6.1.3 Real-Time Database Operations Monitoring

... stay tuned - more to come

1.7.1.1 Oracle Flex ASM

... stay tuned - more to come

1.8.1 Database Performance Enhancements (Full)

... stay tuned - more to come

1.8.2.2 Tracking I/O Outliers

Starting with version 12c Oracle automatically tracks long time I/O requests and populates it in 3 different views (V$IO_OUTLIER, V$KERNEL_IO_OUTLIER, V$LGWRIO_OUTLIER). Long time I/O requests are defined as requests that take more than 500 ms. So this is usually nothing for I/O variability in the milliseconds area, but it will help a lot by troubleshooting serious I/O issues. You still have to use tools like dtrace/strace/truss or doing a SQL trace with wait event analysis, if you need to track down "low latency" I/O variability. You already got some basic information about such I/O outliers in the LGWR trace files (for example) in pre-Oracle 12c times, but it included the timestamp and I/O duration only.

One of these three mentioned views is currently useful on Solaris only - so please check the details here:

V$IO_OUTLIER (X$KSFDSTLL WHERE IO_COMPONENT_ID != 2) - Supported on all OS platforms
V$LGWRIO_OUTLIER (X$KSFDSTLL WHERE IO_COMPONENT_ID = 2) - Supported on all OS platforms
V$KERNEL_IO_OUTLIER (X$KSFDKLL) - Available on all OS platforms, but populated only on Solaris

The following demo was performed on Solaris 11.1 (x86_64) on Oracle Virtual Box and the database was stored in a ZFS pool / file system. I used Solaris for this test, so that i am able to demonstrate all of these views. Unfortunately the view V$KERNEL_IO_OUTLIER was not working as expected and populating no values at all (even not on Solaris). I stressed my I/O "sub system" with a RMAN VALIDATE procedure to delay the usual database I/Os accordingly.

shell> uname -a 
SunOS SOL 5.11 11.1 i86pc i386 i86pc

shell> rman target /
RMAN> VALIDATE DATABASE;

SYS@S12DB:23> select * from V$IO_OUTLIER;

SYS@S12DB:23> select * from V$LGWRIO_OUTLIER;

SYS@S12DB:23> select * from V$KERNEL_IO_OUTLIER;
no rows selected

But wait - what's wrong here - why is the view V$KERNEL_IO_OUTLIER empty and not populated on Solaris as described? It seems to be reasonable, that this feature is based on DTrace probes as it is currently available on Solaris (hopefully it will be ported to OEL as well like DTrace) only. So let's verify DTrace for the oracle (instance/database) OS user.

-- Just for documentation how the oracle user was created in the past
shell> usermod -K defaultpriv=basic,dtrace_proc,dtrace_user,dtrace_kernel oracle

-- DTrace test
shell> id -a
uid=100(oracle) gid=100(dba) groups=100(dba),101(oper)

shell> /usr/sbin/dtrace -n 'syscall::exece:return'
dtrace: description 'syscall::exece:return' matched 1 probe
 CPU     ID                    FUNCTION:NAME   0    113                     exece:return

Looks good as DTrace is working in general for the Oracle user. I have started a discussion about this behavior in the comment section of a blog post called "12c I/O debug" by Jonathan Lewis .. hopefully we will get some interesting insights into this in the near future.

1.9.2.6 Last Login Time Information

In the past years i sometimes needed to know when a user was created, a password was changed, a user was locked or when the user logged on the last time. The information to the first three demands was already available in previous Oracle releases (column CTIME, PTIME and LTIME in table USER$), but for the fourth one you needed to create a logon trigger or enable auditing on such actions. Such changes can be very time-consuming in some environments (ITIL, etc.) and even you are able to do it immediately, you don't have a valid conclusion from the beginning. Starting with Oracle 12c such tracking is "built-in" and can be evaluated at any time as well.

This new information is stored in column SPARE6 in table USER$. Be aware of the time zone - something like this "alter session set TIME_ZONE='+02:00'" is done by SQL*Plus "on-the-fly" by logon.

The last login time for database users is recorded in the USER$ table and displayed when connecting to the database using Oracle SQL*Plus.

The last login time for non-SYS users is displayed when you log on by SQL*Plus. This feature is on by default.

SYS@T12DB:24> create user TESTME identified by TESTME DEFAULT TABLESPACE USERS;
SYS@T12DB:24> grant connect to TESTME;

SYS@T12DB:24> select CTIME, PTIME, LTIME, SPARE6 from USER$ where NAME = 'TESTME';
CTIME                        PTIME                    LTIME                        SPARE6
------------------- ------------------- ------------------- -------------------
04.07.2013 16:55:39 04.07.2013 16:55:39                         04.07.2013 15:23:00


shell> sqlplus TESTME/TESTME@T12DB…
Last Successful login time: Thu Jul 04 2013 17:23:00 +02:00…

Summary

If you have any further questions - please feel free to ask or get in contact directly, if you need assistance by implementing complex Oracle database landscapes, new Oracle database releases or by troubleshooting Oracle (performance) issues.

↧

After the upgrade oracle db from 10G to 11G cannot start SAP WAS ABAP

July 19, 2013, 7:59 am

Latest and popular articles on SAP ERP

≫ Next: SSFS (Secure Storage in File System) configuration in Oracle db for SAP

≪ Previous: [Oracle] Summary - Exploring Oracle 12c (R1) - Is the sky really cloudy or rather shiny?

Hello thisismy first contributionin this groupis quite simplebut hadcomplicatedfor several hoursto a colleaguebasis. after theend oftheupgradefrom 10G to11Goraclewasimpossible to start theSAPsystem. The firstreviewto performandgivelight onthe error.the executeR3trans-d showedthe following:

The error ORA-27101indicating thatthe databasewas down. but it reallywas up

After reviewingthe systemI foundthe causeof the problem.thelistenerstillmaintained theancientoracle10Genginesettings (ORACLE_HOME):

the problem was solved change configuration listener and restart:

Then R3trans -d are OK and sap start without problems.

↧