LDI Docs – 4 Locking and Performance

4. Locking and Performance

4.1 Locking used by Lucene Domain Index

Operation Base Table (row/table) Index Table (SCHEMA.IDX$T) Queue Table (SCHEMA.IDX$QT)
Insert X/RX (1) NONE NONE
Update X/RX NONE NONE
Delete X/RX NONE NONE
Manually Sync X/RS (2) X/T|X/RX (3) DBMS_AQ.BLOCKED (4)
Automatically Sync X/RS (2) X/T|X/RX (3) DBMS_AQ.BLOCKED (4)
Optimize NONE X/T|X/RX (3) NONE
  1. X = Row exclusive lock at the row being inserted, RX = Table row exclusive lock – if index parameter LockMasterTable=true.
  2. X = Row exclusive lock at the row being indexed, RS = Table row share lock. A select … for update no wait is performed at all rows being added to Lucene Index.
  3. X/T this is a writer lock semaphore of Lucene Index and provide serialize write operations, the write lock is performed using lock table $SCHEMA.IDX$T. X/RX is performed at many rows of this table because Lucene creates and deletes many files.
  4. To perform massive dequeue operations at DBMS AQ queue Sync scan this queue with DBMS_AQ.BLOCKED option.
    1. 4.2 Performance tips

      4.2.1 Index Writer parameters

      Lucene Index Writer class uses several parameters to control his index structure. Lucene Domain Index pass to Index Writer several parameters such as MergeFactor, MaxBufferedDocs among others. As best practice if you want to index thousands of rows you can override default Lucene parameters for other which speed up indexing time. With create index or alter index rebuild you can set MergeFactor to 100 and MaxBufferedDocs to 4000. This parameters increase index performance but then DML operations at the base table will batch small set of rows, so after DDL commands change MergeFactor to 2 and MaxBufferedDocs to 100. A good place to start knowing these parameters behavior is the Wiki page Improving Indexing Speed.

      4.2.2 Auto Tune Memory functionality

      Lucene Domain Index have a parameter called AutoTuneMemory a true value means that for Index Writer operations it will try to use up to 50% of the Java Pool Size configured at the Oracle SGA to adjust how many documents are buffered (MaxBufferedDocs) before call IndexWritter.flush().
      With AutoTuneMemory:true MergeFactor,MaxBufferedDocs,MaxMergeDocs are not required, its calculated using free RAM at the SGA, but you has to set MergeFactor. Due Java Pool Size is global parameter the rule is not valid if you want to create many index with parallel connexions, two connections will try to use 50% of the SGA, so one of them will ran out of memory.

      4.2.3 Keep Index on RAM

      OJVMDirectory replaces Lucene file system storage by a table storage with BLOBs. For every Lucene Domain Index created there is a new table which stores every Lucene file as a row with a BLOB column, see section 6 for more detail, using similar strategy as Oracle Text you can keep this table in RAM. Unlike Oracle Text which uses multiples tables for storing the inverted index, Lucene Domain Index use one table, execute this DDL command to keep Lucene Index on RAM:

      create index source_small_lidx on test_source_small(text)
      indextype is lucene.LuceneIndex parameters(
        'FormatCols:line(0000); ExtraCols:line "line"; Analyzer:org.apache.lucene.analysis.StopAnalyzer; MergeFactor:500');
      alter index source_small_lidx parameters('MergeFactor:100');
      alter table source_small_lidx$t storage (buffer_pool keep) modify lob (data) (storage (buffer_pool keep));
      

      During Index creation use AutoTuneMemory:true (default value). Finally change OJVMDirectory storage table and LOB to keep them in RAM. Be sure that your SGA has a enough RAM to keep it. To know how big your index you can query the table:

      SQL> select sum(file_size) from source_small_lidx$t where deleted='N';
      SUM(FILE_SIZE)
      --------------
              147444
      

      Finally as Tom Kyte say, tkprof, tkprof, …. ;) . You can see Lucene Domain Index IO operations with an “alter session set events ’10046 trace name context forever, level 12′; then you can find operations at Lucene Domain Index table SCHEMA.IDX_NAME$T. Using TKPROF information you can alter table and lob storage parameters manually.

      4.2.4 Compare your execution plan

      To be sure that your Lucene Domain Index is properly used compare your executions plans and try to avoid non necessary filter by or sort order by predicates by using in-line sort or multiples field Query Parser conditions. Here examples of sorting using emails table created in section 3.1.4:

      SQL> explain plan for
        2  SELECT subject FROM emails where lcontains(bodytext,'security',1)>0
        3  order by subject ASC;
      

      PLAN_TABLE_OUTPUT
      -----------------------------------------------------------------------------------
      Plan hash value: 1542204867
      Id   Operation                    Name           Rows  Bytes  Cost (%CPU)  Time
      0    SELECT STATEMENT                            1     4016   3 (34)       00:00:01
      1    SORT ORDER BY                               1     4016   3 (34)       00:00:01
      2    TABLE ACCESS BY INDEX ROWID  EMAILS         1     4016   2  (0)       00:00:01
      * 3  DOMAIN INDEX                 EMAILBODYTEXT
      
      Predicate Information (identified by operation id):
      -----------------------------------------------------------------------------------
      3 - access("LUCENE"."LCONTAINS"("BODYTEXT",'security',1)>0)
      

      Above execution plan tells that you are using Lucene Domain Index but you can get a better optimizer plan by using lcontains sort:

      SQL> explain plan for
        2  SELECT /*+ DOMAIN_INDEX_SORT */ subject FROM emails
        3  where lcontains(bodytext,'security','subject:ASC',1)>0;
      

      PLAN_TABLE_OUTPUT
      -----------------------------------------------------------------------------------
      Plan hash value: 1450245214
      Id   Operation                    Name           Rows  Bytes  Cost (%CPU)  Time
      0    SELECT STATEMENT                            1     4016   2 (0)        00:00:01
      1    TABLE ACCESS BY INDEX ROWID  EMAILS         1     4016   2 (0)        00:00:01
      * 2  DOMAIN INDEX                 EMAILBODYTEXT
      
      Predicate Information (identified by operation id):
      -----------------------------------------------------------------------------------
      2 - access("LUCENE"."LCONTAINS"("BODYTEXT",'security','subject:ASC',1)>0)
      

      Here we have a better optimizer plan and lower cost.

      4.2.5 Filtering and sorting at index level

      This functionality only available on Oracle 11g is valid for Lucene Domain Index in 10g/11g databases and also for standard edition version. The performance improvement is done when most of the rows can be filtered and sorted at index level, to do that you have to push the value of the column(s) involved in filter by or order by at index level during index creation. The syntax differs from Oracle Text but the performance improve is similar, let see an example:

      -- Oracle Text 11g syntax
      create index source_big_idx on test_source_big(text) indextype is ctxsys.context
        filter by line
        order by line;
      -- Lucene Domain Index syntax
      create index source_big_lidx on test_source_big(text) indextype is lucene.luceneindex parameters(
        'PerFieldAnalyzer:line(org.apache.lucene.analysis.KeywordAnalyzer),TEXT(org.apache.lucene.analysis.SimpleAnalyzer);
        FormatCols:line(0000);
        ExtraCols:line "line"');
      

      Note that in both cases we choose line as the filter/order by column. Now let see the execution plan and auto trace for an equivalent query using 11g syntax and Lucene Domain Index syntax.

      To see the real impact on the performance of using a filter by at index level let see the time involved of an equivalent query:

      select count(line) from test_source_big
        where lcontains(text,'varchar2 AND line:[2600 TO 9000]')>0;
        2
      COUNT(LINE)
      -----------
      587
      Elapsed: 00:00:00.03
      
      select count(line) from test_source_big
        where lcontains(text,'varchar2')>0 and line>=2600;
        2
      COUNT(LINE)
      -----------
      587
      Elapsed: 00:00:00.89
      

      The point here is that into the first example Lucene Domain Index performs the two operations:

      1. find all the rows which contains the word varchar2
      2. filter the rows that only have line in a range 2600 to 9000

      returning only the rows (587) that match both sentence, for the second example the RDBMS:

      1. look for the rowid that contains the word varchar2 (19963),
      2. visit above rows looking for the value of the column line and filter all that are >=2600

      the difference between the rows visited by the RDBMS is the difference on the performance.

      4.2.6 OnLine mode, ParallelDegree and IndexOnRam

      Starting with 2.9.2.1.1 and 3.0.1.1.0 version inserts are performed in parallel if ParallelDegree is greater than 1 and SyncMode:OnLine, in that case an AQ slave process will create temporary Lucene index adding the rows being indexed, this index is created in RAM if IndexOnRam:true or in disk otherwise, once the index contains the batch of rows added the temporary slave index is merged with the master storage. This speed up massive index additions such as index rebuild, index creation or insert .. into .. select .. from DML operations. Parallel index operations are important in servers which have multiples cores or RAC installations because Oracle AQ starts parallel process doing the job, an SQL trace of WikiPedia dump indexing shows that mostly of time is involved in the scan of table loading the data which is indexed, so using parallel indexing increase the throughput of Lucene Domain Index in multi-core chips.
      Oracle 11g AQ implementation checks how many milliseconds are consumed by an AQ Callback, so choosing a BatchCount too bigger causes that no other slave process is started by Oracle AQ engine, the experience with BatchCount values in a range starting with 100 to 500 is good value to guaranty a correct parallel operation. Following screenshot shows multiple AQ process indexing WikiPedia dump:

      4.2.7 Parallel Shared Slave Index Scan (available since 3.0.2.1.0)

      Starting with 3.0.2.1.0 version a parallel shared slave index scan process is started automatically when your DB start, this process is accepting RMI connections from the other Oracle internal process, that is, once a connection from a client is accepted by the RDBMS a dedicated or shared server is started to performs the SQL operations, this process have the internal OJVM associated to execute the LDI operations. Previous to 3.0.2.1.0 each OJVM have his internal Lucene structures to query the inverted index, due each OJVM is isolated from another if two concurrent connections executes the same SQL operations on LDI each process will load the inverted index structure on RAM and performs the Hit collector operation. The new process now do the same operations but is shared by all the OJVM process which connect to him using RMI, the consequence is that only the first query will load the inverted index structure on RAM, next queries coming from the same or different OJVM process will re-use these structures on RAM. This architectural change increase a lot the Cache Hit rate decreasing the time to performs lcontains operations and reducing the RAM usage by LDI. The screenshot below shows this concept in action:

      There are several process named oracletest (LOCAL=NO) these are dedicated process associated to each client connections, there is one process named ora_j009_test, this process is the slave shared server which is performing the search operations on Lucene Index on behalf of the others. The parallel slave search process is started and stopped automatically by two instance triggers registered at SYS schema, these triggers are after startup and before shutdown events. If you disable these triggers Lucene Domain Index back to previous functionality which means every OJVM process has his own memory structures and do the index scan without dispatching RMI calls.

      4.3 Know caveats

      1. Lucene Domain Index uses Java Util Logging API and RMI to connect to the search process, it means that a grant is required to create and operate on LDI, for example:
        grant LUCENEUSER to scott;
        
      2. SyncMode:OnLine should be reserved only for index which a number of update/insert/delete operation are too small compared to select operations, because each message process requires almost open an IndexWriter/IndexReader on the associated Lucene Index by a background process, except for bulk collect operation or “insert into … select … from” which are processed in batch off 150 rows. Tables with many insert/update operations by seconds should use LuceneDomainIndex.sync(idx) procedure called by DBMS_JOB periodically or by the application.
      3. Syntax for Inline pagination is only supported at the beginning of the Query, it means that if you want to perform pagination using lcontains() query syntax it must start with “rownum:[n TO m] AND” note that this syntax is case sensitive. Also this extraction is performed by splitting the query by position and does not take into account grouping operator, so this query “rownum:[1 TO 10] AND word1 OR word2″ will be passed to Lucene’s Query Parser as “word1 OR word2″ which is not semantically the original one if you look to the precedence operator. We can try to modify Query Parser class in a future to solve this semantic issues.
      4. Columns name are case sensitive in ExtraCols and FormatCols parameters using traditional SQL behavior, it means that for this DDL index creation:
        create index it1 on t1(f2)
          indextype is lucene.LuceneIndex
            parameters('Stemmer:English;FormatCols:F2(zzzzzzzzzzzzzzz),F3(00.00);ExtraCols:F3');
        

        You can use ExtraCols with f3 or F3 but FormatCols should be F3 because f3 is returned by the SQL select operation as F3 during the table full scan, also Lucene Index will have a document with a Field F3 instead of f3. If you want to use f3 as is you can re-write DDL index creation with:
        create index it1 on t1(f2)
          indextype is lucene.LuceneIndex
            parameters('Stemmer:English;FormatCols:F2(zzzzzzzzzzzzzzz),f3(00.00);ExtraCols:F3 "f3"');
        

        With this sentence Lucene will create documents with two field F2 and f3, F2 is uppercase because is the master column of the index and his passed as “F2″ by ODCI API but, due is the default Field of the query, you can omit his name at lcontains syntax, F3 now is lowercase and will be indexed as a Field “f3″.
      5. Index parameters are pre-cached in memory for faster response. Due isolation behaviour of Oracle JVM sessions, if you call to alter index or re-create a new one in another session you need to close all SQL session that have a pre-load index parameter storage.
        Calling to LuceneDomainIndex.getParameter(‘owner.index_name’,'parameter_name’) you can see the values of any parameter passed to the ODCI API either by calling create index or alter index. Otherwise you can call to LuceneDomainIndex.refreshParameterCache stored procedure.
      6. If you re-install Lucene Domain Index without deleting existing indexes you can manually drop resources associated to and old index. For example:
        drop index source_big_lidx force;
        Index dropped.
        select table_name from tabs;
        
        TABLE_NAME
        ------------------------------
        DEPT
        EMP
        BONUS
        SALGRADE
        SOURCE_BIG_LIDX$QT
        DR$SOURCE_BIG_IDX$I
        DR$SOURCE_BIG_IDX$R
        SOURCE_BIG_LIDX$T
        TEST_SOURCE_BIG
        DR$SOURCE_BIG_IDX$N
        DR$SOURCE_BIG_IDX$K
        
        11 rows selected.
        
        drop table SOURCE_BIG_LIDX$T;
        Table dropped.
        
        conn / as sysdba
        connected.
        
        exec DBMS_AQADM.STOP_QUEUE ('SCOTT.SOURCE_BIG_LIDX$Q');
        PL/SQL procedure successfully completed.
        
        exec DBMS_AQADM.DROP_QUEUE ('SCOTT.SOURCE_BIG_LIDX$Q');
        PL/SQL procedure successfully completed.
        
        exec DBMS_AQADM.DROP_QUEUE_TABLE(queue_table  => 'SCOTT.SOURCE_BIG_LIDX$QT', force=>true);
        PL/SQL procedure successfully completed.
        
        exit
        

        Note that “drop index … force” will de-register Lucene Domain Index from Oracle’s system views, then Lucene Domain Index storage’s table is manually dropped, finally connected as SYS Lucene Domain Index AQ’s table is dropped.
      7. Oracle 11g have a know bug “6445561 – ORA-00600 [26599] [62] DUE TO INCORRECT PERSISTENCE OF BY INVOKER PIN” please apply patch number p6445561_111060_LINUX.zip available at Metalink, this bug affects select count(*) with a large results.
      8. Up to Lucene Domain Index 2.9.0 there is known problem with the WhereCondition parameter using OR SQL operator, see section A.3.3 to see the workaround.

      Doc Links

      Previous / LDI Docs – 3 Procedures, Functions, Operators and Examples
      Next / LDI Docs – Appendix A (Parameter reference and syntax)

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.