<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Lucene Domain Index</title>
	<atom:link href="http://ludoix.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://ludoix.wordpress.com</link>
	<description>Just another text search implementation</description>
	<lastBuildDate>Tue, 24 Jan 2012 22:56:24 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='ludoix.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Lucene Domain Index</title>
		<link>http://ludoix.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://ludoix.wordpress.com/osd.xml" title="Lucene Domain Index" />
	<atom:link rel='hub' href='http://ludoix.wordpress.com/?pushpress=hub'/>
		<item>
		<title>LDI Docs – Appendix E (Project Change Log)</title>
		<link>http://ludoix.wordpress.com/2011/03/11/ldi-docs-%e2%80%93-appendix-e-project-change-log/</link>
		<comments>http://ludoix.wordpress.com/2011/03/11/ldi-docs-%e2%80%93-appendix-e-project-change-log/#comments</comments>
		<pubDate>Fri, 11 Mar 2011 17:44:49 +0000</pubDate>
		<dc:creator>ludoix</dc:creator>
				<category><![CDATA[Documentation]]></category>

		<guid isPermaLink="false">http://ludoix.wordpress.com/?p=79</guid>
		<description><![CDATA[E Project Change Log 3.5.0.1.0 Second Release based on Lucene 3 (3.5.0, 23/Jan/12) core base Fully customizable Searcher and Updater process to enable JMX monitoring and time outs Updated to latest spellchecker interfaces Fixed missing changes to work with backported version on 10g 3.4.0.1.0 Second Release based on Lucene 3 (3.4.0, 25/Nov/11) core base Use latest merge policy implementation [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ludoix.wordpress.com&amp;blog=20873609&amp;post=79&amp;subd=ludoix&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<h2>E Project Change Log</h2>
<p><strong>3.5.0.1.0 Second Release based on <em>Lucene</em> 3 (3.5.0, 23/Jan/12) core base</strong></p>
<ul>
<li>Fully customizable Searcher and Updater process to enable JMX monitoring and time outs</li>
<li>Updated to latest spellchecker interfaces</li>
<li>Fixed missing changes to work with backported version on 10g</li>
</ul>
<p><strong>3.4.0.1.0 Second Release based on <em>Lucene</em> 3 (3.4.0, 25/Nov/11) core base</strong></p>
<ul>
<li>Use latest merge policy implementation <em>TieredMergePolicy</em></li>
<li>Use total RAM reported by getJavaPoolSize() when setting <em>MaxBufferedDocs</em></li>
<li>Better error reporting when an Analyzer is not found</li>
<li>Replaced execute immediate with open-fech-close functionality to avoid core dump on 10g when double check for deleted <em>rowid</em></li>
<li>Included a backported version of JUnit4 to jdk1.4 version for 10g releases</li>
<li>Added a parallel updater process, when working in OnLine mode this process do write operations on LDI structure on behalf of the AQ process</li>
<li>Delete do not longer required a write exclusive lock on index storage, now deletes are also enqueued as inserts or updates</li>
<li>Updated source to Lucene 3.4.0 code, removed some deprecated API</li>
</ul>
<p><strong>3.0.2.1.0 Initial Release based on Lucene 3 (3.0.2, 14/Sep/10) core base</strong></p>
<ul>
<li>Added a long awaited functionality, a parallel/shared/slave search process used during a start-fetch-close and CountHits function</li>
<li>Added lfreqterms ancillary operator returning the freq terms array of rows visited</li>
<li>Added lsimilarity ancillary operator returning a computed Levenshtein distance of the row visited</li>
<li>Added a ldidyoumean pipeline table function using DidYouMean.indexDictionary storage</li>
<li>Added test using SQLUnit</li>
</ul>
<p><strong>3.0.1.1.0 Initial Release based on Lucene 3 (3.0.1) core base</strong></p>
<ul>
<li>Added IndexOnRam functionality by using RAMDirectory for building intermediate index after merging a new set of rows</li>
<li>RewriteScore and SimilarityMethod can be used to get better result using wildcard operator</li>
<li>Added auto complete functionality using lautocomplete pipe-line table function</li>
<li>Removed all deprecated method usage and implementations</li>
<li>New CVS repository only compatible with 11g, 10g version is implemented by using a retro-translator</li>
</ul>
<p><strong>2.9.2.1.0 Production Release based on Lucene 2.9 (2.9.2) core base</strong></p>
<ul>
<li>Added elapsed time information when log level is INFO</li>
<li>Removed deprecated usage of LUCENE_CURRENT constant</li>
<li>Fixed facets inconsitence due ignore internal parameter ColName</li>
<li>Initial implementation of DidYouMean functionality contributed by Pedro Pinheiro</li>
<li>Temporary fix until Lucene defines clear semantics for Directory.fileLength (see Lucene issue 2316)</li>
</ul>
<p><strong>2.9.1.1.0 Production Release based on Lucene 2.9 (2.9.1) core base</strong></p>
<ul>
<li>New Lucene Core base libraries</li>
<li>Full Lucene Test Suites certified</li>
<li>Fixed bug enqueue more rowids than required when using OnLine mode and ExtraTabs, WhereCondition parameters</li>
<li>Fixed operator priority when WhereCondition have OR operator</li>
<li>DefaultUserDataStore now uses an array of cached fields to improve performance</li>
<li>Spanish Analyzer use latest ASCIIFoldingFilter</li>
<li>high_freq_terms(idx_name,term,max_num_term) pipeline table function was added to return high frequent terms and the associated docFreq value</li>
<li>index_terms(idx_name,term) pipeline table function was added to return a list of terms and their associated frequency</li>
<li>DefaultUserDataStore now have support for ANALYZED, ANALYZED_WITH_VECTORS, ANALYZED_WITH_OFFSETS, ANALYZED_WITH_POSITIONS and ANALYZED_WITH_POSITIONS_OFFSETS Lucene Field option values</li>
<li>OJVMLock was replaced by SingleInstanceLockFactory for per instance locking, cross sessions lockings are implemented by select for update functionality</li>
<li>an automatic upgrade from 2.9.0 is possible without Index deletions or rebuild, you have to execute:<br />
<pre class="brush: plain;">
ant upgrade-domain-index
ant ncomp-lucene-ojvm (10g only)
ant jit-lucene-classes (11g only)
</pre></li>
</ul>
<p><strong>2.9.0.1.0 Production release based on Lucene 2.9.0 core base, 29/Sep/09</strong></p>
<ul>
<li>Tested with Oracle 11gR2, 11gR1 and 10.2 databases</li>
<li>DefaultUserDataStore do a SAX parsing to get text nodes and attributes from an XMLType value.</li>
<li>A SimpleLRUCache is used to load rowids and his associated Lucene doc id, this reduce memory consumption when querying very big tables. A new parameters has been added, CachedRowIdSize by default 10000 to control the size of the LRU cache.</li>
<li>Lucene Domain Index core was updated to use TopFieldCollector and to avoid computation time when lscore() is not used.</li>
<li>Two new parameter has been added NormalizeScore which control when to track the Max Score and PreserveDocIdOrder when querying, both parameters are consequence of new Lucene Collector API and boost performance when querying.</li>
<li>A table alias L$MT is defined for the master table associated to the index to be used in complex queries to associate columns from master tables and columns from dependent tables</li>
</ul>
<p><strong>2.4.1.1.0 (maintenance release based on Lucene 2.4.1, 27/Mar/09)</strong></p>
<ul>
<li>Do not store internal parameters into system&#8217;s views and force to PopulateIndex:false</li>
<li>After every sync, now files marked as deleted are purged to free BLOB storage</li>
<li>Added lfacets aggregated function for doing facets</li>
<li>CountHits function no longer requires sort argument</li>
<li>Filter are stored/retrived only using QueryParser.toString() key</li>
<li>UN_TOKENIZED format string at DefaultUserDataStore class was replaced by NOT_ANALYZED or NOT_ANALYZED_STORED according to new Lucene definitions.</li>
<li>Fix bug when sync try to process more than 32767 rowids enqueued.</li>
<li>Added parameters for highlighting functions Formatter, MaxNumFragmentsRequired, FragmentSeparator and FragmentSize.</li>
<li>Added PerFieldAnalyzer parameter to use independent Analyzer for each columns.</li>
<li>Added sample of a custom Formatter org.apache.lucene.search.highlight.MyHTMLFormatter</li>
</ul>
<p><strong>2.4.1.0.0 (first release based on Lucene 2.4.1, 9/Mar/09)</strong></p>
<ul>
<li>Fix compatibility problem between 10g/11g SQL Date representation on pipeline table function.</li>
</ul>
<p><strong>2.4.0.1.0 (maintenance release based on Lucene 2.4.0, 10/Jan/09)</strong></p>
<ul>
<li>Added Rhighlight(index_name VARCHAR2, qry VARCHAR2, cols VARCHAR2, rType IN VARCHAR2, rws IN SYS_REFCURSOR) RETURN ANYDATASET pipeline table function</li>
<li>Added Phighlight(index_name VARCHAR2, qry VARCHAR2, cols VARCHAR2, stmt IN VARCHAR2) RETURN ANYDATASET pipeline table function</li>
<li>Added lhighlight(NUMBER):VARCHAR2 ancilliary operator</li>
<li>Removed usage of Lucene deprecated API (Hits and IndexWriter for example)</li>
<li>Usage of FIRST_ROWS optimizer hits to decide how many rows load at first time</li>
<li>sync, optimize and rebuild interfaces now use index_name or [owner,index_name] arguments</li>
<li>A better build system to build Lucene Domain Index from sources</li>
<li>More tests</li>
<li>Tested against 11.1.0.7 and 10.2.0.3</li>
<li>See online docs to see usage of FIRST_ROWS and lhighlight() operator</li>
</ul>
<p><strong>2.4.0.0.0 (production release based on Lucene 2.4.0, 10/10/08)</strong></p>
<ul>
<li>Added parameter for CLOB enconding</li>
<li>More Like this function</li>
<li>NGram analyzer</li>
<li>EnglishWikipediaAnalyzer</li>
<li>DataStore interface include API for setting current connection</li>
<li>Now analyzers, queries, snowball and WikiPedia contrib packages are required</li>
</ul>
<p><strong>2.3.2.0.0 (binary release based on Lucene 2.3.2, 1/Jun/08)</strong></p>
<ul>
<li>Compiled against Lucene 2.3.2 production release</li>
<li>Used latest API for merging based on RAM usage</li>
<li>Use Writer for deleting during Sync</li>
<li>Confirm 4x improvement during indexing reported by Lucene dev group</li>
<li>Fix workaround which changes order of the rowids in ODCRIDList</li>
<li>Added an Spanish WikiPedia Analyzer for testing</li>
<li>Reports IOException instead of RunTimeException to signal EOF or File Not Found</li>
<li>Decouple Flush functionality from TableIndexer</li>
</ul>
<p><strong>2.2.0.2.2 (fixpack for 2.2.0.2.0 release, 5/Apr/08)</strong></p>
<ul>
<li>Added Rowid to lucene doc id caching.</li>
<li>Usage of LoadFirstFieldSelector during Document loading to only load rowid field.</li>
<li>Added a test suite which index a wikipedia dump inside the OJVM.</li>
</ul>
<p><strong>2.2.0.2.1 (fixpack for 2.2.0.2.0 release, 12/Dec/07)</strong></p>
<ul>
<li>DefaultUserDataStore requires usage of XPath text() expresion for getting only textual value</li>
<li>Added logging info SQL being executed at table indexer</li>
<li>Change document logging to FINER level</li>
<li>More pre-defined mapping at DefaultUserDataStore for NUMBER, BINARY_FLOAT, BINARY_DOUBLE, TIMESTAMP, TIMESTAMPTZ and TIMESTAMPLTZ Oracle types.</li>
<li>New parameter PopulateIndex:[true|false] for populating or not Lucene Index at creation time.</li>
<li>New parameter IncludeMasterColumn:[true|false], to choose whether or not index master column, useful with Virtual Columns and XMLType.</li>
<li>New parameter BatchCount:integer, to choose how many rows count are enqueued for indexing using create &#8230; index &#8230; parameters(&#8216;SyncMode:OnLine&#8217;);</li>
<li>Creating an index with SyncMode:OnLine causes that LuceneDomain index will enqueue batchs of &#8220;BatchCount&#8221; rows for index by AQ PLSQL callback in background. Lucene Domain Index is intermediately ready for querying after create.</li>
<li>Batch rowid indexing is doing using a pipeline function.</li>
</ul>
<p><strong>2.2.0.2.0 (third major release synchronized with Lucene 2.2.0, 12/Dec/07)</strong></p>
<p><a href="http://sourceforge.net/project/showfiles.php?group_id=56183">http://sourceforge.net/project/showfiles.php?group_id=56183</a></p>
<p><pre class="brush: plain;">
# CVS access:
cvs -d:pserver:anonymous@dbprism.cvs.sourceforge.net:/cvsroot/dbprism login
cvs -z3 -d:pserver:anonymous@dbprism.cvs.sourceforge.net:/cvsroot/dbprism co -P ojvm
</pre></p>
<ul>
<li>sort by column passed at lcontains(col,query_parser_str,sort_str,corr_id) syntax</li>
<li>Logging support using Java Util Logging package</li>
<li>JUnit test suites emulating middle tier environment</li>
<li>Support for rebuild and optimize online for SyncMode:OnLine index</li>
<li>XMLDB Export</li>
<li>AutoTuneMemory parameter for replacing MaxBufferedDocs parameter</li>
<li>Functional column support</li>
</ul>
<p><strong>2.2.0.1.1 (second release, 27/Sep/07 05:39 AM)</strong></p>
<p><a href="https://issues.apache.org/jira/secure/attachment/12366661/ojvm-09-27-07.tar.gz">https://issues.apache.org/jira/secure/attachment/12366661/ojvm-09-27-07.tar.gz</a></p>
<p><pre class="brush: plain;">
# CVS access:
cvs -d:pserver:anonymous@dbprism.cvs.sourceforge.net:/cvsroot/dbprism login
cvs -z3 -d:pserver:anonymous@dbprism.cvs.sourceforge.net:/cvsroot/dbprism co -P ojvm
</pre></p>
<ul>
<li>LuceneDomainIndex.countHits() function to replace select count from .. where lcontains(..)&gt;0 syntax.</li>
<li>support inline pagination at lcontains(col,&#8217;rownum:[n TO m] AND &#8230;&#8221;) function</li>
<li>rounding and padding support for columns date, timestamp, mumber, float, varchar2 and char</li>
<li>ODCI API array DML support</li>
<li>BLOB parameter support</li>
</ul>
<p><strong>2.2.0.1.0 (first release synchronized with lucene 2.2.0, 14/Sep/07 06:44 AM)</strong></p>
<p><pre class="brush: plain;">
# CVS access:
cvs -d:pserver:anonymous@dbprism.cvs.sourceforge.net:/cvsroot/dbprism login
cvs -z3 -d:pserver:anonymous@dbprism.cvs.sourceforge.net:/cvsroot/dbprism co -P ojvm
</pre></p>
<ul>
<li>Synchronized with latest Lucene 2.2.0 production</li>
<li>Replaced in memory storage using Vector based implementation by direct BLOB IO, reducing memory usage for large index.</li>
<li>Support for user data stores, it means you can not only index one column at time (limited by Data Cartridge API on 10g), now you can index multiples columns at base table and columns on related tabled joined together.</li>
<li>User Data Stores can be customized by the user, it means writing a simple Java Class users can control which column are indexed, padding used or any other functionality previous to document adding step.</li>
<li>There is a DefaultUserDataStore which gets all columns of the query and built a Lucene Document with Fields representing each database columns these fields are automatically padded if they have NUMBER or rounded if they have DATE data, for example.</li>
<li>lcontains() SQL operator support full Lucene&#8217;s QueryParser syntax to provide access to all columns indexed, see examples below.</li>
<li>Support for DOMAIN_INDEX_SORT and FIRST_ROWS hint, it means that if you want to get rows order by lscore() operator (ascending,descending) the optimizer hint will assume that Lucene Domain Index will returns rowids in proper order avoided an inline-view to sort it.</li>
<li>Automatic index synchronization by using AQ&#8217;s Call Back.</li>
<li>Lucene Domain Index creates extra tables named IndexName$T and an Oracle AQ named IndexName$Q with his storage table IndexName$QT at user&#8217;s schema, so you can alter storage&#8217;s preference if you want.</li>
<li>ojvm project is at SourceForge.net CVS, so anybody can get it and collaborate</li>
<li>Tested against 10gR2 and 11gR1 database.</li>
</ul>
<p><strong>2.0.0.1.3 (third release, 09/Jan/07 11:40 AM)</strong></p>
<p><a href="https://issues.apache.org/jira/secure/attachment/12348574/ojvm-01-09-07.tar.gz">https://issues.apache.org/jira/secure/attachment/12348574/ojvm-01-09-07.tar.gz</a></p>
<ul>
<li>The Data Cartridge API is used without column data to reduce the data stored on the queue of changes and speedup the operation of the synchronize method.</li>
<li>Query Hits are cached associated to the index search and the string returned by the QueryParser.toString() method.</li>
<li>If no ancillary operator is used in the select, do not store the score list.</li>
<li>The &#8220;Stemmer&#8221; argument is recognized as parameter given the argument for the SnowBall analyzer, for example:<br />
<pre class="brush: sql;">
create index it1 on t1(f2) indextype is lucene.LuceneIndex parameters('Stemmer:English');
</pre></li>
<li>Before installing the ojvm extension is necessary to execute &#8220;ant jar-core&#8221; on the snowball directory.</li>
<li>The IndexWriter.setUseCompoundFile(false) is called to use multi file storage (faster than the compound file) because there is no file descriptor limitation inside the OJVM, BLOBs are used instead of File.</li>
<li>Files are marked for deletion and they are purged when calling to Sync or Optimize methods.</li>
<li>Blob are created and populated in one call using Oracle SQL RETURNING information.</li>
<li>A testing script for using OE sample schema, with query comparisons against Oracle Text ctxsys.context index.</li>
</ul>
<p><strong>2.0.0.1.2 (second release, 20/Dec/06 02:03 PM)</strong></p>
<p><a href="https://issues.apache.org/jira/secure/attachment/12347614/ojvm-12-20-06.tar.gz">https://issues.apache.org/jira/secure/attachment/12347614/ojvm-12-20-06.tar.gz</a></p>
<ul>
<li>This new release of the OJVMDirectory Lucene Store includes a fully functional Oracle Domain Index with a queue for update/insert massive operations and a lot of performance improvement.</li>
</ul>
<p><strong>2.0.0.1.1 (first release, 28/Nov/06 01:04 PM)</strong></p>
<p><a href="https://issues.apache.org/jira/secure/attachment/12345967/ojvm-11-28-06.tar.gz">https://issues.apache.org/jira/secure/attachment/12345967/ojvm-11-28-06.tar.gz</a></p>
<ul>
<li>The complet API for the Oracle Domain index was completed, but the solution for the operator contains outside the where clause is not good.</li>
<li>I will implement a singleton solution for the OJVMDirectory object when is used in read only mode, typically when user performs select operations against tables which have columns indexed with Lucene. This implementation will increase a lot the final performance because the index reader will be ready for each select operation. Obviously I will check if another user or thread makes a write operation on the index to reload the read-only singleton.</li>
<li>The queue for storing the changes on the index is not implemented yet, I&#8217;ll add it in a short time.</li>
</ul>
<p><strong>2.0.0.1.0 (initial implementation, 22/Nov/06 03:45 PM)</strong></p>
<p><a href="https://issues.apache.org/jira/secure/attachment/12345516/ojvm.tar.gz">https://issues.apache.org/jira/secure/attachment/12345516/ojvm.tar.gz</a></p>
<h2>Doc Links</h2>
<p><a href="http://ludoix.wordpress.com/2011/03/11/ldi-docs-%e2%80%93-appendix-d-functions-operators-and-utilities/">Previous / LDI Docs – Appendix D (Functions, operators and utilities)</a></p>
<br />Filed under: <a href='http://ludoix.wordpress.com/category/documentation/'>Documentation</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ludoix.wordpress.com/79/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ludoix.wordpress.com/79/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ludoix.wordpress.com/79/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ludoix.wordpress.com/79/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ludoix.wordpress.com/79/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ludoix.wordpress.com/79/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ludoix.wordpress.com/79/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ludoix.wordpress.com/79/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ludoix.wordpress.com/79/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ludoix.wordpress.com/79/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ludoix.wordpress.com/79/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ludoix.wordpress.com/79/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ludoix.wordpress.com/79/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ludoix.wordpress.com/79/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ludoix.wordpress.com&amp;blog=20873609&amp;post=79&amp;subd=ludoix&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://ludoix.wordpress.com/2011/03/11/ldi-docs-%e2%80%93-appendix-e-project-change-log/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/3690228eaee545ab583565df1e16b64a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ludoix</media:title>
		</media:content>
	</item>
		<item>
		<title>LDI Docs – Appendix D (Functions, operators and utilities)</title>
		<link>http://ludoix.wordpress.com/2011/03/11/ldi-docs-%e2%80%93-appendix-d-functions-operators-and-utilities/</link>
		<comments>http://ludoix.wordpress.com/2011/03/11/ldi-docs-%e2%80%93-appendix-d-functions-operators-and-utilities/#comments</comments>
		<pubDate>Fri, 11 Mar 2011 17:44:12 +0000</pubDate>
		<dc:creator>ludoix</dc:creator>
				<category><![CDATA[Documentation]]></category>

		<guid isPermaLink="false">http://ludoix.wordpress.com/?p=77</guid>
		<description><![CDATA[Doc Links Previous / LDI Docs – Appendix C (JUnit test suites explained) Next / LDI Docs – Appendix E (Project Change Log) Filed under: Documentation<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ludoix.wordpress.com&amp;blog=20873609&amp;post=77&amp;subd=ludoix&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<h2>Doc Links</h2>
<p><a href="http://ludoix.wordpress.com/2011/03/11/ldi-docs-%e2%80%93-appendix-c-junit-test-suites-explained/">Previous / LDI Docs – Appendix C (JUnit test suites explained)</a><br />
<a href="http://ludoix.wordpress.com/2011/03/11/ldi-docs-%e2%80%93-appendix-e-project-change-log/">Next / LDI Docs – Appendix E (Project Change Log)</a></p>
<br />Filed under: <a href='http://ludoix.wordpress.com/category/documentation/'>Documentation</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ludoix.wordpress.com/77/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ludoix.wordpress.com/77/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ludoix.wordpress.com/77/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ludoix.wordpress.com/77/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ludoix.wordpress.com/77/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ludoix.wordpress.com/77/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ludoix.wordpress.com/77/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ludoix.wordpress.com/77/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ludoix.wordpress.com/77/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ludoix.wordpress.com/77/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ludoix.wordpress.com/77/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ludoix.wordpress.com/77/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ludoix.wordpress.com/77/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ludoix.wordpress.com/77/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ludoix.wordpress.com&amp;blog=20873609&amp;post=77&amp;subd=ludoix&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://ludoix.wordpress.com/2011/03/11/ldi-docs-%e2%80%93-appendix-d-functions-operators-and-utilities/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/3690228eaee545ab583565df1e16b64a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ludoix</media:title>
		</media:content>
	</item>
		<item>
		<title>LDI Docs – Appendix C (JUnit test suites explained)</title>
		<link>http://ludoix.wordpress.com/2011/03/11/ldi-docs-%e2%80%93-appendix-c-junit-test-suites-explained/</link>
		<comments>http://ludoix.wordpress.com/2011/03/11/ldi-docs-%e2%80%93-appendix-c-junit-test-suites-explained/#comments</comments>
		<pubDate>Fri, 11 Mar 2011 17:43:32 +0000</pubDate>
		<dc:creator>ludoix</dc:creator>
				<category><![CDATA[Documentation]]></category>

		<guid isPermaLink="false">http://ludoix.wordpress.com/?p=75</guid>
		<description><![CDATA[C JUnit test suites explained C.1 DBTestCase base class This is base class for most of the test suites includes. It provides a connection pool using OracleDataSource with a minimum of two ready to use connection and growing to 5, after this it will wait up to 20 seconds for free connection. This connection pool [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ludoix.wordpress.com&amp;blog=20873609&amp;post=75&amp;subd=ludoix&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<h2>C JUnit test suites explained</h2>
<h3>C.1 DBTestCase base class</h3>
<p>This is base class for most of the test suites includes. It provides a connection pool using OracleDataSource with a minimum of two ready to use connection and growing to 5, after this it will wait up to 20 seconds for free connection. This connection pool is created at the class constructor. Utility methods provided by this class, each method use is own SQLConnection, so they are autonomous transactions:</p>
<ul>
<li>createTable(), create a test table as follow, (T1 is a constant value defined as TABLE):<br />
<pre class="brush: sql;">
create table T1 (
  f1 number primary key,
  f2 varchar2(200),
  f3 varchar2(200),
  f4 number);
</pre></li>
<li>dropTable(), drop table created above.</li>
<li>createIndex(), add a Lucene Domain Index to previous one created table as follow, (LogLevel,Analyzer,MergeFactor,ExtraCols and FormatCols are customizable at class level, after index creation MergeFactor is reduced to 2):<br />
<pre class="brush: sql;">
create index IT1 on T1(f2)
  indextype is lucene.LuceneIndex
    parameters('LogLevel:WARNING;
      Analyzer:org.apache.lucene.analysis.StopAnalyzer;
      MergeFactor:500;
      ExtraCols:F1;
      FormatCols:F1(0000)')
</pre></li>
<li>dropIndex(), drop previous one index.</li>
<li>int insertRows(int startIndex, int endIndex), insert a set of rows at above table with F1 column varying from startIndex to endIndex. F2 column is an english text representation of F1, F4 is F1*10 and F3 is an english text representation of F1*10. Return a number of rows inserted. If there are problems such as primary key violation it rollback the transaction.</li>
<li>int deleteRows(int startIndex, int endIndex), delete a set of rows where F1 between startIndex and endIndex. Return a number of rows deleted. If there are problems rollback the transaction. Note that deleting rows automatically update Lucene Index.</li>
<li>int updateRows(int startIndex, int endIndex), update F2 column with his own value to fire ODCI update method on each row between startIndex and endIndex.<br />
  Return a number of rows updated.</li>
<li>findRows(int n), find rows which F2 match again a text representation of n using lcontains operator. It only test for a result having 0 or more rows.</li>
<li>long syncIndex(), perform a sync operation at Lucene Domain Index applying pending changes (inserts, updates). If there are errors, usually caused by another transaction having an exclusive lock in a row being indexed, it rollback the operation. Next successful sync will apply pending changes of failed operations. Return a long value with the amount of milliseconds spent during sync.</li>
<li>long optimizeIndex(), perform an optimize operation at Lucene Domain Index merging segments in a new one. If there are errors, usually caused by another transaction having an exclusive lock on the index, it rollback the operation. Return a long value with the amount of milliseconds spent during optimize.</li>
</ul>
<h3>C.2 TestDBIndex</h3>
<p>Simple test which create a table his index and performs insertions, sync, optimize and deletions, finally drop index and table. His output look like:</p>
<pre>
[junit] Testsuite: org.apache.lucene.index.TestDBIndex
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.836 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] Table created: T1
[junit] Index created: IT1
[junit] Index altered: IT1
[junit] Inserted rows: 40 total char inserted: 415 avg text length: 10
[junit] Index synced: IT1 elapsed time: 265 ms.
[junit] Avg Sync time: 6
[junit] Index optimized: IT1 elapsed time: 40 ms.
[junit] Avg Optimize time: 1
[junit] Row deleted 40, from: 10 to: 49 elapsed time: 1303 ms. Avg time: 32 ms.
[junit] Index droped: IT1
[junit] Table droped: T1
</pre>
<h3>C.3 TestDBIndexAddDoc</h3>
<p>Performs several insertions and sync, starting with 10 rows, then 90 and so on, ending with 3.000 insertions using insertRow method of DBTestCase base class. After each batch of insertions calls to syncIndex method calculating average time of sync method for each row inserted. His output look like:</p>
<pre>
[junit] Testsuite: org.apache.lucene.index.TestDBIndexAddDoc
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 64.696 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] Table created: T1
[junit] Index created: IT1
[junit] Index altered: IT1
[junit] Index synced: IT1 elapsed time: 126 ms.
[junit] Inserted rows: 10 total char inserted: 49 avg text length: 4
[junit] Index synced: IT1 elapsed time: 142 ms.
[junit] Avg Sync time: 14
[junit] Inserted rows: 90 total char inserted: 988 avg text length: 10
[junit] Index synced: IT1 elapsed time: 374 ms.
[junit] Avg Sync time: 4
[junit] Inserted rows: 400 total char inserted: 9201 avg text length: 23
[junit] Index synced: IT1 elapsed time: 1276 ms.
[junit] Avg Sync time: 3
[junit] Inserted rows: 500 total char inserted: 11726 avg text length: 23
[junit] Index synced: IT1 elapsed time: 1601 ms.
[junit] Avg Sync time: 3
[junit] Inserted rows: 1000 total char inserted: 35950 avg text length: 35
[junit] Index synced: IT1 elapsed time: 4675 ms.
[junit] Avg Sync time: 4
[junit] Inserted rows: 3000 total char inserted: 110851 avg text length: 36
[junit] Index synced: IT1 elapsed time: 25480 ms.
[junit] Avg Sync time: 8
[junit] Index droped: IT1
[junit] Table droped: T1
</pre>
<h3>C.4 TestDBIndexDelDoc</h3>
<p>At setup method this test case a create a table and fill it with 500 rows. Then performs deletions batch of 10, 90 and 400 rows each calculating average time for each row deleted. His output look like:</p>
<pre>
[junit] Testsuite: org.apache.lucene.index.TestDBIndexDelDoc
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 20.543 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] Table created: T1
[junit] Index created: IT1
[junit] Index altered: IT1
[junit] Inserted rows: 500 total char inserted: 10238 avg text length: 20
[junit] Index synced: IT1 elapsed time: 1643 ms.
[junit] Row deleted 10, from: 1 to: 10 elapsed time: 356 ms. Avg time: 35 ms.
[junit] Row deleted 90, from: 11 to: 100 elapsed time: 2535 ms. Avg time: 28 ms.
[junit] Row deleted 400, from: 101 to: 500 elapsed time: 11526 ms. Avg time: 28 ms.
[junit] Index droped: IT1
[junit] Table droped: T1
</pre>
<h3>C.5 TestDBIndexParallel</h3>
<p>This is more complex test case to check concurrent access to Lucene Domain Index. To do this creates several threads, some for simulating batch insertions of 10 rows, others for simulating batch deletions of 10 rows, another for simulating batch updates of 10 rows and finally many threads searching for rows each 0.5 seconds. By default creates 3 threads for each kind of operations and each thread perform:</p>
<ul>
<li>20 inserts</li>
<li>5 deletes</li>
<li>5 update</li>
<li>100 search</li>
</ul>
<p>Each thread takes his own connection from the connection pool and do his job, if fastSync constant is true after each successful insert and update it calls to syncIndex method to update Lucene Index, if fastSync is false another thread is started performing sync index each 1 second. It end when all threads (inserts, deletes, updates) finish. Here some part of his output:</p>
<pre>
[junit] Testsuite: org.apache.lucene.index.TestDBIndexParallel
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 97.7 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] Table created: T1
[junit] Index created: IT1
[junit] Index altered: IT1
[junit] FastSync: true
[junit] Deleter 1 deleting at block 70
[junit] Updater 1 updating at block 70
[junit] Inserter 2 inserting at block 90
[junit] No Row deleted at: 70 to: 79 elapsed time: 131 ms.
[junit] No Row updated at: 70 to: 79 elapsed time: 12 ms.
[junit] Searcher 2 searching row 30
[junit] Searcher 1 searching row 77
[junit] Not Found rows with: thirty  elapsed time: 211 ms.
[junit] Not Found rows with: seventy-seven  elapsed time: 170 ms.
[junit] Inserted rows: 10 total char inserted: 115 avg text length: 11
[junit] Searcher 2 searching row 62
[junit] Searcher 0 searching row 63
[junit] Searcher 1 searching row 49
[junit] Not Found rows with: sixty-two  elapsed time: 64 ms.
[junit] Index synced: IT1 elapsed time: 283 ms.
[junit] Not Found rows with: sixty-three  elapsed time: 215 ms.
[junit] Searcher 2 searching row 74
[junit] Not Found rows with: seventy-four  elapsed time: 39 ms.
[junit] Not Found rows with: forty-nine  elapsed time: 137 ms.
[junit] Searcher 1 searching row 95
[junit] Searcher 2 searching row 46
[junit] Found rows with: ninety-five  elapsed time: 103 ms.
....
[junit] Updater 2 updating at block 20
[junit] No Row updated at: 20 to: 29 elapsed time: 3 ms.
[junit] Inserted rows: 10 total char inserted: 80 avg text length: 8
[junit] Searcher 0 searching row 97
[junit] Found rows with: ninety-seven  elapsed time: 60 ms.
[junit] Index synced: IT1 elapsed time: 147 ms.
.....
[junit] Searcher 2 searching row 39
[junit] Searcher 1 searching row 84
[junit] Not Found rows with: thirty-nine  elapsed time: 33 ms.
[junit] Not Found rows with: eighty-four  elapsed time: 38 ms.
[junit] Updater 0 updating at block 90
[junit] Row updated 10, from: 90 to: 99 elapsed time: 16 ms. Avg time: 1 ms.
[junit] Index synced: IT1 elapsed time: 162 ms.
......
[junit] Inserted rows: 10 total char inserted: 125 avg text length: 12
[junit] Searcher 0 searching row 57
[junit] Searcher 1 searching row 28
[junit] Deleter 1 deleting at block 80
[junit] Searcher 2 searching row 64
[junit] No Row deleted at: 80 to: 89 elapsed time: 58 ms.
[junit] Not Found rows with: twenty-eight  elapsed time: 112 ms.
[junit] Not Found rows with: fifty-seven  elapsed time: 155 ms.
[junit] Index synced: IT1 elapsed time: 242 ms.
[junit] Searcher 0 searching row 98
[junit] Found rows with: ninety-eight  elapsed time: 72 ms.
[junit] Not Found rows with: sixty-four  elapsed time: 175 ms.
[junit] Searcher 0 searching row 27
[junit] Not Found rows with: twenty-seven  elapsed time: 75 ms.
[junit] Searcher 1 searching row 5
[junit] Deleter 2 deleting at block 50
[junit] Searcher 2 searching row 84
[junit] Not Found rows with: eighty-four  elapsed time: 20 ms.
[junit] Updater 2 updating at block 10
[junit] No Row deleted at: 50 to: 59 elapsed time: 28 ms.
[junit] Row updated 10, from: 10 to: 19 elapsed time: 36 ms. Avg time: 3 ms.
[junit] Found rows with: five  elapsed time: 216 ms.
.................
[junit] Inserter 1 inserting at block 50
[junit] Found rows at: 50 position, ignoring insertions
[junit] Index droped: IT1
[junit] Table droped: T1
</pre>
<h3>C.6 TestDBIndexSearchDoc</h3>
<p>This test check some special features of lcontains operator such as in-line pagination, sort by and filter by expressions. First create a table with 200 rows and then query them, his output look like:</p>
<pre>
[junit] Testsuite: org.apache.lucene.index.TestDBIndexSearchDoc
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 14.001 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] Table created: T1
[junit] Index created: IT1
[junit] Index altered: IT1
[junit] Inserted rows: 200 total char inserted: 3262 avg text length: 16
[junit] Index synced: IT1 elapsed time: 746 ms.
[junit] testFilterAll()
[junit] Excecution time: 129 ms.
[junit] 120 Score: 0.9606395 str: one hundred twenty
[junit] 119 Score: 0.25453204 str: one hundred nineteen
[junit] 118 Score: 0.25453204 str: one hundred eighteen
[junit] 117 Score: 0.25453204 str: one hundred seventeen
[junit] 116 Score: 0.25453204 str: one hundred sixteen
[junit] 115 Score: 0.25453204 str: one hundred fifteen
[junit] 114 Score: 0.25453204 str: one hundred fourteen
[junit] 113 Score: 0.25453204 str: one hundred thirteen
[junit] 112 Score: 0.25453204 str: one hundred twelve
[junit] 111 Score: 0.25453204 str: one hundred eleven
[junit] Index droped: IT1
[junit] Table droped: T1
[junit] Table created: T1
[junit] Index created: IT1
[junit] Index altered: IT1
[junit] Inserted rows: 200 total char inserted: 3262 avg text length: 16
[junit] Index synced: IT1 elapsed time: 721 ms.
[junit] testFilterBy()
[junit] Excecution time: 162 ms.
[junit] 103 Score: 1.0 str: one hundred three
[junit] 120 Score: 0.9606395 str: one hundred twenty
[junit] 101 Score: 0.28600293 str: one hundred one
[junit] 100 Score: 0.27352643 str: one hundred
....
[junit] 115 Score: 0.25453204 str: one hundred fifteen
[junit] 116 Score: 0.25453204 str: one hundred sixteen
[junit] Index droped: IT1
[junit] Table droped: T1
[junit] Table created: T1
[junit] Index created: IT1
[junit] Index altered: IT1
[junit] Inserted rows: 200 total char inserted: 3262 avg text length: 16
[junit] Index synced: IT1 elapsed time: 751 ms.
[junit] testFilterByOrderBy()
[junit] Excecution time: 138 ms.
[junit] 120 Score: 0.9606395 str: one hundred twenty
[junit] 119 Score: 0.25453204 str: one hundred nineteen
....
[junit] 103 Score: 1.0 str: one hundred three
[junit] 102 Score: 0.25453204 str: one hundred two
[junit] 101 Score: 0.28600293 str: one hundred one
[junit] 100 Score: 0.27352643 str: one hundred
[junit] Index droped: IT1
[junit] Table droped: T1
[junit] Table created: T1
[junit] Index created: IT1
[junit] Index altered: IT1
[junit] Inserted rows: 200 total char inserted: 3262 avg text length: 16
[junit] Index synced: IT1 elapsed time: 761 ms.
[junit] testPagination()
[junit] Excecution time: 193 ms.
[junit] 117 Score: 0.03489425 str: one hundred seventeen
[junit] 118 Score: 0.03489425 str: one hundred eighteen
....
[junit] 132 Score: 0.03489425 str: one hundred thirty-two
[junit] 134 Score: 0.03489425 str: one hundred thirty-four
[junit] Index droped: IT1
[junit] Table droped: T1
[junit] Table created: T1
[junit] Index created: IT1
[junit] Index altered: IT1
[junit] Inserted rows: 200 total char inserted: 3262 avg text length: 16
[junit] Index synced: IT1 elapsed time: 743 ms.
[junit] testCountHits()
[junit] Excecution time: 53 ms.
[junit] Hits: 126
[junit] Index droped: IT1
[junit] Table droped: T1
</pre>
<h3>C.7 TestQueryHits</h3>
<p>This test is not autonomous because requires an additional step to run. Before run it create a table and his Lucene Index with:</p>
<p><pre class="brush: sql;">
create table test_source_big as (select * from all_source);
create index source_big_lidx on test_source_big(text)
  indextype is lucene.LuceneIndex
    parameters('AutoTuneMemory:true;
      MergeFactor:500;
      FormatCols:line(0000);
      ExtraCols:line &quot;line&quot;');
</pre></p>
<p>For 11g databases you can create a best optimize Lucene Index using some new Secure LOB features:</p>
<p><pre class="brush: sql;">
create index source_big_lidx on test_source_big(text)
  indextype is lucene.LuceneIndex
    parameters('FormatCols:line(0000);
      ExtraCols:line &quot;line&quot;;
      Analyzer:org.apache.lucene.analysis.StopAnalyzer;
      MergeFactor:500;
      LobStorageParameters:PCTVERSION 0 ENABLE STORAGE IN ROW CHUNK 32768 CACHE READS FILESYSTEM_LIKE_LOGGING');
</pre></p>
<p>On 10g running it as SCOTT, TEST_SOURCE_BIG table will have 220731 rows using a typical installation based on database templates. Using above table two test checks performance with a query which returns 18387 hits, once call to LuceneDomainIndex.countHits function and another iterate over the result in pages of ten rows, typical scenario of web applications. His output look like:</p>
<pre>
[junit] Testsuite: org.apache.lucene.indexer.TestQueryHits
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 2.656 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] iteration from: 13775 to: 13785
[junit] Step time: 791 ms.
[junit] iteration from: 13785 to: 13795
[junit] Step time: 49 ms.
[junit] iteration from: 13795 to: 13805
[junit] Step time: 40 ms.
[junit] iteration from: 13805 to: 13815
[junit] Step time: 44 ms.
[junit] iteration from: 13815 to: 13825
[junit] Step time: 40 ms.
[junit] iteration from: 13825 to: 13835
[junit] Step time: 42 ms.
[junit] iteration from: 13835 to: 13845
[junit] Step time: 41 ms.
[junit] iteration from: 13845 to: 13855
[junit] Step time: 50 ms.
[junit] iteration from: 13855 to: 13865
[junit] Step time: 41 ms.
[junit] iteration from: 13865 to: 13875
[junit] Step time: 41 ms.
[junit] Elapsed time: 1877
[junit] Hits: 18387
[junit] Elapsed time: 564
</pre>
<p>Note that first iteration took more time because it includes parsing time and caching, also to simulate a real word web application an SQLConnection is take and returned to the pool on each iteration.</p>
<h2>Doc Links</h2>
<p><a href="http://ludoix.wordpress.com/2011/03/11/ldi-docs-appendix-b-lucene-domain-index-storage/">Previous / LDI Docs – Appendix B (Lucene Domain Index Storage)</a><br />
<a href="http://ludoix.wordpress.com/2011/03/11/ldi-docs-%e2%80%93-appendix-d-functions-operators-and-utilities/">Next / LDI Docs – Appendix D (Functions, operators and utilities)</a></p>
<br />Filed under: <a href='http://ludoix.wordpress.com/category/documentation/'>Documentation</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ludoix.wordpress.com/75/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ludoix.wordpress.com/75/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ludoix.wordpress.com/75/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ludoix.wordpress.com/75/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ludoix.wordpress.com/75/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ludoix.wordpress.com/75/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ludoix.wordpress.com/75/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ludoix.wordpress.com/75/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ludoix.wordpress.com/75/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ludoix.wordpress.com/75/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ludoix.wordpress.com/75/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ludoix.wordpress.com/75/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ludoix.wordpress.com/75/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ludoix.wordpress.com/75/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ludoix.wordpress.com&amp;blog=20873609&amp;post=75&amp;subd=ludoix&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://ludoix.wordpress.com/2011/03/11/ldi-docs-%e2%80%93-appendix-c-junit-test-suites-explained/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/3690228eaee545ab583565df1e16b64a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ludoix</media:title>
		</media:content>
	</item>
		<item>
		<title>LDI Docs &#8211; Appendix B (Lucene Domain Index Storage)</title>
		<link>http://ludoix.wordpress.com/2011/03/11/ldi-docs-appendix-b-lucene-domain-index-storage/</link>
		<comments>http://ludoix.wordpress.com/2011/03/11/ldi-docs-appendix-b-lucene-domain-index-storage/#comments</comments>
		<pubDate>Fri, 11 Mar 2011 17:39:25 +0000</pubDate>
		<dc:creator>ludoix</dc:creator>
				<category><![CDATA[Documentation]]></category>

		<guid isPermaLink="false">http://ludoix.wordpress.com/?p=71</guid>
		<description><![CDATA[B Lucene Domain Index Storage OJVMDirectory class creates a set of Oracle objects to represent Lucene Inverted Index and Domain Index functionality. First it creates a table named IDX_NAME$T (IDX_NAME is your Lucene Domain Index used at create index DDL statement) with this structure: Name Null? Type NAME NOT NULL VARCHAR2(30) LAST_MODIFIED TIMESTAMP(6) FILE_SIZE NUMBER(38) [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ludoix.wordpress.com&amp;blog=20873609&amp;post=71&amp;subd=ludoix&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<h2>B Lucene Domain Index Storage</h2>
<p><code>OJVMDirectory</code> class creates a set of Oracle objects to represent Lucene Inverted Index and Domain Index functionality. First it creates a table named <code>IDX_NAME$T</code> (IDX_NAME is your Lucene Domain Index used at create index DDL statement) with this structure:</p>
<table>
<tr>
<th>Name</th>
<th>Null?</th>
<th>Type</th>
</tr>
<tr>
<td>NAME</td>
<td>NOT NULL</td>
<td>VARCHAR2(30)</td>
</tr>
<tr>
<td>LAST_MODIFIED</td>
<td></td>
<td>TIMESTAMP(6)</td>
</tr>
<tr>
<td>FILE_SIZE</td>
<td></td>
<td>NUMBER(38)</td>
</tr>
<tr>
<td>DATA</td>
<td></td>
<td>BLOB</td>
</tr>
<tr>
<td>DELETED</td>
<td></td>
<td>CHAR(1)</td>
</tr>
</table>
<p>Also have and index based on <code>IDX_NAME$T.DELETED</code> column to speedy up purge operations. To enqueue operation at the index it defines a DBMS_AQ Queue <code>IDX_NAME$Q</code> with his storage table <code>IDX_NAME$QT</code>. <code>IDX_NAME$Q</code> queue have payload defined as <code>LUCENE_MSG_TYP</code> object. This object type is defined as:</p>
<table>
<tr>
<th>Name</th>
<th>Null?</th>
<th>Type</th>
</tr>
<tr>
<td>RIDLIST</td>
<td></td>
<td>SYS.ODCIRIDLIST</td>
</tr>
<tr>
<td>OPERATION</td>
<td></td>
<td>VARCHAR2(32)</td>
</tr>
</table>
<p><code>SYS.ODCIRIDLIST</code> is an special structure defined by ODCI API to hold a list of rowid changed by an DML operation. OPERATION is one of insert, delete, update, rebuild, optimize, insert-ram or insert-disk reserved keyword. rebuild and optimize operations are used with <code>SyncMode:OnLine</code> to perform these tasks automatically using a background process. insert-ram and insert-disk are messages enqueued internally by LDI when ParallelDegree is enabled.<br />
When using ParallelDegree grater than 1 the structure showed above is replicated for N parallel storages used when indexing, it means that, for example, with ParallelDegree:2 for an index name <code>SOURCE_BIG_LIDX</code> there will be two extra structures named <code>SOURCE_BIG_LIDX$0{$Q|$QT|$T}</code> and <code>SOURCE_BIG_LIDX$1{$Q|$QT|$T}</code>.</p>
<h2>Doc Links</h2>
<p><a href="http://ludoix.wordpress.com/2011/03/11/ldi-docs-appendix-a-parameter-reference-and-syntax/">Previous / LDI Docs – Appendix A (Parameter reference and syntax)</a><br />
<a href="http://ludoix.wordpress.com/2011/03/11/ldi-docs-%e2%80%93-appendix-c-junit-test-suites-explained/">Next / LDI Docs – Appendix C (JUnit test suites explained)</a></p>
<br />Filed under: <a href='http://ludoix.wordpress.com/category/documentation/'>Documentation</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ludoix.wordpress.com/71/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ludoix.wordpress.com/71/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ludoix.wordpress.com/71/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ludoix.wordpress.com/71/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ludoix.wordpress.com/71/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ludoix.wordpress.com/71/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ludoix.wordpress.com/71/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ludoix.wordpress.com/71/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ludoix.wordpress.com/71/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ludoix.wordpress.com/71/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ludoix.wordpress.com/71/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ludoix.wordpress.com/71/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ludoix.wordpress.com/71/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ludoix.wordpress.com/71/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ludoix.wordpress.com&amp;blog=20873609&amp;post=71&amp;subd=ludoix&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://ludoix.wordpress.com/2011/03/11/ldi-docs-appendix-b-lucene-domain-index-storage/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/3690228eaee545ab583565df1e16b64a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ludoix</media:title>
		</media:content>
	</item>
		<item>
		<title>LDI Docs &#8211; Appendix A (Parameter reference and syntax)</title>
		<link>http://ludoix.wordpress.com/2011/03/11/ldi-docs-appendix-a-parameter-reference-and-syntax/</link>
		<comments>http://ludoix.wordpress.com/2011/03/11/ldi-docs-appendix-a-parameter-reference-and-syntax/#comments</comments>
		<pubDate>Fri, 11 Mar 2011 17:38:46 +0000</pubDate>
		<dc:creator>ludoix</dc:creator>
				<category><![CDATA[Documentation]]></category>

		<guid isPermaLink="false">http://ludoix.wordpress.com/?p=69</guid>
		<description><![CDATA[A Parameter reference and syntax Lucene Domain Index accept several parameters which can be passed using create index or alter index DDL commands. This parameters are divided into four categories, Index Writer, Analyzer, User Data Store and General parameters. A.1 Lucene Index Writer parameters This section covers Lucene Index Writer parameters for more information about [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ludoix.wordpress.com&amp;blog=20873609&amp;post=69&amp;subd=ludoix&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<h1>A Parameter reference and syntax </h1>
<p>Lucene Domain Index accept several parameters which can be passed using create index or alter index DDL commands. This parameters are divided into four categories, Index Writer, Analyzer, User Data Store and General parameters.</p>
<h2>A.1 Lucene Index Writer parameters</h2>
<p>This section covers Lucene Index Writer parameters for more information about this parameter see <a href="http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/index/IndexWriter.html">Lucene docs</a> and <a href="http://wiki.apache.org/lucene-java/ImproveIndexingSpeed">Wiki</a>.</p>
<h3>A.1.1 MergeFactor</h3>
<p>Determines how often segment indices are merged by addDocument(). If you are creating a new index over a table with thousands of rows a value of 100 to 500 is good value.</p>
<h3>A.1.2 MaxBufferedDocs</h3>
<p>Determines the minimal number of documents required before the buffered in-memory documents are merged and a new Segment is created. This value can cause an out of memory exception you provide a value larger than user space available. A typical SGA configuration can accept values of 4000 or 5000 depending how big are your rows being indexed. If you are not sure of how megabytes can consume your rows you can use AutoTuneMemory:true parameter which is a default value, so you choose true MaxBufferedDocs will be ignored and Lucene Domain Index will try to uso 90% of Oracle Java Pool Size value.</p>
<h3>A.1.3  MaxMergeDocs</h3>
<p>Determines the largest number of documents ever merged by addDocument().</p>
<h3>A.1.4 MaxBufferedDeleteTerms</h3>
<p>Determines the minimal number of delete terms required before the buffered in-memory delete terms are applied and flushed.</p>
<h3>A.1.5 UseCompoundFile</h3>
<p>Setting to turn on usage of a compound file. When on, multiple files for each segment are merged into a single file once the segment creation is finished. This is done regardless of what directory is in use. By default Lucene Domain Index do not use compound file format because its not affected by max open file descriptors.</p>
<h3>A.1.6 MaxFieldLength</h3>
<p>Determines the maximum number of char indexed for any column of this index, default value is 10000.</p>
<h3>A.1.7 AutoTuneMemory</h3>
<p>AutoTuneMemory:true (default) overrides MaxBufferedDocs parameter, it defines dynamically MaxBufferedDocs based on how much memory is reported by OracleRuntime.getJavaPoolSize() method.<br />
After each document is added to the index it calls to writer.ramSizeInBytes() and test that is not over a 50% of the ram free.<br />
This parameter works in most of the common cases, but you can get a Java out of memory error in multiuser environments because Java Pool Size is common parameter for all the sessions. If you get an exception during index creation time set AutoTuneMemory:false and adjust MaxBufferedDocs to a value which not raise an out of memory exception.</p>
<h2>A.2 Analyzer parameters</h2>
<p>An Analyzer builds TokenStreams, which analyze text. It thus represents a policy for extracting index terms from text.</p>
<p>Typical implementations first build a Tokenizer, which breaks the stream of characters from the Reader into raw Tokens. One or more TokenFilters may then be applied to the output of the Tokenizer.<br />
Analyzer, PerFieldAnalyzer or Stemmer parameter affects indexing and query expressions, so if you want to change this parameter on a exists index you to must rebuild it, the priority of these three parameters is first check for the Stemmer if its not present check for PerFieldAnalyzer if its not present checks for Analyzer parameter, finally if none of them are defined will use SimpleAnalyzer.</p>
<h3>A.2.1 Analyzer</h3>
<p>This parameter is fully qualified Java class name which extends org.apache.lucene.analysis.Analyzer. For example:</p>
<ul>
<li>BrazilianAnalyzer</li>
<li>ChineseAnalyzer</li>
<li>CJKAnalyzer</li>
<li>CzechAnalyzer</li>
<li>DutchAnalyzer</li>
<li>FrenchAnalyzer</li>
<li>GermanAnalyzer</li>
<li>GreekAnalyzer</li>
<li>KeywordAnalyzer</li>
<li>PatternAnalyzer</li>
<li>RussianAnalyzer</li>
<li>SimpleAnalyzer</li>
<li>StandardAnalyzer</li>
<li>StopAnalyzer</li>
<li>ThaiAnalyzer</li>
<li>WhitespaceAnalyzer</li>
</ul>
<p>See <a href="http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/analysis/Analyzer.html">Lucene Java Docs</a> for more details. A default analyzer is SimpleAnalyzer.</p>
<h3>A.2.2 Stemmer</h3>
<p>Stemmer is another kind of analyzer which divides words, stop words and another term related object based on an specific language. Stemmer parameter use <a href="http://snowball.tartarus.org/">Snowball Analyzer</a>, possible values for Stemmer parameter using Lucene 2.2.0 distribution are:</p>
<ul>
<li>Danish</li>
<li>Dutch</li>
<li>English</li>
<li>Finnish</li>
<li>French</li>
<li>German</li>
<li>German2</li>
<li>Italian</li>
<li>Kp</li>
<li>Lovins</li>
<li>Norwegian</li>
<li>Porter</li>
<li>Portuguese</li>
<li>Russian</li>
<li>Spanish</li>
<li>Swedish</li>
</ul>
<p>Stemmer parameter override Analyzer parameter.</p>
<h3>A.2.3 PerFieldAnalyzer</h3>
<p>PerFieldAnalyzer is a wrapper of other analyzers which provides an independent analyzer for each column being indexed, see <a href="http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/analysis/PerFieldAnalyzerWrapper.html">PerFieldAnalyzerWrapper</a> class in Lucene documentation. Each column could have his own analyzer which extends org.apache.lucene.analysis.Analyzer. If a column is not in the list <a href="http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer </a>will be used as default. For example:</p>
<p><pre class="brush: sql;">
create table t1 (f1 VARCHAR2(10), f2 XMLType);
insert into t1 values ('1', XMLType('&lt;emp id=&quot;1&quot;&gt;&lt;name&gt;ravi&lt;/name&gt;&lt;/emp&gt;'));
insert into t1 values ('3', XMLType('&lt;emp id=&quot;3&quot;&gt;&lt;name&gt;murthy&lt;/name&gt;&lt;/emp&gt;'));

create index it1 on t1(f2) indextype is lucene.LuceneIndex
 parameters('IncludeMasterColumn:false;
 ExtraCols:F1,extractValue(F2,''/emp/name/text()'') &quot;name&quot;,extractValue(F2,''/emp/@id'') &quot;id&quot;;
 FormatCols:F1(000),id(00)');

alter index it1 rebuild
 parameters('PerFieldAnalyzer:F1(org.apache.lucene.analysis.KeywordAnalyzer),id(org.apache.lucene.analysis.KeywordAnalyzer)');
</pre></p>
<p>In the above example four columns are being indexed by Lucene Domain Index rowid (added by default) using KeywordAnalyzer, F1 and id (added by ExtraCols parameter) using KeywordAnalyzer too, and finally name which is not included into PerFieldParameter and then using StandardAnalyzer.</p>
<h2>A.3 User Data Store parameters</h2>
<p>Lucene Domain Index implements a User Data Store functionality, this functionality provides many parameters to control which column will be included into a Lucene Document which is inserted into the index.<br />
and First three parameters are used to choose which columns will added to the index in addition to the master column. Oracle Domain Index are bound to a single column, this is a limitation with Oracle 10g version. To avoid this problem passing ExtraCols, ExtraTabsWhereCondition you can easily build a set of new column from the master table and others. Basically a select DML statement is built using these parameters. To clarify this Lucene Domain Index will performs a query like:</p>
<p>&#8211; Full table scan (create index statement):<br />
SELECT rowid, MasterTable.MasterColumn, ExtraCols<br />
FROM MasterTable,ExtraTabs<br />
  where WhereCondition;</p>
<p>&#8211; Find a particular rowid (insert,update operations):<br />
SELECT MasterTable.MasterColumn, ExtraCols<br />
FROM MasterTable,ExtraTabs<br />
  where MasterTable.rowid=:rowid AND WhereCondition;</p>
<p>Text in italic are injected by Lucene Domain Index and text in bold are user defined.</p>
<h3>A.3.1 ExtraCols</h3>
<p>A coma separated list of columns of the Master table of table being indexed or the tables defined into ExtraTabs parameter. Note that if you don&#8217;t define columns alias column name are capitalized by default on Oracle databases. For example &#8216;ExtraCols:F2 &#8220;f2&#8243;,T2.F3 &#8220;f3&#8243;&#8216; note that you can omit master table name if there is no collisions</p>
<h3>A.3.2 ExtraTabs</h3>
<p>A coma separated list of table name and alias for this tables. For example &#8216;ExtraTabs:T2 aliasT2,T3 aliasT3&#8242;. Note that ODCI API only will detect changes at index master column, to notify changes based on ExtraCols list you need to attach triggers, see section examples above for more detail.</p>
<h3>A.3.3 WhereCondition</h3>
<p>An SQL where condition used to join index&#8217;s master table with ExtraTabs tables. For example: &#8216;WhereCondition:T1.f1=T2.f2(+) AND T1.F1=aliasT3.f3&#8242;. Be careful to produce a correct join condition to guaranty single row result; multiple or zero row result based on the master table values are not allowed.</p>
<p>Note: Up to Lucene Domain Index 2.9.0, if you use a WhereCondition which have an OR operator put this where condition enclosed with () because the precedence of the OR over the AND operator makes that some queries returns more rows that the correct behavior, for example instead of:<br />
WhereCondition:T1.F1=&#8217;AA&#8217; OR T1.F1=&#8217;BB&#8217;<br />
put:<br />
WhereCondition:(T1.F1=&#8217;AA&#8217; OR T1.F1=&#8217;BB&#8217;)<br />
this workaround fix some problems when working in OnLine mode. Starting with 2.9.1 version this extra () are not required.</p>
<h3>A.3.4 UserDataStore</h3>
<p>This is a fully Java Class name which implements org.apache.lucene.indexer.UserDataStore interface, you can create your own Data Store class implementing this interface. By default Lucene Domain Index provides an implementation which covers most of the typical scenarios, this class is org.apache.lucene.indexer.DefaultUserDataStore and use FormatCols parameter to create Lucene Fields.</p>
<h3>A.3.5 FormatCols</h3>
<p>A coma separated list of column(format) strings interpreted by User Data Store class to control how an specific database column will be transformed in a Lucene Field. For example you can choose padding, un-tokenized values and so on.<br />
Supported formats by Default Data Store class are:</p>
<ul>
<li>Number padding for numeric columns using java.text.DecimalFormat class syntax, default is 0000000000.</li>
<li>Date rounding for timestamp and date columns using org.apache.lucene.document.DateTools, default is day.</li>
<li>Character left padding for VARCHAR2 or CHAR columns using org.apache.lucene.util.StringUtils class (leftPad method), default is no left char padding. Any char can be used for left padding.</li>
<li>XPath expression for XMLType columns, this XPath string will be passed to XMLType.extract(&#8220;format&#8221;,&#8221;") method, the result of the XPath extraction will be a new XMLType object over getStringVal() will executed. If you want to perform more user defined XMLType to Field extraction extend DefaultUserDataStore class or use virtual column indexing.</li>
<li>For columns of type VARCHAR2 or CHAR you can use an special string NOT_ANALYZED or NOT_ANALYZED_STORED as format which tell to Default User Data Store class that this column will be indexed but un-tokenized, this is useful with columns which will be used for sorting.</li>
</ul>
<h3>A.3.6 LockMasterTable</h3>
<p>When table indexer is getting the row which will be indexed it can use either FOR UPDATE NOWAIT SQL construction or not, setting this parameter to true cause that the row is acquired with a lock.</p>
<h2>A.4 General parameters</h2>
<p>This set of parameters are Lucene Domain Index specific parameters.</p>
<h3>A.4.1 SyncMode</h3>
<p>SyncMode tells to Lucene Domain Index which strategy is used to update the index. SyncMode:Deferred (default) left to the application when the index is synced either by calling LuceneDomainIndex.sync procedure after a set of changes pending or by DBMS_SCHEDULER process at an specific time. With SyncMode:Deferred update and insert operations are queued using DBMS_AQ package. Delete operations are never enqueued because require an update on Lucene Index to not return rowid of deleted rows.<br />
SyncMode:OnLine is implemented by using DBMS_AQ PLSQL callback, so immediately after a commit operation which involves insert or update rows a parallel process dbms_j* is automatically started by DBMS_AQ package to applied pending changes. SyncMode:OnLine should be reserved for index which update, insert or delete operations are much lower than select, AQ callbacks can not handle very well exceptions during sync time, for example when a row being index is locked by another session, so some changes can be lost with this scenario.</p>
<h3>A.4.2 Updater, Searcher</h3>
<p>Lucene Domain Index can be configured to start several parallel shared process which do reader and writer operations on LDI storage on behalf of the user connected session, you can configure multiple searcher process selected randomly using the syntax host1@port1,host2@port2 and one updater process using similar syntax. By default these parameters are defined with the value local which means not using parallel shared servers. Two parallel server are configured and started during database startup process, a searcher process listen at SYS_CONTEXT(&#8216;USERENV&#8217;,'SERVER_HOST&#8217;)@1099 which usually is localhost@1099 and the updater process at localhost@1098, you can register multiples searcher/updater processes editing the properties db.searcher.job/db.searcher.port,db.updater.job/db.updater.port at build.xml file and calling to the targets create-searcher-job and create-updater-job respectively.<br />
Updater and Searcher processes can be stopped, started using Ant&#8217;s targets disable-jobs and enable-jobs. </p>
<h3>A.4.3 LobStorageParameters</h3>
<p>Lucene Domain Index uses a BLOB column named &#8220;data&#8221; for storing Lucene Inverted index files. You can control any LOB storage parameter with this parameter during index creation time, his default value is &#8216;LobStorageParameters:PCTVERSION 0 ENABLE STORAGE IN ROW CACHE READS NOLOGGING&#8217; for 11g databases you can use a better optimize storage by using newest Secure LOB parameter, for example: &#8216;LobStorageParameters:PCTVERSION 0 ENABLE STORAGE IN ROW CHUNK 32768 CACHE READS FILESYSTEM_LIKE_LOGGING&#8217;</p>
<h3>A.4.4 LogLevel</h3>
<p>Lucene Domain Index uses JDK Java Util Logging package, LogLevel parameter is any of the string defined by Level.parse() method, for example: LogLevel:ALL. By default logging level is defined to WARNING.<br />
Lucene Domain Index uses:</p>
<ul>
<li>SEVERE for non recoverable error conditions</li>
<li>FINER for debugging purpose such as ODCI API arguments</li>
<li>INFO for checking index operations such as value being indexed</li>
<li>WARNING for error messages which are reported as ERROR through ODCI API</li>
<li>CONFIG to see user parameters changed by users</li>
</ul>
<p>Logging information is sent by default to Oracle .trc files, but you can redirect this output using dbms_java.set_output procedure for example.<br />
If you are not sure which field and how these fields are added to the index change LogLevel to INFO and check for lines starting with: &#8220;INFO: Document&lt;&#8221;<br />
exiting and throwing methods does not print messages also with log level defined to ALL. This is because logging level used by these methods are controlled by ConsoleHandler level.<br />
To get these methods work copy logging.properties file from your JAVA_HOME/jre/lib to ORACLE_HOME/javavm/lib directory and edit the line which includes level property:</p>
<p><pre class="brush: java;">
# Limit the message that are printed on the console to INFO and above.
java.util.logging.ConsoleHandler.level = ALL
java.util.logging.ConsoleHandler.formatter = java.util.logging.SimpleFormatter
</pre></p>
<p>Then shutdown and startup your Oracle database.</p>
<h3>A.4.5 CachedRowIdSize</h3>
<p>CachedRowIdSize is used by an LRU cached used to maintain the association between Lucene Doc ID and a particular Oracle ROWID. For very big table using an array to store this association can consume a lot of SGA RAM, starting with Lucene Domain Index 2.9.0.1.0 only 10.000 ROWID are stored in this cache, tables with high frequency of updates can use this LRU small due every caused that LRU is completed flushed, but tables with low frequency of updates/deletes can get a lot of performance improvement by using larger LRU cached size.</p>
<h3>A.4.6 BatchCount, IndexOnRam and ParallelDegree</h3>
<p>These three parameters control parallel index operations (inserts) when OnLine mode is enabled, ParalellelDegree defines how many slave index storage will be created to hold temporary parallel index operations when news rows are inserted or the index is created or rebuild. During index creation or rebuild time BatchCount defines how many rows will processed in batch and parallel with another set of rows. IndexOnRam defines when the new set of rows is indexed in a temporary index in RAM or disk, prior to Lucene Domain Index 2.9.2.1.0 a batch of new rows where processes in temporary index stored in disk, using IndexOnRam:true tells to Lucene Domain Index that the new rows will be indexed in RAM and finally merged into the main index stored in disk.</p>
<h3>A.5 Query parameters</h3>
<p>This set of parameters which affects QueryParser and search functionality.</p>
<h3>A.5.1 DefaultColumn</h3>
<p>DefaultColumn defines which columns is used as default column in <a href="http://lucene.apache.org/java/2_3_2/queryparsersyntax.html">QueryParser syntax</a>, if this parameter is not set master column of the index is used, this name is a Lucene Field name. Here an example:</p>
<p><pre class="brush: sql;">
create index pages_lidx_all on pages p (value(p))
  indextype is Lucene.LuceneIndex
  parameters('PopulateIndex:false;
    DefaultColumn:text;
    SyncMode:Deferred;
    LogLevel:WARNING;
    Analyzer:org.apache.lucene.analysis.SpanishWikipediaAnalyzer;
    ExtraCols:extractValue(object_value,''/page/title'') &quot;title&quot;, extractValue(object_value,''/page/revision/comment'') &quot;comment&quot;, extract(object_value,''/page/revision/text/text()'') &quot;text&quot;, extractValue(object_value,''/page/revision/timestamp'') &quot;revisionDate&quot;;
    FormatCols:revisionDate(day);
    IncludeMasterColumn:false;
    LobStorageParameters:PCTVERSION 0 ENABLE STORAGE IN ROW CHUNK 32768 CACHE READS FILESYSTEM_LIKE_LOGGING');
</pre></p>
<p>Note the correlation between DefaultColumn and ExtraCols. ExtraCols defines a Lucene Field named &#8220;text&#8221; with a value calculated by the SQL expression extract(object_value,&#8221;/page/revision/text/text()&#8221;), then you can use a Lucene Field text as default Field in QueryParser syntax.</p>
<h3>A.5.2 DefaultOperator</h3>
<p>DefaultOperator defines which Boolean operator is used in QueryParser syntax, if this parameter is not set OR operator is his default value.</p>
<h3>A.5.3 NormalizeScore</h3>
<p>NormalizeScore is used during Lucene Index scan to know if they need to track the maximum score, the maximum score then used to normalize the result of lscore() operator to return only values between 0 to 1. If you don&#8217;t need a normalized range of the score you can avoid this computation and your query will be fast. Note that a not normalized score not implied that the document are not in order of relevance.</p>
<h3>A.5.4 PreserveDocIdOrder</h3>
<p>PreserveDocIdOrder is an internal parameter which is used by Lucene in some kind of operator, if you don&#8217;t need that result preserve Lucene Doc ID in order rather than the relevance, you can put this value to false (default) and some operator will be fast.</p>
<h3>A.5.5 RewriteScore and SimilarityMethod</h3>
<p>RewriteScore (true or false) and SimilarityMethod (fully class name) are used when query using wildcard operator (*) these parameters produces better recall values, for example:</p>
<p><pre class="brush: sql;">
create table t1 (f1 number primary key, f2 varchar2(2000), f3 number(5,3));
insert into t1 values (1, 'Cefaleias', 1);
insert into t1 values (2, 'Cefaleia', 1);
insert into t1 values (3, 'Cefaleia em salva', 0.625);
insert into t1 values (4, 'Cefaleias de tensão', 0.625);
insert into t1 values (5, 'Cefaleias / enxaquecas', 0.625);
insert into t1 values (6, 'Desproporção céfalo-pélvica', 0.5);
insert into t1 values (7, 'Deformidade por redução cefálica congénita', 15.87);
insert into t1 values (8, 'Intoxicação por antibióticos do grupo das cefalosporinas', 0.5);
commit;

create index it1 on t1(f2)
  indextype is lucene.luceneindex 
  parameters('LogLevel:ALL;
    Analyzer:org.apache.lucene.analysis.PortugueseAnalyzer;
    FormatCols:F3(00.000);
    ExtraCols:F3;
    RewriteScore:true;
    SimilarityMethod:org.apache.lucene.search.WildcardSimilarity');

select /*+ DOMAIN_INDEX_SORT */ lscore(1) f1, f2 from t1
  where lcontains(f2, 'cefa cefa*',1) &gt; 0

 F1 F2 
 1 Cefaleias 
 1 Cefaleia 
 0.625 Cefaleia em salva 
 0.625 Cefaleias de tensão 
 0.625 Cefaleias / enxaquecas 
 0.5 Desproporção céfalo-pélvica 
 0.5 Deformidade por redução cefálica congénita 
 0.5 Intoxicação por antibióticos do grupo das cefalosporinas 
 8 rows selected
</pre></p>
<p><pre class="brush: sql;">
alter index it1 
  parameters('LogLevel:ALL;
    SimilarityMethod:org.apache.lucene.search.DefaultSimilarity');

select /*+ DOMAIN_INDEX_SORT */ lscore(1) f1,f2 from t1
  where lcontains(f2, 'cefa cefa*',1) &gt; 0

 F1 F2 
 0.3539437353610992431640625 Intoxicação por antibióticos do grupo das cefalosporinas 
 0.12431289255619049072265625 Cefaleias 
 0.12431289255619049072265625 Cefaleia 
 0.077695555984973907470703125 Cefaleia em salva 
 0.077695555984973907470703125 Cefaleias de tensão 
 0.077695555984973907470703125 Cefaleias / enxaquecas 
 0.062156446278095245361328125 Desproporção céfalo-pélvica 
 0.062156446278095245361328125 Deformidade por redução cefálica congénita 
 8 rows selected
</pre></p>
<p><pre class="brush: sql;">
alter index it1
  parameters('LogLevel:ALL;
    RewriteScore:false');

select /*+ DOMAIN_INDEX_SORT */ lscore(1) f1, f2 from t1
  where lcontains(f2, 'cefa cefa*',1) &gt; 0

 F1 F2 0.15442870557308197021484375 Cefaleias
 0.15442870557308197021484375 Cefaleia
 0.15442870557308197021484375 Cefaleia em salva 
 0.15442870557308197021484375 Cefaleias de tensão 
 0.15442870557308197021484375 Cefaleias / enxaquecas 
 0.15442870557308197021484375 Desproporção céfalo-pélvica 
 0.15442870557308197021484375 Deformidade por redução cefálica congénita 
 0.15442870557308197021484375 Intoxicação por antibióticos do grupo das cefalosporinas
 8 rows selected
</pre></p>
<h2>A.6 Highlight parameters</h2>
<p>This set of parameters which affects lhighlight, phighlight and rhighlight functionality.</p>
<h3>A.6.1 Formatter</h3>
<p>Formatter defines a valid class name which implements Lucene Interface Formatter and with a constructor with no arguments, default value org.apache.lucene.search.highlight.SimpleHTMLFormatter.</p>
<h3>A.6.2 MaxNumFragmentsRequired</h3>
<p>MaxNumFragmentsRequired defines a number of text fragments returned by Highlight function, default value is 4.</p>
<h3>A.6.3 FragmentSize</h3>
<p>FragmentSize defines the size of each fragment returned in characters of each fragment, default value is 100.</p>
<h3>A.6.4 FragmentSeparator</h3>
<p>FragmentSeparator defines a String used as fragment separator, default value is &#8220;&#8230;&#8221;. Note that you can not use &#8220;;&#8221; or &#8220;:&#8221; as fragment separator because are used as parameter and value delimiters into alter index &#8230; parameters(..) DDL statement. </p>
<h2>Doc Links</h2>
<p><a href="http://ludoix.wordpress.com/2011/03/11/ldi-docs-4-locking-and-performance/">Previous / LDI Docs – 4 Locking and Performance</a><br />
<a href="http://ludoix.wordpress.com/2011/03/11/ldi-docs-appendix-b-lucene-domain-index-storage/">Next / LDI Docs – Appendix B (Lucene Domain Index Storage)</a></p>
<br />Filed under: <a href='http://ludoix.wordpress.com/category/documentation/'>Documentation</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ludoix.wordpress.com/69/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ludoix.wordpress.com/69/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ludoix.wordpress.com/69/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ludoix.wordpress.com/69/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ludoix.wordpress.com/69/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ludoix.wordpress.com/69/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ludoix.wordpress.com/69/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ludoix.wordpress.com/69/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ludoix.wordpress.com/69/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ludoix.wordpress.com/69/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ludoix.wordpress.com/69/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ludoix.wordpress.com/69/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ludoix.wordpress.com/69/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ludoix.wordpress.com/69/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ludoix.wordpress.com&amp;blog=20873609&amp;post=69&amp;subd=ludoix&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://ludoix.wordpress.com/2011/03/11/ldi-docs-appendix-a-parameter-reference-and-syntax/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/3690228eaee545ab583565df1e16b64a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ludoix</media:title>
		</media:content>
	</item>
		<item>
		<title>LDI Docs &#8211; 4 Locking and Performance</title>
		<link>http://ludoix.wordpress.com/2011/03/11/ldi-docs-4-locking-and-performance/</link>
		<comments>http://ludoix.wordpress.com/2011/03/11/ldi-docs-4-locking-and-performance/#comments</comments>
		<pubDate>Fri, 11 Mar 2011 17:37:32 +0000</pubDate>
		<dc:creator>ludoix</dc:creator>
				<category><![CDATA[Documentation]]></category>

		<guid isPermaLink="false">http://ludoix.wordpress.com/?p=67</guid>
		<description><![CDATA[4. Locking and Performance 4.1 Locking used by Lucene Domain Index Operation Base Table (row/table) Index Table (SCHEMA.IDX$T) Queue Table (SCHEMA.IDX$QT) Insert X/RX (1) NONE NONE Update X/RX NONE NONE Delete X/RX NONE NONE Manually Sync X/RS (2) X/T&#124;X/RX (3) DBMS_AQ.BLOCKED (4) Automatically Sync X/RS (2) X/T&#124;X/RX (3) DBMS_AQ.BLOCKED (4) Optimize NONE X/T&#124;X/RX (3) NONE [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ludoix.wordpress.com&amp;blog=20873609&amp;post=67&amp;subd=ludoix&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<h1>4. Locking and Performance</h1>
<h2>4.1 Locking used by Lucene Domain Index</h2>
<table>
<tr>
<th> Operation </th>
<th> Base Table (row/table)  </th>
<th> Index Table (SCHEMA.IDX$T)  </th>
<th> Queue Table (SCHEMA.IDX$QT) </th>
</tr>
<tr>
<td> Insert </td>
<td> X/RX (1) </td>
<td> NONE </td>
<td> NONE </td>
</tr>
<tr>
<td> Update </td>
<td> X/RX </td>
<td> 	NONE </td>
<td> NONE </td>
</tr>
<tr>
<td> Delete </td>
<td> X/RX </td>
<td> NONE </td>
<td> NONE </td>
</tr>
<tr>
<td> Manually Sync </td>
<td> X/RS (2) </td>
<td> X/T|X/RX (3) </td>
<td> DBMS_AQ.BLOCKED (4) </td>
</tr>
<tr>
<td> Automatically Sync </td>
<td> X/RS (2) </td>
<td> X/T|X/RX (3) </td>
<td> DBMS_AQ.BLOCKED (4) </td>
</tr>
<tr>
<td> Optimize </td>
<td> NONE </td>
<td> X/T|X/RX (3) </td>
<td> NONE </td>
</tr>
</table>
<ol>
<li>X = Row exclusive lock at the row being inserted, RX = Table row exclusive lock &#8211; if index parameter LockMasterTable=true. </li>
<li>X = Row exclusive lock at the row being indexed, RS = Table row share lock. A select &#8230; for update no wait is performed at all rows being added to Lucene Index. </li>
<li>X/T this is a writer lock semaphore of Lucene Index and provide serialize write operations, the write lock is performed using lock table $SCHEMA.IDX$T. X/RX is performed at many rows of this table because Lucene creates and deletes many files. </li>
<li>To perform massive dequeue operations at DBMS AQ queue Sync scan this queue with DBMS_AQ.BLOCKED option. </li>
<ol>
<h2>4.2 Performance tips</h2>
<h3>4.2.1 Index Writer parameters</h3>
<p>Lucene Index Writer class uses several parameters to control his index structure. Lucene Domain Index pass to Index Writer several parameters such as MergeFactor, MaxBufferedDocs among others. As best practice if you want to index thousands of rows you can override default Lucene parameters for other which speed up indexing time. With create index or alter index rebuild you can set MergeFactor to 100 and MaxBufferedDocs to 4000. This parameters increase index performance but then DML operations at the base table will batch small set of rows, so after DDL commands change MergeFactor to 2 and MaxBufferedDocs to 100. A good place to start knowing these parameters behavior is the Wiki page <a href="http://wiki.apache.org/lucene-java/ImproveIndexingSpeed">Improving Indexing Speed</a>.</p>
<h3>4.2.2 Auto Tune Memory functionality</h3>
<p>Lucene Domain Index have a parameter called AutoTuneMemory a true value means that for Index Writer operations it will try to use up to 50% of the Java Pool Size configured at the Oracle SGA to adjust how many documents are buffered (MaxBufferedDocs) before call IndexWritter.flush().<br />
With AutoTuneMemory:true MergeFactor,MaxBufferedDocs,MaxMergeDocs are not required, its calculated using free RAM at the SGA, but you has to set MergeFactor. Due Java Pool Size is global parameter the rule is not valid if you want to create many index with parallel connexions, two connections will try to use 50% of the SGA, so one of them will ran out of memory.</p>
<h3>4.2.3 Keep Index on RAM</h3>
<p>OJVMDirectory replaces Lucene file system storage by a table storage with BLOBs. For every Lucene Domain Index created there is a new table which stores every Lucene file as a row with a BLOB column, see section 6 for more detail, using similar strategy as <a href="http://www.oracle.com/technology/products/text/htdocs/mem_load.html">Oracle Text</a> you can keep this table in RAM. Unlike Oracle Text which uses multiples tables for storing the inverted index, Lucene Domain Index use one table, execute this DDL command to keep Lucene Index on RAM:</p>
<p><pre class="brush: sql;">
create index source_small_lidx on test_source_small(text)
indextype is lucene.LuceneIndex parameters(
  'FormatCols:line(0000); ExtraCols:line &quot;line&quot;; Analyzer:org.apache.lucene.analysis.StopAnalyzer; MergeFactor:500');
alter index source_small_lidx parameters('MergeFactor:100');
alter table source_small_lidx$t storage (buffer_pool keep) modify lob (data) (storage (buffer_pool keep));
</pre></p>
<p>During Index creation use AutoTuneMemory:true (default value). Finally change OJVMDirectory storage table and LOB to keep them in RAM. Be sure that your SGA has a enough RAM to keep it. To know how big your index you can query the table:</p>
<p><pre class="brush: sql;">
SQL&gt; select sum(file_size) from source_small_lidx$t where deleted='N';
SUM(FILE_SIZE)
--------------
        147444
</pre></p>
<p>Finally as Tom Kyte say, tkprof, tkprof, &#8230;. <img src='http://s1.wp.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> . You can see Lucene Domain Index IO operations with an &#8220;alter session set events &#8217;10046 trace name context forever, level 12&#8242;; then you can find operations at Lucene Domain Index table SCHEMA.IDX_NAME$T. Using TKPROF information you can alter table and lob storage parameters manually.</p>
<h3>4.2.4 Compare your execution plan</h3>
<p>To be sure that your Lucene Domain Index is properly used compare your executions plans and try to avoid non necessary filter by or sort order by predicates by using in-line sort or multiples field Query Parser conditions. Here examples of sorting using emails table created in section 3.1.4:</p>
<p><pre class="brush: sql;">
SQL&gt; explain plan for
  2  SELECT subject FROM emails where lcontains(bodytext,'security',1)&gt;0
  3  order by subject ASC;
</pre></p>
<pre>
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------
Plan hash value: 1542204867
Id   Operation                    Name           Rows  Bytes  Cost (%CPU)  Time
0    SELECT STATEMENT                            1     4016   3 (34)       00:00:01
1    SORT ORDER BY                               1     4016   3 (34)       00:00:01
2    TABLE ACCESS BY INDEX ROWID  EMAILS         1     4016   2  (0)       00:00:01
* 3  DOMAIN INDEX                 EMAILBODYTEXT

Predicate Information (identified by operation id):
-----------------------------------------------------------------------------------
3 - access("LUCENE"."LCONTAINS"("BODYTEXT",'security',1)&gt;0)
</pre>
<p>Above execution plan tells that you are using Lucene Domain Index but you can get a better optimizer plan by using lcontains sort:</p>
<p><pre class="brush: sql;">
SQL&gt; explain plan for
  2  SELECT /*+ DOMAIN_INDEX_SORT */ subject FROM emails
  3  where lcontains(bodytext,'security','subject:ASC',1)&gt;0;
</pre></p>
<pre>
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------
Plan hash value: 1450245214
Id   Operation                    Name           Rows  Bytes  Cost (%CPU)  Time
0    SELECT STATEMENT                            1     4016   2 (0)        00:00:01
1    TABLE ACCESS BY INDEX ROWID  EMAILS         1     4016   2 (0)        00:00:01
* 2  DOMAIN INDEX                 EMAILBODYTEXT

Predicate Information (identified by operation id):
-----------------------------------------------------------------------------------
2 - access("LUCENE"."LCONTAINS"("BODYTEXT",'security','subject:ASC',1)&gt;0)
</pre>
<p>Here we have a better optimizer plan and lower cost.</p>
<h3>4.2.5 Filtering and sorting at index level</h3>
<p>This functionality only available on <a href="http://download.oracle.com/docs/cd/E11882_01/text.112/e10944/csql.htm#CCREF0105">Oracle 11g</a> is valid for Lucene Domain Index in 10g/11g databases and also for standard edition version. The performance improvement is done when most of the rows can be filtered and sorted at index level, to do that you have to push the value of the column(s) involved in filter by or order by at index level during index creation. The syntax differs from Oracle Text but the performance improve is similar, let see an example:</p>
<p><pre class="brush: sql;">
-- Oracle Text 11g syntax
create index source_big_idx on test_source_big(text) indextype is ctxsys.context
  filter by line
  order by line;
-- Lucene Domain Index syntax
create index source_big_lidx on test_source_big(text) indextype is lucene.luceneindex parameters(
  'PerFieldAnalyzer:line(org.apache.lucene.analysis.KeywordAnalyzer),TEXT(org.apache.lucene.analysis.SimpleAnalyzer);
  FormatCols:line(0000);
  ExtraCols:line &quot;line&quot;');
</pre></p>
<p>Note that in both cases we choose line as the filter/order by column. Now let see the execution plan and auto trace for an equivalent query using 11g syntax and Lucene Domain Index syntax.</p>
<p><a href="http://ludoix.files.wordpress.com/2011/04/ddgw7sjp_575ghw55ps9_b.png"><img src="http://ludoix.files.wordpress.com/2011/04/ddgw7sjp_575ghw55ps9_b.png?w=300&#038;h=120" alt="" title="ddgw7sjp_575ghw55ps9_b" width="300" height="120" class="aligncenter size-medium wp-image-132" /></a></p>
<p><a href="http://ludoix.files.wordpress.com/2011/04/ddgw7sjp_576cdhjshc7_b.png"><img src="http://ludoix.files.wordpress.com/2011/04/ddgw7sjp_576cdhjshc7_b.png?w=300&#038;h=150" alt="" title="ddgw7sjp_576cdhjshc7_b" width="300" height="150" class="aligncenter size-medium wp-image-130" /></a></p>
<p><a href="http://ludoix.files.wordpress.com/2011/04/ddgw7sjp_577g89chnfv_b.png"><img src="http://ludoix.files.wordpress.com/2011/04/ddgw7sjp_577g89chnfv_b.png?w=300&#038;h=72" alt="" title="ddgw7sjp_577g89chnfv_b" width="300" height="72" class="aligncenter size-medium wp-image-131" /></a></p>
<p><a href="http://ludoix.files.wordpress.com/2011/04/ddgw7sjp_578cbx9m5gq_b.png"><img src="http://ludoix.files.wordpress.com/2011/04/ddgw7sjp_578cbx9m5gq_b.png?w=300&#038;h=145" alt="" title="ddgw7sjp_578cbx9m5gq_b" width="300" height="145" class="aligncenter size-medium wp-image-129" /></a></p>
<p>To see the real impact on the performance of using a filter by at index level let see the time involved of an equivalent query:</p>
<p><pre class="brush: sql;">
select count(line) from test_source_big
  where lcontains(text,'varchar2 AND line:[2600 TO 9000]')&gt;0;
  2
COUNT(LINE)
-----------
587
Elapsed: 00:00:00.03

select count(line) from test_source_big
  where lcontains(text,'varchar2')&gt;0 and line&gt;=2600;
  2
COUNT(LINE)
-----------
587
Elapsed: 00:00:00.89
</pre></p>
<p>The point here is that into the first example Lucene Domain Index performs the two operations:</p>
<ol>
<li>find all the rows which contains the word varchar2</li>
<li>filter the rows that only have line in a range 2600 to 9000</li>
</ol>
<p>returning only the rows (587) that match both sentence, for the second example the RDBMS:</p>
<ol>
<li>look for the rowid that contains the word varchar2 (19963),</li>
<li>visit above rows looking for the value of the column line and filter all that are &gt;=2600</li>
</ol>
<p>the difference between the rows visited by the RDBMS is the difference on the performance.</p>
<h3>4.2.6 OnLine mode, ParallelDegree and IndexOnRam</h3>
<p>Starting with 2.9.2.1.1 and 3.0.1.1.0 version inserts are performed in parallel if ParallelDegree is greater than 1 and  SyncMode:OnLine, in that case an AQ slave process will create temporary Lucene index adding the rows being indexed, this index is created in RAM if IndexOnRam:true or in disk otherwise, once the index contains the batch of rows added the temporary slave index is merged with the master storage. This speed up massive index additions such as index rebuild, index creation or insert .. into .. select .. from DML operations. Parallel index operations are important in servers which have multiples cores or RAC installations because Oracle AQ starts parallel process doing the job, an <a href="https://docs.google.com/present /view?id=ddgw7sjp_156gf9hczxv">SQL trace of WikiPedia dump indexing</a> shows that mostly of time is involved in the scan of table loading the data which is indexed, so using parallel indexing increase the throughput of Lucene Domain Index in multi-core chips.<br />
Oracle 11g AQ implementation checks how many milliseconds are consumed by an AQ Callback, so choosing a BatchCount too bigger causes that no other slave process is started by Oracle AQ engine, the experience with BatchCount values in a range starting with 100 to 500 is good value to guaranty a correct parallel operation. Following screenshot shows multiple AQ process indexing WikiPedia dump:</p>
<p><a href="http://ludoix.files.wordpress.com/2011/04/ddgw7sjp_574gbjpjhcb_b.png"><img src="http://ludoix.files.wordpress.com/2011/04/ddgw7sjp_574gbjpjhcb_b.png?w=300&#038;h=72" alt="" title="ddgw7sjp_574gbjpjhcb_b" width="300" height="72" class="aligncenter size-medium wp-image-126" /></a></p>
<h3>4.2.7 Parallel Shared Slave Index Scan (available since 3.0.2.1.0)</h3>
<p>Starting with 3.0.2.1.0 version a parallel shared slave index scan process is started automatically when your DB start, this process is accepting RMI connections from the other Oracle internal process, that is, once a connection from a client is accepted by the RDBMS a dedicated or shared server is started to performs the SQL operations, this process have the internal OJVM associated to execute the LDI operations. Previous to 3.0.2.1.0 each OJVM have his internal Lucene structures to query the inverted index, due each OJVM is isolated from another if two concurrent connections executes the same SQL operations on LDI each process will load the inverted index structure on RAM and performs the Hit collector operation. The new process now do the same operations but is shared by all the OJVM process which connect to him using RMI, the consequence is that only the first query will load the inverted index structure on RAM, next queries coming from the same or different OJVM process will re-use these structures on RAM. This architectural change increase a lot the Cache Hit rate decreasing the time to performs lcontains operations and reducing the RAM usage by LDI. The screenshot below shows this concept in action:</p>
<p><a href="http://ludoix.files.wordpress.com/2011/04/ddgw7sjp_581gtrxtcdb_b.png"><img src="http://ludoix.files.wordpress.com/2011/04/ddgw7sjp_581gtrxtcdb_b.png?w=300&#038;h=108" alt="" title="ddgw7sjp_581gtrxtcdb_b" width="300" height="108" class="aligncenter size-medium wp-image-127" /></a></p>
<p>There are several process named oracletest (LOCAL=NO) these are dedicated process associated to each client connections, there is one process named ora_j009_test, this process is the slave shared server which is performing the search operations on Lucene Index on behalf of the others. The parallel slave search process is started and stopped automatically by two instance triggers registered at SYS schema, these triggers are after startup and before shutdown events. If you disable these triggers Lucene Domain Index back to previous functionality which means every OJVM process has his own memory structures and do the index scan without dispatching RMI calls.</p>
<h2>4.3 Know caveats</h2>
<ol>
<li>Lucene Domain Index uses Java Util Logging API and RMI to connect to the search process, it means that a grant is required to create and operate on LDI, for example:<br />
<pre class="brush: sql;">
grant LUCENEUSER to scott;
</pre></li>
<li>SyncMode:OnLine should be reserved only for index which a number of update/insert/delete operation are too small compared to select operations, because each message process requires almost open an IndexWriter/IndexReader on the associated Lucene Index by a background process, except for bulk collect operation or &#8220;insert into &#8230; select &#8230; from&#8221; which are processed in batch off 150 rows. Tables with many insert/update operations by seconds should use LuceneDomainIndex.sync(idx) procedure called by DBMS_JOB periodically or by the application.</li>
<li>Syntax for Inline pagination is only supported at the beginning of the Query, it means that if you want to perform pagination using lcontains() query syntax it must start with &#8220;rownum:[n TO m] AND&#8221; note that this syntax is case sensitive. Also this extraction is performed by splitting the query by position and does not take into account grouping operator, so this query &#8220;rownum:[1 TO 10] AND word1 OR word2&#8243; will be passed to Lucene&#8217;s Query Parser as &#8220;word1 OR word2&#8243; which is not semantically the original one if you look to the precedence operator. We can try to modify Query Parser class in a future to solve this semantic issues.</li>
<li>Columns name are case sensitive in ExtraCols and FormatCols parameters using traditional SQL behavior, it means that for this DDL index creation:<br />
<pre class="brush: sql;">
create index it1 on t1(f2)
  indextype is lucene.LuceneIndex
    parameters('Stemmer:English;FormatCols:F2(zzzzzzzzzzzzzzz),F3(00.00);ExtraCols:F3');
</pre><br />
You can use ExtraCols with f3 or F3 but FormatCols should be F3 because f3 is returned by the SQL select operation as F3 during the table full scan, also Lucene Index will have a document with a Field F3 instead of f3. If you want to use f3 as is you can re-write DDL index creation with:<br />
<pre class="brush: sql;">
create index it1 on t1(f2)
  indextype is lucene.LuceneIndex
    parameters('Stemmer:English;FormatCols:F2(zzzzzzzzzzzzzzz),f3(00.00);ExtraCols:F3 &quot;f3&quot;');
</pre><br />
With this sentence Lucene will create documents with two field F2 and f3, F2 is uppercase because is the master column of the index and his passed as &#8220;F2&#8243; by ODCI API but, due is the default Field of the query, you can omit his name at lcontains syntax, F3 now is lowercase and will be indexed as a Field &#8220;f3&#8243;.</li>
<li>Index parameters are pre-cached in memory for faster response. Due isolation behaviour of Oracle JVM sessions, if you call to alter index or re-create a new one in another session you need to close all SQL session that have a pre-load index parameter storage.<br />
Calling to LuceneDomainIndex.getParameter(&#8216;owner.index_name&#8217;,'parameter_name&#8217;) you can see the values of any parameter passed to the ODCI API either by calling create index or alter index. Otherwise you can call to LuceneDomainIndex.refreshParameterCache stored procedure.</li>
<li>If you re-install Lucene Domain Index without deleting existing indexes you can manually drop resources associated to and old index. For example:<br />
<pre class="brush: sql;">
drop index source_big_lidx force;
Index dropped.
select table_name from tabs;

TABLE_NAME
------------------------------
DEPT
EMP
BONUS
SALGRADE
SOURCE_BIG_LIDX$QT
DR$SOURCE_BIG_IDX$I
DR$SOURCE_BIG_IDX$R
SOURCE_BIG_LIDX$T
TEST_SOURCE_BIG
DR$SOURCE_BIG_IDX$N
DR$SOURCE_BIG_IDX$K

11 rows selected.

drop table SOURCE_BIG_LIDX$T;
Table dropped.

conn / as sysdba
connected.

exec DBMS_AQADM.STOP_QUEUE ('SCOTT.SOURCE_BIG_LIDX$Q');
PL/SQL procedure successfully completed.

exec DBMS_AQADM.DROP_QUEUE ('SCOTT.SOURCE_BIG_LIDX$Q');
PL/SQL procedure successfully completed.

exec DBMS_AQADM.DROP_QUEUE_TABLE(queue_table  =&gt; 'SCOTT.SOURCE_BIG_LIDX$QT', force=&gt;true);
PL/SQL procedure successfully completed.

exit
</pre><br />
Note that &#8220;drop index &#8230; force&#8221; will de-register Lucene Domain Index from Oracle&#8217;s system views, then Lucene Domain Index storage&#8217;s table is manually dropped, finally connected as SYS Lucene Domain Index AQ&#8217;s table is dropped.</li>
<li>Oracle 11g have a know bug &#8220;6445561 &#8211; ORA-00600 [26599] [62] DUE TO INCORRECT PERSISTENCE OF BY INVOKER PIN&#8221; please apply patch number p6445561_111060_LINUX.zip available at Metalink, this bug affects select count(*) with a large results.</li>
<li>Up to Lucene Domain Index 2.9.0 there is known problem with the WhereCondition parameter using OR SQL operator, see section A.3.3 to see the workaround.</li>
</ol>
<h2>Doc Links</h2>
<p><a href="http://ludoix.wordpress.com/2011/03/11/ldi-docs-3-procedures-functions-operators-and-examples/">Previous / LDI Docs – 3 Procedures, Functions, Operators and Examples</a><br />
<a href="http://ludoix.wordpress.com/2011/03/11/ldi-docs-appendix-a-parameter-reference-and-syntax/">Next / LDI Docs – Appendix A (Parameter reference and syntax)</a></p>
<br />Filed under: <a href='http://ludoix.wordpress.com/category/documentation/'>Documentation</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ludoix.wordpress.com/67/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ludoix.wordpress.com/67/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ludoix.wordpress.com/67/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ludoix.wordpress.com/67/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ludoix.wordpress.com/67/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ludoix.wordpress.com/67/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ludoix.wordpress.com/67/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ludoix.wordpress.com/67/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ludoix.wordpress.com/67/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ludoix.wordpress.com/67/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ludoix.wordpress.com/67/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ludoix.wordpress.com/67/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ludoix.wordpress.com/67/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ludoix.wordpress.com/67/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ludoix.wordpress.com&amp;blog=20873609&amp;post=67&amp;subd=ludoix&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://ludoix.wordpress.com/2011/03/11/ldi-docs-4-locking-and-performance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/3690228eaee545ab583565df1e16b64a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ludoix</media:title>
		</media:content>

		<media:content url="http://ludoix.files.wordpress.com/2011/04/ddgw7sjp_575ghw55ps9_b.png?w=300" medium="image">
			<media:title type="html">ddgw7sjp_575ghw55ps9_b</media:title>
		</media:content>

		<media:content url="http://ludoix.files.wordpress.com/2011/04/ddgw7sjp_576cdhjshc7_b.png?w=300" medium="image">
			<media:title type="html">ddgw7sjp_576cdhjshc7_b</media:title>
		</media:content>

		<media:content url="http://ludoix.files.wordpress.com/2011/04/ddgw7sjp_577g89chnfv_b.png?w=300" medium="image">
			<media:title type="html">ddgw7sjp_577g89chnfv_b</media:title>
		</media:content>

		<media:content url="http://ludoix.files.wordpress.com/2011/04/ddgw7sjp_578cbx9m5gq_b.png?w=300" medium="image">
			<media:title type="html">ddgw7sjp_578cbx9m5gq_b</media:title>
		</media:content>

		<media:content url="http://ludoix.files.wordpress.com/2011/04/ddgw7sjp_574gbjpjhcb_b.png?w=300" medium="image">
			<media:title type="html">ddgw7sjp_574gbjpjhcb_b</media:title>
		</media:content>

		<media:content url="http://ludoix.files.wordpress.com/2011/04/ddgw7sjp_581gtrxtcdb_b.png?w=300" medium="image">
			<media:title type="html">ddgw7sjp_581gtrxtcdb_b</media:title>
		</media:content>
	</item>
		<item>
		<title>LDI Docs &#8211; 3 Procedures, Functions, Operators and Examples</title>
		<link>http://ludoix.wordpress.com/2011/03/11/ldi-docs-3-procedures-functions-operators-and-examples/</link>
		<comments>http://ludoix.wordpress.com/2011/03/11/ldi-docs-3-procedures-functions-operators-and-examples/#comments</comments>
		<pubDate>Fri, 11 Mar 2011 17:34:49 +0000</pubDate>
		<dc:creator>ludoix</dc:creator>
				<category><![CDATA[Documentation]]></category>
		<category><![CDATA[example]]></category>
		<category><![CDATA[export]]></category>
		<category><![CDATA[function]]></category>
		<category><![CDATA[operator]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[procedure]]></category>

		<guid isPermaLink="false">http://ludoix.wordpress.com/?p=47</guid>
		<description><![CDATA[3. Procedures, Functions, Operators and Examples Before you start to work on through the examples below, do grant the LUCENEUSER role to any dedicated Oracle user/schema, who has become selected to run the Lucene Domain Index (LDI). Remember, that you must not run a index within user/schema LUCENE. For example: 3.1 Create a Lucene Domain [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ludoix.wordpress.com&amp;blog=20873609&amp;post=47&amp;subd=ludoix&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<h1>3. Procedures, Functions, Operators and Examples</h1>
<p>Before you start to work on through the examples below, do grant the <code>LUCENEUSER</code> role to any dedicated Oracle user/schema, who has become selected to run the <em>Lucene Domain Index</em> (<em>LDI</em>). Remember, that you must not run a index within user/schema <code>LUCENE</code>. For example:</p>
<p><pre class="brush: sql;">
 -- connected as sysdba
  grant LUCENEUSER to scott;
</pre></p>
<h2>3.1 Create a Lucene Domain Index</h2>
<h3>3.1.1 Single column index</h3>
<p>The first example creates a domain index on table <code>t1</code>, column <code>f2</code> using <em>Lucene&#8217;s</em> <code>SimpleAnalyzer</code> along with the <code>Analyzer</code> parameter. After execution, a new index, <code>T1.IT1</code>, and two new <em>LDI</em>-tables, the index storage table, <code>IT1$T</code>, and the index queue table (AQ), <code>IT1$QT</code>, will have been added to the user&#8217;s schema. Because of the generated objects, <strong>can no Lucene Domain Index name be longer than 21 characters</strong>! This is due the secondary generation of the <em>Oracle AQ</em> table name.</p>
<p><pre class="brush: sql;">
  create table t1 (
    f1 number,
    f2 varchar2(200),
    f3 varchar2(200),
    f4 number unique);

  create index it1 on t1(f2) indextype is lucene.LuceneIndex
    parameters('Analyzer:org.apache.lucene.analysis.SimpleAnalyzer');
</pre></p>
<p><span id="more-47"></span></p>
<p>Another simple example may employ a stemmer instead of an analyzer. A stemmer is kind of a language specific analyzer and behaves comparable to <em>Lucene&#8217;s</em> <code>StandardAnalyzer</code>. The <em>Lucene</em> stemming approach used here, the <code>SnowballAnalyzer</code>, is based on the <em>Snowball</em> code stack (<a href="http://snowball.tartarus.org">snowball.tartarus.org</a>). The employment of a stemmer requires setting the <code>Stemmer</code> parameter. Since a stemmer is somehow an extended analyzer, any <code>Stemmer</code> parameter in a parameter list will override any <code>Analyzer</code> parameter given as well. See <a href="http://ludoix.wordpress.com/2011/03/11/ldi-docs-appendix-a-parameter-reference-and-syntax">Appendix A</a> of the <em>LDI</em> docs for more information about analyzing and stemming.</p>
<p><pre class="brush: sql;">
  create index it1 on t1(f2) indextype is lucene.LuceneIndex
    parameters('Stemmer:English');
</pre></p>
<h3>3.1.2 Multiple columns</h3>
<p>Extending the previous example, you can also index additional columns of the same base table, in sum called compound columns. This requires the usage of the <code>ExtraCols</code> parameter, handing in the column identifier, that can be modified with an alias name, iff required. Note that the parameter list provided for a <code>create</code> or <code>alter index</code> <strong>needs to be a string without any line breaks and without any dispensible whitespace characters</strong>. Here is the code:</p>
<p><pre class="brush: sql;">
  create index it1 on t1(f2) indextype is lucene.LuceneIndex
    parameters('Stemmer:English;ExtraCols:F1 &quot;f1&quot;');
</pre></p>
<p>Note that employing an alias name also modifies the syntax of field searches with <code>lcontains</code>. That is, executing the following search will not result in any rows returned: </p>
<p><pre class="brush: sql;">
  select * from t1 where lcontains(f2, 'F1:xyz', 1) &gt; 0;
</pre></p>
<p>You have stick to the exact alias field name, with an appropriate character case instead:</p>
<p><pre class="brush: sql;">
  select * from t1 where lcontains(f2, 'f1:xyz', 1) &gt; 0;
</pre></p>
<p>However, the column actually indexed can be searched using an all uppercase field identifier or just no field identifier at all:</p>
<p><pre class="brush: sql;">
  select * from t1 where lcontains(f2, 'xyz', 1) &gt; 0;
  select * from t1 where lcontains(f2, 'F2:xyz', 1) &gt; 0;
</pre></p>
<p><strong>gennff: the following statement and example may be misleading because oracle is sometimes smart enough to detect blind updates and does not fire any update trigger under that circumstance. this technique needs to be checked and proved in detail.</strong></p>
<p>Because <em>Oracle ODCI API</em> will not detect changes on other columns than the indexed master column, you need to create a trigger that will fire on any update on columns in the <code>ExtraCols</code> list to rewrite the master column. Such a way, any changes on <code>f1</code> will also force to change <code>f2</code>. ODCI will notify <em>LDI</em> that an specific rowid was updated and <em>LDI</em>, being based on the specific parameter definition of this index, will update the index tables to reflect the changes. Here is an example:</p>
<p><pre class="brush: sql;">
  CREATE OR REPLACE TRIGGER L$IT1 BEFORE UPDATE OF f1 ON t1 FOR EACH ROW
  BEGIN
    :new.f2 := :new.f2;
  END;
  /
</pre></p>
<h3>3.1.3 Multiple tables</h3>
<p><em>LDI</em> also supports indexing multiple columns over multiple tables, that can be joined in a natural form. All to be done ist to define a list of tables with the <code>ExtraTabs</code> parameter as well as to specify a where condition with the <code>WhereCondition</code> parameter. The master table alias <code>L$MT</code> is automatically introduced by <em>LDI</em> and can be applied without any preparations. This alias is furthermore important, for example, to create complex joins with xml tables, that imply the use of the <code>existsNode</code> or <code>extracValue</code> operators. This functionality has been added starting with the <code>2.9.0.1.0</code> release of <em>LDI</em>. Here is an example:</p>
<p><pre class="brush: sql;">
  create table t2 (
    f4 number primary key,
    f5 VARCHAR2(200));

  create table t1 (
    f1 number,
    f2 VARCHAR2(4000),
    f3 number,
    CONSTRAINT t1_t2_fk FOREIGN KEY (f3)
      REFERENCES t2(f4) ON DELETE cascade);

  create index it1 on t1(f3) indextype is lucene.LuceneIndex
    parameters('ExtraCols:L$MT.f2 &quot;f2&quot;,t2.f5 &quot;f5&quot;;ExtraTabs:t2;WhereCondition:L$MT.f3=t2.f4');
</pre></p>
<p>Note that the tables <code>t1</code> and <code>t2</code> are joined directly by a foreign key, such that <code>t1</code> could be considered a child table of <code>t2</code>. Using this set of parameters, when the ODCI API detects a change on the index master column <code>t1.f3</code>, a select like this will be executed:</p>
<p><pre class="brush: sql;">
  select L$MT.f3, L$MT.f2 &quot;f2&quot;, t2.f5 &quot;f5&quot;
  from t1 L$MT, t2
    where L$MT.rowid=? and L$MT.f3=t2.f4;
</pre></p>
<p><strong>gennff: the following example may be wrong in terms of the t2 trigger. that is, the it1 index will never see any inserts and/or deletes on table t2 and will therefore hardly stay in sync with the actual table data. in general, i don&#8217;t think it is an easy to follow approach to use an index on a child table in this example (t1 is a child of t2 since it references it by fk).</strong></p>
<p>Keeping the <em>LDI</em> index in sync with the changes upon any columns (and tables) defined with <code>ExtraCols</code> (and <code>ExtraTabs</code>) parameters is no more complex than the compound columns on a single table example above. It just requires a combination of two or more triggers, according to the number of additional tables in the index definition. The additional trigger determines all rowids at the master table who have a reference to the row in change and then employs the <code>LuceneDomainIndex.enqueueChange</code> procedure to just notify LDI about the changes, while <code>sys.ODCIRidList</code> is a special ODCI structure to hold a group of rowids. Here you go:</p>
<p><pre class="brush: sql;">
  CREATE OR REPLACE TRIGGER L$IT1 BEFORE UPDATE OF f2 ON t1 FOR EACH ROW
  BEGIN
    :new.f3 := :new.f3;
  END;
  /
  CREATE OR REPLACE TRIGGER LT$IT1 AFTER UPDATE OF f5 ON t2 FOR EACH ROW
    DECLARE
        ridlist sys.ODCIRidList;
    BEGIN
      SELECT ROWID BULK COLLECT INTO ridlist FROM T1 WHERE F3=:NEW.f4;
      LuceneDomainIndex.enqueueChange(USER || '.IT1', ridlist, 'update');
    END;
  /
</pre></p>
<h3>3.1.4 Padding and formatting</h3>
<p><em>Lucene Domain Index</em> can be customized with a parameter named <code>UserDataStore</code>, that defines which class is responsible for creating <em>Lucene</em> documents. A <em>Lucene</em> document is a list of fields for each column indexed, plus an extra field named rowid, being stored compressed and untokenized. By default <code>UserDataStore</code> is defined to be implemented by <code>org.apache.lucene.indexer.DefaultUserDataStore</code>.</p>
<p>Default <code>UserDataStore</code> supports left padding for <code>NUMBER</code> or <code>FLOAT</code> columns as well as left character padding for <code>VARCHAR2</code> or <code>CHAR</code> columns. To define padding, use the <code>FormatCols</code> parameter within the <code>create</code> or <code>alter index</code> DDL command. Here is an example that automatically pads all <code>F2</code> column values to 15 characters and formats all F3 column values to comply to 00.00:</p>
<p><pre class="brush: sql;">
  create table t1 (
    f1 number primary key,
    f2 varchar2(200),
    f3 number(4,2))
  ORGANIZATION INDEX;

  insert into t1 values (1, 'ravi', 3.46);
  insert into t1 values (3, 'murthy', 15.87);
  commit;

  create index it1 on t1(f2) indextype is lucene.LuceneIndex
    parameters('Stemmer:English;ExtraCols:F3;FormatCols:F2(zzzzzzzzzzzzzzz),F3(00.00)');
</pre></p>
<p>then these rows will be indexed as Lucene documents like:</p>
<p><pre class="brush: plain;">
FINE: Document&lt;stored,indexed&lt;rowid:*BAGAGfsCwQL+&gt; indexed,tokenized&lt;F2:zzzzzzzzzzzravi&gt; indexed&lt;F3:03,46&gt;&gt;
FINE: Document&lt;stored,indexed&lt;rowid:*BAGAGfsCwQT+&gt; indexed,tokenized&lt;F2:zzzzzzzzzmurthy&gt; indexed&lt;F3:15,87&gt;&gt;
</pre></p>
<p>For columns based on Oracle XMLType, <code>FormatCols</code> parameter can be used to define an XPath expression which controls a subset of XML nodes to be indexed.</p>
<p><pre class="brush: sql;">
  create table t1 (
    f1 VARCHAR2(10),
    f2 XMLType);
  insert into t1 values ('1', XMLType('&lt;emp id=&quot;1&quot;&gt;&lt;name&gt;ravi&lt;/name&gt;&lt;/emp&gt;'));
  insert into t1 values ('3', XMLType('&lt;emp id=&quot;3&quot;&gt;&lt;name&gt;murthy&lt;/name&gt;&lt;/emp&gt;'));
  commit;

  create index it1 on t1(f1) indextype is lucene.LuceneIndex
    parameters('Analyzer:org.apache.lucene.analysis.WhitespaceAnalyzer;ExtraCols:F2;FormatCols:F1(000),F2(/emp/name)');
</pre></p>
<p>Inspecting the trace output of the <code>create index</code> statement, above rows will be indexed as:</p>
<p><pre class="brush: plain;">
FINE: Document&lt;stored,indexed&lt;rowid:AAAWqqAAGAAABodAAA&gt; indexed,tokenized&lt;F1:001&gt; indexed,tokenized&lt;F2:ravi &gt;&gt;
FINE: Document&lt;stored,indexed&lt;rowid:AAAWqqAAGAAABodAAB&gt; indexed,tokenized&lt;F1:003&gt; indexed,tokenized&lt;F2:murthy &gt;&gt;
</pre></p>
<p>Columns of type <code>VARCHAR2</code>, <code>CHAR</code> and <code>CLOB</code> allow for some special formatting options, namely <code>NOT_ANALYZED</code>, <code>NOT_ANALYZED_STORED</code>, <code>ANALYZED_WITH_OFFSETS</code>, <code>ANALYZED_WITH_POSITIONS</code> and <code>ANALYZED_WITH_POSITIONS_OFFSETS</code>. These options control the way <code>UserDataStore</code> chooses to analyze/index, store and vectorize the field values. For example, a field value that only holds one token per document may be analyzed/indexed without tokenization and also not be stored with the index to be employed as a sort field:</p>
<p><strong>gennff: the example should also include sorting for numbers, probably using the NumericField class as of Lucene &gt;= 2.9 .</strong></p>
<p><pre class="brush: sql;">
  create table emails (
    emailFrom VARCHAR2(256),
    emailTo VARCHAR2(256),
    subject VARCHAR2(4000),
    emailDate DATE,
    bodyText CLOB);

  INSERT INTO EMAILS (EMAILFROM, EMAILTO, SUBJECT, EMAILDATE, BODYTEXT)
    VALUES ('EMAILFROM', 'EMAILTO', 'SUBJECT', sysdate, 'BODYTEXT');
  commit;

  create index emailbodyText on emails(bodyText) indextype is lucene.LuceneIndex
    parameters('Analyzer:org.apache.lucene.analysis.StopAnalyzer;ExtraCols:emailDate &quot;emailDate&quot;,subject &quot;subject&quot;,emailFrom &quot;emailFrom&quot;,emailTo &quot;emailTo&quot;;FormatCols:subject(NOT_ANALYZED),emailFrom(NOT_ANALYZED),emailTo(NOT_ANALYZED);LogLevel:ALL');
</pre></p>
<p>Then the trace output for the row processed looks like this, where the fields <code>subject</code>, <code>emailFrom</code> and <code>emailTo</code> are only indexed but not tokenized by the analyzer (obviously, <code>emailDate</code> is also only indexed because the analyzer is smart enough to deduce this approach from the datatype of the value):</p>
<p><pre class="brush: plain;">
FINE: Document&lt;stored,indexed&lt;rowid:AAAWvbAAGAAABrrAAA&gt; indexed,tokenized&lt;BODYTEXT:BODYTEXT&gt; indexed&lt;emailDate:20110617&gt; indexed&lt;subject:SUBJECT&gt; indexed&lt;emailFrom:EMAILFROM&gt; indexed&lt;emailTo:EMAILTO&gt;&gt;
</pre></p>
<p>Another example, using the <code>ANALYZED_WITH_VECTORS</code> option of the <code>FormatCols</code> parameter prints:</p>
<p><pre class="brush: plain;">
FINE: Document&lt;stored,indexed&lt;rowid:AAAW5IAAGAAABqOAAG&gt; stored,indexed,tokenized,termVector&lt;STR1:20.12.2010&gt; stored,indexed,tokenized,termVector&lt;STR2:20.12.2010&gt;&gt;
</pre></p>
<p>Iff you go and read the <em>Lucene</em> documentation, you won&#8217;t find the options introduced above. They have been set up by <em>LDI</em> to define a mapping to <em>Lucene&#8217;s</em> concept of field, that is:</p>
<ul>
<li>analysis/indexing: (how will the value be searchable via the inverted index</li>
<li>storage: is there any need to preserve the original value with the index as well, for display purposes probably and</li>
<li>term vectorization: used for highlightning, categrization etc.</li>
</ul>
<p>Here is the mapping table where the first row depicts the default value, i.e. when no <code>FormatCols</code> parameter is given:</p>
<table>
<tbody>
<tr>
<th>LDI format option</th>
<th>Lucene field analysis/indexing (Field.Index.*)</th>
<th>Lucene field storage (Field.Store.*)</th>
<th>Lucene field term vectorization (TermVector.*)</th>
</tr>
<tr>
<td><em>default</em></td>
<td>ANALYZED</td>
<td>NO</td>
<td>NO</td>
</tr>
<tr>
<tr>
<td>ANALYZED_\<br />STORED</td>
<td>ANALYZED</td>
<td>YES</td>
<td>NO</td>
</tr>
<tr>
<td>ANALYZED_\<br />WITH_VECTORS</td>
<td>ANALYZED</td>
<td>YES</td>
<td>YES</td>
</tr>
<tr>
<td>ANALYZED_\<br />WITH_OFFSETS</td>
<td>ANALYZED</td>
<td>YES</td>
<td>WITH_OFFSETS</td>
</tr>
<tr>
<td>ANALYZED_\<br />WITH_POSITIONS</td>
<td>ANALYZED</td>
<td>YES</td>
<td>WITH_POSITIONS</td>
</tr>
<tr>
<td>ANALYZED_\<br />WITH_POSITIONS_\<br />OFFSETS</td>
<td>ANALYZED</td>
<td>YES</td>
<td>WITH_POSITIONS_\<br />OFFSETS</td>
</tr>
<tr>
<td>NOT_ANALYZED</td>
<td>NOT_ANALYZED</td>
<td>NO</td>
<td>NO</td>
</tr>
<tr>
<td>NOT_ANALYZED_\<br />STORED</td>
<td>NOT_ANALYZED</td>
<td>YES</td>
<td>NO</td>
</tr>
</tbody>
</table>
<p>The following table depicts mappings that regard the <code>*_NORMS</code> options to describe index time boosting of terms. It is just an extension of the preceding table, this time incorporation the <code>*_NORMS</code> option.</p>
<table>
<tbody>
<tr>
<th>LDI format option</th>
<th>Lucene field analysis/indexing (Field.Index.*)</th>
<th>Lucene field storage (Field.Store.*)</th>
<th>Lucene field term vectorization (TermVector.*)</th>
</tr>
<tr>
<td>ANALYZED_\<br />NO_NORMS</td>
<td>ANALYZED_\<br />NO_NORMS</td>
<td>NO</td>
<td>NO</td>
</tr>
<tr>
<tr>
<td>ANALYZED_\<br />NO_NORMS_\<br />STORED</td>
<td>ANALYZED_\<br />NO_NORMS</td>
<td>YES</td>
<td>NO</td>
</tr>
<tr>
<td>ANALYZED_\<br />NO_NORMS_\<br />WITH_VECTORS</td>
<td>ANALYZED_\<br />NO_NORMS</td>
<td>YES</td>
<td>YES</td>
</tr>
<tr>
<td>ANALYZED_\<br />NO_NORMS_\<br />WITH_OFFSETS</td>
<td>ANALYZED_\<br />NO_NORMS</td>
<td>YES</td>
<td>WITH_OFFSETS</td>
</tr>
<tr>
<td>ANALYZED_\<br />NO_NORMS_\<br />WITH_POSITIONS</td>
<td>ANALYZED_\<br />NO_NORMS</td>
<td>YES</td>
<td>WITH_POSITIONS</td>
</tr>
<tr>
<td>ANALYZED_\<br />NO_NORMS_\<br />WITH_POSITIONS_\<br />OFFSETS</td>
<td>ANALYZED_\<br />NO_NORMS</td>
<td>YES</td>
<td>WITH_POSITIONS_\<br />OFFSETS</td>
</tr>
<tr>
<td>NOT_ANALYZED_\<br />NO_NORMS</td>
<td>NOT_ANALYZED_\<br />NO_NORMS</td>
<td>NO</td>
<td>NO</td>
</tr>
<tr>
<td>NOT_ANALYZED_\<br />NO_NORMS_\<br />STORED</td>
<td>NOT_ANALYZED_\<br />NO_NORMS</td>
<td>YES</td>
<td>NO</td>
</tr>
</tbody>
</table>
<p>Concluding, the following table introduces the possibility to directly index numeric values as numbers, dates and times as of <em>Lucene&#8217;s</em> release &gt;= <code>2.9</code> <code>NumericField</code> class. Since <em>LDI</em> currently only supports sorting by int and float (see § 3.4.4), the basic types <code>long</code> and <code>double</code> are not yet accepted. Specifying <code>NUMERIC_DATETIME</code> should be accompanied with an according <code>FormatCols</code> format to set the desired datetime resolution from year down to second, e.g. <code>FormatCols:revisionDate(day)</code>.</p>
<table>
<tbody>
<tr>
<th>LDI format option</th>
<th>Lucene field analysis/indexing (Field.Index.*)</th>
<th>Lucene field storage (Field.Store.*)</th>
<th>Lucene field term vectorization (TermVector.*)</th>
</tr>
<tr>
<td>NUMERIC_INT</td>
<td>n/a (NOT_ANALYZED)</td>
<td>NO</td>
<td>n/a (NO)</td>
</tr>
<tr>
<td>NUMERIC_FLOAT</td>
<td>n/a (NOT_ANALYZED)</td>
<td>NO</td>
<td>n/a (NO)</td>
</tr>
<tr>
<td>NUMERIC_DATETIME</td>
<td>n/a (NOT_ANALYZED)</td>
<td>NO</td>
<td>n/a (NO)</td>
</tr>
</tbody>
</table>
<h3>3.1.5 Functional columns</h3>
<p>Since some <code>ExtraCols</code> parameter list will just be added to the selected columns of the base table for indexing, one can also define additional or specialized fields for a <em>Lucene</em> index. Actually, any valid SQL expression in a select section is allowed here. For example, using the above table definition, one can introduce another <em>Lucene</em> field named <code>id</code>, that can even be a subject to formatting with the same statement. Again, be careful to use the same character case when writing down database column and <em>Lucene</em> field identifiers:</p>
<p><pre class="brush: sql;">
  create index it1 on t1(f1) indextype is lucene.LuceneIndex
    parameters('Analyzer:org.apache.lucene.analysis.standard.StandardAnalyzer;ExtraCols:F2,extractValue(F2,''/emp/@id'') &quot;id&quot;;FormatCols:F1(000),F2(/emp/name),id(00)');
</pre></p>
<p>The <em>LDI</em> indexer will use a statement roughly comparable to the following:</p>
<p><pre class="brush: plain;">
  select f1, F2, extractValue(F2,''/emp/@id'') &quot;id&quot; from t1;
</pre></p>
<p>Appropriate selects may look like this:</p>
<p><pre class="brush: sql;">
  select * from t1 where lcontains(f1, '001') &gt; 0;
  select * from t1 where lcontains(f1, 'id:01') &gt; 0;
  select * from t1 where lcontains(f1, 'F2:ravi') &gt; 0;
</pre></p>
<h3>3.1.6 Online synchronization</h3>
<p>If you put <code>SyncMode:OnLine</code>, in contrast to <code>SyncMode:Defered</code>, in the parameter list of your <code>create index</code> statement, <em>LDI</em> will set up a PLSQL AQ Callback to the indexes queue <code>it1$q</code> in <code>it1$qt</code>, see above, that immediately synchronizes all changes of <code>insert</code> or <code>update</code> statements with the <em>Lucene</em> index structure in the background. <em>LDI</em> will enqueue and process affected rowids of the master table in batches of <code>BatchCount</code> rows, where the default batch size is 115. For example:</p>
<p><pre class="brush: sql;">
  create index pages_lidx_all on pages p (value(p)) indextype is Lucene.LuceneIndex
    parameters('SyncMode:OnLine;LogLevel:WARNING;Stemmer:Spanish;ExtraCols:extractValue(object_value,''/page/title'') &quot;title&quot;,extractValue(object_value,''/page/revision/comment'') &quot;comment&quot;,extract(object_value,''/page/revision/text/text()'') &quot;text&quot;,extractValue(object_value,''/page/revision/timestamp'') &quot;revisionDate&quot;;IncludeMasterColumn:false;LobStorageParameters:PCTVERSION 0 ENABLE STORAGE IN ROW CHUNK 32768 CACHE READS FILESYSTEM_LIKE_LOGGING');
</pre></p>
<h3>3.1.7 Populate Index</h3>
<p>Using <code>PopulateIndex:false</code> with an <code>index create</code> statement will build an empty <em>Lucene</em> index structure that is ready for use but will of course not return any results on a <code>select</code> statement. You may execute an <code>alter index rebuild</code> statement afterwards to actually populate the index. Here is an example:</p>
<p><pre class="brush: sql;">
  create index it1 on t1(f2) indextype is lucene.LuceneIndex
    parameters('PopulateIndex:false;LogLevel:ALL;IncludeMasterColumn:false;ExtraCols:F1,extractValue(F2,''/emp/name/text()'') &quot;name&quot;,extractValue(F2,''/emp/@id'') &quot;id&quot;;FormatCols:F1(000),id(00)');
  -- At this point the index is set up but not populated, so no rows will be returned
  select lscore(1),f2 from t1 where lcontains(f2, 'name:ravi',1) &gt; 0;
  -- Populate the index
  alter index it1 rebuild parameters('Analyzer:org.apache.lucene.analysis.WhitespaceAnalyzer');
  -- Now, upon a query, rows will be returned (iff matched)
  select lscore(1), f2 from t1 where lcontains(f2, 'name:ravi',1) &gt; 0;
</pre></p>
<h3>3.1.8 Parallel Index Operations</h3>
<p>Starting with <em>Lucene Domain Index</em> 2.9.1.1.0, you can enable parallel operations with the <code>ParallelDegree</code> parameter that may be explicitely set to 0, the default, or to a range of 2 to 9. Parallel operations is implemented by using multiple <code>UserDataStore</code> segements and is most useful on multi core chip boxes or in a RAC environment. Up to now, only <code>insert</code> statemets upon the base table is parallelized and the index must be configured in <code>SyncMode:OnLine</code> mode. This is an example of an index that enables parallel inserts:</p>
<p><pre class="brush: sql;">
  create index source_big_lidx on test_source_big(text)
    indextype is lucene.luceneindex
parameters('BatchCount:1000;ParallelDegree:4;SyncMode:OnLine;LogLevel:INFO;AutoTuneMemory:true;PerFieldAnalyzer:line(org.apache.lucene.analysis.KeywordAnalyzer),TEXT(org.apache.lucene.analysis.SimpleAnalyzer);FormatCols:line(0000);ExtraCols:line &quot;line&quot;');
</pre></p>
<p>After execution of the statement, ten new tables will be visible on the user&#8217;s schema: <code>SOURCE_BIG_LIDX$T</code> (master index storage) and <code>SOURCE_BIG_LIDX$[0..3]$T</code> (slave index storages) as well as <code>SOURCE_BIG_LIDX$QT</code> (master index queue) and <code>SOURCE_BIG_LIDX$[0..3]$QT</code> (slave index queues). A sequence <code>SOURCE_BIG_LIDX$S</code> is also created and serves generating the numbers 0 to 3.</p>
<p>The parallel implementation will enqueue batches of <code>BatchCount:1000</code> rows in the master queue <code>SOURCE_BIG_LIDX$Q</code> of the index. Then the PLSQL AQ Callback, which is enabled for this queue by <code>SyncMode:OnLine</code> will dequeue each batch and re-enqueue in the slaves queues <code>SOURCE_BIG_LIDX$[0..3]$Q</code>. As a result, Oracle AQ will execute multiple AQ server processes that you can see as multiple <code>ora_j00x_sid</code> processes on a *U*X box.</p>
<p>With Oracle 11g we saw that that AQ may not always start another slave process if one callback is getting a lot of (too much) CPU usage. Experience shows, that a <code>BatchCount</code> parameter setting around 250 always leaves enough machine ressources for other slaves processes to be started successfully and setting parallelizing to real work.</p>
<p><strong>gennff: the following section is quite hard to understand and should be rewritten, pointing out the several keywords in a more declarative way.</strong></p>
<p><em>Lucene Domain Index</em> 2.9.2.1.1+ and 3.0.1.1.0+ also includes a new parameter <code>IndexOnRam</code>, default true, that executes indexing in RAM, using the <em>Lucene</em> <code>RAMDirectory</code> implementation. <code>RAMDirectory</code> is around 40% faster than indexing a similar batch by means of the slave storages and parallel index operations, as introduced before. Obviously each slave process will consume more RAM compared to the same batch of rows using a disk storage based on <code>OJVMDirectory</code>. The rule of thumb is to enable <code>LogLevel:INFO</code> and check for a certain <code>BatchCount</code> value, what time it takes to index a new batch on RAM and what time it takes to merge the slave directory with the main directory storage. Here is an example:</p>
<p><pre class="brush: plain;">
  INFO: .addDocToIdx - start indexing on SCOTT.SOURCE_BIG_LIDX numRows= 500
  INFO: .addDocToIdx - indexing done SCOTT.SOURCE_BIG_LIDX elapsedTime: 200 ms.
  INFO: .addDocToIdx - addIndexesNoOptimize merge done SCOTT.SOURCE_BIG_LIDX elapsedTime: 317 ms.
</pre></p>
<p>Choosing a greater <code>BatchCount</code> value here will promote an indexing time similar or greater than the merge time and will, in sum, decrease the overall processing time, for example:</p>
<p><pre class="brush: plain;">
  INFO: .addDocToIdx - start indexing on SCOTT.SOURCE_BIG_LIDX numRows= 700
  INFO: .addDocToIdx - indexing done SCOTT.SOURCE_BIG_LIDX elapsedTime: 312 ms.
  INFO: .addDocToIdx - addIndexesNoOptimize merge done SCOTT.SOURCE_BIG_LIDX elapsedTime: 74 ms.
</pre></p>
<p>Parallel indexing is are also used for <code>insert .. into .. select .. from</code> DML operations, however, the <code>BatchCount</code> parameter is not used in this case because Oracle automatically chooses the number of rows that are inserted in a batch, usually around 115 rows.</p>
<p>A <em>Lucene Domain Index</em> working in <code>SyncMode:OnLine</code> and using a <code>ParallelDegree</code> greater than 1 will be populated in parallel using <code>RAMDirectory</code> or <code>OJVMDirectory</code> slave storage depending on <code>IndexOnRam</code> parameter. Note that indexing is always done in parallel with other indexing operation because it doesn&#8217;t require a write lock on the master indexing.</p>
<h2>3.2 Alter index</h2>
<p>The <em>Lucene Domain Index</em> <code>alter index</code> command can be used to change any parameter after index creation time. <em>LDI</em> parameters are a simple list of <code>name:value</code> pairs being stored into <em>LDI&#8217;s</em> <code>OJVMDirectory</code> storage. If you want to remove any parameter from the index definition, do prepend the parameter name with a <code>"~"</code>. Here are some examples of <code>alter index</code>.</p>
<p>Change the <em>Lucene</em> index writer parameter <code>MaxBufferedDocs</code> to 500 and disable auto tuning of indexing memory:</p>
<p><pre class="brush: sql;">
  alter index it1
    parameters('MaxBufferedDocs:500;AutoTuneMemory:false');
</pre></p>
<p>Similar to the previous one example but enabling online synchronization:</p>
<p><pre class="brush: sql;">
  alter index it1
    parameters('MaxBufferedDocs:500;AutoTuneMemory:false;SyncMode:OnLine');
</pre></p>
<p>This disables online synchronization from the above example. You can get a similar effect by setting <code>SyncMode:Deferred</code>, which is the default value for <code>SyncMode</code>, to overwrite any previous <code>SyncMode</code> setting.</p>
<p><pre class="brush: sql;">
  alter index it1 parameters('~SyncMode:OnLine');
</pre></p>
<h2>3.2 Alter index rebuild</h2>
<p>The <code>alter index rebuild</code> statement rebuilds an index from scratch. This is useful when a <em>LDI</em> instance is damaged, corrupted or you need to change some parameter setting that affects field value preprocessing and indexing. An example is the <em>Lucene</em> <code>Analyzer</code> parameter.</p>
<h3>3.2.1 Manual</h3>
<p>The following example shows how to change <em>Lucene</em> index analyzer. If you change your index analyzer it is necessary to rebuild the complete index because you should not query an index with an analyzer that is different from the index time.</p>
<p><pre class="brush: sql;">
  alter index it1 rebuild
    parameters('Analyzer:org.apache.lucene.analysis.StopAnalyzer;MaxBufferedDocs:500;AutoTuneMemory:false);
</pre></p>
<h3>3.2.2 On Line</h3>
<p><strong>gennff: it is not comprehensible to the reader what an oracle index online rebuild has in common with a ldi online sync.</strong></p>
<p>Alter index rebuild will not return up to the complete operation is finished. Rebuild On Line is a functionality for Oracle Index available in enterprise edition databases, but with a little trick you can rebuild Lucene Domain Index On Line too. If you are working with SyncMode:Deferred you need to change to SyncMode:OnLine, then you can rebuild the index by using:</p>
<p><pre class="brush: sql;">
  alter index it1 rebuild
    parameters('SyncMode:OnLine;MergeFactor:100;BatchCount:1000');
  commit; -- notify change to AQ Callback
</pre></p>
<p>Rebuild command enqueues batchs of 1000 rowids of the master table (it1) for addition to Lucene Index structure then Lucene Domain Index AQ Callback will process these messages using background database process and automatically commit changes when it finish.</p>
<h2>3.3 Drop</h2>
<p>Dropping a <em>LDI</em> instance is no way different from dropping any other index in an Oracle database. Under the covers, this operation also drops the index&#8217; storage table, <code>IT1$T</code> for the above example, and the index&#8217; AQ <code>IT1$Q</code> with its storage <code>IT1$QT</code>. If the index is configured with SyncMode:OnLine, the PL/SQL AQ Callback is disabled first.</p>
<p><pre class="brush: sql;">
  drop index it1;
</pre></p>
<p>If something goes wrong during index <code>drop</code> command you can add a <code>force</code> at the end of the command. That will clear any system views from any stale references to the index.</p>
<p><pre class="brush: sql;">
  drop index it1 force;
</pre></p>
<h2>3.4 Querying</h2>
<p><em>Lucene Domain Index</em> introduces a new <em>SQL</em> operator named <code>lcontains()</code> with its ancillary operators <code>lscore()</code> and <code>lhighlight()</code> (see below). The functionality of <code>lcontains()</code> and <code>lscore()</code> is comparable to the <em>Oracle Text</em> operators <code>contains()</code> and <code>score()</code>.</p>
<h3>3.4.1 Simple columns</h3>
<p>Based on the xml indexing by <code>FormatCols</code> example from § 3.1.4, some simple statement employs <code>contains()</code> in the where clause and <code>lscore()</code> with the select list. In that order, the operators serve a parameterized text query and a normalized hit scoring. The first parameter to <code>lcontains()</code> denotes the column on which the <em>Lucene Domain Index</em> resides, the second one is the actual string (of tokens) to be searched. Both parameters are mandatory. The third parameter of <code>lcontains()</code> as well as the first parameter of <code>lscore()</code> is a correlation id that establishes a connection between a dedicated <code>lcontains()</code> and a dedicated <code>lscore()</code> in some SQL statement. That is, every call to <em>Lucene</em> by means of <code>lcontains()</code> delivers its own scoring! Note that <code>lcontains()</code> must always return a value &gt; <code>0</code> to some consumer (a boolean expression here) to identify a successful match.</p>
<p>Since the <em>Lucene</em> index resides on column <code>f1</code>, this column is called the master column of the index and makes up the default search field of a query expression (see the <a href="http://lucene.apache.org/java/3_1_0/queryparsersyntax.html"><em>Lucene Query Parser</em></a> syntax). That is, leaving out any field qualifier in front of a query expression term, executes a search on column <code>f1</code>.</p>
<p><pre class="brush: sql;">
  select lscore(1) as sc, f1 from t1 where lcontains(f1, '001', 1) &gt; 0;

    SC|F1
------|---
 1.000|1
</pre></p>
<p>The approach is different when searching an <code>ExtraCols</code> column, here the search string has to be prefixed by &#8220;F2:&#8221;.</p>
<p><pre class="brush: sql;">
  select lscore(1) as sc, f1 from t1 where lcontains(f1, 'F2:ravi', 1) &gt; 0;

    SC|F2
------|----------------------------------------
 1.000|&lt;emp id=&quot;1&quot;&gt;&lt;name&gt;ravi&lt;/name&gt;&lt;/emp&gt;
</pre></p>
<h3>3.4.2 Multiple columns</h3>
<p><em>Lucene</em> query parser syntax provides a rich query language that comprises logical operators, term modifiers, grouping and stuff. You can apply any of them to each column indexed as long as the resulting expression conforms to the query language. Here is a some more complex example, using the xml indexing by <code>FormatCols</code> example from § 3.1.5 . Note that first row matches against the extra column <code>F2:ravi</code> and the functional column <code>id:01</code>, the second row matches with <code>F1</code> equal to <code>003</code> (where some qualifier its not necessary for <code>F1</code> because it is the master column of the index).</p>
<p><pre class="brush: sql;">
  select f1, lscore(1) sc, extractValue(f2, '/emp/@id') id from t1
    where lcontains(f1, '003 OR (F2:(ravi OR ravie) AND id:01)', 1) &gt; 0;

  F1 |    SC|ID
  ---|------|---
  1  |  .577|1
  3  |  .206|3
</pre></p>
<h3>3.4.3 Pagination</h3>
<p>The <code>lcontains()</code> operator has an extension to the <em>Lucene</em> query parser syntax that includes in-line pagination information for the <em>Lucene Domain Index</em> result set. You can select a specific window (pagination) of your sorted query results, note the <code>DOMAIN_INDEX_SORT</code> optimizer hint and see next section, by injecting a query qarser like range term inside the query expression. For example (from § 3.1.5):</p>
<p><pre class="brush: sql;">
  select /*+ DOMAIN_INDEX_SORT */
      f1, lscore(1) sc, extractValue(f2, '/emp/@id') id from t1
    where lcontains(f1, 'rownum:[2 TO 2] AND (003 OR (F2:(ravi OR ravie) AND id:01))', 1) &gt; 0
  order by lscore(1) desc;

  F1 |    SC|ID
  ---|------|---
  3  |  .206|3
</pre></p>
<p><em>Lucene Domain Index</em> implementation automatically extracts pagination information <code>rownum:[n TO m] AND</code> from the beginning of the query expression and only returns the required subset of n rowids to the <em>Oracle</em> optimizer. This extension provides a lot of performance gain by eliminating the outer statement of <em>Oracle&#8217;s</em> Top-N syntax that, in a worst case, collects all resulting rowids to filter the result set window. Because inline pagination is an home brew extension to the standard <em>Lucene</em> query parser syntax, there are some home brew rules also:</p>
<ul>
<li>The <code>rownum:[n TO m] AND</code> term must start the query expression and, as is, we simply use positioned string matching of the <code>rownum .. AND</code> keywords to extract the start and the stop index of the window. Such a way it is your responsibility to provide a well formatted term.</li>
<li>Pagination information is concatenated to the actual query expression using the <code>AND</code> boolean operator. However, this operator does not have any sense concerning the grouping logic of terms in a search expression, it is a placeholder, like <code>rownum</code>. For example, <code>rownum:[n TO m] AND xx OR bb</code> will be evaluated as <code>((rownum:[n TO m] AND xx) OR bb)</code> but of course only seached as <code>(xx OR bb)</code>.</li>
</ul>
<p>Although all the pagination functionality looks really easy to use and fast executing yould should always keep the following in mind. Query result sets may change over time, they may change in value, in lenght and in order. For example, a select like the one above may return a completely different result set iff some new row has been added to the base table. The row will, iff matched, also move the page window around. A common pifall is that you inspect some search results on page 123, then a new row comes in and matches right at the top, and all of a sudden when you switch to page 123+1, the last result from page 123 reappears as the first result from page 123+1.</p>
<h3>3.4.4 Sort</h3>
<p>Pure <em>Lucene</em> provides sorting over the result of a particular query, <em>Lucene Domain Index</em> goes further and provides sorting by using an extra argument to the <code>lcontains()</code> operator (see § 3.4.1). The sort parameter syntax is a coma separated string of <code>field[:ORDER[:TYPE]</code> values, where the fields being included in the sorting spec should be <code>NOT_ANALYZED</code> or <code>NOT_ANALYZED_STORED</code> (see the <code>FormatCols</code> parameter in § 3.1.4). The <code>ORDER</code> can be set to <code>ASC</code> or <code>DESC</code>, default value is <code>ASC</code>. The <code>TYPE</code> key can be <code>string</code>, <code>float</code> or <code>int</code>, starting with Lucene <code>2.9.0</code> the default value is <code>string</code>.</p>
<p>Note that if you are using <code>lcontains()</code> to sort anything within the index, you have to add the <code>DOMAIN_INDEX_SORT</code> optimizer hint. This hint tells the <em>Oracle</em> optimizer that the order of the rows will be dictated by that <em>Lucene Domain Index</em>. Also note that the usage of <code>lscore()</code> in conjunction with a non-scored sort does not make any sense at all and will, due to the score computation on the index engine, only produce an overhead to the execution time of some query.</p>
<p>Here are some examples of sorted queries against the <code>emails</code> table created in § 3.1.4 and on that data:</p>
<p><pre class="brush: sql;">
  INSERT INTO EMAILS (EMAILFROM, EMAILTO, SUBJECT, EMAILDATE, BODYTEXT)
    VALUES ('arthur@schop.de', 'friedrich@nietz.de', 'Pessimismus', sysdate-12, 'Denn alles Streben entspringt aus Mangel');
  INSERT INTO EMAILS (EMAILFROM, EMAILTO, SUBJECT, EMAILDATE, BODYTEXT)
    VALUES ('friedrich@nietz.de', 'arthur@schop.de', 'Deine Philosophie', sysdate-7, 'wie beim Eintritt in den Hochwald');
  INSERT INTO EMAILS (EMAILFROM, EMAILTO, SUBJECT, EMAILDATE, BODYTEXT)
    VALUES ('irgendwer@schop.de', 'arthur@schop.de', 'Metaphysik ist pessimistisch', sysdate-3, 'Die Welt als Wille und Vorstellung');
  INSERT INTO EMAILS (EMAILFROM, EMAILTO, SUBJECT, EMAILDATE, BODYTEXT)
    VALUES ('friedrich@nietz.de', 'arthur@schop.de', 'Der neue Pessimismus', sysdate, 'Vitalismus ist doch schicker!');
  commit;
</pre></p>
<p><pre class="brush: sql;">
SELECT /*+ DOMAIN_INDEX_SORT */ subject, emailfrom as src, emaildate ts
FROM emails
  where lcontains(bodytext, 'streben || wille || ruhe', 'subject', 1) &gt; 0;

SUBJECT                       |SRC                           |TS
------------------------------|------------------------------|--------
Metaphysik ist pessimistisch  |irgendwer@schop.de            |18.06.11
Metaphysik ist pessimistisch  |arthur@schop.de               |18.06.11
Metaphysik und Pessimismus    |arthur@schop.de               |09.06.11

SELECT /*+ DOMAIN_INDEX_SORT */ subject, emailfrom as src, emaildate ts
FROM emails
 where lcontains(bodytext, 'streben || wille || ruhe', 'subject:DESC', 1) &gt; 0;

SUBJECT                       |SRC                           |TS
------------------------------|------------------------------|--------
Metaphysik und Pessimismus    |arthur@schop.de               |09.06.11
Metaphysik ist pessimistisch  |irgendwer@schop.de            |18.06.11
Metaphysik ist pessimistisch  |arthur@schop.de               |18.06.11

SELECT /*+ DOMAIN_INDEX_SORT */ subject, emailfrom as src, emaildate ts
FROM emails
  where lcontains(bodytext, 'streben || wille || ruhe', 'subject:DESC,emailFrom', 1) &gt; 0;

SUBJECT                       |SRC                           |TS
------------------------------|------------------------------|--------
Metaphysik und Pessimismus    |arthur@schop.de               |09.06.11
Metaphysik ist pessimistisch  |arthur@schop.de               |18.06.11
Metaphysik ist pessimistisch  |irgendwer@schop.de            |18.06.11

SELECT /*+ DOMAIN_INDEX_SORT */ subject, emailfrom as src, emaildate ts
FROM emails
  where lcontains(bodytext, 'streben || wille || ruhe', 'subject:DESC,emailFrom:DESC', 1) &gt; 0;

SUBJECT                       |SRC                           |TS
------------------------------|------------------------------|--------
Metaphysik und Pessimismus    |arthur@schop.de               |09.06.11
Metaphysik ist pessimistisch  |irgendwer@schop.de            |18.06.11
Metaphysik ist pessimistisch  |arthur@schop.de               |18.06.11
</pre></p>
<p>Again, be very careful to not change the case of any identifier or add redundant whitespace with the sort specification, <em>Lucene Domain Index</em> is very sloppy as a query parser. For example, writing <code>emailFrom</code> as <code>emailfrom</code> (no capital F) changes the semantics of the result set without notice:</p>
<p><pre class="brush: sql;">
SELECT /*+ DOMAIN_INDEX_SORT */ subject, emailfrom as src, emaildate ts
FROM emails
 where lcontains(bodytext, 'streben || wille || ruhe', 'emailFrom:DESC', 1) &gt; 0;

SUBJECT                       |SRC                           |TS
------------------------------|------------------------------|--------
Metaphysik ist pessimistisch  |irgendwer@schop.de            |18.06.11
Metaphysik und Pessimismus    |arthur@schop.de               |09.06.11
Metaphysik ist pessimistisch  |arthur@schop.de               |18.06.11

SELECT /*+ DOMAIN_INDEX_SORT */ subject, emailfrom as src, emaildate ts
FROM emails
 where lcontains(bodytext, 'streben || wille || ruhe', 'emailfrom:DESC', 1) &gt; 0;

SUBJECT                       |SRC                           |TS
------------------------------|------------------------------|--------
Metaphysik und Pessimismus    |arthur@schop.de               |09.06.11
Metaphysik ist pessimistisch  |irgendwer@schop.de            |18.06.11
Metaphysik ist pessimistisch  |arthur@schop.de               |18.06.11
</pre></p>
<p>The following query doesn&#8217;t include any sort specification but the <code>DOMAIN_INDEX_SORT</code> hint such that the result set will by default be sorted by score descending. This is an abbreviated syntax for the probably most widely used application of sorts with text searches. You can, of course, exchange it to the classic syntax of <code>order by lscore(1) desc</code>. Yes, ok, the result set are not identical, however, this is because two rows have, to a mantissa of 15 numbers, ichecked it, the same score, so <em>Oracle</em> / <em>LDI</em> has no other choice than returning the rows by accident.</p>
<p><pre class="brush: sql;">
SELECT /*+ DOMAIN_INDEX_SORT */ subject, emailfrom as src, lscore(1) sc
FROM emails
  where lcontains(bodytext, 'streben || wille || ruhe', 1) &gt; 0;

SUBJECT                       |SRC                           |    SC
------------------------------|------------------------------|------
Metaphysik ist pessimistisch  |arthur@schop.de               |  .184
Metaphysik und Pessimismus    |arthur@schop.de               |  .138
Metaphysik ist pessimistisch  |irgendwer@schop.de            |  .138

SELECT subject, emailfrom as src, lscore(1) sc
FROM emails
  where lcontains(bodytext, 'streben || wille || ruhe', 1) &gt; 0 order by lscore(1) desc;

SUBJECT                       |SRC                           |    SC
------------------------------|------------------------------|------
Metaphysik ist pessimistisch  |arthur@schop.de               |  .184
Metaphysik ist pessimistisch  |irgendwer@schop.de            |  .138
Metaphysik und Pessimismus    |arthur@schop.de               |  .138
</pre></p>
<p><strong>gennff: the reader gained no understanding how to index and sort by date or timestamp values. one may of course convert dates to well formatted strings or numbers using formatcols. However, per chance ldi may already offer some automated process that, for whatever reason, did not make it to the docs yet.</strong></p>
<h3>3.4.5 Count Hits</h3>
<p>The <code>countHits()</code> function is a <em>Lucene Domain Index</em> optimization to replace the regular SQL <code>count(*)</code> functionality. By comparison, <code>countHits()</code> is extremely fast because there is no need to pass <code>rowid</code> information from the <em>Lucene Data Cartridge</em> to the <em>Oracle Engine</em> to count matching rows. Here is an example:</p>
<p><pre class="brush: sql;">
  select LuceneDomainIndex.countHits('EMAILBODYTEXT','security') hits from dual;

  HITS
  ----
  5
</pre></p>
<p>The first argument of <code>countHits()</code> is the <em>Lucene Domain Index</em> name, the second argument is the search expression in question. You can optionally use a three argument version of <code>countHits()</code> to check some index in another schema. <code>countHits()</code> does not only serve the functionality explained above, a call to <code>countHits()</code> will also cache the <code>rowid</code> information along with the query expression. This is useful for <code>lcontains()</code> queries executed immediately afterwards, because the <em>LDI</em> can shortcut to this cached information. Under the covers, <em>LDI</em> keyes to the cache by <code>sort_string(QueryParser.toString())</code> such that at least the query expressions of <code>countHits()</code> and <code>lcontains()</code> must match exactly. Following is an example of <code>countHits()</code> in correlation with <code>lcontains()</code> where <code>emailFrom:(security)</code> ist the match key:</p>
<p><pre class="brush: sql;">
  select LuceneDomainIndex.countHits('EMAILBODYTEXT','security') from dual;

  LUCENEDOMAININDEX.COUNTHITS('EMAILBODYTEXT','SECURITY')
  -------------------------------------------------------
  5

  Elapsed: 00:00:00.02

  select emailFrom FROM emails
    where lcontains(bodytext,'security','emailFrom:ASC',1)&gt;0;

  EMAILFROM
  ---------
  codeshepherd@gmail.com
  codeshepherd@gmail.com
  erik@ehatchersolutions.com
  lucenelist2005@danielnaber.de
  lucenelist2005@danielnaber.de

  Elapsed: 00:00:00.04
</pre></p>
<h3>3.4.6 First Rows Hint</h3>
<p>Starting with the <code>2.4.0.1.0</code> release of <em>LDI</em> we have replaced deprecated <em>Lucene</em> hit classes by the <code>TopDocs</code> class. If you use the <code>FIRST_ROWS</code> optimizer hint in conjuction with the <code>lcontains()</code> inline pagination,  <em>Lucene Domain Index</em> will execute a call to <code>TopDocs</code> to get the first N hits only. For example:</p>
<p><pre class="brush: sql;">
  select /*+ FIRST_ROWS DOMAIN_INDEX_SORT */ lhighlight(1), extractValue(object_value,'/page/title')
  from pages
    where lcontains(object_value, 'rownum:[1 TO 10] AND (musica tango rock)', 1)&gt;0;
</pre></p>
<p><code>FIRST_ROWS</code> and <code>rownum:[1 TO 10]</code> being used together performs a Lucene Query for the first <code>10</code> hits only. However, the next query along <code>rownum:[10 TO 20]</code> will find most of the <em>Lucene</em> structures, like the <code>Searcher</code> class instance and the <code>ROWIDLucene</code> to <code>DocID</code> association, already cached in memory. The Lucene index will be nevertheless be re-queried to get first <code>20</code> Hits (1..20) again. On the other hand, if you omit <code>FIRST_ROWS</code>, <em>Oracle</em> will by default switch to <code>ALL_ROWS</code> mode which means, if you are using a pagination of <code>(rownum:[n TO &gt; 2000])</code>, <em>Lucene Domain Index</em> will fetch m first hits, but if m is lower than <code>2000</code>, <em>Lucene Domain Index</em> will try to fetch by default <code>2000</code> hits. The magic number of <code>2000</code> is due <em>Oracle ODCI API</em> calls to the <code>ODCIFetch</code> routine in batches of <code>2000</code> rowids. If <code>FIRST_ROWS</code> and in-line pagination are not included in query, <code>Lucene Domain Index</code> alwasy works in batches of <code>2000</code> hits causing several cache misses in a full scan mode. For example, given a query:</p>
<p><pre class="brush: sql;">
  select count(*) from pages where lcontains(object_value, 'musica tango rock')&gt;0;
</pre></p>
<p><em>Lucene Domain Index</em> fetches the first 2000 hits, finally with the information that the hit amount is <code>2736</code> and it re-fetches (cache miss) the <code>2736</code> hits. Obviously you can use <code>LuceneDomainIndex.countHits()</code> to estimate the hit count in advance being faster than the previous query.</p>
<h3>3.4.7 Highlighting</h3>
<p>The <code>lhighlight()</code> ancillary operator works just as <code>lscore()</code>, remember the corellation id to <code>lcontains()</code>, but instead returns a <code>VARCHAR2</code> text with the hit words (tokens) highlightened. The <em>HTML</em> tag used to remark hit words denotes <code>&lt;B&gt;&lt;/B&gt;</code> and is, as well as the fragment separator (&#8230;) and the maximum number of fragments (4), not yet customizable per call. However, starting with the <code>2.4.1.1.0</code> release of <em>LDI</em>, these parameters have become statically customizable through <code>alter index ... parameters();</code>, see below. here is some highlighting example:</p>
<p><pre class="brush: sql;">
  SELECT /*+ DOMAIN_INDEX_SORT */ subject, lscore(1) sc, lhighlight(1) txt
  FROM emails
    where lcontains(bodytext, 'security OR mysql', 'subject:ASC', 1)&gt;0;

  SUBJECT
  SC
  TXT                                                                              
  ---
  Re: lucene injection
  .27477634
  On Dec 21, 2006, at 4:56 AM, Deepan wrote: I am bothered about &lt;strong&gt;security&lt;/strong&gt; problems with lucene. Is it vulnerable to any kind of injection like &lt;strong&gt;mysql&lt;/strong&gt; injection? many times the query from user is passed to lucene for search without validating. Rest easy. There are no known &lt;strong&gt;security&lt;/strong&gt; issues with Lucene, and it has even undergone a recent static code analysis by Fortify (see the lucene-dev e-mail list).
</pre></p>
<p>Highlighting only works with columns of types <code>VARCHAR2</code>, <code>CLOB</code> and <code>XMLType</code>. You can perform highlighting operation even if your master columns is not indexed/stored. So far, the index creation DDL below features a <code>IncludeMasterColumn:false</code>, which means the actual <code>XMLType</code> representation of the <a href="http://marceloochoa.blogspot.com/2007/12/uploading-wikipedia-dumps-to-oracle.html">Spanish Wikipedia page dump</a> is not rawly indexed. Only the virtual columns <code>title</code>, <code>comment</code>, <code>text</code> and <code>revisionDate</code> become processed by <em>Lucene</em>. However, the <em>LDI</em> <code>TextHighlight</code> <em>Java</em> method must receive the full <code>value(p) XMLType</code> from the RDBMS engine to feed the <em>Lucene</em> <code>Highlighter</code> class with all the information that is not necessarily contained in the <em>Lucene</em> index.</p>
<p><pre class="brush: sql;">
  create index pages_lidx_all on pages p (value(p))
    indextype is Lucene.LuceneIndex
    parameters('PopulateIndex:false;
      DefaultColumn:text;
      SyncMode:Deferred;
      LogLevel:INFO;
      Analyzer:org.apache.lucene.analysis.SpanishWikipediaAnalyzer;
      ExtraCols:extractValue(object_value,''/page/title'') &quot;title&quot;,
        extractValue(object_value,''/page/revision/comment'') &quot;comment&quot;,
        extract(object_value,''/page/revision/text/text()'') &quot;text&quot;,
        extractValue(object_value,''/page/revision/timestamp'') &quot;revisionDate&quot;;
      FormatCols:revisionDate(day);
      IncludeMasterColumn:false;
      LobStorageParameters:PCTVERSION 0 ENABLE STORAGE IN ROW CHUNK 32768 CACHE READS FILESYSTEM_LIKE_LOGGING');

  select /*+ DOMAIN_INDEX_SORT */ lhighlight(1), extractValue(object_value,'/page/title')
  from pages
    where lcontains(object_value, 'rownum:[1 TO 10] AND (musica tango rock)', 1)&gt;0;

  &lt;strong&gt;Música&lt;/strong&gt; de Argentina... [[Latinoamérica|latinoamericanos]] con más desarrollo en su [[&lt;strong&gt;música&lt;/strong&gt;]]. Se encuentra una gran... argentinos, un instrumento tradicional andino]] Aún se mantiene la &lt;strong&gt;música&lt;/strong&gt; de los [[Indígenas_en_Argentina... de grandes corrientes de [[inmigración|inmigrantes]] europeos, la &lt;strong&gt;música&lt;/strong&gt; argentina se enriqueció

Música de Argentina

musical emparentado con la [[habanera]] y el [[&lt;strong&gt;tango&lt;/strong&gt; (&lt;strong&gt;música&lt;/strong&gt;)|&lt;strong&gt;tango&lt;/strong&gt;]].
  ==Diferencias con el &lt;strong&gt;tango&lt;/strong&gt;==
  Aunque tanto la milonga como el &lt;strong&gt;tango&lt;/strong&gt; están en [[compás]] de 2/4, las 8 [[semicorchea]]s de la milonga están distribuidas en 3 + 3 +  2 en cambio el &lt;strong&gt;tango&lt;/strong&gt; posee un ritmo más «cuadrado». Las letras...]] criticó en algún momento el &lt;strong&gt;tango&lt;/strong&gt; y prefirió la  milonga, que no trasmite la melancolía

Milonga (género musical)
</pre></p>
<p>Parameters supported by highlighting functions are:</p>
<ul>
<li><code>Formatter</code>, a valid class name that implements <em>Lucene&#8217;s</em> <code>Formatter</code> interface and a constructor with no arguments. The default value is <code>org.apache.lucene.search.highlight.SimpleHTMLFormatter</code>.</li>
<li><code>MaxNumFragmentsRequired</code>, number of text fragments returned by the <code>lhighlight()</code> function, the default value is 4.</li>
<li><code>FragmentSize</code>, the size of each fragment returned, the default value is <code>100</code>.</li>
<li><code>FragmentSeparator</code>, the string used as fragment separator, the default is &#8220;&#8230;&#8221;. Note that you can not use &#8220;;&#8221; or &#8220;:&#8221; as fragment separator because these tokens are used as parameter and value delimiters with the <code>create</code> or <code>alter index ... parameters();</code> statements.</li>
</ul>
<p>So far there is no customization allowed by passing any constructor arguments to the <code>Formatter</code> class, but you can easily create your own formatter to call <code>SimpleHTMLFormatter</code> with arguments like this:</p>
<p><pre class="brush: sql;">
  create or replace and compile java source named &quot;org.apache.lucene.search.highlight.MyHTMLFormatter&quot;
  as
  package org.apache.lucene.search.highlight;

  public class MyHTMLFormatter extends SimpleHTMLFormatter {
    public MyHTMLFormatter() {
      super(&quot;&lt;span class=\&quot;myhighlightclass\&quot;&gt;&quot;,&quot;&lt;/span&gt;&quot;);
    }
  }
  /

  alter index emailbodyText
    parameters('Formatter:org.apache.lucene.search.highlight.MyHTMLFormatter;
      MaxNumFragmentsRequired:3;
      FragmentSeparator:...;
      FragmentSize:50');
</pre></p>
<h3>3.4.8 Highlighting using pipeline table functions</h3>
<p><strong>gennff: the author should at least give a short application scenario where the pipeline functions outperform the classis approach od § 3.4.7 .</strong></p>
<p><code>phighlight()</code> and <code>rhighlight()</code> provide a more general usage pattern for <em>Lucene&#8217;s</em> highlighting functionality. <code>phighlight()</code> receives an SQL query as string and performs highlighting according to a set of user defined columns on the query result. <code>rhighlight()</code> receives a <code>SYS_REFCURSOR</code> argument and adain performs highlighting on a set of user defined query columns but, unlike <code>phighlight()</code>, <code>rhighlight()</code> requires that the user defines a return type of the query, usually a <code>TABLE OF</code> collection, because with a <code>SYS_REFCURSOR</code> argument there is no option to know the return type of the query at compilation time. Both functions support the highlighting parameters introduced in § 3.4.7 . Here are two examples of highlighting by pipeline table functions:</p>
<p><pre class="brush: sql;">
  SELECT * FROM
  TABLE(phighlight(
          'EMAILBODYTEXT',
          'lucene OR mysql',
          'SUBJECT,BODYTEXT',
          'select /*+ DOMAIN_INDEX_SORT FIRST_ROW */ lscore(1) sc,e.*
           from eMails e where lcontains(bodytext,''security OR mysql'',''subject:ASC'',1)&gt;0'
      ));

  SELECT * FROM
  TABLE(rhighlight(
          'EMAILBODYTEXT',
          'lucene OR mysql',
          'SUBJECT,BODYTEXT',
          'EMAILRSET',
          CURSOR(select /*+ DOMAIN_INDEX_SORT FIRST_ROW */ lscore(1) sc,e.*
          from eMails e where lcontains(bodytext,'security OR mysql','subject:ASC',1)&gt;0)
      ));
</pre></p>
<p>The first three arguments of both pipeline functions read the same: the <em>LDI</em> index used, the <em>Lucene</em> query expression (that should match the <code>lcontains()</code> argument in the query) and finally the column list to be highlightened. The last argument for <code>phighlight()</code> is a <code>VARCHAR2</code> type that transfers the SQL query to be executed by DBMS_SQL package. Note the additional single quotes used as an escape character. For <code>rhighlight()</code> two further arguments are required. The type returned by the cursor, <code>EMAILRSET</code>, that is a collection of the <code>EMAILR</code> record which holds all columns of the table <code>EMAILS</code> plus the score returned by the <code>lscore()</code> function (see the example below). And finally, the last argument is of <code>CURSOR</code> type which means any SQL query.</p>
<p><pre class="brush: sql;">
  CREATE TYPE EMAILR AS OBJECT (
    sc NUMBER,
    emailFrom VARCHAR2(256),
    emailTo VARCHAR2(256),
    subject VARCHAR2(4000),
    emailDate DATE,
    bodyText CLOB
  );

  CREATE OR REPLACE TYPE EMAILRSET AS TABLE OF EMAILR;
</pre></p>
<h3>3.4.9 More like this functionality</h3>
<p>The <em>more like this</em> functionality of <em>Lucene</em> is provided in the <code>LDI</code> package <code>MoreLike</code>, funtion <code>this</code> (again overloaded to allow for an additional owner name parameter) as follows.</p>
<p><pre class="brush: sql;">
  FUNCTION this(index_name IN VARCHAR2,
    x IN ROWID,
    f IN NUMBER DEFAULT 1,
    t IN NUMBER DEFAULT 10,
    minTermFreq IN NUMBER DEFAULT 2,
    minDocFreq IN NUMBER DEFAULT 5) RETURN sys.odciridlist;
</pre></p>
<p>A typical use case may look like this, where the anonymous PL/SQL block gets the first <code>ROWID</code> returned from the first query as pivot element and then expands the result set with other rows that also include terms like &#8220;procedure (C, Java or PL/SQL), optionally qualified&#8221;. Note that the &#8220;C&#8221; token will not be taken into account because it is regarded a stop word. Refer to the Appendix D.6 for a full explanation of each parameter.</p>
<p><pre class="brush: sql;">
  select rowid,lscore(1),text from test_source_big
    where lcontains(text,'&quot;procedure java&quot;~10',1)&gt;0 order by lscore(1) desc;

  AAAOaPAAEAAAAnnABV 1.00000003 procedure (C, Java or PL/SQL), optionally qualified
  AAAOaPAAEAAAA0aAAV  .84852819  STATIC PROCEDURE refreshParameterCache as LANGUAGE JAVA NAME
  ...

  declare
    ridlist sys.odciridlist;
  begin
    ridlist := MoreLike.this(index_name=&gt;'SOURCE_BIG_LIDX',x=&gt;'AAAOaPAAEAAAAnnABV',minTermFreq=&gt;1);
    FOR i IN (
      select rowid,text from test_source_big
        where rowid in (select * from table(ridlist_table(ridlist)))
    ) LOOP
      dbms_output.put_line('rowid: '||i.rowid||' text: '||i.text);
    END LOOP;
  end;
  /

  rowid: AAAOaPAAEAAAAhLAAc text: after issuing insert, update, delete or anonymous PL/SQL calls
  rowid: AAAOaPAAEAAAAjrAAo  text: QUALIFIED_SQL_NAME
  rowid: AAAOaPAAEAAAAk5AAe text: ORA-06502: PL/SQL: numeric or value error: character string buffer
  ...
  rowid: AAAOaPAAEAAAAtXAAb text: The name of the Java class, PL/SQL package or object type implementing
</pre></p>
<h3>3.4.10 Facets</h3>
<p>Starting with <em>Lucene Domain Index</em> release <code>2.4.1.1.0</code>, <em>Lucene&#8217;s</em> facets functionality is available through a SQL aggregate function <code>lfacets()</code>. The <code>input</code> parameter is an encoded string containing the <em>LDI</em> (schema.)index name and a list of categories. The aggregated function only accepts a simple scalar value as an input argument so we need to encode the index name and categories list in a comma separated value. Using the index created in § 2.5, some categories in <em>Lucene</em> query Syntax are prefixed by <code>TEXT:</code>, according to the actually indexed column, and carry &#8220;procedure&#8221; as the main and &#8220;java&#8221; as the sub category, respectively.</p>
<p><pre class="brush: sql;">
  CREATE OR REPLACE function lfacets(input varchar2)
  return agg_tbl
  parallel_enable aggregate using facets_agg_type;
  /

  select lfacets('SOURCE_BIG_LIDX,TEXT:procedure,TEXT:java') from dual;
</pre></p>
<p>Creating a table with categories and linking the rows in a parent-child-relationship is an option to automatically generate facets, for example:</p>
<p><pre class="brush: sql;">
  create table source_categories (
    cat_code    number(4),
    cat_name    varchar2(256),
    cat_parent  number(4),
    CONSTRAINT PK_SOURCE_CATEGORIES PRIMARY KEY (cat_code),
    CONSTRAINT FK_CAT_PARENT FOREIGN KEY (cat_parent)
      REFERENCES source_categories (cat_code)
  );

  insert into source_categories values (1,'TEXT:procedure',null);
  insert into source_categories values (2,'TEXT:function',null);
  ...
  insert into source_categories values (6,'TEXT:java',1);
  insert into source_categories values (7,'TEXT:(pl sql)',1);
  insert into source_categories values (8,'TEXT:wrapped',1);
  ...
  insert into source_categories values (21,'line:[1 TO 1000]',1);
  insert into source_categories values (22,'line:[1001 TO 2000]',1);
  insert into source_categories values (23,'line:[2001 TO 3000]',1);
</pre></p>
<p>Now we can query the above table by executing a call to <code>lfacets()</code>, passing a category and sub a category. Note that we are using the <code>ljoin()</code> function which will convert the <code>agg_tbl</code> type to a comma separated string current plus hit cardinality. The first row returned does not have a sub category because the parent column value is <code>null</code>. The trailing <code>5116</code>, in parantheses, is the number of rows that match the token &#8220;procedure&#8221;. The <code>TEXT:procedure,line:[1001 TO 2000]</code> result implies an logical intersection between the set of rows that include the token &#8220;procedure&#8221; against the set of rows that match &#8220;line[1001 TO 2000]&#8220;. The <code>group by cat_code</code> causes the <em>Oracle ODCI API</em> to first calculate the bit set for &#8220;procedure&#8221; and then iterate over all the sub categories &#8220;java&#8221;, &#8220;pl sql&#8221;, &#8220;wrapped&#8221;, doing bit calculations. This is fast and once the facets is computed it is stored as a filter in the <em>Lucene Domain Index</em> memory structures.</p>
<p><pre class="brush: sql;">
  select ljoin(lfacets('SOURCE_BIG_LIDX,' ||
     case level when (1) then cat_name
     else prior cat_name || ',' || cat_name end)) facet,
       cat_code, level
     from source_categories
     start with cat_parent is null
     connect by prior cat_code = cat_parent
     group by cat_code,level;

  FACET                                   CAT_CODE   LEVEL
  --------------------------------------------------------
  TEXT:procedure(5116)                    1          1
  TEXT:function(5574)                     2          1
  TEXT:trigger(96)                        3          1
  TEXT:package(860)                       4          1
  TEXT:(object type)(5140)                5          1
  TEXT:procedure,TEXT:java(9)             6          2
  .....
  TEXT:procedure,line:[1 TO1000](3)       21         2
  TEXT:procedure,line:[1001 TO2000](615)  22         2
  ...
</pre></p>
<p>When a number of rows or the amount of categories is quite large, one can use a materialized view to work as cache for the facets computation. Such a materialized view can be queried as any other table and the access will be too fast (and can be indexed as well).</p>
<p><pre class="brush: sql;">
  CREATE MATERIALIZED VIEW source_facets
  AS
  select ljoin(lfacets('SOURCE_BIG_LIDX,' ||
     case level when (1) then cat_name
     else prior cat_name || ',' || cat_name end)) facet,
       cat_code, level
     from source_categories
     start with cat_parent is null
     connect by prior cat_code = cat_parent
     group by cat_code,level;
</pre></p>
<p><strong>gennff: __rework marker__</strong></p>
<h3>3.4.11 Terms pipeline table functions</h3>
<p>Starting with Lucene Domain Index 2.9.1.1.0, two pipeline table functions has been included to iterate over terms of Lucene Index structure, high_freq_terms():</p>
<p><pre class="brush: sql;">
  FUNCTION high_freq_terms(index_name VARCHAR2,
    term_name  VARCHAR2,
    num_terms  NUMBER) RETURN term_info_set
</pre></p>
<p>is available for getting the Top-N (num_terms) most used terms on the whole index or in a particular field. term_info_set is defined as:</p>
<p><pre class="brush: sql;">
  TYPE term_info AS OBJECT (
    term     VARCHAR2(4000),
    docFreq  NUMBER(10)
  );
  TYPE term_info_set AS TABLE OF term_info;
</pre></p>
<p>You can query your index by using:</p>
<p><pre class="brush: sql;">
  select * from table(high_freq_terms('SOURCE_BIG_LIDX','TEXT',10));
  select * from table(high_freq_terms('SOURCE_BIG_LIDX',null,10));
  select * from table(high_freq_terms('SOURCE_BIG_LIDX','line',100));
</pre></p>
<p>and, index_terms():</p>
<p><pre class="brush: sql;">
  FUNCTION index_terms(index_name VARCHAR2,
    term_name  VARCHAR2) RETURN term_info_set

  select * from table(index_terms('SOURCE_BIG_LIDX','TEXT')) order by docFreq desc;
  select * from table(index_terms('SOURCE_BIG_LIDX','TEXT'));
  select * from table(index_terms('SOURCE_BIG_LIDX',null)) where rownum  select * from (select * from table(index_terms('SOURCE_BIG_LIDX','line')) order by docFreq desc) where rownum</pre></p>
<p>on both functions if argument term is NULL, these functions will iterate over all index terms. The natural order for high_freq_terms() is descendent by docFreq, but index_terms() is ordered by term_name:term_value ascending. Note that if you pass a non NULL value to term to starts with the first value for the specific term index_terms() do not stop when all the values of this term are completed, this functionality is similar to Lucene Java method reader.terms(new Term(term)). Here example if you want only iterate on an specific term name:</p>
<p><pre class="brush: sql;">
  BEGIN
     FOR term_rec IN (SELECT * FROM table(index_terms('SOURCE_BIG_LIDX','line'))) LOOP
        /* Fetch from cursor variable. */
        EXIT WHEN substr(term_rec.term,1,length('line'))&lt;&gt;'line'; -- exit when last row is fetched
        -- process data record
        dbms_output.put_line('Name = ' || term_rec.term || ' ' || term_rec.docFreq);
     END LOOP;
  END;
  /
</pre></p>
<p>You can use index_terms() to get the Top-N terms order by docFreq, for example:</p>
<p><pre class="brush: sql;">
  select * from (select * from table(index_terms('SOURCE_BIG_LIDX',null))
  order by docFreq desc) where rownumselect * from table(high_freq_terms('SOURCE_BIG_LIDX',null,10));   TEXT:in             24952   TEXT:varchar     16996   ...   TEXT:return         6241   Elapsed: 00:00:00.02 </pre></p>
<p>Two queries are equivalent semantically but high_freq_terms() is more efficient because it uses TermInfoQueue structure for sorting, caches his computation one is executed and do not creates a lot of term_info objects which then are sorted by the RDBMS engine. lfreqterms ancillary operator is a complimentary function which requires a column indexed with ANALYZED_WITH_VECTORS option in FormatCols index parameter. Example usage:</p>
<p><pre class="brush: sql;">   select TERMVECTOR TXT, LFREQTERMS(1) FQTERMS from t1 where lcontains(termvector,'two',1)&gt;0;
  TXT                                                           FQTERMS
  ---                                                           -------
  one two two three three three                       LUCENE.TERM_INFO_SET('LUCENE.TERM_INFO(one,1)','LUCENE.TERM_INFO(three,3)','LUCENE.TERM_INFO(two,2)')
  one two two three three three                       LUCENE.TERM_INFO_SET('LUCENE.TERM_INFO(one,1)','LUCENE.TERM_INFO(three,3)','LUCENE.TERM_INFO(two,2)')
  one two two three three three                       LUCENE.TERM_INFO_SET('LUCENE.TERM_INFO(one,1)','LUCENE.TERM_INFO(three,3)','LUCENE.TERM_INFO(two,2)')
  two two three three three four four four four    LUCENE.TERM_INFO_SET('LUCENE.TERM_INFO(four,4)','LUCENE.TERM_INFO(three,3)','LUCENE.TERM_INFO(two,2)')
</pre></p>
<h3>3.4.12 Did You Mean functionality</h3>
<p>Starting with Lucene Domain Index 2.9.2.1.0, Did You Mean Lucene functionality was added as an extended LDI property using the Lucene SpellChecker library to create the dictionary index from the main index. Finaly, the dictionary index will be merged to the main index.</p>
<p><pre class="brush: sql;">
  PROCEDURE indexDictionary(
    index_name   IN VARCHAR2,
    spellColumns IN VARCHAR2 DEFAULT null,
    distancealg  IN VARCHAR2 DEFAULT 'Levenstein')
</pre></p>
<p>is available to create the dictionary index to be merged with main index.</p>
<p>You can create the dictionary by using:</p>
<p><pre class="brush: sql;">
  call didyoumean.indexdictionary('SOURCE_BIG_LIDX');
  Call completed.
  Elapsed: 00:01:11.61

  exec didyoumean.indexdictionary('EMAILBODYTEXT','BODYTEXT,subject,emailFrom,emailTo','NGram');
  PL/SQL procedure successfully completed.
  Elapsed: 00:00:01.62
</pre></p>
<p>Only index_name is mandatory. If spellColumns parameter is NULL, the master column of the main index will be used. By default Levenstein Distance Algorithm (a.k.a. edit distance) is applied (other options are Jaro &#8211; Jaro Winkler metric &#8211; and Ngram distance).</p>
<p>Note: The dictionary structure create the &#8220;word&#8221;, &#8220;gramN&#8221;, &#8220;startN&#8221; and &#8220;endN&#8221; Lucene fields, so be carefull if you have this fieds in the main index. The structure of this index is (for a 3-4 gram) this:</p>
<p><pre class="brush: plain;">
  Index Structure
    Example
  word
    kings
  gram3
    kin, ing, ngs
  gram4
    king, ings
  start3
    kin
  start4
    king
  end3
    ngs
  end4
    ings
</pre></p>
<p>and,</p>
<p><pre class="brush: sql;">
  FUNCTION suggest (
    index_name  IN VARCHAR2,
    cmpval      IN VARCHAR2,
    highlight   IN VARCHAR2 DEFAULT null,
    distancealg IN VARCHAR2 DEFAULT 'Levenstein'
  ) RETURN VARCHAR2;
</pre></p>
<p>is available to query the dictionary index. You can query the dictionary by using:</p>
<p><pre class="brush: sql;">
  select didyoumean.suggest('SOURCE_BIG_LIDX','sorce') suggestion from dual;

  SUGGESTION
  ----------
  source
  Elapsed: 00:00:00.31

  select didyoumean.suggest('SOURCE_BIG_LIDX','sorce','b') suggestion from dual;
  SUGGESTION
  ----------
  &lt;strong&gt;source&lt;/strong&gt;
  Elapsed: 00:00:00.09

  select didyoumean.suggest('SOURCE_BIG_LIDX','sorce','b','Jaro') suggestion from dual;
  SUGGESTION
  ----------
  &lt;strong&gt;source&lt;/strong&gt;
  Elapsed: 00:00:00.07

  select didyoumean.suggest('EMAILBODYTEXT','lucene searhc','i') suggestion from dual;
  SUGGESTION
  ----------
  lucene &lt;em&gt;search&lt;/em&gt;
  Elapsed: 00:00:00.06

  select didyoumean.suggest('EMAILBODYTEXT','lucine injetion','b','Levenstein') suggestion from dual;
  SUGGESTION
  ----------
  &lt;strong&gt;lucene&lt;/strong&gt; &lt;strong&gt;injection&lt;/strong&gt;
  Elapsed: 00:00:00.06
</pre></p>
<p>The index_name parameter and the word to respell (cmpval) parameter are mandatory. You can define, optionaly, the highlight to be used (e.g. b for bold, i for italic, etc.) and define the distance algorithm to apply.</p>
<p><pre class="brush: sql;">
  Pipeline table lautocomplete (
    index_name  IN VARCHAR2,
    term_name  VARCHAR2 DEFAULT NULL,
    term_value VARCHAR2 DEFAULT '__ALL__',
    num_terms  NUMBER DEFAULT 10
  ) RETURN term_info_set;
</pre></p>
<p>is available to query the dictionary index. You can query the dictionary by using:</p>
<p><pre class="brush: sql;">
  select * from table(lautocomplete('DICC_LIDX','TERM','th',15)) t;

  TERM                            DOCFREQ
  -----------------------------   ------
  there                           3
  theory                          2
  thaw                            2
  then                            2
  therefore                       2
  thence                          2
  ....
  thank                           1
</pre><br />
<pre class="brush: sql;">
  Pipeline table ldidyoumean (
    index_name  IN VARCHAR2,
    cmpval  VARCHAR2,
    numSug NUMBER DEFAULT 10,
    highlight  VARCHAR2 DEFAULT null,
    distancealg  VARCHAR2 DEFAULT 'Levenstein') RETURN term_info_set;
</pre></p>
<p>is available to query the dictionary index. You can query the dictionary by using:</p>
<p><pre class="brush: sql;">
  select * from table(ldidyoumean('DICC_LIDX', 'atention', 5, 'b', 'Levenstein')) t;

  TERM                   DOCFREQ
  --------------------   -------
  &lt;strong&gt;attention&lt;/strong&gt;       2
  &lt;strong&gt;intention&lt;/strong&gt;       1
  &lt;strong&gt;detention&lt;/strong&gt;       1
  &lt;strong&gt;mention&lt;/strong&gt;         3
  &lt;strong&gt;attenuation&lt;/strong&gt;     1
</pre></p>
<p>lsimilarity ancillary operator computes the similarity between the indexed column and the query passed to lcontains using Levenstein Distance algorithm. Example:</p>
<p><pre class="brush: sql;">
  select lsimilarity(1) sim1, term t from dicc where lcontains(term, 'there~0.7', 1) &gt; 0;

  SIM1                          T
  ----                          _
  1                             there
  1                             there
  1                             there
  0.800000011920928955078125    where
  0.800000011920928955078125    theme
  0.800000011920928955078125    here
  0.800000011920928955078125    here

  select lsimilarity(1) sim1, term t from dicc where lcontains(term, 'there*', 1) &gt; 0

  SIM1                          T
  ----                          -
  1                             there
  0.5555555820465087890625      therefore
  1                             there
  1                             there
  0.5555555820465087890625      therefore
</pre></p>
<h2>3.5 Synchronize</h2>
<p>Working with SyncMode:Deferred you has to manually synchronize your index, it means update Lucene Domain Index structure applying pending changes such as insert and update. Deletes operations are always applied due ODCI Api do not accept rowid of deleted rows.<br />
Here an example:</p>
<p><pre class="brush: sql;">
  begin
    LuceneDomainIndex.sync('IT1');
    commit; -- release locks
  end;
  /
</pre></p>
<p>LuceneDomainIndex.sync procedure requires an argument of type VARCHAR2 with the index object name, index object name are usually capitalized and have the syntax SCHEMA_OWNER.IDX_NAME.</p>
<p>Synchronize operation could raise an exception if some rows being indexed are locked for update, in that case you have release first locked rows and re-sync the index.</p>
<p>An exclusive lock at Lucene Index storage is obtained during index synchronization, so you has to commit or rollback the connection immediately after this operation to release exclusive lock.</p>
<p>Since Lucene Domain Index 2.4.0.1.0 you can use LuceneDomainIndex.sync(&#8216;IT1&#8242;) or LuceneDomainIndex.sync(USER,&#8217;IT1&#8242;), both procedure are equivalent.</p>
<p>Note: Due a limitation on SYS.ODCIRidList() array you can enqueue more than 32767 additions or deletions, an update is counted as one deletion plus one addition by Lucene implementation code. This limitation will be removed in future releases of Lucene Domain Index.</p>
<h2>3.6 Optimize</h2>
<p>Optionally you can optimize Lucene Index storage, for doing that execute:</p>
<p><pre class="brush: sql;">
  begin
    LuceneDomainIndex.optimize('IT1');
    commit; -- release locks
  end;
  /
</pre></p>
<p>Like sync operation this procedure get an exclusive lock at Lucene Index storage table and perform an optimization of Lucene Index merging multiples segment in new one for example. You can still performing select operation (read-only) using Lucene Domain Index during optimization time, Oracle concurrency system (redo logs) provides you this functionality, once you perform a commit operation any other concurrent session will automatically see index changes.</p>
<h2>3.7 XMLDB Export</h2>
<p><em>Lucene Domain Index</em> provides for a raw dump of the <em>Lucene</em> index directory files. You can perform this task by means of an <em>Oracle XMLDB</em> export operation that will create xdb resources, which in turn can be accessed by a variety of methods such as <em>FTP</em>, <em>HTTP</em> or <em>WebDAV</em>.</p>
<p>Before writing the raw dump to XMLDB, go and check the accessibilty of the access method in question. For FTP in <em>Oracle 11g</em>, for example, access to the protocol is switched off by default, having the FTP port initially at value of <code>0</code>. You can check the current setting of the FTP port using this statement:</p>
<p><pre class="brush: sql;">
  -- SYSDBA
  select extractValue(DBMS_XDB.cfg_get(),
    '/xdbconfig/sysconfig/protocolconfig/ftpconfig/ftp-port/text()')
  from dual;
</pre></p>
<p>Resetting the port to an appropriate value goes like this. Test your FTP immediately after resetting the port! You should at least be able to read/see the <code>ftp://thathost:1531/public</code> directory.</p>
<p><pre class="brush: sql;">
  -- SYSDBA
  DECLARE
    newconfig XMLType;
  BEGIN
    SELECT
      updateXML(DBMS_XDB.cfg_get(),
        '/xdbconfig/sysconfig/protocolconfig/ftpconfig/ftp-port/text()', 1531)
      INTO newconfig
      FROM DUAL;
    DBMS_XDB.cfg_update(newconfig);
    COMMIT;
  END;
  /

  -- or for short in 11g
  exec dbms_xdb.setFtpPort(1531);
</pre></p>
<p><strong>gennff: it appears nondistinctive to the reader why the method does not comprise the commit with its body. is there a special reason that necessitated this design decision?</strong></p>
<p>The next step is to execute the utility method <code>LuceneDomainIndex.xdbExport()</code> that may be called with parameters <code>(schemaname, indexname)</code> or just <code>(indexname)</code> (the latter will derive the schema name from the index object). It is essential to execute the <code>commit;</code> afterwards because the method itself will not commit and will keep holding all incoming locks until commiting (or rolling back):</p>
<p><pre class="brush: sql;">
  begin
    LuceneDomainIndex.xdbExport('IT1');
    commit; -- makes change visible to Ftp or WebDAV
  end;
  /
</pre></p>
<p>For an index <code>IT1</code> of user <code>SCOTT</code>, the file resources will be available at this xdb directory: <code>ftp://thathost:1531/public/lucene/SCOTT.IT1</code>. Once you copied the files to some public file system, you can open the <em>Lucene</em> index with any <em>Lucene</em> compatible application like, for example, <a href="http://code.google.com/p/luke/"><em>Luke</em></a>. Here are some screen shots of <em>Luke</em>, analyzing several properties of a <em>Lucene</em> index.</p>
<p><a href="http://ludoix.files.wordpress.com/2011/03/ddgw7sjp_58g7qgh8fm.jpg"><img class="aligncenter size-medium wp-image-53" title="ddgw7sjp_58g7qgh8fm" src="http://ludoix.files.wordpress.com/2011/03/ddgw7sjp_58g7qgh8fm.jpg?w=300&#038;h=223" alt="" width="300" height="223" /></a></p>
<p><a href="http://ludoix.files.wordpress.com/2011/03/ddgw7sjp_57fq744zg8.jpg"><img class="aligncenter size-medium wp-image-52" title="ddgw7sjp_57fq744zg8" src="http://ludoix.files.wordpress.com/2011/03/ddgw7sjp_57fq744zg8.jpg?w=300&#038;h=223" alt="" width="300" height="223" /></a></p>
<p><a href="http://ludoix.files.wordpress.com/2011/03/ddgw7sjp_59fk7cwvgg.jpg"><img class="aligncenter size-medium wp-image-54" title="ddgw7sjp_59fk7cwvgg" src="http://ludoix.files.wordpress.com/2011/03/ddgw7sjp_59fk7cwvgg.jpg?w=300&#038;h=223" alt="" width="300" height="223" /></a></p>
<p><a href="http://ludoix.files.wordpress.com/2011/03/ddgw7sjp_60fgkrbgdd.jpg"><img class="aligncenter size-medium wp-image-55" title="ddgw7sjp_60fgkrbgdd" src="http://ludoix.files.wordpress.com/2011/03/ddgw7sjp_60fgkrbgdd.jpg?w=300&#038;h=223" alt="" width="300" height="223" /></a></p>
<h2>3.8 Exporting/Importing functional index with exp/imp Oracle tools</h2>
<p>You can perform an Oracle exp operation for your Lucene Domain Index. Oracle exp tool performs by default functional index for every table being exported during the backup process. As I mention early Lucene Domain Index creates a table named IDX_NAME$T which have Lucene file storage replaced by BLOB, also a DBMS AQ is created during the index creation time, this queue is associated to a table IDX_NAME$QT, both tables have a flag marked as SECONDARY, which means that you can not export these tables alone, but they are automatically includes when Lucene Domain Index is included into the export.</p>
<p>During import operation Oracle re-create the index using a create index &#8230; parameters(&#8216;your lucene parameters&#8217;) DML statement, all Lucene Domain Index parameters are included except for the parameter PopulateIndex which always is stored as false into Oracle System&#8217;s views. This parameter is altered intentional by Lucene Domain Index because if its set to true, during import operation Lucene Domain Index will try to re-create the Lucene Index structure instead of using the information restored into IDX_NAME$T table.</p>
<p>Alternative to XMLDB Export or Oracle exp tool you can also exports your Lucene Domain Index storage using a create table as &#8230; DML statement. For example:</p>
<p><pre class="brush: sql;">
  create table SOURCE_BIG_LIDX$T$BK as (select * from SOURCE_BIG_LIDX$T);
</pre></p>
<p>You can export now using exp tool SOURCE_BIG_LIDX$T$BK because is regular table:</p>
<p><pre class="brush: plain;">
  -bash-3.2$  exp
  Export: Release 10.2.0.3.0 - Production on Fri Mar 27 02:46:18 2009
  Copyright (c) 1982, 2005, Oracle.  All rights reserved.

  Username: scott/tiger

  Connected to: Oracle Database 10g Release 10.2.0.3.0 - Production
  Enter array fetch buffer size: 4096 &gt;

  Export file: expdat.dmp &gt; SOURCE_BIG_LIDX_BK.dmp

  (2)U(sers), or (3)T(ables): (2)U &gt; 3

  Export table data (yes/no): yes &gt; yes

  Compress extents (yes/no): yes &gt;

  Export done in US7ASCII character set and AL16UTF16 NCHAR character set
  server uses AL32UTF8 character set (possible charset conversion)

  About to export specified tables via Conventional Path ...
  Table(T) or Partition(T:P) to be exported: (RETURN to quit) &gt; SOURCE_BIG_LIDX$T$BK

  . . exporting table           SOURCE_BIG_LIDX$T$BK         19 rows exported
  Table(T) or Partition(T:P) to be exported: (RETURN to quit) &gt;

  Export terminated successfully without warnings.
</pre></p>
<p>Now you can drop your index and re-create again without populating it:</p>
<p><pre class="brush: sql;">
  select count(*) from test_source_big where lcontains(text,'function')&gt;0;

  COUNT(*)
  --------
  6167

  drop index SOURCE_BIG_LIDX;

  Index dropped.

  create index source_big_lidx on test_source_big(text)
         indextype is lucene.LuceneIndex
         parameters('PopulateIndex:false;
           AutoTuneMemory:true;
           Analyzer:org.apache.lucene.analysis.SimpleAnalyzer;
           MergeFactor:500;
           FormatCols:line(0000);
           ExtraCols:line &quot;line&quot;');

  Index created.

  drop table SOURCE_BIG_LIDX$T$BK;

  Table dropped.
</pre></p>
<p>Restore your .dmp now and check again if your index returns a correct result:</p>
<p><pre class="brush: plain;">
  -bash-3.2$ imp scott/tiger

  Import: Release 10.2.0.3.0 - Production on Fri Mar 27 02:49:40 2009

  Copyright (c) 1982, 2005, Oracle.  All rights reserved.

  Connected to: Oracle Database 10g Release 10.2.0.3.0 - Production

  Import file: expdat.dmp &gt; SOURCE_BIG_LIDX_BK.dmp

  Enter insert buffer size (minimum is 8192) 30720&gt;

  Export file created by EXPORT:V10.02.01 via conventional path
  import done in US7ASCII character set and AL16UTF16 NCHAR character set
  import server uses AL32UTF8 character set (possible charset conversion)
  List contents of import file only (yes/no): no &gt;

  Ignore create error due to object existence (yes/no): no &gt;

  Import grants (yes/no): yes &gt;

  Import table data (yes/no): yes &gt;

  Import entire export file (yes/no): no &gt; yes

  . importing SCOTT's objects into SCOTT
  . importing SCOTT's objects into SCOTT
  . . importing table         &quot;SOURCE_BIG_LIDX$T$BK&quot;         19 rows imported
  Import terminated successfully without warnings.
</pre></p>
<p>Check first that your index do not have information and populate them with Lucene Index information:</p>
<p><pre class="brush: sql;">
  conn scott/tiger
  Connected.
  select count(*) from test_source_big where lcontains(text,'function')&gt;0;

  COUNT(*)
  --------
  0

  truncate table SOURCE_BIG_LIDX$T;

  Table truncated.

  insert into SOURCE_BIG_LIDX$T (select * from SOURCE_BIG_LIDX$T$BK);

  19 rows created.
  exit
  ..... and connect again to refresh Lucene Domain Index in memory structures ....
  conn scott/tiger
  Connected.
  select count(*) from test_source_big where lcontains(text,'function')&gt;0;

    COUNT(*)
  ----------
        6167
</pre></p>
<p>As you can see the Lucene Domain Index structure can be export alone without exporting the master table, this is useful when you are upgrading Lucene Domain Index that requires that all index need to be dropped first and you don&#8217;t want to re-create a very big index.</p>
<h2>Doc Links</h2>
<p><a href="http://ludoix.wordpress.com/2011/03/11/lucene-domain-index-installing-and-testing">Previous / LDI Docs – 2 Installing and Testing</a><br />
<a href="http://ludoix.wordpress.com/2011/03/11/ldi-docs-4-locking-and-performance">Next / LDI Docs – 4 Locking and Performance</a></p>
<br />Filed under: <a href='http://ludoix.wordpress.com/category/documentation/'>Documentation</a> Tagged: <a href='http://ludoix.wordpress.com/tag/example/'>example</a>, <a href='http://ludoix.wordpress.com/tag/export/'>export</a>, <a href='http://ludoix.wordpress.com/tag/function/'>function</a>, <a href='http://ludoix.wordpress.com/tag/operator/'>operator</a>, <a href='http://ludoix.wordpress.com/tag/performance/'>performance</a>, <a href='http://ludoix.wordpress.com/tag/procedure/'>procedure</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ludoix.wordpress.com/47/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ludoix.wordpress.com/47/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ludoix.wordpress.com/47/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ludoix.wordpress.com/47/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ludoix.wordpress.com/47/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ludoix.wordpress.com/47/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ludoix.wordpress.com/47/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ludoix.wordpress.com/47/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ludoix.wordpress.com/47/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ludoix.wordpress.com/47/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ludoix.wordpress.com/47/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ludoix.wordpress.com/47/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ludoix.wordpress.com/47/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ludoix.wordpress.com/47/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ludoix.wordpress.com&amp;blog=20873609&amp;post=47&amp;subd=ludoix&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://ludoix.wordpress.com/2011/03/11/ldi-docs-3-procedures-functions-operators-and-examples/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/3690228eaee545ab583565df1e16b64a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ludoix</media:title>
		</media:content>

		<media:content url="http://ludoix.files.wordpress.com/2011/03/ddgw7sjp_58g7qgh8fm.jpg?w=300" medium="image">
			<media:title type="html">ddgw7sjp_58g7qgh8fm</media:title>
		</media:content>

		<media:content url="http://ludoix.files.wordpress.com/2011/03/ddgw7sjp_57fq744zg8.jpg?w=300" medium="image">
			<media:title type="html">ddgw7sjp_57fq744zg8</media:title>
		</media:content>

		<media:content url="http://ludoix.files.wordpress.com/2011/03/ddgw7sjp_59fk7cwvgg.jpg?w=300" medium="image">
			<media:title type="html">ddgw7sjp_59fk7cwvgg</media:title>
		</media:content>

		<media:content url="http://ludoix.files.wordpress.com/2011/03/ddgw7sjp_60fgkrbgdd.jpg?w=300" medium="image">
			<media:title type="html">ddgw7sjp_60fgkrbgdd</media:title>
		</media:content>
	</item>
		<item>
		<title>LDI Docs &#8211; 2 Installing and Testing</title>
		<link>http://ludoix.wordpress.com/2011/03/11/lucene-domain-index-installing-and-testing/</link>
		<comments>http://ludoix.wordpress.com/2011/03/11/lucene-domain-index-installing-and-testing/#comments</comments>
		<pubDate>Fri, 11 Mar 2011 15:09:14 +0000</pubDate>
		<dc:creator>ludoix</dc:creator>
				<category><![CDATA[Documentation]]></category>
		<category><![CDATA[docs]]></category>
		<category><![CDATA[install]]></category>
		<category><![CDATA[test]]></category>

		<guid isPermaLink="false">http://ludoix.wordpress.com/?p=26</guid>
		<description><![CDATA[2 Installing and Testing 2.1 Requirements JDeveloper 11g (optional) only if you want to edit the Java code Ant 1.7.0 Sun JDK 1.5.0_05/1.4.2 ($ORACLE_HOME/jdk directory works fine as Java Home for compiling on 10g and 11g) Linux/Windows Database Oracle 10g 10.2/11g production 2.2 Install binary distributions Binary distributions are available at SourceForge.net and provides a [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ludoix.wordpress.com&amp;blog=20873609&amp;post=26&amp;subd=ludoix&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<h1>2 Installing and Testing</h1>
<h2>2.1 Requirements</h2>
<ul>
<li>JDeveloper 11g (optional) only if you want to edit the Java code</li>
<li>Ant 1.7.0</li>
<li>Sun JDK 1.5.0_05/1.4.2 (<code>$ORACLE_HOME/jdk</code> directory works fine as Java Home for compiling on 10g and 11g)</li>
<li>Linux/Windows Database Oracle 10g 10.2/11g production</li>
</ul>
<h2>2.2 Install binary distributions</h2>
<p>Binary distributions are available at <a title="SF.net Download are" href="http://sourceforge.net/projects/ldi/files/" target="_blank">SourceForge.net</a> and provides a very straightforward installation.</p>
<p><span id="more-26"></span></p>
<h3>2.2.1 11g Binary Distribution</h3>
<p>Edit your <code>~/build.properties</code> file with your Database values (Windows users can find build.properties file at <code>C:\Documents and Settings\username</code> folder):</p>
<p><pre class="brush: bash;">
  db.str=test
  db.usr=LUCENE
  db.pwd=LUCENE
  dba.usr=sys
  dba.pwd=change_on_install
  javac.debug=true
  javac.source=1.5
  javac.target=1.5
</pre></p>
<p>db.str is your SQLNet connect string for your target database, check first with tnsping. This is an example environment setting before installing on 11g database:</p>
<p><pre class="brush: bash;">
  MAVEN_HOME=/usr/local/maven
  ORACLE_BASE=/u01/app/oracle
  ORACLE_HOME=$ORACLE_BASE/product/11.1.0.6.0/db_1
  ORACLE_SID=test
  JAVA_HOME=$ORACLE_HOME/jdk
  PATH=$MAVEN_HOME/bin:$HOME/bin:$ORACLE_HOME/bin:$JAVA_HOME/bin:/usr/local/bin:$PATH
  LD_LIBRARY_PATH=$ORACLE_HOME/lib:/usr/local/lib
  CVS_RSH=ssh
  umask 022
  export PATH LD_LIBRARY_PATH ORACLE_HOME ORACLE_BASE ORACLE_SID JAVA_HOME CVS_RSH NLS_LANG
</pre></p>
<p>Upload, install and test your code into the database:</p>
<p><pre class="brush: bash;">
  ant install-odi
  ant test-odi-[clob|dicc|emails|iot|master-detail|similarity|source-small|term-vector|xmltype]
</pre></p>
<p>For Oracle 11g you can perform a post-installation step:</p>
<p><pre class="brush: bash;">
  ant jit-lucene-classes
</pre></p>
<p>This target force to translate all Lucene, Snowball, etc. and ODI classes to assembler. Instead of waiting that the database compile it by detecting most used classes or method.</p>
<p><span class="Apple-style-span" style="font-size:15px;font-weight:bold;">2.2.2 10g Binary Distribution</span></p>
<p>First edit your <code>~/build.properties</code> with something like this:</p>
<p><pre class="brush: bash;">
  db.str=orcl
  db.usr=LUCENE
  db.pwd=LUCENE
  dba.usr=sys
  dba.pwd=change_on_install
  javac.debug=true
  javac.source=1.4
  javac.target=1.4
</pre></p>
<p>db.str property is a SQLNet connect string for the target database. <code>ORACLE_HOME</code> environment setting is required and properly configured to an Oracle 10g database layout, finally execute ant without arguments. Here an example of environment setting on 10g database:</p>
<p><pre class="brush: bash;">
  MAVEN_HOME=/usr/local/maven
  ORACLE_BASE=/u01/app/oracle
  ORACLE_HOME=$ORACLE_BASE/product/10.2.0/db_1
  ORACLE_SID=orcl
  JAVA_HOME=$ORACLE_HOME/jdk
  PATH=$MAVEN_HOME/bin:$HOME/bin:$ORACLE_HOME/bin:$JAVA_HOME/bin:/usr/local/bin:$PATH
  LD_LIBRARY_PATH=$ORACLE_HOME/lib:/usr/local/lib
  CVS_RSH=ssh
  umask 022
  export PATH LD_LIBRARY_PATH ORACLE_HOME ORACLE_BASE ORACLE_SID JAVA_HOME CVS_RSH NLS_LANG
</pre></p>
<p>If you are re-installing Oracle Lucene Domain Index (ODI) first drop any Lucene Domain Index. Default target will drop first Lucene schema if exists, example install action:</p>
<p><pre class="brush: bash;">
  ant install-odi-10g
ant test-odi-[clob|dicc|emails|iot|master-detail|similarity|source-small|term-vector|xmltype]
</pre></p>
<p>Additionally to installation and testing steps you can perform a NCOMP translation of Lucene ODI and related classes from Java to assembler using:</p>
<p><pre class="brush: bash;">
  ant ncomp-runtime-retrotranslator-sys-code
  ant ncomp-lucene-all
</pre></p>
<h2>2.3 Install Instructions to compile from sources</h2>
<ul>
<li>Unpack or checkout Lucene sources</li>
<li>Checkout ODI sources, by now only Anonymous CVS access is provided you can download from Source Forge servers with<br />
<pre class="brush: bash;">
  cd /tmp
  svn co http://ldi.svn.sourceforge.net/svnroot/ldi/odi/trunk odi
</pre></li>
<li>Copy to <code>$LUCENE_ROOT/contrib</code><br />
<pre class="brush: bash;">
  cd $LUCENE_ROOT/contrib
  cp -rp /tmp/odi .
</pre></li>
<li>Edit <code>$LUCENE_ROOT/common-build.xml</code> adding a target for creating a jar file with test sources<br />
<pre class="brush: xml;">
....
  &lt;property name=&quot;dev.version&quot; value=&quot;3.4.0&quot;/&gt;
  &lt;property name=&quot;tests.luceneMatchVersion&quot; value=&quot;3.4.0&quot;/&gt;
....
  &lt;target name=&quot;jar-test&quot; depends=&quot;compile-test&quot;&gt;
    &lt;jar destfile=&quot;${build.dir}/${final.name}-test.jar&quot; basedir=&quot;${build.dir}/classes/test&quot; excludes=&quot;**/*.java&quot;/&gt;
  &lt;/target&gt;
</pre></li>
</ul>
<p>(OPTIONAL) Update Lucene&#8217;s <code>BufferedIndexInput.BUFFER_SIZE</code> according to your <code>db_block_size</code> init.ora parameter.<br />
Before compile and upload Lucene core library you can change <code>org.apache.lucene.store.BufferedIndexInput.BUFFER_SIZE</code> constant to the value of your <code>db_block_size</code> init parameter, this change will improve reading performance by using same block size as the physical block size that your database use.</p>
<p>Compile ODI Directory sources and tests, these targets automatically copies all Lucene Domain Index required libraries from your <code>$ORACLE_HOME</code> and Internet. Starting with ODI 2.4.0.1.x <code>build.xml</code> file automatically compiles all Lucene contrib modules dependency.</p>
<p><pre class="brush: bash;">
  cd $LUCENE_ROOT/contrib/odi
  ant jar-core
  ant jar-test
  ant package-zip
</pre></p>
<p>Edit your <code>~/build.properties</code> file with your Database values:</p>
<p><pre class="brush: bash;">
  db.str=orcl
  db.usr=LUCENE
  db.pwd=LUCENE
  dba.usr=sys
  dba.pwd=change_on_install
  javac.debug=true
  javac.source=1.4
  javac.target=1.4
</pre></p>
<p>db.str is your SQLNet connect string for your target database, check first with tnsping utility, also note that for 11g database user and password are case sensitive, so leave LUCENE in uppercase. Upload your code to the database:</p>
<p><pre class="brush: bash;">
  ant install-odi
</pre></p>
<h3>2.3.1 Generating Maven&#8217;s artifacts</h3>
<p>You can generate Lucene and ODI Maven&#8217;s artifacts following previous one steps, then execute:</p>
<p><pre class="brush: bash;">
   ant generate-maven-artifacts
</pre></p>
<h2>2.4 Optimizations</h2>
<p>Discussion of Optimizations.</p>
<h3>2.4.1 Using NCOMP on 10g</h3>
<p>Is strongly recommended before going in production that install Oracle Lucene Domain Index NCOMPed in 10g databases. NCOMP automatically translate Lucene and ODI Java code to assembler and finally install it as dynamic link library (.so/.dll) in your Oracle home. To do this simply execute this Ant task instead of install-odi target:</p>
<p><pre class="brush: bash;">
  ant ncomp-lucene-all
</pre></p>
<h3>2.4.2 Using JIT on 11g</h3>
<p>First verify that your database parameter <code>java_jit_enabled</code> is TRUE. Oracle 11g includes a JIT technology which automatically translates most used Java methods to assembler. If you want to pre-compile all Lucene Java code to assembler and not wait for Oracle database detects common used code you can execute this target:</p>
<p><pre class="brush: bash;">
  ant jit-lucene-classes
  ant jit-oracle-classes
</pre></p>
<h2>2.5 Testing Lucene Domain Index</h2>
<p>Required grants for regular Oracle users. IMPORTANT: Before start using Lucene Domain Index grant this to any Oracle user rather than LUCENE:</p>
<p><pre class="brush: sql;">
  -- connected as sysdba
  grant LUCENEUSER to scott;
</pre></p>
<p>Lucene Domain Index have two kinds of test suites to check that everything is OK after installation. First test suite which can be launched using Ant and is implemented using SQLUnit, to launch it simply execute (first checking values of SQLNet connection options):</p>
<p><pre class="brush: bash;">
  [mochoa@mochoa odi]$ cat db/sqlunit.properties
  # SQLUnit test
  sqlunit.driver = oracle.jdbc.driver.OracleDriver
  sqlunit.url = jdbc:oracle:oci:@orcl
  sqlunit.user = LUCENE
  sqlunit.password = LUCENE
[mochoa@localhost lucene-odi]$ ant test-odi-clob
Buildfile: build.xml
test-odi-clob:
</pre></p>
<p>Second test suite is a set of JUnit tests to simulate middle tier environments, it also use a connection pool. To start these suites run:</p>
<p><pre class="brush: bash;">
[mochoa@localhost lucene-odi]$ ant test-parallel
Buildfile: build.xml

test-parallel:
    [junit] Running org.apache.lucene.index.TestDBIndexParallel
    [junit] Table created: T1
    [junit] Index altered: LIT1
    [junit] OnLine: true
    [junit] No Row updated at: 4030 to: 4039 elapsed time: 22 ms.
    [junit] Inserted rows: 10 time: 193 avg time: 19
      ...
    [junit] Index droped: LIT1
    [junit] Table droped: T1
</pre></p>
<p>Next test suite is JUnit tests to simulate middle tier environments, it also use a connection pool. To start these suites run:</p>
<p><pre class="brush: bash;">
[mochoa@localhost lucene-odi]$ ant test-queryhits
Buildfile: build.xml

test-queryhits:
    [junit] Running org.apache.lucene.indexer.TestQueryHits
    [junit] Hits: 35602
      ...
    [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 63.306 sec
BUILD SUCCESSFUL
Total time: 1 minute 4 seconds
</pre></p>
<p>Class <code>org.apache.lucene.indexer.TestQueryHits</code> requires a table which is very big to create and destroy at <code>setup()</code> and <code>tearDown()</code> methods. Before run this test create the table with:</p>
<p><pre class="brush: sql;">
    create table test_source_big as (select * from all_source);
</pre></p>
<p>And the index on 10g with:</p>
<p><pre class="brush: sql;">
create index source_big_lidx on test_source_big(text)
indextype is lucene.luceneindex
parameters('AutoTuneMemory:true;IndexOnRam:true;BatchCount:3000;ParallelDegree:2;SyncMode:OnLine;LogLevel:INFO;AutoTuneMemory:true;PerFieldAnalyzer:line(org.apache.lucene.analysis.KeywordAnalyzer),TEXT(org.apache.lucene.analysis.SimpleAnalyzer);FormatCols:line(00000);ExtraCols:line &quot;line&quot;;MergeFactor:500');
</pre></p>
<p>Or in 11g with:</p>
<p><pre class="brush: sql;">
create index source_big_lidx on test_source_big(text)
indextype is lucene.luceneindex
parameters('AutoTuneMemory:true;IndexOnRam:true;BatchCount:3000;ParallelDegree:2;SyncMode:OnLine;LogLevel:INFO;AutoTuneMemory:true;PerFieldAnalyzer:line(org.apache.lucene.analysis.KeywordAnalyzer),TEXT(org.apache.lucene.analysis.SimpleAnalyzer);FormatCols:line(00000);ExtraCols:line &quot;line&quot;;MergeFactor:500;LobStorageParameters:CACHE READS FILESYSTEM_LIKE_LOGGING');
</pre></p>
<p><span class="Apple-style-span" style="font-size:20px;font-weight:bold;">Doc Links</span></p>
<p><a href="http://ludoix.wordpress.com/2011/03/06/lucene-domain-index-introduction/">Previous / LDI Docs – 1 Introduction</a><br />
<a href="http://ludoix.wordpress.com/2011/03/11/ldi-docs-3-procedures-functions-operators-and-examples">Next / LDI Docs – 3 Procedures, Functions, Operators and Examples</a></p>
<br />Filed under: <a href='http://ludoix.wordpress.com/category/documentation/'>Documentation</a> Tagged: <a href='http://ludoix.wordpress.com/tag/docs/'>docs</a>, <a href='http://ludoix.wordpress.com/tag/install/'>install</a>, <a href='http://ludoix.wordpress.com/tag/test/'>test</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ludoix.wordpress.com/26/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ludoix.wordpress.com/26/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ludoix.wordpress.com/26/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ludoix.wordpress.com/26/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ludoix.wordpress.com/26/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ludoix.wordpress.com/26/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ludoix.wordpress.com/26/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ludoix.wordpress.com/26/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ludoix.wordpress.com/26/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ludoix.wordpress.com/26/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ludoix.wordpress.com/26/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ludoix.wordpress.com/26/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ludoix.wordpress.com/26/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ludoix.wordpress.com/26/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ludoix.wordpress.com&amp;blog=20873609&amp;post=26&amp;subd=ludoix&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://ludoix.wordpress.com/2011/03/11/lucene-domain-index-installing-and-testing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/3690228eaee545ab583565df1e16b64a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ludoix</media:title>
		</media:content>
	</item>
		<item>
		<title>LDI Docs &#8211; 1 Introduction</title>
		<link>http://ludoix.wordpress.com/2011/03/06/lucene-domain-index-introduction/</link>
		<comments>http://ludoix.wordpress.com/2011/03/06/lucene-domain-index-introduction/#comments</comments>
		<pubDate>Sun, 06 Mar 2011 22:30:54 +0000</pubDate>
		<dc:creator>ludoix</dc:creator>
				<category><![CDATA[Documentation]]></category>
		<category><![CDATA[docs]]></category>
		<category><![CDATA[overwiew]]></category>

		<guid isPermaLink="false">http://ludoix.wordpress.com/?p=10</guid>
		<description><![CDATA[1 Introduction General introduction, features, benefits and comparison with Lucene standalone implementation and Oracle Text. 1.1 What is Lucene Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. Apache Lucene is an open source project [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ludoix.wordpress.com&amp;blog=20873609&amp;post=10&amp;subd=ludoix&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<h1>1 Introduction</h1>
<p>General introduction, features, benefits and comparison with <em><strong>Lucene</strong></em> standalone implementation and Oracle Text.</p>
<h2>1.1 What is Lucene</h2>
<p>Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.<br />
Apache Lucene is an open source project available for <a href="http://www.apache.org/dyn/closer.cgi/lucene/java/">free download</a>.<br />
If Lucene is a pure Java framework why not use it inside Oracle Database JVM environment?</p>
<h2>1.2 What is Lucene Domain Index</h2>
<p>Lucene Domain Index is full integration of Lucene project running inside the Oracle database using Oracle JVM. Oracle provides a full featured JVM inside your Oracle Database compliant with JDK 1.4 in 10g release and 1.5 in 11g.<br />
<a href="http://dbprism.cvs.sourceforge.net/dbprism/ojvm/">OJVMDirectory</a> is a replacement for Lucene&#8217;s file system storage by a BLOB based storage, the name is related to the class which overrides (Directory.java)</p>
<p><span id="more-10"></span></p>
<p>Here a simple list of points take into account to choose this storage:</p>
<ul>
<li>Using traditional File System for storing the inverted index is not a good option for some users, you don&#8217;t have commit or rollback behavior, backup, etc.</li>
<li>Using BLOB for storing the inverted index running Lucene outside the Oracle database has a bad performance because there are a lot of network round trips and data marshaling.</li>
<li>Indexing relational data such as tables with VARCHAR2, CLOB or XMLType with Lucene running outside the database has the same problem as the previous point.</li>
<li>By using Secure BLOB on Oracle 11g you can choose to encrypt and compress Lucene Index storage transparently reducing disk usage and not exposing your relational data outside the DB increasing risk or violating SOX company regulations.</li>
<li>The JVM included inside the Oracle database can scale up to 10.000+ concurrent sessions without memory leaks or deadlock and all the operations on tables are in the same memory space!!</li>
</ul>
<p>More on this, Oracle provides a Data Cartridge API (ODCI), also called Extensible Indexing mechanism because you can write your own Domain Index and integrate it with the Oracle Engine and optimizer.</p>
<p>There are some important points integrating Lucene by using ODCI:</p>
<ul>
<li>Changes on rows are automatically notified to Lucene, now these changes are en-queued using Oracle AQ. User can control if these changes are applied OnLine (immediately after commit) or Deferred (application Sync).</li>
<li>Oracle optimizer can choose a proper execution plan if there is a Domain Index created.</li>
<li>You can mix <code>lcontains()</code>, <code>lhighlight()</code>, <code>lscore()</code> and many other operators, procedures or functions in your queries.</li>
</ul>
<h2>1.3 Why do I use Lucene Domain Index?</h2>
<p>Oracle includes a full featured and enterprise dedicated text search engine named <a href="http://www.oracle.com/technology/products/text/index.html">Oracle Text</a>, being coded in C and fully integrated into the Oracle kernel, but:</p>
<ul>
<li>on Oracle Text you can not:
<ul>
<li>control which functionality will be included into next release</li>
<li>easily customize it for your needs</li>
<li>index Index Organized Tables (IOT)</li>
<li>index joined tables</li>
<li>index unlimited extra columns</li>
<li>easily highlight text</li>
<li>index NCLOB and NVARCHAR data types</li>
</ul>
</li>
<li>on Oracle 10g you can not:
<ul>
<li>index multiple columns in a same index</li>
<li>sort and filter by using indexed columns at index level</li>
</ul>
</li>
<li>on Oracle 11g you can not:
<ul>
<li>filter by / sort by on columns of timestamp with TZ, commonly<br />
used in XMLDB because is the official data type for xsd:date type</li>
</ul>
</li>
<li>using Lucene Domain Index you can:
<ul>
<li>usually indexes are smaller because Lucene Domain Index do not store any column, except the rowid, inside Lucene&#8217;s inverted index structure. By using a rowid Oracle can lookup any column value faster than retrieve it from Lucene inverted index</li>
<li>Support padding for Text columns</li>
<li>Support formatting (rounding/padding) for Number and Date/Time columns</li>
<li>You can create index on-line even in a standard edition databases (feature available en EE for Text)</li>
<li>Extending DefaultUserDataStore class an application can implement any data type mapping, specially BLOB which in common cases have non standard encoding</li>
<li>An experimental native REST WS can be used to query the index</li>
<li>Lucene inverted index is transactional, if a SQL operation is rolled back, the index will be consistent too, avoiding phantom reads or negative hits (rows which should be included as hit but was not included in Lucene index)</li>
<li>is a ready to use uptodate solution for any programming language, for example Ruby, .Net, Phyton or PHP</li>
<li>an elegant solution for highlighting text use pipeline table functions</li>
<li>a high level abstraction layer for Lucene IR library, developers only deal with SQL</li>
<li>transparent compression and encryption of Lucene storage if you enable Oracle Transparent Data Encryption and Secure File compression</li>
</ul>
</li>
</ul>
<h2>Doc Links</h2>
<p><a href="http://ludoix.wordpress.com/2011/03/11/lucene-domain-index-installing-and-testing/">Next / LDI Docs – 2 Installing and Testing</a></p>
<br />Filed under: <a href='http://ludoix.wordpress.com/category/documentation/'>Documentation</a> Tagged: <a href='http://ludoix.wordpress.com/tag/docs/'>docs</a>, <a href='http://ludoix.wordpress.com/tag/overwiew/'>overwiew</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ludoix.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ludoix.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ludoix.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ludoix.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ludoix.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ludoix.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ludoix.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ludoix.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ludoix.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ludoix.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ludoix.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ludoix.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ludoix.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ludoix.wordpress.com/10/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ludoix.wordpress.com&amp;blog=20873609&amp;post=10&amp;subd=ludoix&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://ludoix.wordpress.com/2011/03/06/lucene-domain-index-introduction/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/3690228eaee545ab583565df1e16b64a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ludoix</media:title>
		</media:content>
	</item>
	</channel>
</rss>
