LDI Docs – Appendix E (Project Change Log)
March 11, 2011 Leave a comment
E Project Change Log
3.5.0.1.0 Second Release based on Lucene 3 (3.5.0, 23/Jan/12) core base
- Fully customizable Searcher and Updater process to enable JMX monitoring and time outs
- Updated to latest spellchecker interfaces
- Fixed missing changes to work with backported version on 10g
3.4.0.1.0 Second Release based on Lucene 3 (3.4.0, 25/Nov/11) core base
- Use latest merge policy implementation TieredMergePolicy
- Use total RAM reported by getJavaPoolSize() when setting MaxBufferedDocs
- Better error reporting when an Analyzer is not found
- Replaced execute immediate with open-fech-close functionality to avoid core dump on 10g when double check for deleted rowid
- Included a backported version of JUnit4 to jdk1.4 version for 10g releases
- Added a parallel updater process, when working in OnLine mode this process do write operations on LDI structure on behalf of the AQ process
- Delete do not longer required a write exclusive lock on index storage, now deletes are also enqueued as inserts or updates
- Updated source to Lucene 3.4.0 code, removed some deprecated API
3.0.2.1.0 Initial Release based on Lucene 3 (3.0.2, 14/Sep/10) core base
- Added a long awaited functionality, a parallel/shared/slave search process used during a start-fetch-close and CountHits function
- Added lfreqterms ancillary operator returning the freq terms array of rows visited
- Added lsimilarity ancillary operator returning a computed Levenshtein distance of the row visited
- Added a ldidyoumean pipeline table function using DidYouMean.indexDictionary storage
- Added test using SQLUnit
3.0.1.1.0 Initial Release based on Lucene 3 (3.0.1) core base
- Added IndexOnRam functionality by using RAMDirectory for building intermediate index after merging a new set of rows
- RewriteScore and SimilarityMethod can be used to get better result using wildcard operator
- Added auto complete functionality using lautocomplete pipe-line table function
- Removed all deprecated method usage and implementations
- New CVS repository only compatible with 11g, 10g version is implemented by using a retro-translator
2.9.2.1.0 Production Release based on Lucene 2.9 (2.9.2) core base
- Added elapsed time information when log level is INFO
- Removed deprecated usage of LUCENE_CURRENT constant
- Fixed facets inconsitence due ignore internal parameter ColName
- Initial implementation of DidYouMean functionality contributed by Pedro Pinheiro
- Temporary fix until Lucene defines clear semantics for Directory.fileLength (see Lucene issue 2316)
2.9.1.1.0 Production Release based on Lucene 2.9 (2.9.1) core base
- New Lucene Core base libraries
- Full Lucene Test Suites certified
- Fixed bug enqueue more rowids than required when using OnLine mode and ExtraTabs, WhereCondition parameters
- Fixed operator priority when WhereCondition have OR operator
- DefaultUserDataStore now uses an array of cached fields to improve performance
- Spanish Analyzer use latest ASCIIFoldingFilter
- high_freq_terms(idx_name,term,max_num_term) pipeline table function was added to return high frequent terms and the associated docFreq value
- index_terms(idx_name,term) pipeline table function was added to return a list of terms and their associated frequency
- DefaultUserDataStore now have support for ANALYZED, ANALYZED_WITH_VECTORS, ANALYZED_WITH_OFFSETS, ANALYZED_WITH_POSITIONS and ANALYZED_WITH_POSITIONS_OFFSETS Lucene Field option values
- OJVMLock was replaced by SingleInstanceLockFactory for per instance locking, cross sessions lockings are implemented by select for update functionality
- an automatic upgrade from 2.9.0 is possible without Index deletions or rebuild, you have to execute:
ant upgrade-domain-index ant ncomp-lucene-ojvm (10g only) ant jit-lucene-classes (11g only)
2.9.0.1.0 Production release based on Lucene 2.9.0 core base, 29/Sep/09
- Tested with Oracle 11gR2, 11gR1 and 10.2 databases
- DefaultUserDataStore do a SAX parsing to get text nodes and attributes from an XMLType value.
- A SimpleLRUCache is used to load rowids and his associated Lucene doc id, this reduce memory consumption when querying very big tables. A new parameters has been added, CachedRowIdSize by default 10000 to control the size of the LRU cache.
- Lucene Domain Index core was updated to use TopFieldCollector and to avoid computation time when lscore() is not used.
- Two new parameter has been added NormalizeScore which control when to track the Max Score and PreserveDocIdOrder when querying, both parameters are consequence of new Lucene Collector API and boost performance when querying.
- A table alias L$MT is defined for the master table associated to the index to be used in complex queries to associate columns from master tables and columns from dependent tables
2.4.1.1.0 (maintenance release based on Lucene 2.4.1, 27/Mar/09)
- Do not store internal parameters into system’s views and force to PopulateIndex:false
- After every sync, now files marked as deleted are purged to free BLOB storage
- Added lfacets aggregated function for doing facets
- CountHits function no longer requires sort argument
- Filter are stored/retrived only using QueryParser.toString() key
- UN_TOKENIZED format string at DefaultUserDataStore class was replaced by NOT_ANALYZED or NOT_ANALYZED_STORED according to new Lucene definitions.
- Fix bug when sync try to process more than 32767 rowids enqueued.
- Added parameters for highlighting functions Formatter, MaxNumFragmentsRequired, FragmentSeparator and FragmentSize.
- Added PerFieldAnalyzer parameter to use independent Analyzer for each columns.
- Added sample of a custom Formatter org.apache.lucene.search.highlight.MyHTMLFormatter
2.4.1.0.0 (first release based on Lucene 2.4.1, 9/Mar/09)
- Fix compatibility problem between 10g/11g SQL Date representation on pipeline table function.
2.4.0.1.0 (maintenance release based on Lucene 2.4.0, 10/Jan/09)
- Added Rhighlight(index_name VARCHAR2, qry VARCHAR2, cols VARCHAR2, rType IN VARCHAR2, rws IN SYS_REFCURSOR) RETURN ANYDATASET pipeline table function
- Added Phighlight(index_name VARCHAR2, qry VARCHAR2, cols VARCHAR2, stmt IN VARCHAR2) RETURN ANYDATASET pipeline table function
- Added lhighlight(NUMBER):VARCHAR2 ancilliary operator
- Removed usage of Lucene deprecated API (Hits and IndexWriter for example)
- Usage of FIRST_ROWS optimizer hits to decide how many rows load at first time
- sync, optimize and rebuild interfaces now use index_name or [owner,index_name] arguments
- A better build system to build Lucene Domain Index from sources
- More tests
- Tested against 11.1.0.7 and 10.2.0.3
- See online docs to see usage of FIRST_ROWS and lhighlight() operator
2.4.0.0.0 (production release based on Lucene 2.4.0, 10/10/08)
- Added parameter for CLOB enconding
- More Like this function
- NGram analyzer
- EnglishWikipediaAnalyzer
- DataStore interface include API for setting current connection
- Now analyzers, queries, snowball and WikiPedia contrib packages are required
2.3.2.0.0 (binary release based on Lucene 2.3.2, 1/Jun/08)
- Compiled against Lucene 2.3.2 production release
- Used latest API for merging based on RAM usage
- Use Writer for deleting during Sync
- Confirm 4x improvement during indexing reported by Lucene dev group
- Fix workaround which changes order of the rowids in ODCRIDList
- Added an Spanish WikiPedia Analyzer for testing
- Reports IOException instead of RunTimeException to signal EOF or File Not Found
- Decouple Flush functionality from TableIndexer
2.2.0.2.2 (fixpack for 2.2.0.2.0 release, 5/Apr/08)
- Added Rowid to lucene doc id caching.
- Usage of LoadFirstFieldSelector during Document loading to only load rowid field.
- Added a test suite which index a wikipedia dump inside the OJVM.
2.2.0.2.1 (fixpack for 2.2.0.2.0 release, 12/Dec/07)
- DefaultUserDataStore requires usage of XPath text() expresion for getting only textual value
- Added logging info SQL being executed at table indexer
- Change document logging to FINER level
- More pre-defined mapping at DefaultUserDataStore for NUMBER, BINARY_FLOAT, BINARY_DOUBLE, TIMESTAMP, TIMESTAMPTZ and TIMESTAMPLTZ Oracle types.
- New parameter PopulateIndex:[true|false] for populating or not Lucene Index at creation time.
- New parameter IncludeMasterColumn:[true|false], to choose whether or not index master column, useful with Virtual Columns and XMLType.
- New parameter BatchCount:integer, to choose how many rows count are enqueued for indexing using create … index … parameters(‘SyncMode:OnLine’);
- Creating an index with SyncMode:OnLine causes that LuceneDomain index will enqueue batchs of “BatchCount” rows for index by AQ PLSQL callback in background. Lucene Domain Index is intermediately ready for querying after create.
- Batch rowid indexing is doing using a pipeline function.
2.2.0.2.0 (third major release synchronized with Lucene 2.2.0, 12/Dec/07)
http://sourceforge.net/project/showfiles.php?group_id=56183
# CVS access: cvs -d:pserver:anonymous@dbprism.cvs.sourceforge.net:/cvsroot/dbprism login cvs -z3 -d:pserver:anonymous@dbprism.cvs.sourceforge.net:/cvsroot/dbprism co -P ojvm
- sort by column passed at lcontains(col,query_parser_str,sort_str,corr_id) syntax
- Logging support using Java Util Logging package
- JUnit test suites emulating middle tier environment
- Support for rebuild and optimize online for SyncMode:OnLine index
- XMLDB Export
- AutoTuneMemory parameter for replacing MaxBufferedDocs parameter
- Functional column support
2.2.0.1.1 (second release, 27/Sep/07 05:39 AM)
https://issues.apache.org/jira/secure/attachment/12366661/ojvm-09-27-07.tar.gz
# CVS access: cvs -d:pserver:anonymous@dbprism.cvs.sourceforge.net:/cvsroot/dbprism login cvs -z3 -d:pserver:anonymous@dbprism.cvs.sourceforge.net:/cvsroot/dbprism co -P ojvm
- LuceneDomainIndex.countHits() function to replace select count from .. where lcontains(..)>0 syntax.
- support inline pagination at lcontains(col,’rownum:[n TO m] AND …”) function
- rounding and padding support for columns date, timestamp, mumber, float, varchar2 and char
- ODCI API array DML support
- BLOB parameter support
2.2.0.1.0 (first release synchronized with lucene 2.2.0, 14/Sep/07 06:44 AM)
# CVS access: cvs -d:pserver:anonymous@dbprism.cvs.sourceforge.net:/cvsroot/dbprism login cvs -z3 -d:pserver:anonymous@dbprism.cvs.sourceforge.net:/cvsroot/dbprism co -P ojvm
- Synchronized with latest Lucene 2.2.0 production
- Replaced in memory storage using Vector based implementation by direct BLOB IO, reducing memory usage for large index.
- Support for user data stores, it means you can not only index one column at time (limited by Data Cartridge API on 10g), now you can index multiples columns at base table and columns on related tabled joined together.
- User Data Stores can be customized by the user, it means writing a simple Java Class users can control which column are indexed, padding used or any other functionality previous to document adding step.
- There is a DefaultUserDataStore which gets all columns of the query and built a Lucene Document with Fields representing each database columns these fields are automatically padded if they have NUMBER or rounded if they have DATE data, for example.
- lcontains() SQL operator support full Lucene’s QueryParser syntax to provide access to all columns indexed, see examples below.
- Support for DOMAIN_INDEX_SORT and FIRST_ROWS hint, it means that if you want to get rows order by lscore() operator (ascending,descending) the optimizer hint will assume that Lucene Domain Index will returns rowids in proper order avoided an inline-view to sort it.
- Automatic index synchronization by using AQ’s Call Back.
- Lucene Domain Index creates extra tables named IndexName$T and an Oracle AQ named IndexName$Q with his storage table IndexName$QT at user’s schema, so you can alter storage’s preference if you want.
- ojvm project is at SourceForge.net CVS, so anybody can get it and collaborate
- Tested against 10gR2 and 11gR1 database.
2.0.0.1.3 (third release, 09/Jan/07 11:40 AM)
https://issues.apache.org/jira/secure/attachment/12348574/ojvm-01-09-07.tar.gz
- The Data Cartridge API is used without column data to reduce the data stored on the queue of changes and speedup the operation of the synchronize method.
- Query Hits are cached associated to the index search and the string returned by the QueryParser.toString() method.
- If no ancillary operator is used in the select, do not store the score list.
- The “Stemmer” argument is recognized as parameter given the argument for the SnowBall analyzer, for example:
create index it1 on t1(f2) indextype is lucene.LuceneIndex parameters('Stemmer:English'); - Before installing the ojvm extension is necessary to execute “ant jar-core” on the snowball directory.
- The IndexWriter.setUseCompoundFile(false) is called to use multi file storage (faster than the compound file) because there is no file descriptor limitation inside the OJVM, BLOBs are used instead of File.
- Files are marked for deletion and they are purged when calling to Sync or Optimize methods.
- Blob are created and populated in one call using Oracle SQL RETURNING information.
- A testing script for using OE sample schema, with query comparisons against Oracle Text ctxsys.context index.
2.0.0.1.2 (second release, 20/Dec/06 02:03 PM)
https://issues.apache.org/jira/secure/attachment/12347614/ojvm-12-20-06.tar.gz
- This new release of the OJVMDirectory Lucene Store includes a fully functional Oracle Domain Index with a queue for update/insert massive operations and a lot of performance improvement.
2.0.0.1.1 (first release, 28/Nov/06 01:04 PM)
https://issues.apache.org/jira/secure/attachment/12345967/ojvm-11-28-06.tar.gz
- The complet API for the Oracle Domain index was completed, but the solution for the operator contains outside the where clause is not good.
- I will implement a singleton solution for the OJVMDirectory object when is used in read only mode, typically when user performs select operations against tables which have columns indexed with Lucene. This implementation will increase a lot the final performance because the index reader will be ready for each select operation. Obviously I will check if another user or thread makes a write operation on the index to reload the read-only singleton.
- The queue for storing the changes on the index is not implemented yet, I’ll add it in a short time.
2.0.0.1.0 (initial implementation, 22/Nov/06 03:45 PM)
https://issues.apache.org/jira/secure/attachment/12345516/ojvm.tar.gz
Doc Links
Previous / LDI Docs – Appendix D (Functions, operators and utilities)






Recent Comments