Apex and Oracle Text - Sage Computing Services
Transcription
Apex and Oracle Text - Sage Computing Services
SAGE Computing Services Customised Oracle Training Workshops and Consulting Oracle Text in Apex Advanced Indexing Techniques Integrated with Application Express Scott Wesley Systems Consultant & Trainer Agenda • • • • • • • • • Introduction Architecture Fundamentals Considerations Setting Up Samples Index Maintenance Visualisation New Features Larry Lessig? the law is strangling creativity http://presentationzen.blogs.com/presentationzen/2005/10/the_lessig_meth.html http://www.ted.com/talks/larry_lessig_says_the_law_is_strangling_creativity.html Identity 2.0 – Dick Hardt http://identity20.com/media/OSCON2005/ who’s the Dick on your site Connor McDonald http://www.oracledba.co.uk so today’s going to be more like this and this after I show a few pictures who_am_i; http://strategy2c.wordpress.com/2009/01/10/strategy-for-goldfish-funny-illustration-by-frits/ balance Why use Oracle Application Express? Why use Oracle Text? What is Oracle Text? Document Collection Catalogue Information Document Classification Architecture Class Description Datastore How are your documents stored? Filter How can the documents be converted to plain text? Lexer What language is being indexed? Wordlist How should stem and fuzzy queries be expanded? Storage How should the index data be stored? Stop List What words or themes are not to be indexed? Section Group How are documents sections defined? 1) Example CREATE INDEX ctx_name ON my_names(name) INDEXTYPE IS ctxsys.context PARAMETERS ('DATASTORE CTXSYS.DEFAULT_DATASTORE'); SQL> 2 3 4 SELECT SCORE(1), name FROM my_names WHERE CONTAINS(name, 'fuzzy(john,,,weight)', 1) > 0 ORDER BY SCORE(1) DESC; SCORE(1) ---------100 100 70 70 63 63 52 48 NAME ---------------------------------------John John Jon Jon Joan Joan Jong Jona 8 rows selected. 2) Datastore CTXSYS.DEFAULT_DATASTORE BLOB BFiles Pointers to objects on file system URLs Pointers to objects on the intertube User Defined Why would you? 3) Index Type a) CONTEXT Document Collection large document size provides a score asynchronous index & table data CONTAINS b) CTXCAT Catalogue Information smaller documents text fragments multiple attributes set lists similar to typical index paradigm transactional CATSEARCH c) CTXRULE Document Classification routing information displace manual interaction not binary files MATCHES 4) Considerations location of text document format bypassing rows - images character set language fuzzy matching & stemming wildcard query performance stopwords & stopthemes query performance and storage of LOBs mixed queries 5) Setting up GRANT ctxapp TO ausoug; create & delete indexing preferences use Oracle Text PL/SQL supplied packages 1* select grantee, owner, table_name, privilege from dba_tab_privs where table_name = 'CTX_DDL' SQL> / GRANTEE -------------------CTXAPP APEX_040000 APEX_030200 AUSOUG XDB 5 rows selected. OWNER -----------CTXSYS CTXSYS CTXSYS CTXSYS CTXSYS TABLE_NAME -----------------------------CTX_DDL CTX_DDL CTX_DDL CTX_DDL CTX_DDL PRIVILEGE -------------------EXECUTE EXECUTE EXECUTE EXECUTE EXECUTE PLS-00201: identifier "string" must be declared CTX PL/SQL Packages GRANT GRANT GRANT GRANT GRANT GRANT GRANT GRANT EXECUTE EXECUTE EXECUTE EXECUTE EXECUTE EXECUTE EXECUTE EXECUTE ON ON ON ON ON ON ON ON CTXSYS.CTX_CLS TO ausoug; CTXSYS.CTX_DDL TO ausoug; CTXSYS.CTX_DOC TO ausoug; CTXSYS.CTX_OUTPUT TO ausoug; CTXSYS.CTX_QUERY TO ausoug; CTXSYS.CTX_REPORT TO ausoug; CTXSYS.CTX_THES TO ausoug; CTXSYS.CTX_ULEXER TO ausoug; Using URL Datastore in 11g CREATE ROLE apex_url_datastore_role; GRANT apex_url_datastore_role TO APEX_040000 WITH ADMIN OPTION; GRANT apex_url_datastore_role TO ausoug; EXEC ctxsys.ctx_adm.set_parameter ('file_access_role' ,'APEX_URL_DATASTORE_ROLE'); Demonstrations Script Description Ctx_blobs.sql Import & index a range of documents Ctx_bfiles.sql Import & index BFILE pointers Ctx_urls.sql Index & search URL references Ctx_dict.sql Index & search English dictionary words Ctx_views.sql Index view SQL text for impact analysis Ctx_apex_files.sql Duplicate and search Apex file repository Ctx_apex_backups.sql Hunt through your (automated) Apex app backups Ctx_names.sql Basic name filter options Ctx_products.sql Multiple column searches Ctx_category.sql Attribute based searching Ctx_classify.sql Classify documents into categories 6) Index maintenance indexing errors resume failed index ALTER INDEX ctx_surname REBUILD PARAMETERS ('resume memory 10m'); recreate index online (11g) EXEC ctx_ddl.recreate_index_online ('ctx_surname', 'replace lexer sw_lexer'); rebuilding an index ALTER INDEX ctx_surname REBUILD PARAMETERS('replace lexer sw_lexer') ONLINE; ctx_report.index_stats create table ausoug.my_stats (stats clob); declare x clob := null; begin for r_rec in (select * from ctxsys.ctx_indexes where idx_owner = 'AUSOUG' and idx_type = 'CONTEXT') loop ctx_report.index_stats(r_rec.idx_name,x); insert into ausoug.my_stats values (x); end loop; commit; dbms_lob.freetemporary(x); end; / 7) Data Dictionary SQL> select count(*) 2 from all_views 3 where owner = 'CTXSYS'; COUNT(*) ---------58 8) Common Questions DML operations on a CONTEXT index ctxsys.ctx_user_pending synchronise the index synchronize EXEC ctx_ddl.sync_index('ctx_surname'); dbms_job dbms_scheduler how often? optimise the index can get fragmented inverted index each entry contains list of documents DOG DOG DOG DOG - DOC1 DOC3 DOC5 DOC7 DOC9 DOC11 ctx_ddl.optimize_index capacity planning? Object of Interest Num Rows Table Size Index size 150k 7 27 28 34 1.5 Names 27k 1 6 Views 2k 7 2 Dictionary Documents BFiles 4 Product 1 URL 1 more text cleaner data less overhead document format next steps? read Application Developer’s Guide find examples experiment SAGE Computing Services Customised Oracle Training Workshops and Consulting Question time Presentations are available from our website: http://www.sagecomputing.com.au [email protected] [email protected] http://triangle-circle-square.blogspot.com