Apex and Oracle Text - Sage Computing Services

Transcription

Apex and Oracle Text - Sage Computing Services
SAGE Computing Services
Customised Oracle Training Workshops and Consulting
Oracle Text in Apex
Advanced Indexing Techniques Integrated with
Application Express
Scott Wesley
Systems Consultant & Trainer
Agenda
•
•
•
•
•
•
•
•
•
Introduction
Architecture
Fundamentals
Considerations
Setting Up
Samples
Index Maintenance
Visualisation
New Features
Larry Lessig?
the law is strangling creativity
http://presentationzen.blogs.com/presentationzen/2005/10/the_lessig_meth.html
http://www.ted.com/talks/larry_lessig_says_the_law_is_strangling_creativity.html
Identity 2.0 – Dick Hardt
http://identity20.com/media/OSCON2005/
who’s the Dick on your site
Connor McDonald
http://www.oracledba.co.uk
so today’s going to be more like this
and this
after I show a few pictures
who_am_i;
http://strategy2c.wordpress.com/2009/01/10/strategy-for-goldfish-funny-illustration-by-frits/
balance
Why use Oracle Application Express?
Why use Oracle Text?
What is Oracle Text?
Document Collection
Catalogue Information
Document Classification
Architecture
Class
Description
Datastore
How are your documents stored?
Filter
How can the documents be converted to plain text?
Lexer
What language is being indexed?
Wordlist
How should stem and fuzzy queries be expanded?
Storage
How should the index data be stored?
Stop List
What words or themes are not to be indexed?
Section Group
How are documents sections defined?
1) Example
CREATE INDEX ctx_name ON my_names(name)
INDEXTYPE IS ctxsys.context
PARAMETERS ('DATASTORE CTXSYS.DEFAULT_DATASTORE');
SQL>
2
3
4
SELECT SCORE(1), name
FROM my_names
WHERE CONTAINS(name, 'fuzzy(john,,,weight)', 1) > 0
ORDER BY SCORE(1) DESC;
SCORE(1)
---------100
100
70
70
63
63
52
48
NAME
---------------------------------------John
John
Jon
Jon
Joan
Joan
Jong
Jona
8 rows selected.
2) Datastore
CTXSYS.DEFAULT_DATASTORE
BLOB
BFiles
Pointers to objects on file system
URLs
Pointers to objects on the intertube
User Defined
Why would you?
3) Index Type
a) CONTEXT
Document Collection
large document size
provides a score
asynchronous index & table data
CONTAINS
b) CTXCAT
Catalogue Information
smaller documents
text fragments
multiple attributes
set lists
similar to typical index paradigm
transactional
CATSEARCH
c) CTXRULE
Document Classification
routing information
displace manual interaction
not binary files
MATCHES
4) Considerations
location of text
document format
bypassing rows - images
character set
language
fuzzy matching & stemming
wildcard query performance
stopwords & stopthemes
query performance and storage of LOBs
mixed queries
5) Setting up
GRANT ctxapp TO ausoug;
create & delete indexing preferences
use Oracle Text PL/SQL supplied packages
1* select grantee, owner, table_name, privilege from dba_tab_privs where table_name =
'CTX_DDL'
SQL> /
GRANTEE
-------------------CTXAPP
APEX_040000
APEX_030200
AUSOUG
XDB
5 rows selected.
OWNER
-----------CTXSYS
CTXSYS
CTXSYS
CTXSYS
CTXSYS
TABLE_NAME
-----------------------------CTX_DDL
CTX_DDL
CTX_DDL
CTX_DDL
CTX_DDL
PRIVILEGE
-------------------EXECUTE
EXECUTE
EXECUTE
EXECUTE
EXECUTE
PLS-00201: identifier "string" must be declared
CTX PL/SQL Packages
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
EXECUTE
EXECUTE
EXECUTE
EXECUTE
EXECUTE
EXECUTE
EXECUTE
EXECUTE
ON
ON
ON
ON
ON
ON
ON
ON
CTXSYS.CTX_CLS TO ausoug;
CTXSYS.CTX_DDL TO ausoug;
CTXSYS.CTX_DOC TO ausoug;
CTXSYS.CTX_OUTPUT TO ausoug;
CTXSYS.CTX_QUERY TO ausoug;
CTXSYS.CTX_REPORT TO ausoug;
CTXSYS.CTX_THES TO ausoug;
CTXSYS.CTX_ULEXER TO ausoug;
Using URL Datastore in 11g
CREATE ROLE apex_url_datastore_role;
GRANT apex_url_datastore_role TO APEX_040000
WITH ADMIN OPTION;
GRANT apex_url_datastore_role TO ausoug;
EXEC
ctxsys.ctx_adm.set_parameter
('file_access_role'
,'APEX_URL_DATASTORE_ROLE');
Demonstrations
Script
Description
Ctx_blobs.sql
Import & index a range of documents
Ctx_bfiles.sql
Import & index BFILE pointers
Ctx_urls.sql
Index & search URL references
Ctx_dict.sql
Index & search English dictionary words
Ctx_views.sql
Index view SQL text for impact analysis
Ctx_apex_files.sql
Duplicate and search Apex file repository
Ctx_apex_backups.sql
Hunt through your (automated) Apex app backups
Ctx_names.sql
Basic name filter options
Ctx_products.sql
Multiple column searches
Ctx_category.sql
Attribute based searching
Ctx_classify.sql
Classify documents into categories
6) Index maintenance
indexing errors
resume failed index
ALTER INDEX ctx_surname
REBUILD PARAMETERS ('resume memory 10m');
recreate index online (11g)
EXEC ctx_ddl.recreate_index_online
('ctx_surname', 'replace lexer sw_lexer');
rebuilding an index
ALTER INDEX ctx_surname
REBUILD PARAMETERS('replace lexer sw_lexer')
ONLINE;
ctx_report.index_stats
create table ausoug.my_stats (stats clob);
declare
x clob := null;
begin
for r_rec in
(select *
from ctxsys.ctx_indexes
where idx_owner = 'AUSOUG'
and
idx_type = 'CONTEXT') loop
ctx_report.index_stats(r_rec.idx_name,x);
insert into ausoug.my_stats values (x);
end loop;
commit;
dbms_lob.freetemporary(x);
end;
/
7) Data Dictionary
SQL> select count(*)
2 from all_views
3 where owner = 'CTXSYS';
COUNT(*)
---------58
8) Common Questions
DML operations on a CONTEXT index
ctxsys.ctx_user_pending
synchronise the index
synchronize
EXEC ctx_ddl.sync_index('ctx_surname');
dbms_job
dbms_scheduler
how often?
optimise the index
can get fragmented
inverted index
each entry contains list of documents
DOG
DOG
DOG
DOG
-
DOC1 DOC3 DOC5
DOC7
DOC9
DOC11
ctx_ddl.optimize_index
capacity planning?
Object of Interest
Num
Rows
Table
Size
Index
size
150k
7
27
28
34
1.5
Names
27k
1
6
Views
2k
7
2
Dictionary
Documents
BFiles
4
Product
1
URL
1
more text
cleaner data
less overhead
document format
next steps?
read Application Developer’s Guide
find examples
experiment
SAGE Computing Services
Customised Oracle Training Workshops and Consulting
Question time
Presentations are available from our website:
http://www.sagecomputing.com.au
[email protected]
[email protected]
http://triangle-circle-square.blogspot.com