EMC Documentum Search Development Guide

Transcription

EMC® Documentum®
Version 7.2
Search Development Guide
EMC Corporation
Corporate Headquarters:
Hopkinton, MA 01748–9103
1–508–435–1000
www.EMC.com
Copyright ©1999-2015 EMC Corporation. All rights reserved.
EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without
notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED "AS IS." EMC CORPORATION MAKES NO
REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION,
AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR
PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.
For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. Adobe and Adobe PDF
Library are trademarks or registered trademarks of Adobe Systems Inc. in the U.S. and other countries. All other trademarks used
herein are the property of their respective owners.
Documentation Feedback
Your opinion matters. We want to hear from you regarding our product documentation. If you have feedback about how we can
make our documentation better or easier to use, please send us your feedback directly at [email protected].
Table of Contents
Chapter 1
Indexing and Querying Full-text Indexes..........................................................7
Introduction to Indexing.......................................................................................7
Controlling what is indexed .................................................................................8
How queries are processed.................................................................................8
DQL hints ........................................................................................................10
Extended object search ....................................................................................14
Chapter 2
Configuring and Customizing DFC Search.....................................................19
Configuring DFC search ...................................................................................19
DFC query builder ............................................................................................24
Transforming a query with a filter.......................................................................25
DFC database queries......................................................................................28
Hello World DFC search ...................................................................................28
DFC customization examples............................................................................31
Chapter 3
Customizing Search with DFS ........................................................................39
DFS Search Services .......................................................................................39
Full-text and database searches........................................................................39
Constructing a search.......................................................................................40
Search service objects......................................................................................42
Search service operations.................................................................................48
Chapter 4
Configuring and Customizing Webtop Search ...............................................63
About WDK search...........................................................................................63
Wildcards, lemmatization, and word fragments...................................................66
Configuring search controls...............................................................................67
Configuring the basic search component............................................................68
Configuring the advanced search component .....................................................69
Configuring search results ................................................................................72
Configuring Webtop Federated Search clustering ...............................................74
Modifying search component JSP pages............................................................75
Modifying a search component query.................................................................79
Chapter 5
Configuring CenterStage Search....................................................................85
Set Federated Search Services options .............................................................85
Improving search performance ..........................................................................86
EMC Documentum Version 7.2 Search Development Guide
3
Table of Contents
Chapter 6
Troubleshooting .............................................................................................87
Troubleshooting Search ....................................................................................87
Problem queries ...............................................................................................89
Debugging .......................................................................................................91
Appendix A
4
DFC schemas .................................................................................................93
Preface
This document summarizes information for developers who customize search in their Content
Server client applications. When you customize search, you may need information about several
different products: Content Server, xPlore index server, DQL, DFC, DFS, and WDK. The
information in this document is drawn from the following sources:
• EMC Documentum Content Server Administration and Configuration Guide
• EMC Documentum Content Server DQL Reference
• DFC Javadocs
• EMC Documentum Foundation Services Development Guide
• EMC Documentum WDK Development Guide
• EMC Documentum WDK Reference Guide
Some information appears in this guide that is not available in other product guides.
When you are familiar with the Content Server data model and indexing, you can design queries and
search customizations and troubleshoot query performance. Web Development Kit (WDK) provides
you with tools to display query-generating pages and results pages in web-accessible applications.
DFC and DFS allow you to build a query within a client application.
This document does not describe how to set up and configure an xPlore server or a Federated
Search Services (FS2) server. (FS2 server and FS2 adapters are required for federated search, that
is, searches against external sources, not Documentum repositories.) For information on installing
and configuring an xPlore index server and index agent, see EMC Documentum xPlore Installation
Guide and EMC Documentum xPlore Administration and Development Guide. For information on
developing an FS2 adapter, see EMC Documentum Federated Search Services Development Guide.
If you need assistance in implementing your customizations, contact EMC Professional Services
or EMC Developer support.
Intended Audience
This guide is directed to administrators and Java developers who are developing customized DFC,
DFS, or WDK-based clients of the Content Server. The customization tasks described in this guide
use Java, JSP, XML, XQuery and XPath, JavaScript, and DQL.
Conventions
This manual uses the following conventions in the syntax descriptions and examples.
Table 1
Syntax conventions
Convention
Identifies
italics
A variable for which you must provide a value
5
Preface
Convention
Identifies
[ ] square brackets
An optional argument that is included only once
xplore_home
Installation directory for xPlore
DM_HOME
Installation directory for Content Server
Revision history
The following changes have been made to this document.
Table 2
6
Revision history
Revision Date
Description
February 2015
Initial publication.
Chapter 1
Indexing and Querying Full-text Indexes
This chapter contains the following topics:
•
Introduction to Indexing
•
Controlling what is indexed
•
How queries are processed
•
DQL hints
•
Extended object search
Introduction to Indexing
This chapter provides a brief overview of the indexing process, the indexes, and the software
components that perform indexing and searching. For information on Documentum xPlore (xPlore)
installation, administration, configuration and customization, refer to EMC Documentum xPlore
Administration and Development Guide.
The Content Server Installation Guide contains information on installing Content Server. The EMC
Documentum xPlore Installation Guide contains information on installing the index agent and index
server. See the EMC Community Network Documentum search and analytics forum to post your
questions and see solutions offered by other customers and EMC employees.
Content Server
Full-text indexing is enabled in the repository by default when the repository is created or upgraded
to the latest Content Server version. However, Content Server itself does not create or maintain the
full-text index. Install xPlore to create and maintain the index.
The Content Server manages documents in a repository, generates full-text indexing events, queries the
index, and returns query results to client applications.
Index agent
The xPlore index agent is a multithreaded Java application running in the Content Server application
server. Run the xPlore installer to install an index agent on a Content Server host or a separate host.
The index agent processes index queue items generated by Content Server and prepares objects for
indexing. The index agent creates a representation of the indexable SysObjects using the DFTXML
schema. xPlore processes the DFTXML for indexing in the internal xDB database.
7
xPlore
The xPlore indexing server creates full-text indexes and responds to full-text queries from Content
Server client applications. The index itself is a Lucene index managed by an XML database (xDB).
xPlore can be installed on the Content Server host that meets the xPlore environment requirements.
For better performance, install xPlore on a separate host. For complete information on installing and
running xPlore, refer to EMC Documentum xPlore Installation Guide.
Controlling what is indexed
A full-text index is an index on the properties and content of files associated with objects of SysObjects
and SysObject subtypes. When you search for values in a full-text index, you can retrieve objects with
properties or content associated with your search terms. All characters are stored as lowercase in
the index. Case sensitivity is not configurable.
Content files and properties in all supported languages are indexed by default. All standard Unicode
character sets are supported. No special configuration is necessary. For tested languages in xPlore,
refer to EMC Documentum xPlore Administration and Development Guide.
To control what is indexed, set the properties on individual objects, object types, or formats in
Documentum Administrator. Configure stop words or special characters in xPlore. You can also
limit indexing by file size or text content size. For complete information on these controls, see EMC
Documentum xPlore Administration and Development Guide.
Lemmatization is applied to indexed documents and to queries. Lemmatization analyzes a word for
its context (part of speech), and the canonical form of a word (lemma) is indexed. The extracted
lemmas are actual words. Lemmatization saves both the indexed term and its canonical form in
the index, effectively doubling the size of the index. You can turn off lemmatization in xPlore or
configure lemmatization for specific elements. Refer to EMC Documentum xPlore Administration and
Development Guide.
How queries are processed
FTDQL is a subset of Document Query Language (DQL) and is used for querying full-text indexes.
DQL and FTDQL are fully documented in Content Server DQL Reference. DFC- and DFS-based
client applications like Webtop or TaskSpace translate queries into an XQuery statement.
Your application can also issue DQL queries, which can be configured to run against the database or
against the full-text index. The Content Server query plugin for xPlore translates a DQL into an
XQuery expression unless XQuery generation is turned off. (For instructions on turning off XQuery
generation, see EMC Documentum xPlore Administration and Development Guide.
Note: It is not recommended to turn off XQuery processing. If you do, you cannot use facets, native
xPlore security, and other performance enhancements.
For detailed information on query processing, including wildcards and fuzzy search, see EMC
8
Security of query results
Content Server user, group, and object permissions are applied to query results either in the xPlore
server (default) or in Content Server. Performance is faster with native xPlore security, because
results are not sent back to the Content Server and discarded for users who do not have appropriate
permissions. Security is configurable in xPlore. Refer to EMC Documentum xPlore Administration
and Development Guide:
Clients like WDK, DFC, and DFS do not apply permissions to search results Changes to permissions
are replicated to xPlore as they happen, with some small latency. You can decrease the latency by
setting up a separate index agent dedicated to ACLs and groups.
Faceted results
Faceted search, also called guided navigation, allows users to explore large data sets to locate items
of interest. You can define facets for the attributes that are used most commonly for search. After
facets are computed and the results of the initial query are presented in facets, the user can drill
down to areas of interest.
Multiple attributes can be used to compute a facet, for example, r_modifier or keywords. Faceted
navigation has several advantages over a keyword search or explicit query:
• The user can explore an unknown data set by restricting values suggested by the search service.
• The data set is presented in a visual interface, so that the user can drill down rather than constructing
a query in a complicated UI.
• Faceted navigation prevents dead-end queries by limiting the restriction values to non-empty
results. The query is reissued for the selected facets.
Facets are computed on discrete values, for example, authors, categories, tags, and date or numeric
ranges. Facets are not computed on text fields such as content or object name. Facet results are not
localized; the client application must provide localization. For information on creating facets, refer to
EMC Documentum xPlore Administration and Development Guide.
When to use a database query
Full-text queries have more capability for natural language and free-text searching than database
queries. These queries generally perform better than database queries because the index is optimized
and security is performed in the xPlore server. If security is performed in the Content Server,
non-permitted results are returned to the Content Server and then discarded.
In DFC clients, all search component queries are full-text queries unless a DQL hints file is in place
and you have turned off automatic XQuery generation in dfc.properties. The hints file allows you to
specify certain conditions under which a database is done in place of a full-text query. For information
on the hints file, see DQL hints, page 10.
A selection in the Webtop UI labeled Include recently modified properties searches for attribute
values in the database instead of the full-text index: A NOFTDQL search on attributes.) This option is
not enabled out of the box and requires configuration.
Note: For attributes that are queried against the database, create an index in the database.
9
DQL hints
DQL hints can be added to a query to change query behavior. For information on all DQL hints, refer
to EMC Documentum Content Server DQL Reference. For tips on migrating DQL hints to xPlore, see
EMC Documentum xPlore Administration and Development Guide.
The ENABLE(FTDQL) hint causes the Content Server to attempt to execute the query as an FTDQL
query. If the remaining syntax in the query conforms to the required syntax for an FTDQL query, the
query is executed as an FTDQL query. If the syntax does not conform to FTDQL query rules, an
error is returned.
The TRY_FTDQL_FIRST hint is added to all queries that are built with the DFC query builder
package. This hint handles timeouts and resource exceptions returned from xPlore by querying the
attributes portion of a query against the repository database.
You can turn off FTDQL for the attribute portion of a query with the hint ENABLE(NOFTDQL), like
the following query:
Select r_object_id from dm_document SEARCH DOCUMENT
CONTAINS ’foo’ WHERE object_name = ’bar’ ENABLE(NOFTDQL)
You cannot use a DQL hints file with xPlore unless you turn off automatic XQuery generation. The
portion of the query covered by hints file criteria is run against the database, and the remainder of the
query is run against the full-text index. However, when XQuery generation is turned off, search
performance is worse. Some search features do not work without XQuery such as : facets, paging,
and parallel queries.
Using a DQL hints file
If a DQL hints file is present on the application server, and XQuery generation is turned off, DFC reads
it. DFC applies the hints to queries based on conditions defined in the file. The remainder of the
query is run against the full-text index. You can define conditions under which the hints are applied,
for example, for certain object types, attributes, or repositories. DQL hints, page 10 describes the
behavior governed by the hints file.
The DQL hints file location is specified in the DFC configuration file dfc.properties on the application
server host. The file must be named dfc.dqlhints.xml. If the file has been modified, it is reloaded every
two minutes. The following line could be added to dfc.properties to specify a Windows location
for the hints file:
dfc.dqlhints.file=C:/Documentum/config/dfc-dqlhints.xml
Alternatively, you can place a DQL hints file in the application server host system classpath or as a
system environment variable, for example:
-Ddfc.dqlhints.file=path_to_hints_file
Use forward slashes for paths in Java properties file (back slash is used for escape). Alternatively, the
file can be loaded from classpath or the DFC data home directory on the application server host.
See DQL hints file DTD, page 93 for the hints file DTD.
10
Hints file elements
The following elements are contained within a root <RuleSet> element to define the hints passed
to IDfQueryManager.
Table 3
DQL hints file elements
Element
Description
<Rule>
Can have zero to many <Condition> elements
<DisableFullText/>
Disables full-text search on basic search or attributes for the conditions in the rule
<DisableFTDQL/>
Disables search for metadata in the FT index.
<Condition>
Child elements are ANDed
<Select>, <Where>
Child <Attribute> elements can be ANDed (condition="all") or ORed
(condition="any")
<SelectOption>
Adds a permission, for example, FOR READ or FOR BROWSE. For example,
FOR DELETE would limit the results of a query that meets the condition to those
documents on which the user has delete permission. The following example
applies to all Webtop queries:
<RuleSet>
<Rule>
<Condition>
<Where>
<Attribute operator="like">object_name</Attribute>
</Where>
</Condition>
<SelectOption>FOR DELETE</SelectOption>
<DisableFTDQL/>
</Rule>
</RuleSet>
<From>
Child <Type> elements can be ANDed (condition="all") or ORed
(condition="any")
<Docbase>
The value of this element corresponds to a repository to which the hint applies.
The descend attribute is optiona. Default=false. To apply the DQL hint to a folder
and all its subfolders, set descend=true.
<Attribute>, <Type>,
Support Java regular expression (java.util.regex.Pattern). For example,
<Docbase>
<type>custom.*</type> matches all type names beginning with "custom".
11
Element
Description
<Attribute>
Operator "like" represents DQL predicates CONTAINS and LIKE. The value
"is_null" represents DQL predicates NULL, NULLINT, NULLSTRING, and
NULLDATE.
<FulltextExpression>
Child of <condition>. Set the mandatory exists attribute to false to add
ENABLE(NOFTDQL) to the query when there is no full-text expression in the
search.
<DQLHint>
Contains any valid DQL hint. For the full list of DQL hints, refer to Content
Server DQL Reference.
Hints file examples
To send all queries on attributes to the database, define the following hint. The query must not contain
a full-text search expression.
<RuleSet>
<Rule>
<Condition>
<FulltextExpression exists="false"/>
</Condition></Rule></RuleSet>
If you disable FTDQL for specific conditions defined within the <rule> element, the attributes portion
of the query that meets those conditions is issued against the database.
A temp table is populated with the full-text result. If the full-text query is unselective, then the temp
table is large, negatively impacting response time.
In the following example, FTDQL is turned off for queries on the object_name attribute that use the
"like" operator. (In the Webtop UI, the like operator is "contains", "begins with", or "ends with".)
Multiple attributes can be added to the rule.
<RuleSet>
<Rule>
<DQLHint>ENABLE(FT_CONTAIN_FRAGMENT)</DQLHint></Rule></RuleSet>
In the following example, attributes for the specified object type are queried in the database, not the
full-text index:
<RuleSet>
<Rule>
<Condition>
<From condition="any">
<Type>km_message</Type>
</From>
</Condition>
<DisableFTDQL/>
</Rule>
</RuleSet>
The following example adds two hints to wildcard queries on either of two attributes:
12
<RuleSet>
<Rule>
<Condition>
<Where condition="any">
<Attribute operator="like">subject</Attribute>
</Where>
</Condition>
<DQLHint>ENABLE(SQL_DEF_RESULT_SET 100, NOFTDQL)</DQLHint>
<DisableFTDQL/>
</Rule>
</RuleSet>
In the following hints file, one rule applies to queries for one attribute, the second rule applies to a
different attribute:
<RuleSet>
<Rule>
<Condition>
<Attribute operator="like">subject</Attribute>
</Where>
</Condition>
<DQLHint> ENABLE(SQL_DEF_RESULT_SET 100, NOFTDQL) </DQLHint>
<DisableFTDQL/>
</Rule>
<Rule>
<Condition>
</Where>
</Condition>
<DQLHint> ENABLE(SQL_DEF_RESULT_SET 10) </DQLHint>
<DisableFTDQL/>
</Rule>
</RuleSet>
Make sure that your multiple rules are mutually exclusive when applied to a single query. If not, the
query generates a DQL syntax error. If the Webtop user adds both attributes to the query (subject and
object_name), this hints file example throws an error.
You can turn off FTDQL for attribute queries in a repository, adding conditions as needed, as shown in
the following example:
<Rule>
<Condition>
<Docbase>
<Name>support</Name>
</Docbase>
</Condition>
<DisableFTDQL/>
</Rule>
You can turn off FTDQL for FOLDER(DESCEND) queries. In Webtop, this hint turns off FTDQL
for searches from current location or some other specific location instead of from the repository
root. If there are many subfolders, FOLDER(DESCEND) queries can time out. The following
13
example sends the attribute portion of the query to the database instead of the full-text index for
the specific repository. The descend attribute specifies whether to apply the condition and hint to
FOLDER(DESCEND) queries:
<Rule>
<Condition>
<Docbase>
<Name descend="true">dm_notes</Name></Docbase>
</Condition>
<DisableFTDQL/>
</Rule>
DQL hints and Webtop search components
The Webtop search components use the DFC query builder package to construct a query. If XQuery
generation is turned off, the DFC query builder adds the DQL hint TRY_FTDQL_FIRST. This hint
prevents timeouts and resource exceptions by querying the attributes portion of a query against the
repository database. The query builder also bypasses lemmatization by using a DQL hint for wildcard
and phrase searches.
If wildcard attribute searches ("contains", "begins with", "ends with") have many results, they can
time out. These searches have been optimized in xPlore, but the optimization is not applied when
XQuery generation is turned off. You can configure xPlore to support wildcard searches without using
DQL and without turning off XQuery generation.
Extended object search
Extended object search (EOS) allows you searching in the content or attributes of more than one
object when the objects are related in some way. For example, you can search both an email and its
attachments for content. EOS also allows you searching on augmented content. For example, you can
inject data from external repositories to enrich the content indexed by xPlore.
To support an extended object, you define a mapping that is independent from the storage format. For
example, an extended object definition represents emails. The definition combines attributes for
more than one object type.
You create a mapping file for the main interface. Your search application uses the DFC query builder
API to query the join of objects or tables as though it were a single object. In the addResultAttribute()
and addSimpleAttrExpression() methods, you add aliases that are defined in your mapping file. These
procedures are described in detail in the following topics. You can also use the aliases in facets.
Note: Starting in version 7.0, the DQL mapping and the mapping deployment mechanism using an
SBO are deprecated. They are only supported for backward compatibility.
The following diagram illustrates the steps necessary to implement EOS:
This section focuses on the last two steps: defining (and deploying) the EOS mapping and defining
a custom query.
14
Creating a mapping file
A mapping applies to all types. Multiple mappings can apply at the same time. The mapping loader
merges all the mappings.
If several mappings apply to the same attribute, they are incompatible and the system throws an
error at query time.
For the mapping files schema, see Extended object search schema, page 93.
In the mapping file, you define interfaces that the DFC query builder can instantiate. The following
example defines the main interface of the mapping as IDmDoc:
<interface name=’IDmDoc’>
You add aliases to the interface that can be used in your queries. The alias can map to other interfaces
or to qualified Documentum attributes. Use the map-to attribute of the alias element for this mapping.
The map-to value is a path within the DFTXML representation of the input document, for example,
map-to="dmftcustom>mediaAnnotations>annotation>author". The DFTXML schema is documented
in the appendix of EMC Documentum xPlore Administration and Development Guide.
Add interface elements that map to attributes. Add subinterfaces and reference them recursively from
an alias in the main interface. The following example shows the main interface, IDmDoc, and an alias
the subinterface IMgAnn. The aliases in the subinterface map to a path in the dmftcustom element of
the DFTXML representation of the main document. (A TBO injected this data.)


<alias name=’annotation’ map-to=’IMgAnn’ cardinality="MANY"/>
</interface>

<interface name=’IMgAnn’>
<alias name="author" map-to="
dmftcustom>mediaAnnotations>annotation>author" cardinality="MANY"/>
<alias name="content" map-to="dmftcustom>mediaAnnotations>annotation"
cardinality="MANY"/>
</interface>
Sample extended object mapping file
xPlore mapping (xploreMapping.xml)
<?xml version="1.0"?>
<doc:mapping xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:doc="http://www.documentum.com"
xsi:schemaLocation="
http://www.documentum.com ../../ressources/complex_objects_mapping.xsd">


<alias name=’annotation’ map-to=’IMgAnn’ cardinality="MANY"/>
</interface>
15

<interface name=’IMgAnn’>
<alias name="author" map-to="
dmftcustom>mediaAnnotations>annotation>author" cardinality="MANY"/>
<alias name="content" map-to="
dmftcustom>mediaAnnotations>annotation" cardinality="MANY"/>
</interface>
</doc:mapping>
Note: The map-to value is a path within the DFTXML representation of the input document.
Deploying EOS mappings in the repository
Deploy mappings in the repository to the folder: /System/Search/EOS/. The DFC Search Service scans
the folder and loads all the files in this folder as xPlore mappings.
If you modify a mapping file, the DFC Search Service dynamically reloads it. By default, the system
scans the mapping folder every minute and when a query is run. To modify this interval, set the
property : dfc.search.eos.mappingcache.refresh_interval in the dfc.properties file.
1. Create an XML file to define the mapping.
2. Name the file. While the filename is ignored by the DFC Search Service, we recommend to prefix
it by the namespace of the application that deploys it.
3. Import the file as a dm_document to /System/Search/EOS/. Files in sub-folders are ignored.
4. Make sure that the ACL for this file allows read access to anyone. We recommend to make it
read-only.
Deploying EOS mappings in classpath
Use classpath deployment if you have different mappings on each Content Server repository. Instead
of deploying the mapping files in the repository, a registration file defines an alternate location in the
classpath. The following procedure does not describe the creation of the XML mapping file.
1. Create a property file named sco.properties.
2. Add it to your DFC classpath, for example, in the folder that contains dfc.properties.
3. Edit the file sco.properties to add the properties such as :
complextype.xploremapping[0]=<filename>
complextype.xploremapping[1]=<filename2>
where <filename> can be either: an absolute filename, a relative filename (relative to the
application current folder), or a file in the classpath.
The DFC Search Service first looks in the file system then in absence of a matching file, it looks in the
classpath. For example, with the following property:
complextype.xploremapping[0]=com/documentum/test/fc/client/search/
TFileMappingLoader_sco.mapping.properties
The DFC Search Service looks in the classpath for a file named
TFileMappingLoader_sco.mapping.properties in the package com.documentum.test.fc.client.search.
16
Mappings deployed in the classpath are not reloaded dynamically. You must restart the application
to refresh the cache.
Adding metadata from other tables or objects to the main
document
The metadata that is referenced in an alias must be denormalized into the index for the main document
by a TBO or aspect. In this context, denormalization is the process of rendering normalized relational
data into a single XML structure within the DFTXML representation of the main document.
For the customization of injected metadata or joins, refer to EMC Documentum xPlore Administration
and Development Guide . The developer must subclass DfPersistentObject and override
customExportForIndexing to add custom nodes in the DFTXML.
Using extended object aliases in a DFC query
The aliases that you define in a mapping file can be used like any regular attribute in the DFC search
service. They can be used in constraints or as results attributes. In a DFS query, aliases for attributes
can be used in a PropertyExpression.
In the following example, the alias annotation/author is added as a result attribute and as a simple
attribute expression. The aliases are shown in bold in the mapping example.
IDfClient client = DfClient.getLocalClient();
m_searchService = client.newSearchService(m_sessionManager, docbase);
IDfQueryManager queryManager = m_searchService.newQueryMgr();
m_queryBuilder = queryManager.newQueryBuilder("dm_document");
m_queryBuilder.addSelectedSource(docbase);
m_queryBuilder.addResultAttribute("annotation/author");
//annotation author is our alias
m_queryBuilder.addResultAttribute("r_object_id");
m_queryBuilder.addResultAttribute("object_name");
// annotation/author alias is used again
exprSet.addSimpleAttrExpression("
annotation/author", IDfAttr.DM_STRING,
IDfSimpleAttrExpression.SEARCH_OP_CONTAINS, false, true, "value1");
m_processor = m_searchService.newQueryProcessor(m_queryBuilder, false);
m_processor.blockingSearch(600000);
The XQuery rendering of this query is the following:
let $libs := (’/MSSQL66ECI1/dsearch/Data’)
let $results := for $dm_doc score $s in collection($libs)/dmftdoc[
(dmftmetadata//a_is_hidden = "false") and (dmftversions/iscurrent = "
true") and (dmftinternal/i_all_types = "03110a1b80000129") and (
dmftcustom/mediaAnnotations/annotation/author ftcontains "value1"
with stemming)] order by $s descending return $dm_doc
return (for $dm_doc in subsequence($results,1,351) return <r>
{ for $attr in $dm_doc/dmftcustom/mediaAnnotations/annotation/author
return <alias name=’f0_f1’ type=’dmstring’>{string($attr)}</alias>}
17
{for $attr in $dm_doc/dmftmetadata//*[local-name()=(’r_object_id’)]
return <attr name=’{local-name($attr)}’ type=’{$attr/@dmfttype}’>
{string($attr)}</attr>}{xhive:highlight((
$dm_doc/dmftcontents/dmftcontent/dmftcontentref,$dm_doc/dmftcustom))}
<attr name=’score’ type=’dmdouble’>{string(dsearch:get-score($dm_doc))}
</attr></r>)
18
Chapter 2
Configuring and Customizing DFC
Search
•
Configuring DFC search
•
DFC query builder
•
Transforming a query with a filter
•
DFC database queries
•
Hello World DFC search
•
DFC customization examples
Configuring DFC search
The following options in dfc.properties configure search behavior in DFC and DFC clients such as
WDK and Webtop. This file is located in the Documentum home config directory as specified by the
dfc_data environment variable, for example, C:\Documentum\config or /tmp/Documentum/config.
This file includes settings to enable and configure FS2 for searching external (non-Documentum)
sources.
Optimizing query batch size
You can optimize query performance by setting a smaller batch size. The batch size is the number of
results returned at a time by xPlore. Set the batch size for an individual query, if you are constructing
the query in DFC. Set it for multiple queries in dfc.properties as the value of dfc.search.batch_hint_size.
Any value can be used for dfc.search.batch_hint_size, but larger values probably do not optimize.
Configuring search in dfc.properties
Table 4
Search options in dfc.properties
Parameter
Default value
Description
dfc.search.docbase.broker_count
20
Number of broker threads supporting execution of the
Documentum repository part of a query. One broker
supports execution of the query for each repository
selected for this query. min value: 0, max value: 1000
19
Configuring and Customizing DFC Search
Parameter
Default value
Description
dfc.search.external_sources.broker_count
30
Number of broker threads supporting execution of
the FS2 part of a query. One broker supports the
execution of the query for all external sources selected
for this query. min value: 0, max value: 1000
dfc.search.external_sources.enable
false
Set to true tells DFC to use FS2 in addition to Content
Server’s basic search facilities. For CenterStage Pro
deployments: true
dfc.search.external_sources.host
localhost
RMI registry host to connect to FS2 Server. For
information on the RMI registry, refer to EMC
Documentum Federated Search Services Development
Guide chapter on the application SDK.
dfc.search.external_sources.port
3005
RMI registry port to connect to FS2 Server. For
information on the RMI registry, refer to EMC
Documentum Federated Search Services Development
Guide chapter on the application SDK. min value: 0,
max value: 65535
dfc.search.external_sources.username
guest
Default credentials to connect to FS2 server as guest.
dfc.search.external_sources.password
askonce
Default credentials to connect to FS2 server as guest.
dfc.search.external_sources.backup.host
localhost
RMI registry host to connect to the backup FS2 Server.
The EMC Documentum Federated Search Services
Development Guide chapter on the application SDK
explains the RMI registry.
dfc.search.external_sources.backup.port
3005
RMI registry port to connect to the backup FS2 Server.
The EMC Documentum Federated Search Services
Development Guide chapter on the application SDK
explains the RMI registry. min value: 0, max value:
65535
dfc.search.external_sources.retry.period
300000
Time in milliseconds before retrying to connect to the
main FS2 server (after having switch to the backup
FS2 server). min value: 0, max value: 2147483647
dfc.search.external_sources.adapter.domain
JSP
Subdomain containing the source available to DFC.
By default, DFC uses the default domain of the
standalone FS2 WEB client. For CenterStage Pro
deployments: CenterStage
dfc.search.external_sources.request_timeout
180000
Time in milliseconds to wait for answer from FS2
server. min value: 0, max value: 10000000
dfc.search.external_sources.rmi_name
xtrim.RmiApi
RMI registry symbolic name associated with FS2 API.
dfc.search.external_sources.ssl.enable
false
Enable encryption of results and content sent from the
FS2 server to the DFC client.
dfc.search.external_sources.ssl.keystore
(none)
Define a keystore where to find DFC client certificate
and keys and FS2 Server trusted certificate.This
keystore is a file available locally on the machine
where the DFC resides.
20
Parameter
Default value
Description
dfc.search.external_sources.ssl.keystore_password
(none)
Define the password for the keystore file used for
communication with the FS2 server.
dfc.search.fulltext.enable
true
Use the Content Server full-text engine (for example,
xPlore). If you set this to false, DFC replaces DQL
full-text clauses by LIKE clauses on the following
attributes: object_name, title, subject.
dfc.search.matching_terms_computing.enable
true
If this property is enabled, the matching terms will
not be computed by the indexer but will be computed
locally by the DFC search service. This setting
can enhance performance, but variants will not be
included. If the source is not indexed, this property
is ignored because the matching terms are already
computed by DFC.
dfc.search.max_results
1000
Maximum number of results to retrieve by a query
search.
min value: 1, max value: 10000000
dfc.search.max_results_per_source
350
Maximum number of results to retrieve per source
by a query search.
dfc.search.sourcecache.refresh_interval
1200000
Time in milliseconds between refreshes of the search
source map cache.
dfc.search.typecache.refresh_interval
1200000
Time in milliseconds between refreshes of the cache
of type information.
dfc.search.formatcache.refresh_interval
1200000
Time in milliseconds between refreshes of the cache
of formats.
dfc.search.eos.mappingcache.refresh_interval
60000
Time in milliseconds between refreshes of the cache of
Extended Object Search (EOS) mapping information.
dfc.search.batch_hint_size
0
This controls both the client to server and server
to database batching of query data for the search
services only. If set, this property overrides the
DFC_BATCH_HINT_SIZE property value for all
queries generated by Search services. It can be used
to affect the performance based on the performance of
the network links. It is a hint in the sense that there
is no guarantee that the value will be honored; for
example if the number is too large it will be rounded
down.
For client to server traffic, it controls the number of
rows transported each time a new batch of rows is
needed in while processing a query collection. For
server to database traffic, this affects the number of
rows returned each time a database table is accessed.
The default value is usually adequate. Sometimes
21
Parameter
Default value
Description
a larger value can improve performance in a high
latency environment.
Configuring federated search ranking
xPlore returns a ranking of search results. xPlore uses the relevancy scoring of the underlying Lucene
index. If DFC relevancy configuration has been customized, it can combine with or override the
xPlore score. If you search over more than one source, ranking is recalculated based on the custom
ranking algorithm. If you search only one source, like xPlore or an external source, the score returned
by the source is used.
You can configure the weighting of criteria used for ranking the relevancy of search results from
xPlore and other sources. (For xPlore, source=<repository_name>.) A weight is a numerical value that
increases or decreases the importance of a search source or set of sources. DFC combines scores for
sources to produce a relevancy ranking that displays the most relevant results first.
Weights for relevancy ranking are configured in a file named dfc-searchranking.xml, located in
the Documentum home /config directory, for example, C:\Documentum\config. In WDK-based
applications, the Documentum home directory is under the application server executable directory.
Add this file to the Documentum/config subdirectory of the binary directory, for example,
CATALINA_HOME/bin/Documentum/config. You can specify an alternate location as the value of a
Java system property named dfc.searchranking.file.
The following table describes the elements that configure relevancy ranking. All elements are
contained within the root element <SearchRanking>.
Table 5
Relevancy ranking configuration elements
Element
Description
<SourceBonus>
Specifies a specific source or set of sources for which to provide bonus ranking.
Contains <AttributeQuery>, <FullTextQuery>, or both. The source attribute value
is a source name or a regular expression that defines the source. The type attribute
value can be used to restrict the source type to either repository or external.
<AttributeQuery>
Specifies a separate bonus for attributes. The source bonus is within [-1,1]
<FullTextQuery>
Specifies a separate bonus for full-text. The source bonus is within [-1,1]
<RankConfidence >
Decreases confidence ranking for specific source or set of sources. The value is
within [0,1]. The source attribute value is a source name or a regular expression
that defines the source. The type attribute value can be used to restrict the source
type to either repository or external.
<FullText>
Specifies a set of attributes to be added to the computation of the full-text factor.
By default, as a partial representation of the full-text score for a specific document,
the computation uses the concatenation of Dublin Core Metatdata Elements. You
can set one or more attributes to be used for the computation. Contains one
or more <Attribute> elements.
<Attribute>
Specifies an attribute to be weighted with the full-text score. The value is an
attribute or a regular expression that resolves one or more attributes.
22
Element
Description
<AttributeWeight>
Specifies the weight for a specific attribute value or values that match a regular
expression. The weight of an attribute is a positive number, relative to the other
attributes weight. By default, the title attribute weight is 2, all other attributes
have a neutral weight of 1. A weight of 0 negates the effect of the attribute. The
attribute attribute specifies the attribute or a regular expression that resolves one
or more attributes. The value attribute is optional and specifies a value or a regular
expression that resolves one or more values. The value is within [0+].
<RatingWeight>
Specifies the relative weight of the score from specific source types compared to
the relevancy ranking score (this last one is assigned a neutral weight of 1). With a
weight of 0 the score from the specific source is not taken into account; with a
weight of 100 or greater the relevancy ranking score is ignored (not computed).
The source attribute value is a source name or a regular expression that defines the
source. The type attribute value can be used to restrict the source type to either
repository or external. The rating weight is within [0+]. The following example
removes xPlore ranking:
<RatingWeight source="my_repository" >0</RatingWeight>
Note: Regular expression substitution is supported. For example, attribute=".*format.*"
resolves any attribute with the substring format in the name. The declaration
<Attribute>abstract.*|summary</Attribute> resolves any attribute starting with abstract, or the
summary attribute.
The DTD for this file is in DFC, so you do not need to provide it in your environment:
<!ELEMENT SearchRanking (SourceBonus*, RankConfidence*, FullText?,
AttributeWeight*, RatingWeight*)>
<!ELEMENT SourceBonus (AttributeQuery?, FullTextQuery?)>
<!ATTLIST SourceBonus source CDATA #IMPLIED>
<!ATTLIST SourceBonus type (any | docbase | external) "any">
<!ELEMENT AttributeQuery (#PCDATA)>
<!ELEMENT FullTextQuery (#PCDATA)>
<!ELEMENT RankConfidence (#PCDATA)>
<!ATTLIST RankConfidence source CDATA #IMPLIED>
<!ATTLIST RankConfidence type (any | docbase | external) "any">
<!ELEMENT FullText (Attribute*)>
<!ELEMENT Attribute (#PCDATA)>
<!ELEMENT AttributeWeight (#PCDATA)>
<!ATTLIST AttributeWeight attribute CDATA #REQUIRED>
<!ATTLIST AttributeWeight value CDATA #IMPLIED>
<!ELEMENT RatingWeight (#PCDATA)>
<!ATTLIST RatingWeight source CDATA #IMPLIED>
<!ATTLIST RatingWeight type (any | docbase | external) "any">
Adding a bonus for a specific source
The unified ranking score takes only into account the results metadata. You can give a bonus for a
specific source when you know that the source returns relevant results. In the following sample, a 0.3
bonus is added to the score of all results returned by the source named "good_source".
<SourceBonus source="good_source">
<AttributeQuery>0.3</AttributeQuery>
<FullTextQuery>0.3</FullTextQuery>
23
</SourceBonus>
Emphasizing a specific attribute
You can modify the relative weight of an attribute in the score. By default, the title attribute weight is
2, while other attributes have a weight of 1, which is a neutral value. If the title attribute is not very
relevant, you can assign other attributes a higher weight in the global score. You can also decrease the
weight of the title attribute. The following example demonstrates how to accentuate the effect of the
subject attribute in the global score.
<SearchRanking>
<AttributeWeight attribute="subject">4</AttributeWeight>
</SearchRanking>
DFC query builder
For information on DFC interfaces for use with the xPlore server, refer to EMC Documentum xPlore
There are two ways to execute a query in DFC:
• Simple query using IDfQuery. See DFC database queries, page 28
• Complex query using the DFC search service (query builder)
With IDfQueryBuilder, you can use DQL syntax to query one or more indexed or non-indexed Content
Servers. With Federated Search Services (FS2) product, you can query external sources and the client
desktop as well. IDfQueryBuilder provides a programmatic interface to change the query structure,
support external sources, support asynchronous operations, change display attributes, and perform
concurrent query execution in a federation.
IDfQueryBuilder allows you to build queries with the following information:
• Data to build the query
• Source list (required)
• Set max result count
• Get hit count (setHitCountRetrieved)
• Set the locale of the query (setLocale)
• Container of source names
• Transient search metadata manager bound to the query
• Transient query validation flag
• Attributes to order the results by: addOrderByAttribute()
• Add a facet definition
IDfQueryManager is an object-oriented interface to build a query. This interface does not manipulate a
String representation. It is internally responsible for translating the query to different language and
language levels: DQL, FTDQL, FS2 Query Language. In DFC or WDK-based search components,
use IDfQueryBuilder to access and manipulate queries.
Pools of query brokers queue and execute synchronous and asynchronous queries. There is one queue
for repositories and one queue for external sources. Each broker is a thread running in DFC that
24
executes a query on a single source. For example, a broker can execute an IDfQuery on a repository.
Brokers for external sources connect to FS2 brokers for repositories. In the following example, 30
brokers are configured in dfc.properties:
dfc.search.external_sources.brokers=30
dfc.search.docbase.brokers=20
Results and events such as progress or errors are returned as soon as they arrive. The following
illustration diagrams this asynchronous process:
IDfSourceMap maps the available repositories and external sources and their capabilities. Before
sending a query to a source, you can check the source capabilities. For example, you can verify
whether facets are supported, if FTDQL is supported, or if wildcards are supported. Refer to the
javadocs of the interface IDfSearchSource for DFC or RepositoryProperty for DFS for more details
about source capabilities. Querying external sources requires Federated Search Services.
IDfSearchMetadataMgr determines for the query builder what metadata is available from the selected
sources, such as available object types and data dictionary information about the types. The FS2 server
store and administration tools manage external sources. The Search Metadata Manager communicates
with the FS2 server to assemble a list of available sources. The search metadata manager has methods
to get types and their attributes from each source
With FS2, if the FS2 configuration file defines external custom types, they can be searched. An
external type is defined as a value of client.dfc.types. Additionally, dm_sysobject and dm_document
types are queried in external sources, but not all attributes of these types are available in the external
sources. For multi-repository searches, the first repository in a client search list is used as the metadata
model server. This model server is used to retrieve all data dictionary information.
Transforming a query with a filter
A search filter is a Java class or SBO that transforms a query before it is submitted or transforms the
results. For example, you can:
• Transform a query before it is sent for processing (DQL, XQuery, or FS2).
– Add new attributes that can be transformed to internal attributes.
– Direct which xPlore collection to query, for more efficient queries.
– Remove attributes that the target does not support.
– Add logging information for each query.
• Transform the query results before they are returned to the user.
– Add computed attributes to the results.
25
– Filter out results.
Implementing a filter
A search service filter implements one or more of the following interfaces in the
com.documentum.fc.client.search.filter package:
• IDfQueryFilter
• IDfFacetFilter
• IDfResultFilter
• IDfCompletionFilter
A filter can modify the data structure (query, results, or facets) and context parameters. It can send an
event that is retrievable by an IDfQueryStatus object.
The filter accesses the execution context through the IContext interface. This interface contains
runtime information: Session, application-specific properties, and backend information such as
whether the target is a repository or which index server is supported.
Deploying a filter
Choose one of the following to deploy a search service filter:
• Create a searchfilter.properties file in the application classpath. The class must also be in the
classpath. The file has the following form:
filterclass[0]=com.emc.documentum.filters.MyFilter
filterclass[1]=com.emc.documentum.filters.MyOtherFilter
• Package the filter class as an SBO. At runtime, DFC loads the filter class. This method is
recommended for a multi-repository environment.
Multiple filters are supported, but the order in which they are loaded is not configurable. You have
some control over filter order by implementing the interface IFilterOrderDependency.
Sample filter class
This example shows how to set the collection based on the object type set in the query. This filter does
static caching in the filter static fields. This field is lazily populated the first time a query is executed.
package com.documentum.test.fc.client.search.utils;
import
import
import
import
import
import
import
import
com.documentum.fc.client.search.filter.IDfQueryFilter;
com.documentum.fc.client.search.filter.IDfContext;
com.documentum.fc.client.search.IDfQueryDefinition;
com.documentum.fc.client.search.IDfQueryBuilder;
com.documentum.fc.client.search.IDfSearchSourceMap;
com.documentum.fc.client.search.IDfSearchSource;
com.documentum.fc.client.*;
com.documentum.fc.common.DfException;
import java.util.Map;
import java.util.Collections;
26
import java.util.HashMap;
public class CollectionFilter implements IDsQueryFilter
{
public IDfQueryDefinition filterQuery (
IDfContext context, IDfQueryDefinition query) throws DfException
{
if (query.isQueryBuilder())
{
IDfQueryBuilder builder = (IDfQueryBuilder) query;
IDfSearchSourceMap sourcesMap = query.getMetadataMgr().getSourceMap();
Iterable<String> sources = context.getSources();
for (String source : sources)
{
IDfSearchSource sourceDef = sourcesMap.getSource(source);
if (sourceDef.getType() == IDfSearchSource.SRC_TYPE_DOCBASE)
{
String collection = getCollection(context, source, builder);
if ((collection != null) && (collection.length()>0))
{
builder.addPartitionScope(source, collection);
}
}
}
}
return query;}
private String getCollection (
IDfContext context, String source, IDfQueryBuilder builder)
throws DfException
{
String typeName = builder.getObjectType();
Map<String, String> collectionMapping = getCollectionMapping(context);
String collection = collectionMapping.get(typeName);
if (collection == null)
{
IDfSessionManager sessionManager = context.getSessionManager();
IDfSession session = sessionManager.getSession(source);
try
{
while ((collection == null) && ((typeName != null) && (
typeName.length()>0)))
{
IDfType dfType = session.getType(typeName);
typeName = dfType.getSuperName();
if ((typeName != null) && (typeName.length()>0))
{
collection = collectionMapping.get(typeName);
}
}
}
finally
{
sessionManager.release(session);
}
if (collection != null)
{
collectionMapping.put(typeName, collection);
}
27
else
{
collectionMapping.put(typeName, "");
}
}
return collection;}
private static synchronized Map<String, String> getCollectionMapping (
IDfContext context)
{
if (m_collectionToTypeMapping == null)
{
m_collectionToTypeMapping = Collections.synchronizedMap(
new HashMap<String , String >());
// TODO: load the collection mapping from the classpath or a
// file in the repository. Here we hardcode the mapping
m_collectionToTypeMapping.put("dm_folder", "collection1");
m_collectionToTypeMapping.put("dm_document", "collection2");
}
return m_collectionToTypeMapping;
}
private static Map<String, String> m_collectionToTypeMapping = null;}
DFC database queries
You can use the IDfQuery interface, which is not part of the DFC search service, for database queries.
Refer to the Javadocs for the com.documentum.fc.client.search package for a description of how
to use this capability.
The following example from the WDK GroupAttributes class executes a simple query and gets the
results as an IDfCollection:
StringBuffer query = new StringBuffer(512);
query.append(
"SELECT group_name FROM dm_group where ANY i_all_users_names = ’");
query.append(loginUserName);
query.append("’");
IDfQuery queryObject = DfcUtils.getClientX().getQuery();
queryObject.setDQL(query.toString());
IDfCollection collection = queryObject.execute(
getDfSession(), IDfQuery.DF_READ_QUERY);
Hello World DFC search
You can create DFC search applications based on servlets and JSP pages and the DFC Search Service.
For information on the DFC query builder service, see DFC query builder, page 24 and the Javadocs
for the package com.documentum.fc.client.search.
The following example takes a search input string and searches all available sources known to the
search service:
28
/**
* Search the web based on the search string and stores it in the Hashmap
*/
private void saveECISearchResults()
{
System.out.println("ECISearch Method :SaveECISearchResults: Start");
IDfSearchSourceMap srcMap = null;
IDfClient localClient = null;
IDfQueryManager queryMgr = null;
IDfQueryBuilder queryBldr = null;
IDfQueryProcessor idfQueryProcessor = null;
IDfResultsSet resultsSet = null;
IDfResultObjectManager idfResultObjMgr = null;
ArrayList arrExternalSources = new ArrayList(20);
mMap = new HashMap();
int c = 0;
try
{
IDfClient client = m_clientX.getLocalClient();
/*
* sessionManager - A session manager to be used for authentication
*
against search sources
* defaultMetadataDocbase - The default repository from which to pick
*
type metadata. Can be safely set to null if the search service is
*
configured to search only repositories and not on external sources.
*
Must not be null if external sources are configured in the search
*
service. The session manager must have login info for the repository
*/
IDfSearchService searchService = client.newSearchService(
m_sessionManager, m_docbaseName);
srcMap = searchService.getSourceMap();
queryMgr = searchService.newQueryMgr();
IDfQueryBuilder queryBuilder = queryMgr.newQueryBuilder("dm_sysobject");
IDfSearchMetadataManager IDfSearchMetadataManager = queryBuilder.
getMetadataMgr();
//Getting the source map
IDfSearchSourceMap searchSourceMap = searchService.getSourceMap();
//Getting list of available external sources
IDfEnumeration enumSearchSource = searchSourceMap.getAvailableSources(
IDfSearchSource.SRC_TYPE_EXTERNAL);
while (enumSearchSource.hasMoreElements())
{
IDfSearchSource idfsource = (
IDfSearchSource) enumSearchSource.nextElement();
String[] strExternalSource = new String[2];
strExternalSource[0] = idfsource.getName();
System.out.println("External Sources(0):" + strExternalSource[0]);
arrExternalSources.add(strExternalSource);
//add source to SearchMetadatamanager
IDfSearchMetadataManager.addSelectedSource(strExternalSource[0]);
//add the source to the query builder
queryBuilder.addSelectedSource(strExternalSource[0]);}
29
IDfExpressionSet rootExp = queryBuilder.getRootExpressionSet();
//Creating the search query
rootExp.addSimpleAttrExpression("object_name", IDfValue.DF_STRING,
IDfSimpleAttrExpression.SEARCH_OP_CONTAINS, false, false, m_searchString);
queryBuilder.addResultAttribute("object_name");
idfQueryProcessor = searchService.newQueryProcessor(queryBuilder, true);
idfResultObjMgr = searchService.newResultObjectManager(queryBuilder);
idfQueryProcessor.addListener(this);
idfQueryProcessor.search();
System.out.println("ECISearch Method: Query Failed : "
+ idfQueryProcessor.getQueryStatus().getNbrFailed());
Thread.sleep(m_sleepTime);
System.out.println("ECISearch Method: Query Status : "
+ idfQueryProcessor.getQueryStatus().getStatus());
IDfResultsSet rs = idfQueryProcessor.getResults();
System.out.println(rs.size() + " result(s)\n");
while (rs.next())
{
IDfResultEntry result = rs.getResult();
// Filter the results based on the score attribute
if (result.getString("score").equalsIgnoreCase("1.0") || result.
getString("score") == "1.0")
{
String objectName = result.getString("object_name");
mMap.put(objectName, result);
System.out.println(result);
}
}
addExternalFilesToFolder(mMap, idfResultObjMgr);
}
catch (Exception e)
{
e.printStackTrace();}}
Displaying the FS2 targets at design time:
{
IDfSearchSource idfsource = (IDfSearchSource) enumSearchSource.nextElement();
}
Setting the FS2 target at query execution time:
30
{
IDfSearchSource idfsource = (IDfSearchSource) enumSearchSource.nextElement();
//custom test to check if source belongs to the selection of the user
//(design time)
if (strExternalSource-does-not-belong-to-selection-at-design-time) continue;
//add source to SearchMetadatamanager
IDfSearchMetadataManager.addSelectedSource(strExternalSource[0]);
//add the source to the query builder
queryBuilder.addSelectedSource(strExternalSource[0]);}
DFC customization examples
The following examples illustrate the most common scenarios when using the DFC search service.
The first scenario is a simple search on one repository. The next example searches an external source
(relying on the Federated Search Server) that requires authentication. The third example creates an
asynchronous search.
The source files for these examples can be found on the EMC Developer Network web site. Go to
Content Management > Sample code > DFC > DFC Search API Samples and download the
corresponding file: DFCSearchAPISamples.zip.
Simple search of one repository
In the following example, a login servlet (LoginServlet class) and login.jsp page handle user login.
(The login class servlet is not shown in the following code.) The SearchServlet class handles query
building and execution. The JSP pages search.jsp and results.jsp display a search form and results. The
following illustration shows the UI that is displayed in search.jsp.
31
The following illustration shows the directory structure for this simple application.
The SearchServlet class gets the query builder instance to create a search. The variables from the
search JSP page are saved for the QueryBuilder ("ft" for full-text, and "object_name"):
String fulltextValue = httpServletRequest.getParameter("ft");
String objectNameValue = httpServletRequest.getParameter("object_name");
String docbase= httpServletRequest.getParameter("docbase");
IDfSearchService searchService = client.newSearchService(sMgr, docbase);
IDfQueryManager queryManager = searchService.newQueryMgr();
32
IDfQueryBuilder queryBuilder = queryManager.newQueryBuilder("dm_document");
IDfSearchService (com.documentum.fc.client.search) is the entry point to search related services:
query building, query execution, results manipulation, available sources, and query metadata.
The following lines in the search servlet set the result attributes to be displayed. The servlet then adds
the source repository, which can either be added to the UI or set in the servlet class. Next, the servlet
builds an expression set. The method addFullTextExpression adds the string from the search form.
The method addSimpleAttrExpression adds the object name and operator from the form:
queryBuilder.addResultAttribute("object_name");
queryBuilder.addResultAttribute("summary");
queryBuilder.addResultAttribute("score");
queryBuilder.addSelectedSource(docbase);
IDfExpressionSet rootExpressionSet = queryBuilder.getRootExpressionSet();
if (fulltextValue!=null)
rootExpressionSet.addFullTextExpression(fulltextValue);
if (objectNameValue!=null)
rootExpressionSet.addSimpleAttrExpression(
"object_name", IDfValue.DF_STRING, IDfSimpleAttrExpression.
SEARCH_OP_CONTAINS, false, false, objectNameValue);
The following lines execute the query synchronously by using the synchronous call blockingSearch
with a timeout of 60 seconds. The query processor handles the query execution. When the query has
finished, the control is forwarded to the JSP page to build the results page.
IDfQueryProcessor queryProcessor = searchService.newQueryProcessor(
queryBuilder, true);
queryProcessor.blockingSearch(60000);
The following code generates the results JSP page. The interface IDfResultEntry is like
IDfTypedObject but is not modifiable.
<%IDfResultsSet results = queryProcessor.getResults();
for (int index = 0; index < results.size(); index++)
{
IDfResultEntry result = results.getResultAt(index);
%>
<table border="0" cellpadding="0" cellspacing="0" width="100%" style="
margin-bottom: 8px">
<tr><td width="5"/><td width="1"> </td>
<td>
<div class="result-title"><%=result.getString("object_name")%>
</div>
<div class="result-score">
<%= result.getSource() %> - <i>
<%=(int)(result.getScore() * 100)%>%</i>
</div>
<br/><font size="-1"><%=result.getString("summary")%></font>
</td>
</tr><tr height="1"></tr>
</table>
<% } %>
33
Search an external source that requires authentication
This example extends the first one and illustrates how to create a search on an external source. The
Federated Search Server handles communication with the external source. The following configuration
in the dfc.properties file is required:
dfc.search.external_sources.enable = true
dfc.search.external_sources.host = <host_name>
The external source can be another repository, an eRoom, or a web site. Refer to Federated Search
Services (FS2) documentation for details about out-of-the-box adapters and adapter development.
The query building and query execution are similar for one or for several sources. When you query
external sources, you must do three tasks:
• Get the list of available sources.
• Add the sources to the query.
• Register the authentication information (the credentials) with the SessionManager.
The following example illustrates these tasks.
IDfSearchSourceMap sourceMap = searchService.getSourceMap();
// Get the list of available sources
IDfEnumeration sources = sourceMap.getAvailableSources();
while (sources.hasMoreElements()) {
IDfSearchSource source = (IDfSearchSource) sources.nextElement();
String sourceName = source.getName();
// Add source in query builder
queryBuilder.addSelectedSource(sourceName);
// That would come from the custom application
String loginName = getLoginName(sourceName);
String loginPassword = getLoginPassword(sourceName);
// If need be, check login capability
// source.hasCapability(IDfSearchSource.CAP_LOGIN)
// Set the credentials for the user
IDfLoginInfo loginInfoObj = clientx.getLoginInfo();
loginInfoObj.setUser(loginName);
loginInfoObj.setPassword(loginPassword);
// Add credentials for the source in Session manager
sessionManager.setIdentity(sourceName, loginInfoObj);}
The instance of IDfSearchSourceMap is a map of all available search sources, including external
sources from FS2. It is like IDfDocbaseMap which provides information about the repositories known
to a connection broker.
The same interface, IDFSessionManager, is used to contain the credentials for the current repository,
or any Documentum repository as well as external sources.
34
Asynchronous search example
Aynchronous search is also called a "non-blocking" search as it allows you to display results as they
come in. You do not have to wait for the complete result set. You can also display and update the
status of the query in real time (such as "done", "in progress", or "failed"). Several calls are made to
populate the results, each time retrieving the next results. It is useful when retrieving large result sets
or when querying sources with different response times.
This example differs from the first example on the execution part. Instead of calling blockingSearch()
and indicating a timeout, we call the search() method and provide a notification interface that extends
DfGenericQueryListener. The query is run in the background and new results and execution events are
notified to the query listener. The notification methods are the following:
• onQueryCompleted(): Query execution finished (successfully or with errors).
• onResultChange(): New results have been received from the sources.
• onStatusChange(): An event has occurred. It can be related to the query execution status or to
possible errors.
IDfQueryProcessor queryProcessor = searchService.newQueryProcessor(
queryBuilder, true);
// Add the notification interface
QueryListener queryListener = new QueryListener(queryProcessor);
queryProcessor.addListener(queryListener);
// Call the asyncronous search method
queryProcessor.search();
After you launch the search, use IDFQueryStatus to obtain information about the status of the query
and the sources. Use IDfSourceStatus to obtain status information for a specific source.
Using the visitor API
You can use the visitor API in DFC to visit nodes in the expression tree. The following example
creates a QueryDumper class that visits the expressions in the query.
import com.documentum.fc.client.search.DfExpressionVisitor;
class QueryDumper extends DfExpressionVisitor
{
private StringBuffer m_expressionDump = new StringBuffer();
public String dump()
{
return m_expressionDump.toString();}
public final void visit(IDfExpressionSet expr) throws DfException
{
switch (expr.getLogicalOperator())
{
case IDfExpressionSet.LOGICAL_OP_AND:m_expressionDump.append("(and ");
break;
case IDfExpressionSet.LOGICAL_OP_OR:m_expressionDump.append("(or ");
}
super.visit(expr);
m_expressionDump.append(")");}
35
public void visit(IDfValueListAttrExpression expr) throws DfException
{
super.visit(expr);
dumpAttrAndOperator(expr);
IDfEnumeration values = expr.getValues();
while (values.hasMoreElements())
{
String value = (String) values.nextElement();
m_expressionDump.append(" ").append(value);
}
m_expressionDump.append("]");}
public void visit(IDfFullTextExpression expr) throws DfException
{
super.visit(expr);
m_expressionDump.append("[ft ").append(expr.getValue()).append("]");}
public void visit(IDfSimpleAttrExpression expr) throws DfException
{
super.visit(expr);
if ((expr.getSearchOperationCode() != IDfSimpleAttrExpression.
SEARCH_OP_IS_NULL) && (expr.getSearchOperationCode() !=
IDfSimpleAttrExpression.SEARCH_OP_IS_NOT_NULL))
{
m_expressionDump.append(" ").append(expr.getValue());
}
m_expressionDump.append("]");}
public void visit(IDfRelativeDateExpression expr) throws DfException
{
super.visit(expr);
String timeUnitAsAString = ReflectionUtil.getConstantName(
Calendar.class, expr.getTimeUnit());
m_expressionDump.append(" ").append(expr.getRelativeTime()).append(
" ").append(timeUnitAsAString).append("]");}
public void visit(IDfValueRangeAttrExpression expr) throws DfException
{
super.visit(expr);
m_expressionDump.append(" ").append(expr.getFromValue()).append(" ").append(
expr.getToValue()).append("]");}
private void dumpAttrAndOperator(IDfAttrExpression expr)
{
m_expressionDump.append("[").append(expr.getAttrName()).append(" ");
String searchOpAsAString = s_operationMap.get(expr.getSearchOperationCode());
String valueDataTypeAsAString = ReflectionUtil.getConstantName(
IDfValue.class, expr.getValueDataType());
m_expressionDump.append(searchOpAsAString).append("(").append(
valueDataTypeAsAString).append(")");}}
36
You can use the expression visitor in a class that accesses the query builder, such as a customized
Webtop search class. The following example gets the query expression set:
QueryDumper queryDumper = new QueryDumper();
rootExpr = queryBuilder.getRootExpression();
rootExpr.acceptVisitor(queryDumper);
System.out.println("query =" + queryDumper.dump());
37
Chapter 3
Customizing Search with DFS
•
DFS Search Services
•
Full-text and database searches
•
Constructing a search
•
Search service objects
•
Search service operations
DFS Search Services
Search Services provides search capabilities against EMC Documentum repositories, as well as
against external sources, using Documentum Federated Search Services (FS2) server. The Search
service provides full-text and structured search capabilities against multiple EMC Documentum
repositories (termed managed repositories in DFS). You must install and configure full-text indexing
on Documentum repositories.
All DFC customizations can be used in DFS client applications. For DFC filters, see Transforming a
query with a filter, page 25. See the EMC Community Network Documentum search and analytics
forum to post your questions and see solutions offered by other customers and EMC employees.
External sources (termed external repositories) can also be searched. , You must install FS2 adapters
on external repositories (registered with an FS2 server) and deploy the Clustering SBO if Content
Server is lower than 6.7.
To use the Search service it is also helpful to understand FTDQL queries, dfc.properties settings,
and DQL hint file settings.
Full-text and database searches
Search service queries can be run as full-text queries, database queries against a managed or external
repository, or mixed queries (both full-text index and database).
The search query is a full-text or database search depending on the following factors:
• The availability to the service of indexed repositories.
• Settings in the DQL hints file, if present.
• The presence or absence of full-text expressions (a SEARCH DOCUMENT CONTAINS clause)
in a DQL query.
39
• Explicit setting of setDatabaseSearch in a StructuredQuery.
Searches against a full-text index are case insensitive. Database searches are by default case sensitive.
If a database query includes a SEARCH DOCUMENT CONTAINS clause in PassthroughQuery or a
FullTextExpression object in a StructuredQuery, the full-text expression is evaluated against the
title, subject, and object_name of dm_sysobjects. If the repository does not support full-text queries,
the query is not processed.
Constructing a search
Non-blocking (asynchronous) searches
Searches can either be blocking or non-blocking, depending on the Search Profile setting. By default,
searches are blocking. Non-blocking searches display results dynamically. The client application
does not have to wait for all results before displaying the first results. The Search service supports
non-blocking searches because:
• DFS relies on DFC, which supports asynchronous search execution;
• Query calls are non-blocking: multiple successive calls can be made to get new results and the
query status. The query status contains the status for each source repository: Successful, more
results expected, or failed with errors.
Caching mechanism
The Search service relies on a caching mechanism. The cache contains the search results populated in
background for every search. The cache key is built with the queryId, the query definition, and the
number of results requested, which we call the search context. To leverage the cache, subsequent
calls have to use the same search context. If one of the search context elements is different, the
search is re-executed.
The cache is used to make successive calls. This way, the first results can be displayed while
subsequent calls retrieve more results. If one source fails or takes too long to return results, the search
is not blocked and the first available results are returned.
When a query is not found in the cache (cache miss), the operation, which contains the query execution
parameters, re-executes the query.
The cache clean-up mechanism is both time-based and size-based. You can modify the cache clean-up
properties by editing the dfs-runtime.properties file.
To modify the cache period, set the dfs.search_query_cache_house_keeper.period parameter. The
default value is set to 10 (minutes) which lets enough time to compute clustering operations for the
result set. If you have a large number of search operations, reduce the cache period to avoid excessive
memory usage.
To modify the cache size, set the dfs.search_query_cache_house_keeper.max_queries parameter.
The default value is set to 100 (queries). As a guideline, one cache entry for a simple query on
dm_document with 350 results uses around 1 MB of memory. For such queries, with the default cache
size value of 100, the cache does not use more than 100 MB of memory.
40
Computing clusters
The search results can be displayed in clusters. Clusters group results dynamically into categories
based on the values of the results attributes. The clustering information is returned as soon as enough
results are gathered to compute clusters. Clusters can then be used to navigate into the search results.
For each level of clusters, a strategy is used to defined which attributes are used to compute the
clusters. For example, you can define a first strategy to compute the first level of clusters on the values
for Author, Source and Owner. Define a second strategy display clusters on a subset of the results
using the values for Author, Format and Modified Date.
Clusters can be computed on search results, but they can also be computed on a subset of the results.
Query results are not cached. If they are no longer available in the search context, execute the query
again. The search context is the context in which the query was executed.
The clustering operations of the Search service (getClusters and getSubclusters) depend on the
Clustering SBO . This SBO must be installed on a global registry. Starting with Content Server 6.7, the
Clustering SBO is installed with Content Server.
Computing facets
A facet definition is like a cluster strategy. The definition indicates on which attribute the facet is
computed. However, there are some fundamental differences:
• xPlore computes facets on the entire result set. Clusters are computed on a subset of results
retrieved by the application.
• Facets are more exhaustive and use a group-by technique. The clustering algorithm uses tokenizers
(often with text analytics), relative grouping sizes, and thresholds. Consequently, clusters provide a
global idea of the result set while facets are more accurate and can be used for navigation purpose
for example.
Facets are like clusters. They group results into categories based on common attribute values. A facet d
Other differences:
• The tokenizers define the cluster order. Facets are sorted using the facetSort parameter.
• Clusters usually have a threshold, that is, a minimum number of documents, to optimize the
number of groupings.
• It is possible to set a maximum number of facets to retrieve. In contrast, the number of clusters
depends on the number of results in the result set.
• Facets must be defined before the query execution, clusters are computed after the query execution.
For full information on facets, see EMC Documentum xPlore Administration and Development Guide.
Searching external repositories
To run searches against external repositories:
• Install the FS2 server. The EMC Documentum Federated Search Services Installation Guide
provides information about how to install the FS2 server.
41
• Install and configure FS2 adapters as described in EMC Documentum Federated Search Services
Adapter Installation Guide.
• Set the following properties in the file dfc.properties:
– dfc.search.external_sources.enable=true
– dfc.search.external_sources.host=<fs2_host>
– dfc.search.external_sources.port=<fs2_port> (default is 3005)
Search service objects
This section briefly describes objects used by this service. For field-level information, please refer to
the Javadoc or Windows help.
PassthroughQuery
The PassthroughQuery object is a container for a DQL or FTDQL query string. It can be executed as
either a full-text or database query.
A PassthroughQuery can search multiple managed repositories, but does not run against external
repositories. To search an external repository a client must use a StructuredQuery.
StructuredQuery
A structured query defines a query using an object-oriented model. An ExpressionSet object defines
a set of criteria that constraing the query. An ordered list of RepositoryScope objects defines the
scope of the query (sources) .
The structured query can also contain a list of FacetDefinition objects that are used to retrieve the
facets with the results and a list of PartitionScope objects to limit the search to specific partitions. If
you specify several partition scopes, all the specified partitions are searched.
The ExpressionScope object allows you to add an ExpressionSet to the query for a given repository.
The expression set is added to the root expression set of the query. This mechanism can be useful when
executing the same query against several sources.
The following table summarizes the StructuredQuery fields.
Field
Data Type
Description
scopes
List<Reposito-
Specifies the list of RepositoryScope objects that
define the repositories against which the query is
executed.
ryScope>
partitionScopes
42
List<PartitionScope>
(Since 6.7) Specifies the list of PartitionScope
objects that define the partitions against which the
query is executed for a specific source. A partition
is an xPlore collection. This parameter is ignored if
xPlore is not the indexing engine.
Field
Data Type
Description
expressionScopes
List<Expression-
(Since 6.7) Specifies the list of ExpressionScope
objects. An ExpressionScope object is used to
specify expressions that are only added for a specific
source.
Scope>
isDatabaseSearch
boolean
Specify if the query must be executed against the
database and not against the indexer. Default is
false.
isIncludeAllVersions
boolean
Specify if the query must return all matching
versions (true) or only the current version (false) of
the objects. Default is false.
isIncludeHidden
boolean
Specifies if the hidden objects must be filtered from
the result set (false) or kept (true). Default is false.
rootExpressionSet
ExpressionSet
Specifies the query constraints in an ExpressionSet.
orderByClauses
List<OrderBy-
Specifies the list of OrderByClause objects.
Clause>
facetDefinitions
List<FacetDefinition>
(Since 6.6) Specifies the list of FacetDefinition
objects for the query.
maxResultsForFacets
int
(Since 6.6) Specifies the total number of unique
results available from the source, after deduplication
(if deduplication is available) that are used to
compute facets. Default value is -1 which means
that the configuration of the indexer is used.
isHitcountRetrieved
boolean
Specifies if the hit count must be computed and
retrieved even if no facets are requested. Default
is false which means that the hit count is only
computed when facets are requested in the query.
maxHitcount
int
Specifies the maximum number of results to
be returned as the hit count. A smaller number
lowers the performance impact of the hit count
computation. Default value is -1 which means
that the DFC property dfc.search.max_results_
per_source is used (10000).
Scope objects
PartitionScope allows you to specify a partition (xPlore collection) when querying a repository. It is
only used with xPlore indexer and ignored in all other cases. An xPlore partition is a storage area (or
"file store") in the Content Server mapped to an xPlore collection.
RepositoryScope enables a search to be constrained to a specific folder of a repository. It can also
exclude folders.
An expression set and repository name define an ExpressionScope. The expression scope allows you
to add an expression set only for the specified repository. This mechanism isuseful when you execute
the same query against several sources.
43
Expression objects
An ExpressionSet is a collection of Expression objects, each of which defines either a full-text
expression, or a search constraint on a single property. The Expression instances comprising the
ExpressionSet are related to one another by a single logical operator (either AND or OR). The
ExpressionSet as a whole defines the complete set of search criteria that is applied during a search.
The top-level Expression passed contained in a StructuredQuery is referred to as the root expression
of the expression tree.
Three concrete classes extend the Expression class: FullTextExpression, PropertyExpression, and
ExpressionSet.
• FullTextExpression
FullTextExpression encapsulates a search string accessed using the getValue and setValue methods.
This string supports the operators "AND" "OR", and "NOT", as well as parentheses.
• PropertyExpression
PropertyExpression provides a search constraint based on a single property.
• ExpressionSet
Extends Expression and contains a set of Expression instances. An ExpressionSet can nest
ExpressionSet instances. Nesting allows construction of arbitrarily complex expression trees.
The following table describes the concrete subtypes of the ExpressionValue class.
Table 7
ExpressionValue subtypes
Subtype
Description
SimpleValue
Contains a single String value.
RangeValue
Contains two String values representing the start and end of a range.
The values can represent dates (using the DateFormat specified in the
StructuredQuery) or integers.
Contains an ordered List of String values.
ValueList
RelativeDateValue
Contains a TimeUnit setting and an integer value representing the number
of time units. TimeUnit values are MILLISECOND, SECOND, MINUTE,
HOUR, DAY, ERA, WEEK, MONTH, YEAR. The integer value can be
negative or positive to represent a past or future time.
Condition is an enumerated type that expresses the logical condition to use when comparing a
repository value to a value in an Expression. A specific Condition is included in a PropertyExpression
to determine precisely how to constrain the search on the property value.
QueryResult
Both the Search and Query services use the QueryResult class as a container for the set of results
returned by the execute operation. The QueryResult class also contains the queryId generated for
this query. To uniquely identifie the query, use the queryId. The queryId is a key in the cache that
identifies the query for a given user.
44
Status objects
QueryStatus contains status information returned by a search operation. The status information can be
known for each search source repository.
Table 8
QueryStatus fields
Field
Data Type
repositoryStatusInfos
List<RepositoryStatusInfo>
hasMoreResults
Description
Specifies the list of RepositoryStatusInfo where the
query has been executed.
boolean
Specifies if the repository can return more results.
isCompleted
boolean
Specifies if the query execution is completed.
globalResultsCount
int
Specifies the total number of unique results available
from the source, after deduplication (if deduplication
is available).
RepositoryStatusInfo contains data related to a query or search result regarding the status of the search
in a specific repository. RepositoryStatusInfo instances are returned in a List<RepositoryStatusInfo>
within a QueryResult, which is returned by a search or query operation.
Starting with DFS version 6.7, RepositoryStatusInfo also contains a list of repositoryEvent objects.
Use these objects to access information available at the DFC level in the IDFQueryEvent objects, such
as the native query or the type of error.
RepositoryStatus provides detail information about the status of a query that was executed, as pertains
to a specific repository.
Cluster objects
The QueryCluster object is a container for ClusterTree objects for a given query. Another parameter
is the queryId, which is used to uniquely identify the query. The queryId can be used to access any
part of the result set. For example, you can retrieve the next set of results or clusters on all or some of
the results.
A ClusterTree is a container for Cluster objects that are calculated according to a ClusteringStrategy.
The field isRefreshable indicates that all clusters have been computed and the search is complete or
that more results can be returned by the source.
The Cluster class represents a cluster or group of objects that have something in common. These
objects are grouped into categories comparing the values of their attributes.
45
Table 9
Cluster fields
Field
Data Type
clusterValues
List<String>
Description
Specifies the list of values that are used to generate
the cluster name.
clusterSize
int
Specifies the number of objects in the cluster.
clusterObjectsIdentities
ObjectIdentitySet
Specifies a list of ObjectIdentity instances for the
objects belonging to this cluster.
A ClusterTree object uses the ClusteringStrategy class to set the strategy for calculating clusters. The
clustering strategy can use tokenizers to group the clusters (for example, dates can be grouped into
quarters). In this case, you define which tokenizer to apply for a given attribute.
The ClusteringStrategy class also controls the amount of data returned by the operation.
Table 10
ClusteringStrategy field
Field
Data Type
Description
strategyName
String
Specifies the strategy name.
attributes
List<String>
Specifies the list of attributes used in this strategy.
clusteringRange
ClusteringRange
Specifies the number of clusters computed by the
clustering service. Possible values are : LOW,
MEDIUM, HIGH.
clusteringThreshold
int
Specifies the minimum number of results required
to create a cluster.
returnIdentitySet
boolean
Specifies whether the object identities is returned.
PropertySet
tokenizers
Table 11
Specifies the tokenizer to apply. The ProperySet is a
set of StringProperty where the name is the attribute
name and the value is the tokenizer name to apply
to this attribute.Available tokenizers are listed in
ClusteringStrategy.
List of Tokenizers available for the clustering
Tokenizer name
Description
dm_object_name
Tokenizes an object name attribute. Strings are cleaned before being used:
underscore characters are replaced by spaces and the extensions are removed.
dm_percentage
Tokenizes a score attribute or a numeric value between 0 and 1. The suffix "%" is
added to the percentage.
dm_date_by_quarter
Tokenizes a date attribute to create cluster by Quarter (2006 Q1, 2006 Q2, 2006
Q3 ...)
46
Tokenizer name
Description
dm_dynamic_size
Tokenizes a string size attribute and groups dynamically the input sizes.
dm_size_by_range
Tokenizes a string size attribute and creates predefined ranges. The ranges are
0KB-100KB, 100KB-1MB, 1MB-10MB, 10MB-100MB, >100MB
dm_date_by_day
Tokenizes a string date attribute according to the "dd/MM/yyyy" pattern.
dm_exact_match
Tokenizes any string and groups the ones that are exactly the same.
dm_text
Parses, lemmatizes and dynamically groups any string attribute.
dm_number
Tokenizes strings to obtain numbers and groups dynamically the input numbers.
dm_author
Tokenizes strings to obtain lists of authors. Groups dynamically the authors. By
default, the author names are expected to start with the first name.
dm_collection
Tokenizes strings of the form "category1:category2:category3" and groups
dynamically according to the most significant categories or sub-categories.
dm_source
Tokenizes a r_source attribute, it generates a suitable source name for the external
source.
Facet objects
The QueryFacet object is a container for Facet objects for a given query. It is computed on query
results. The queryId field identifies the query. The QueryFacet also contains the QueryStatus. It is
like the QueryResult object.
A Facet is a container for FacetValue objects and a FacetDefinition object. xPlore computes the
facet values according to the facet definition.
The FacetValue class represents a group of results having attribute values in common. A FacetValue
has a value and a count indicating the number of results contained in this group. It can also have a
list of subfacet values and a set of properties.
Table 12
FacetValue fields
Field
Data Type
value
string
Description
The display value or label for this FacetValue.
count
int
Specifies the number of results for the facet value.
properties
PropertySet
Specifies a list of Property instances used to define
custom properties. For example, facets grouped by
day are defined by a starting and an ending date and
time.
subFacetValues
List<FacetValue>
Specifies the list of FacetValue objects.
A Facet object uses the FacetDefinition class to define how to build a Facet.
47
Table 13
FacetDefinition fields
Field
Data Type
Description
name
String
Specifies the definition name.
attributes
List<String>
Specifies the list of attributes used in this definition.
If not specified, the definition name is used as an
attribute.
groupBy
String
Specifies the "group by" strategy. Possible values
are: string (default value), range (for numeric
values), location (for CIS entities).
The range grouping requires a range property that
defines the subvalues to use.
For dates, the possible values are: day, week, month,
year, and relativeDate. The relativeDate subvalues
are: today, yesterday, this week, this month, this
year, last year, and older. An optional property
timezone allows you to specify the client timezone,
such as GMT+1.
maxFacetValues
int
Specifies the maximum number of FacetValue
objects to build a Facet. If not set, it returns ten
values. If set to -1, it returns all values.
facetSort
FacetSort
Specifies the sort order to apply. Possible
values are: FREQUENCY (descending order
based on count values), VALUE_ASCENDING
(ascending order based on alphanumeric values),
VALUE_DESCENDING (descending order based
on alphanumeric values), NONE.
properties
PropertySet
Specifies a list of Property instances used to define
custom properties.
subFacetDefinition
FacetDefinition
Specifies a FacetDefinition for subfacet values, if
any.
Search service operations
The following operations are available in the search service.
getRepositoryList operation
The getRepositoryList operation provides list of managed and external repositories that are available
to the service for searching.
Java syntax
List<Repository> getRepositoryList(OperationOptions options)
throws SearchServiceException
C# syntax
48
List<Repository> GetRepositoryList(OperationOptions options)
Parameter
Data type
Description
options
OperationOptions
Contains profiles and properties that specify operation
behaviors. Not used.
Returns a List of Repository instances.
The following example demonstrates the getRepositoryList operation.
Java: Getting a repository list
public List<Repository> repositoryList()
{
try
{
ServiceFactory serviceFactory = ServiceFactory.getInstance();
ISearchService searchService
= serviceFactory.getService(ISearchService.class, serviceContext);
List<Repository> repositoryList = searchService.getRepositoryList
(new OperationOptions());
for (Repository r : repositoryList)
{
System.out.println(r.getName());
}
return repositoryList;
}
catch (Exception e)
{
e.printStackTrace();
throw new RuntimeException(e);}
C#: Getting a repository list
public List<Repository> RepositoryList()
{
try
{
List<Repository> repositoryList = searchService.GetRepositoryList
(new OperationOptions());
foreach (Repository r in repositoryList)
{
Console.WriteLine(r.Name);
}
return repositoryList;
}
catch (Exception e)
{
Console.WriteLine(e.StackTrace);
throw new Exception(e.Message);}}
execute operation
The execute operation searches a repository or set of repositories and returns search results.
Java syntax
49
QueryResult execute(Query query,
QueryExecution execution,
OperationOptions options)
C# syntax
QueryResult Execute(Query query,
Parameter
Data type
Description
query
Query
Either a PassthroughQuery or a StructuredQuery
execution
QueryExecution
Object describing execution parameters. Query execution
parameters are described in .
options
OperationOptions
behaviors. For the execute operation, the profiles primarily
provide filters that modify the contents of the DataPackage
returned in QueryResult.
An applicable profile is the SearchProfile.
In a PropertyProfile only the property filter mode
SPECIFIED_BY_INCLUDE is supported for this operation.
Other property filter modes are not supported.
The SearchProfile sets the parameters for the search execution. Set the isAsyncCall parameter to
indicate whether the search is blocking.
Returns a QueryResult instance.
Java: Simple PassthroughQuery
public QueryResult simplePassthroughQuery()
{
QueryResult queryResult;
try
{
String queryString
= "select distinct r_object_id from dm_document order by r_object_id ";
int startingIndex = 0;
int maxResults = 20;
int maxResultsPerSource = 60;
PassthroughQuery q = new PassthroughQuery();
q.setQueryString(queryString);
q.addRepository(defaultRepositoryName);
QueryExecution queryExec = new QueryExecution(startingIndex,
maxResults,maxResultsPerSource);
queryExec.setCacheStrategyType(CacheStrategyType.
NO_CACHE_STRATEGY);
50
queryResult = searchService.execute(q, queryExec, null);
QueryStatus queryStatus = queryResult.getQueryStatus();
RepositoryStatusInfo repStatusInfo = queryStatus.
getRepositoryStatusInfos().get(0);
if (repStatusInfo.getStatus() == Status.FAILURE)
{
System.out.println(repStatusInfo.getErrorTrace());
throw new RuntimeException("Query failed to return result.");
}
System.out.println("Query returned result successfully.");
DataPackage dp = queryResult.getDataPackage();
System.out.println("DataPackage contains " +
dp.getDataObjects().size()
+ " objects.");
for (DataObject dataObject : dp.getDataObjects())
{
System.out.println(dataObject.getIdentity());
}
}
catch (Exception e)
{
throw new RuntimeException(e);
}
return queryResult;}
C#: Simple PassthroughQuery
public QueryResult SimplePassthroughQuery()
{
QueryResult queryResult;
try
{
string queryString = "select distinct r_object_id from dm_document
order by r_object_id ";
PassthroughQuery q = new PassthroughQuery();
q.QueryString = queryString;
q.AddRepository(DefaultRepository);
maxResults,
maxResultsPerSource);
queryExec.CacheStrategyType = CacheStrategyType.NO_CACHE_STRATEGY;
queryResult = searchService.Execute(q, queryExec, null);
QueryStatus queryStatus = queryResult.QueryStatus;
RepositoryStatusInfos[0];
if (repStatusInfo.Status == Status.FAILURE)
51
{
Console.WriteLine(repStatusInfo.ErrorTrace);
throw new Exception("Query failed to return result.");
}
Console.WriteLine("Query returned result successfully.");
DataPackage dp = queryResult.DataPackage;
Console.WriteLine("DataPackage contains " + dp.DataObjects.Count
+ " objects.");
foreach (DataObject dataObject in dp.DataObjects)
{
Console.WriteLine(dataObject.Identity);
}
}
catch (Exception e)
{
throw new Exception(e.Message);
}
return queryResult;}
Java: Structured query
public void simpleStructuredQuery()
{
try
{
String repoName = defaultRepositoryName;
// Create query
StructuredQuery q = new StructuredQuery();
q.addRepository(repoName);
q.setObjectType("dm_document");
q.setIncludeHidden(true);
q.setDatabaseSearch(true);
ExpressionSet expressionSet = new ExpressionSet();
expressionSet.addExpression(new PropertyExpression("owner_name",
Condition.CONTAINS,
"admin"));
q.setRootExpressionSet(expressionSet);
// Execute Query
maxResults,
QueryResult queryResult = searchService.execute(q, queryExec, null);
QueryStatus queryStatus = queryResult.getQueryStatus();
52
getRepositoryStatusInfos().get(0);
if (repStatusInfo.getStatus() == Status.FAILURE)
{
System.out.println(repStatusInfo.getErrorTrace());
throw new RuntimeException("Query failed to return result.");
}
// print results
for (DataObject dataObject : queryResult.getDataObjects())
{System.out.println(dataObject.getIdentity());}
catch (Exception e)
{
throw new RuntimeException(e);
}
System.out.println("test completed - OK");}
C#: Structured query
public void SimpleStructuredQuery()
{
try
{
String repoName = DefaultRepository;
// Create query
StructuredQuery q = new StructuredQuery();
q.AddRepository(repoName);
q.ObjectType = "dm_document";
q.IsIncludeHidden = true;
q.IsDatabaseSearch = true;
ExpressionSet expressionSet = new ExpressionSet();
expressionSet.AddExpression(new PropertyExpression("owner_name",
Condition.CONTAINS,
"admin"));
q.RootExpressionSet = expressionSet;
// Execute Query
maxResults,
QueryResult queryResult = searchService.Execute(q, queryExec, null);
QueryStatus queryStatus = queryResult.QueryStatus;
RepositoryStatusInfo repStatusInfo = queryStatus.RepositoryStatusInfos[0];
if (repStatusInfo.Status == Status.FAILURE)
{
Console.WriteLine(repStatusInfo.ErrorTrace);
throw new Exception("Query failed to return result.");
}
53
// print results
foreach (DataObject dataObject in queryResult.DataObjects)
{Console.WriteLine(dataObject.Identity);}}
catch (Exception e)
{
Console.WriteLine(e.Message);
throw new Exception(e.Message);
}
}
stopSearch operation
The stopSearch operation stops the execution of the query passed in as parameter. The execute
operation must be called first to launch the query. Once the query is stopped, results retrieved so far are
available. It is then possible to call the operations getClusters, getSubclusters and getResultProperties
passing in the Query and QueryExecution parameters of the stopped query. Restart the stopped search
by calling the execute operation with the same query and query execution objects, without the queryId.
Java syntax
QueryStatus stopSearch(Query query,
QueryExecution execution)
C# syntax
QueryStatus StopSearch(Query query,
QueryExecution execution)
Parameter
Data type
Description
query
Query
Either a PassthroughQuery or a StructuredQuery
execution
QueryExecution
parameters are described in Documentum Enterprise Content
Services Reference.
Returns a QueryStatus instance of the stopped query.
Java: stopping a search
public QueryStatus stopSearch () throws ServiceException
{
// Specify query: can be either a PassthroughQuery or a StructuredQuery
PassthroughQuery query = new PassthroughQuery();
query.setQueryString("select * from dm_document");
query.addRepository(getEnv().getDefaultDocbaseName());
// Specify query execution
QueryExecution queryExecution = new QueryExecution();
queryExecution.setMaxResultCount(100);
queryExecution.setMaxResultPerSource(350);
// Set operations options
OperationOptions operationOptions = new OperationOptions();
54
SearchProfile searchProfile = new SearchProfile();
searchProfile.setAsyncCall(true);
operationOptions.setSearchProfile(searchProfile);
PropertyProfile propertyProfile = new PropertyProfile();
propertyProfile.setFilterMode(PropertyFilterMode.SPECIFIED_BY_INCLUDE);
operationOptions.setPropertyProfile(propertyProfile);
// Start the search
QueryResult results =
m_searchService.execute(query, queryExecution, operationOptions);
// Set query id
queryExecution.setQueryId(results.getQueryId());
// Optional: check the status is RUNNING before stopping the search
// Stop the search
QueryStatus status = m_searchService.stopSearch(query, queryExecution);
// Optional: check the status is STOPPED
return status;}
getClusters operation
The getClusters operation computes clusters on query results. To run the query and get results, call
the execute operation first. The getClusters operation uses the same Query and QueryExecution
parameters.
If the query has not run or if results are no longer available in the search context, you must supply
these parameters to reexecute the query.
Set blocking in the Search profile to compute clusters on the first available results. Set non-blocking
to compute clusters only when all results are returned. By default, the execution is synchronous and
clusters are computed when all results are returned.
Java syntax
QueryCluster getClusters (Query query,
throws SearchServiceException;
C# syntax
QueryCluster GetClusters (Query query,
Parameter
Data type
Description
query
Query
Contains the query definition and the repositories against
which the query is run.
55
Parameter
Data type
Description
execution
QueryExecution
Services Reference.
options
OperationOptions
behaviors. Only the ClusteringProfile and the SearchProfile
are applicable. If this object is null or if there is no
ClusteringStrategy, no clusters are returned.
The ClusteringProfile contains a list of ClusteringStrategy instances. The ClusteringStrategy is used to
compute the ClusterTrees and controls the amount of data returned by the operation.
Returns a QueryCluster object containing a list of ClusterTree objects and the id of the query.
The SearchServiceException exception is thrown in particular when the Clustering SBO is not installed.
The following example demonstrates the getClusters operation.
public QueryCluster getClusters () throws ServiceException
{
OperationOptions options = new OperationOptions();
// Can be either a PassthroughQuery or StructuredQuery
query.addRepository(YOUR_REPOSITORY);
// Get 50 results
QueryExecution queryExec = new QueryExecution(0, 50, 50);
QueryResult results = searchService.execute(query,
queryExec, options);
// Get generated queryId and set it for subsequent calls
String queryId = results.getQueryId();
queryExec.setQueryId(queryId);
// Get query clusters
// Set ClusteringStrategy
ClusteringStrategy strategy = new ClusteringStrategy();
strategy.setStrategyName("Name");
List<String> attrs = new ArrayList<String>(2);
attrs.add("object_name");
strategy.setAttributes(attrs);
strategy.setReturnIdentitySet(true);
strategy.setClusteringRange(ClusteringRange.HIGH);
// Set ClusteringProfile
ClusteringProfile profile = new ClusteringProfile(strategy);
options.setClusteringProfile(profile);
QueryCluster queryCluster = searchService.getClusters(query,
return queryCluster;}
56
getSubclusters operation
The getSubclusters operation enables to compute clusters on a subset of the result set. The subset
is specified in the ObjectIdentitySet.
To run the query and get results, call the execute operation first. IThe getSubclusters operation uses the
same Query and QueryExecution parameters.
If the query has not run, or if results are no longer available in the search context, the query is executed
according to the Query, QueryExecution and OperationOptions parameters.
Set blocking in the Search profile to compute clusters on the first available results. Set non-blocking
to compute clusters only when all results are returned. By default, the execution is synchronous and
clusters are computed when all results are returned.
Java syntax
QueryCluster getSubclusters (ObjectIdentitySet objectsToClusterize,
Query query,
C# syntax
QueryCluster GetSubclusters (ObjectIdentitySet objectsToClusterize,
Query query,
Parameter
Data type
Description
objectsToClusterize
ObjectIdentitySet
Contains a list of ObjectIdentity instances specifying the
objects on which the clusters are computed.
query
Query
execution
QueryExecution
Services Reference.
options
OperationOptions
behaviors. Only the ClusteringProfile and the SearchProfile
are applicable. If this object is null or if there is no
ClusteringStrategy, no clusters are returned.
The ClusteringProfile contains a list of ClusteringStrategy instances. The ClusteringStrategy is used to
compute the ClusterTrees and controls the amount of data returned by the operation.
Returns a QueryCluster object containing a list of ClusterTree objects and the id of the query.
The SearchServiceException exception is thrown in particular when the Clustering SBO is not installed.
The following example demonstrates the getSubclusters operation.
public DataPackage getClusterObjects () throws ServiceException
{
57
// Get 50 results
// Get query clusters
List<String> attrs = new ArrayList<String>(2);
// Set ClusteringProfile
// Get objects belonging to the first cluster
DataPackage clusterObjects = new DataPackage();
if (null != queryCluster.getClusterTrees() && !queryCluster.
getClusterTrees().isEmpty())
{
ClusterTree finalTree = queryCluster.getClusterTrees().get(0);
if (null != finalTree.getClusters() && !finalTree.
getClusters().isEmpty())
{
Cluster cluster = finalTree.getClusters().get(0);
clusterObjects = searchService.
getResultsProperties(cluster.getClusterObjectsIdentities(),
query, queryExec, options);}}
return clusterObjects;}
getResultsProperties operation
To display results, use the getResultsProperties operation. Call this operation after a call to the
getClusters or getSubclusters operations. It can also be called after a search.
If the search context is no longer available, the query is executed according to the Query,
QueryExecution and OperationOptions parameters. The search context is necessary to retrieve the
results for the selected cluster.
58
Java syntax
DataPackage getResultsProperties (ObjectIdentitySet forClustersObjects,
Query query,
C# syntax
DataPackage GetResultsProperties (ObjectIdentitySet forClustersObjects,
Query query,
Parameter
Data type
Description
forClustersObjects
ObjectIdentitySet
Contains a list of ObjectIdentity instances specifying the
results to retrieve.
query
Query
execution
QueryExecution
Services Reference.
options
OperationOptions
behaviors. If this object is null, default operation behaviors
apply.
Returns a DataPackage containing the query results, that is, the objects specified in the
ObjectIdentitySet.
The SearchServiceException exception is thrown in particular when the Clustering docapp is not
installed.
The following example demonstrates the getResultsProperties operation.
public QueryCluster getSubClusters () throws ServiceException
{
// Ask for 100 results
// Now get query clusters
59
List<String> attrs = new ArrayList<String>();
// Set ClusteringProfile with strategy
// Get clusters on results retrieved so far
// Get the objects belonging to the first cluster
// and calculate new clusters on this subset
List<ClusterTree> clusterTrees = queryCluster.getClusterTrees();
QueryCluster subClusters = new QueryCluster();
if (null != clusterTrees && !clusterTrees.isEmpty())
{
// Get first ClusterTree
ClusterTree firstTree = clusterTrees.get(0);
List<Cluster> clusters = firstTree.getClusters();
if (null != clusters && !clusters.isEmpty())
{
// Get first cluster
Cluster cluster = clusters.get(0);
// Get identities of objects belonging to this cluster
ObjectIdentitySet ids = cluster.getClusterObjectsIdentities();
// Create a new strategy to get clusters based on format
ClusteringStrategy authorStrategy = new ClusteringStrategy();
authorStrategy.setStrategyName("Format");
List<String> authorAttrs = new ArrayList<String>
authorAttrs.add("a_content_type");
authorStrategy.setAttributes(authorAttrs);
authorStrategy.setReturnIdentitySet(true);
authorStrategy.setClusteringRange(ClusteringRange.HIGH);
// Create new profile to take into account the new strategy
ClusteringProfile newProfile = new ClusteringProfile(authorStrategy);
options.setClusteringProfile(newProfile);
// Get new clusters calculated on the given subset of results
subClusters = searchService.getSubclusters(ids,
query,
queryExec,
options);}}
return subClusters;}
60
getFacets operation
The getFacets operation computes facets on query results. To run the query and benefit from the
search cache, call the execute operation first.
If the search context is no longer available, or if the query has not already been executed, the query is
executed according to the Query and OperationOptions parameters.
By default, the execution is synchronous and facets are computed when all results are returned. To
retrieve the facets asynchronously, for example, if the query is run against several repositories, specify
a SearchProfile.
Java syntax
QueryFacet getFacets (Query query,
C# syntax
QueryFacet GetFacets (Query query,
Parameter
Data type
Description
query
Query
Contains the query definition, the repositories against which
the query is run, and the facet definitions.
execution
QueryExecution
Services Reference. Only the QueryId is used to identify the
query.
options
OperationOptions
behaviors. Only the SearchProfile is applicable.
Returns a QueryFacet containing the facets, the query id, and query status.
The following example demonstrates the getFacets operation.
// Create the query
StructuredQuery query = new StructuredQuery();
query.addRepository("your_docbase");
query.setObjectType("dm_sysobject");
ExpressionSet set = new ExpressionSet();
set.addExpression(new FullTextExpression("your_query_term"));
query.setRootExpressionSet(set);
// Add a facet definition to the query: we want a facet on r_modify_date
// attribute.
FacetDefinition facetDefinition = new FacetDefinition("date");
facetDefinition.addAttribute("r_modify_date");
// Request all facets
facetDefinition.setMaxFacetValues(-1);
// Set sort order
facetDefinition.setFacetSort(FacetSort.VALUE_ASCENDING);
query.addFacetDefinition(facetDefinition);
61
// Execution options: we don’t want to retrieve results, we just want
// facets.
QueryExecution queryExecution = new QueryExecution(0, 0);
// Call getFacets method.
QueryFacet queryFacet = service.getFacets(query, queryExecution,
new OperationOptions());
// Check the query status: it should be SUCCESS
QueryStatus status = queryFacet.getQueryStatus();
System.out.println(status.getRepositoryStatusInfos().get(0).
getStatus());
// Display facet values
List<Facet> facets = queryFacet.getFacets();
for (Facet facet : facets)
{
for (FacetValue facetValue : facet.getValues())
{
System.out.println(facetValue.getValue() + "/" +
facetValue.getCount());}}
62
Chapter 4
Configuring and Customizing Webtop
Search
•
About WDK search
•
Wildcards, lemmatization, and word fragments
•
Configuring search controls
•
Configuring the basic search component
•
Configuring the advanced search component
•
Configuring search results
•
Configuring Webtop Federated Search clustering
•
Modifying search component JSP pages
•
Modifying a search component query
About WDK search
Following is a brief general description of the WDK customization model. Information on individual
search controls and components is contained in the comprehensive reference guide, EMC Documentum
Web Development Kit and Webtop Reference Guide. General information on configuring and
customizing features in WDK applications is described in EMC Documentum xPlore Administration
and Development Guide
The following illustration shows points at which you can configure or customize search component
presentation and behavior in Webtop applications.
63
Configuring and Customizing Webtop Search
Key:
1.
See Configuring search controls, page 67, Configuring the advanced search component, page
69, and Configuring search results, page 72.
2.
See Modifying a search component query, page 79.
3.
See Constructing a search, page 40.
4.
See DQL hints, page 10.
5.
See Debugging, page 91.
64
Search sources
Multiple repositories can be added to the user search preferences. With Federated Search Services, the
user can select external sources for search and import results into the current repository. Included files
within HTML or XML documents are not imported.
Simple and advanced search
Simple and advanced searches query the full-text index by default. You can run a full-text query in
advanced search using the Contains field. The Contains field or the simple search text box can
contain a string within quotations marks to search for the string, for example, "this string". The box
also supports the operators AND and OR operators. The following rules apply:
• Either operator can be appended with NOT.
• The operators are not case sensitive.
• Punctuation, accents, and other special characters are ignored (replaced with a space).
• The AND operator has priority over the OR operator. For example, you type knowledge AND
management OR discovery. The results contain both knowledge and management, or the results
contain discovery.
• Parentheses override the priority of operators. For example, if you type knowledge AND
(management OR discovery), the results must contain knowledge and must also contain either
management or discovery. The NOT operator cannot be used to qualify an expression within
parentheses, for example, NOT (a and b). It can be used within parentheses, for example a OR
(b and NOT c).
• If no operators are used between words, multiple words are treated with the AND operator.
Searching attribute values
All attributes are indexed, so a query for attribute criteria is run against the full-text index by default.
The attributes for search criteria are supplied by the data dictionary of the selected repository. If value
assistance is defined in the data dictionary, the values are supplied for "is" and "is not" search criteria.
Verity operators such as "not" or "between" are not supported.
The default search is for a string query type in a full-text search. If the Content Server is indexed, the
query is performed against the full-text index including all searchable properties.
For attributes-only search, or mixed DQL and full-text, disable XQuery generation. Turn off XQuery
generation by adding the following setting to dfc.properties on the DFC client application:
dfc.search.xquery.generation.enable=false
The following procedures support attributes-only search:
• (Advanced search only) Add a checkbox for Include recently modified properties on the
advanced search page. Attributes are queried against the database and not the index. To add the
checkbox, uncomment the following lines your custom advanced search JSP page (a copy of the
webcomponent advanced search JSP page):
<!-<tr class="leftAlignment" valign=top>
65
<td class="leftAlignment" valign=top nowrap>
<dmfxs:searchscopecheckbox
name=’<%=AdvSearchEx.DATABASE_SEARCH_SCOPECHECKBOX_CONTROL%>’
scopename=’<%=RepositorySearch.DATABASE_SEARCH_PROPERTY%>’
checkedvalue=’true’
uncheckedvalue=’false’
nlsid=’MSG_DATABASE_SEARCH’
tooltipnlsid="MSG_DATABASE_SEARCH_TIP"/>
</td>
</tr>-->
• Use the DQL query type for a custom search component and pass the query string in the query
parameter. (See Modifying search component JSP pages, page 75.)
• Turn off FTDQL (queries against index) using a DQL hints file. You can disable index queries
for attributes without affecting the full-text string portion of a query. For more information, see
DQL hints, page 10.
• Set dfc.search.fulltext.enable to false in dfc.properties, which is located in WEB-INF/classes.
Value assistance and presets
If value assistance is defined in the data dictionary, the values are supplied for "is" and "is not"
search criteria.
Value assistance as defined within a DAR is supported. The assistance within the DAR provides a
union of values for a type across lifecycles. For information on supporting conditional value assistance
in JSP pages, see Configuring the advanced search component, page 69.
Limitations:
• Not all values in value assistance are available across repositories in a logical OR operation. (This
limitation does not apply to the AND operation.)
• Locale-based assistance must be present in the data dictionary for each locale.
In the Webtop presets editor, you can create a preset that limits the searchable object types. This preset
overrides the <includetypes> setting in the advanced search component definition.
Clustering, templates, and monitoring
Content Server provides search results clustering, search templates, and search monitoring. Before
version 6.7 of Content Server, the clustering and search monitoring requires a DAR file deployed to
a global registry repository. The search templates DAR file must be deployed to each repository in
which you wish to store search templates. Use Documentum Composer to deploy these DAR files to
the repositories. Instructions for deploying the Webtop Federated Search DocApps are in the EMC
Documentum Web Development Kit and Webtop Deployment Guide.
Wildcards, lemmatization, and word fragments
When the user enters an explicit wildcard (asterisk in one-box search, for example, Docum*), the
wildcard is not applied in the full-text index. It is applied only to find metadata in the index. Most
66
queries that users make are for whole words, not parts of words. This behavior can be changed (see
“Enabling the wildcard CONTAINS operator” below).
Lemmatization finds terms that are based on the root or lemma. For example, if no wildcard is present,
a search for car finds auto. Lemmatization is not performed on terms that contain wildcards: a search
for car* finds cars but not autos.
Enabling the wildcard CONTAINS operator for string
property searches
To enable the checkbox, remove the JSP comment tags around tine following tag in the
advancedsearchex.jsp page:
<!-<tr class="leftAlignment" valign=top>
<td class="leftAlignment" valign=top nowrap>
<dmfxs:searchscopecheckbox
name=’<%=AdvSearchEx.DATABASE_SEARCH_SCOPECHECKBOX_CONTROL%>’
scopename=’<%=RepositorySearch.DATABASE_SEARCH_PROPERTY%>’
checkedvalue=’true’
uncheckedvalue=’false’
nlsid=’MSG_DATABASE_SEARCH’
tooltipnlsid="MSG_DATABASE_SEARCH_TIP"/>
</td>
</tr>-->
Enabling fragment or database search
You can change the behavior of the CONTAINS operator behavior by enabling the searchscope
checkbox in the advanced search JSP page. This checkbox serves the following purposes:
• Retrieve objects with recently modified properties that have not yet been indexed.
• Perform case-sensitive queries against the database:
– DFC (and WDK/Webtop) queries
Set dfc.search.fulltext.enabled to false.
– DQL queries
Add the DQL hint ft_contain_fragment. Lemmatization is not applied when this hint is used.
Configuring search controls
Seeo EMC Documentum Web Development Kit and Webtop Reference Guide for details on the
configuration of each control.
You can globally configure all instances of certain advanced search controls by modifying the control
configuration definitions on wdk/config/advsearchex.xml. The following controls can be configured:
• searchattribute controls, match case attribute (does not apply to searches of the index)
• searchsizeattribute control
67
• searchdateattributecontrol
• search clusters
The following example changes the size range dropdown selections. It modifies advsearchex.xml in a
modification file located in custom/config with the following content:
<config version=’1.0’>
<scope type=’dm_sysobject’>
<searchsizeattributerange modifies="searchsizeattributerange:
wdk/config/advsearchex.xml">
<insert>
<option>
<label>Any old size</label>
<operator>LT</operator>
<value>-1</value>
<unit>KB</unit>
</option>
</insert>
</searchsizeattributerange>
</scope>
</config>
The resulting UI (search size custom dropdown list) shows the new values for size attribute range:
Search on full-text strings or attributes against a repository is not case sensitive. If the repository is not
indexed, queries are case sensitive by default. Case sensitivity for non-indexed repositories can be
turned on or off in wdk/config/advsearchex.xml, as the value of the <defaultmatchcase> element. If
you turn off case sensitivity, create functional indexes on the attributes that are queried.
You can set NOFTDQL queries to be case sensitive. Set the value of <defaultmatchcase> to true. For
better performance, set case sensitivity to true, or set it to false and create a functional index on
the queried attribute columns.
Configuring the basic search component
Basic search searches all sysobjects in the current repository for the user-supplied string in the full-text
index of content and attributes. The default base type for the search can be configured in the search
component definition. The default preferred sources can also be specified in the component definition.
If Federated Search Services is installed, its sources can include external sources .
• The list of object types and their attributes comes from the reference repository. The reference
repository is the first repository selected by the user. If external sources only are selected, then the
list of object types in the current repository is used.
68
• The search components are versioned. If a request is made for a search component, the new
component is returned by default. If you customized a supported previous version of a Webtop search
component and extended it, your customization is used in place of the new search components.
• To configure basic search to perform a DQL query, create a modified JSP page. For information on
this configuration, see Modifying search component JSP pages, page 75.
Configuring the advanced search component
The data dictionary provides the following data to the search UI:
• The default and other searchable attributes for a given object type.
• The list of searchable types. The presets or configuration file filters the list.
• The default and other search operators for a given type and attribute.
• Value assistance values for "=" and "< >" search operations, if defined in the data dictionary.
The WDK search UI contains search controls. To control attribute values, extend a search component
and modify your custom search JSP page.
Setting the search type drop-down list
The includetypes element in the advsearch component definition configures the available search types
list. The includetypes list is comma-delimited. The descend attribute specifies whether subtypes or
included or not. Create your modification definition in custom/config. The following example displays
dm_folder and all of its subtypes including custom types that subtype dm_folder:
<component modifies="advsearch:webcomponent/config/library/search/searchex/
advsearch_component.xml">
<replace path="includetypes">
<includetypes descend="true">dm_folder</includetypes>...
</replace></component>
The following illustration shows the type selection list set by includetypes with descend set to true.
69
The following example displays only two selections, because the descend parameter is set to false:
<includetypes descend="false">
dm_folder, my_type
</includetypes>
The following illustration shows the type selection list set by includetypes with descend set to false.
Providing conditional value assistance
Use individual searchattribute control tags to provide conditional value assistance. The default value
assistance must have no dependency on another attribute. Conditional value assistance depends on the
display order of the constraints in the JSP page, so you must display the controls in the dependency
order. The searchattributegroup tag provides only simple attribute assistance unless the constraints are
entered in the correct order.
The lists of conditional values are set in Documentum Composer. Query value assistance can use a
reference ($value(attribute)), for example:
SELECT "MyDocbase"."MyTable"."MyColumn1" FROM "MyDocbase"."MyTable"
WHERE "MyDocbase"."MyTable"."MyColumn2" = ’$value(MyAttribute)’
70
The following example lists four attributes, three of which have conditional value assistance lists that
were set up in Documentum Composer. The drop-down list for Make determines the list available for
Model. The drop-down lists Fuel and Year both depend on Model.
This UI was generated from the following set of controls in the JSP page:
<tr>
<td>Make:</td>
<td><dmfxs:searchattribute
</tr>
<tr>
<td>Model:</td>
</tr>
<tr>
<td>Year:</td>
</tr>
<tr>
<td>Fuel:</td>
</tr>
name=’make’ attribute="make"/></td>
name=’model’ attribute="model"/></td>
name=’year’ attribute="year"/><td>
name=’fuel’ attribute="fuel"/></td>
Configuring the savesearch component
Searches are saved as smartlist objects. Saved searches save the display configuration as well as the
query, and the user has the option of saving query results with the query. Users can revise a saved
search using the advanced search component.
Smartlists created with Documentum Desktop can be executed or edited in the advanced search UI.
After editing, they can no longer be used Desktop. Smartlists that are created in WDK applications
cannot be used or edited in Desktop.
The savesearch component displays checkboxes that allow the user to save search results with a search
and to make the saved search public. These two features can be removed by setting the value of the
configuration element enablesavingsearchresults to false. The following example in a modification
file removes these two checkboxes:
<component modifies="savesearch:webcomponent/config/library/savesearch/
savesearchex/savesearch_component.xml">
<replace path="enablesavingsearchresults">
71
<enablesavingsearchresults>false</enablesavingsearchresults>
</replace></component>...
The configuration element <includeresults> specifies whether to save results with a search.
Configuring search results
You can configure the maximum number of search results and turn off term hit highlighting. After
you have made custom types and their attributes available for search, you can configure the display of
custom attributes in the search results. You can configure the display_preferences component to allow
users to configure their preferences for displaying custom attributes.
The maximum number of search results, globally and per source, is configured in dfc.properties.
The maximum number of search results is specified as the value of dfc.search.max_results (was
maxresults_per_source in 5.3.x). The maximum number of results per source is specified as the
value of dfc.search.max_results_per_source. For example, you have specified a maximum of 1000
results and a maximum per source of 500. Results are accumulated from each source until the source
maximum of 500 is reached or until the global maximum of 1000 is reached.
Note: These settings can affect performance. Setting the value too high can overload xPlore, and
setting it too low can frustrate users. Evaluate the best settings for your environment.
Term hit highlighting (highlighting of the search term in the results) can be set as a user preference.
The default value is set as the value of the element highlight_matching_terms in the search component
definition, which is located in webcomponent/config/library/search/searchex. If you are customizing
Webtop or an application that extends Webtop, add a highlight_matching_terms element to the
top-level search component definition.
Configuring the display of attributes in search results
Default search result columns are configured as column elements in the basic search configuration file
search60_component.xml in webcomponent/config/library/search/searchex. Only attributes marked as
searchable in the data dictionary can be specified as columns. Users can set a preference for search
results columns in the display_preferences component, which then overrides the default settings in
the configuration file.
To define default visible columns for custom attributes, your custom search component definition must
specify a scope for the custom type. For example, the user selects a custom type for the advanced
search. The columns specified in your scoped basic search component are displayed in the results.
Details of the columns configuration can be found in EMC Documentum Web Development Kit
Reference Guide
In the following simple configuration, the definition extends the WDK search component definition
and adds some custom attribute columns:
<scope type=’technical_publications_web’>
<component modifies="
search:webcomponent/config/library/search/searchex/search60_component.xml">
<insert path=’columns_list’>
<column>
<attribute>tp_edition</attribute>
72
<label>Edition</label>
<visible>true</visible>
</column>
<column>
<attribute>tp_web_viewable</attribute>
<label>OK to display</label>
<visible>true</visible>
</column>
</insert>
</component>
</scope>
</config>
The user can select attributes for display in search results, which overrides the default display. The
preferences UI allows users to specify the attributes that are displayed for specific object types. If
the user configures different display columns, the query is not reissued. The new column data is not
displayed until the search is performed again. For example, calculated columns such as score or
summary do not display any values unless they are selected before the query is run.
Modify the definition for the display_preferences component to make columns of your custom type
available to users for display. To make a custom type available in preferences:
1. Modify the display_preferences component in your custom/config directory:
<component modifies="
display_preferences:webtop/config/display_preferences_ex_component.xml">
2. Add your custom type to the <display_docbase_types> element. For example:
<insert path=’preferences.display_docbase_types’>
<docbase_type>
<value>my_custom_type</value>
<label>My type</label>
</docbase_type>
</insert>
3. Save this file and refresh the configuration files on the application server by navigating to
wdk/refresh.jsp.
To make a calculated attribute available in search results:
1. Extend the Search60 class in the package com.documentum.webtop.webcomponent.search.
2. Override the initAttributes method and add your computed attribute. The following example
adds "myComputed" attribute:
protected void initAttributes()
{
List<String> mandatoryAttrs = getAttributesManager().getMandatory();
mandatoryAttrs.add("myComputed");
getAttributesManager().setMandatory(mandatoryAttrs);
super.initAttributes();
}
3. Extend the search component definition to use your custom class, and scope it to your custom type.
Set the class to use the custom class.
73
Tuning results performance
To enhance query performance, turn off the display of the results folder path. The value of
displayresultspath in webcomponent/config/library/search/searchex/search60_component.xml is set to
false.
The summary column is calculated, which can add to query overhead. Turn off the summary
column by extending the Webtop search component searchex_component.xml, which is
located in webtop/config. Copy the columns_XXX elements (columns_drilldown, columns_list,
and columns_saved_search) from the parent configuration file search60_component.xml
in webcomponent/config/libarary/search/searchex. In each of the columns elements, set
the value of column.attribute.visible for the summary attribute to false. Set the value of
columns_XXX.loadinvisibleattribute to false to ensure that the column is not calculated.
Configuring Webtop Federated Search
clustering
Install the Webtop Federated search clustering DAR file in the global registry to support clustering of
search results in groups based on their attribute values. Define the strategies including default strategies
in clusterstrategies_config.xml, which is located in the wdk/config of the WDK-based application.
The clusterStrategy element defines each cluster strategy. This element contains one or more attributes
specified as the value of the criterion child element. The clusterTree element governs the display. Its
child elements primary and secondary have values that correspond to the IDs of strategies.
Tokenizers split attribute strings into chunks that are then used as clusters. Only one tokenizer is
associated with an attribute. The default tokenizer is text, and other tokenizers are defined to tokenize
on number, author and date. Tokenizers are part of the clustering SBO.
You can add, remove, or change a strategy definition or add, remove, or change the strategies that are
displayed in the default cluster tree. Users can change these defaults in their search preferences.
To add a strategy definition:
1. Create a file clusterstrategies_modifications.xml in custom/config.
2. Add the opening and closing declarations:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<config>
<scope>
</scope>
</config>
3. Within the scope element, add the following element that specifies the primary element you are
modifying and the file in which it exists:
<clusterStrategies modifies="clusterStrategies:wdk/config/
clusterstrategies_config.xml">
</clusterStrategies>
4. Within the clusterStrategies element, insert the new strategy that will cluster results for a certain
attribute.
74
This example creates a cluster for the keywords attribute:
<insert>
<clusterStrategy id="keywords" nlsid="MSG_KEYWORDS"
icon="cluster/ranking.gif" threshold="5">
<criterion>keywords</criterion>
</clusterStrategy>
</insert>
Note: If you provide an nlsid value, you must have a corresponding string in
clusterstrategiesNlsProp.properties. The icon path is relative to the theme folder icons/browsertree
directory in the application. The threshold specifies the minimum number of documents for which
to display the cluster.
5. Refresh the configurations in memory by navigating to wdk/refresh.jsp or restart the application
server.
To display a new strategy in the default cluster tree:
1. In the modifications file you created that contains the new strategy, add the following child element
to scope (sibling to clusterStrategy):
<clusterTreeGroup modifies="clusterTreeGroup:wdk/config/
clusterstrategies_config.xml">
<insert>
<clusterTree>
<primary>keywords</primary>
<secondary>topic</secondary>
</clusterTree>
</insert>
Modifying search component JSP pages
Changes to JSP pages are considered to be customizations. The following examples extend Webtop
search component definitions and specify a custom JSP page in which to make customizations.
Performing a DQL query
The basic search component can perform a DQL query. Basic search is launched from the titlebar
component. This example replaces basic search. You can add a button in the titlebar that launches
a DQL query, leaving basic search intact. If you add a new button, as shown in the example, add a
JavaScript event handler to launch your DQL query.
1.
Create an XML modification file in /custom/config with the following contents:
<scope>
<component modifies="titlebar:webtop/config/titlebar_component.xml">
<replace path="pages.start">
<start>custom/titlebar/titlebar.jsp</start>
</replace>
</component></scope></config>
75
2. Copy titlebar.jsp from webtop/titlebar to custom/titlebar. (Create this target directory if it does not
yet exist.)
3.
Open titlebar.jsp in custom/titlebar and find the JavaScript function onClickSearch. Within the
function, find the following line:
postComponentJumpEvent(null, "search", "content", "query", strValue);
In this call to the basic search component, you change the query type to "dql" and the value to
the DQL string.
4.
Add a query and change the query type in the onClickSearch JavaScript function, like the
following. (This example does a wildcard search with the input string.)
function onClickSearch ()
{
var contentPage = eval(getAbsoluteFramePath("content"));
if (contentPage != null)
{
var text = document.getElementById("txtSearch");
callBlur(text);
var strValue = text.value;
if (strValue != "" && strValue != "<%=strSearch %>")
{
var strDQL = "select * from dm_document where upper(object_name)
like ’%" + strValue.toUpperCase() + "%’";
postComponentJumpEvent(null, "search", "content", "queryType",
"dql", "query", strDQL);
if (typeof text.autoComplete != "undefined" &&
text.autoComplete != null)
{
// add the search string to client-side’s auto-complete suggestions
text.autoComplete.addEntry(strValue);
var prefs = InlineRequestEngine.getPreferences(
InlineRequestType.JSON);
prefs.setCallback("onUpdateACCallBack");
postInlineServerEvent(null, prefs, null, null, "
onUpdateAutoCompleteData", null, null);}}}}
Setting the default search type
To set the default search type, supply your preferred type in the JavaScript function that calls
the advanced search container. In Webtop, titlebar.jsp calls advanced search. Extend the titlebar
component and provide the following postComponentNestEvent calls in the onClickAdvancedSearch
JavaScript function. Substitute your custom type (in quotation marks) for custom_type:
postComponentNestEvent(null, "advsearchcontainer", "content", "component", "
advsearch", "type", custom_type, "usepreviousinput", "false", "query", strValue);
...
postComponentNestEvent(null, "advsearchcontainer","content","component","
advsearch",
"type", custom_type, "usepreviousinput", "true");
76
This example uses simple DQL. You can take content from the user for a DQL search and construct the
DQL on the fly as shown in the following example.
Displaying specific attributes for search
You can specify attributes for your search rather passive generation by the searchattributegroup
control. In the following example of a custom advsearch component, specific attribute controls have
replaced the searchattributegroup control in the JSP page:
...
<dmfxs:searchobjecttypedropdownlist name=’objecttypectrl’.../></td></tr>
<tr><td colspan=’2’ class=’spacer’ height=’10’> </td></tr>
<tr>
<td align=right valign=top nowrap><dmf:label label=’Name’ cssclass="
fieldlabel"/></td>
<td align=left valign=top nowrap>
<dmfxs:searchattribute name=’searchname’ attribute=’object_name’
andorvisible="false" removable="false">
</dmfxs:searchattribute>
</td>
</tr>
<tr>
<td align=right valign=top nowrap><dmf:label label=’Type’ cssclass="
fieldlabel"/></td>
<td align=left valign=top nowrap>
<dmfxs:searchattribute name=’searchtype’ attribute=’r_object_type’
andorvisible="false" removable="false">
</dmfxs:searchattribute>
</td>
</tr>...
Note: Set the andorvisible and removable attributes to false on the searchattribute control.
Before this customization, the user must select properties from a dropdown:
77
To display specific custom attributes as individual search criteria, extend the advanced search
component. Scope the definition to your custom type and provide a custom JSP page. In that page,
add attribute controls for your attributes. When the user selects the custom type, the configuration
service reads the scoped definition. The custom JSP page with custom attributes is displayed, like the
following:
After customization, the UI shows the individual attributes "Name" and "Type" as search criteria:
Specific attributes as search criteria
78
Enabling fragment search (wildcard support)
Starting with DFC 7.0 and xPlore 1.3, the support of fragment search using wildcards has changed.
The default behavior in xPlore matches that of commonly used search engines. Wildcard (fragment)
search is not performed in a full-text search unless the user adds an explicit wildcard. This provides
fast, more precise search results than a fragment search. The EMC Documentum xPlore Administration
and Development Guide provides information on the default support and wildcard configuration.
Modifying a search component query
You can access a query before it is submitted and modify it in various ways. The query is accessible by
overriding the initSearch() method of the Search60 class. Your custom class must extend the Webtop
version of either the Search60 or AdvSearchEx component class.
The following methods in the basic search component class Search60 provide customization points:
• initSearch(arg): Override to modify queries before execution
• initControls(arg): Override to update custom controls
• initAttributes(): Override to perform specific treatment for columns. Use getAttributesManager() to
manipulate columns and query attributes
• initResultsSet(): Override to manipulate the results that are fed to the datagrid
• initSearchExecution(): Start the actual query execution
Adding a WHERE clause to simple search
To add a WHERE clause to the query in simple search, extend Search60 in the package
com.documentum.webtop.webcomponent.search. You can add criteria other than keywords to the
initSearch method. If you override buildQuery, you can break smartlist usage. The following example
adds an AND clause to a query. The query searches for a specific string in the name of the object, in
addition to criteria in the simple search text box.
First, create your search component definition in custom/config as follows:
<scope>
<component modifies="search:webtop/config/search60_component.xml">
<replace path=’class’>
<class>com.mycompany.SearchEx</class>
</replace>
</component>
</scope>
</config>
Next, create your custom class that extends Search60 and overrides initSearch():
package com.mycompany;
import com.documentum.fc.client.search.IDfExpressionSet;
import com.documentum.fc.client.search.IDfQueryBuilder;
import com.documentum.fc.client.search.IDfSimpleAttrExpression;
import com.documentum.fc.common.IDfValue;
79
import com.documentum.web.common.ArgumentList;
import com.documentum.webcomponent.library.search.SearchInfo;
public class SearchEx extends com.documentum.webtop.webcomponent.search.Search60
{
protected void initSearch (ArgumentList args)
{
super.initSearch(args);
String queryType = args.get(ARG_QUERY_TYPE);
if ((queryType == null) || (queryType.length() == 0) ||
(queryType.equals("string")))
{
SearchInfo info = getSearchInfo();
IDfQueryBuilder qb = info.getQueryBuilder();
IDfExpressionSet rootSet = qb.getRootExpressionSet();
IDfExpressionSet setAnd = rootSet.addExpressionSet
(IDfExpressionSet.LOGICAL_OP_AND);
setAnd.addSimpleAttrExpression("r_modifier", IDfValue.DF_STRING,
IDfSimpleAttrExpression.SEARCH_OP_CONTAINS, true, false, "tuser");
}
}
}
This example adds an AND criterion in which the modifier attribute must contain the user name
"tuser". Before the customization, a search on the string "Target" in the simple search box returns
three results as shown here:
After customization, only a single result in which the object name contains "Target" and the user name
contains "tuser" returned. (User name is displayed in the second column, as "Modifier.")
With IDfExpressionSet, you can add the following operators: LOGICAL_OP_AND,
LOGICAL_OP_DEFAULT (default operator in data dictionary), and LOGICAL_OP_OR. The
following expressions, also called predicates, are available for IDfSimpleAttrExpression (names
are self-explanatory):
80
SEARCH_OP_BEGINS_WITH
SEARCH_OP_CONTAINS
SEARCH_OP_DOES_NOT_CONTAIN
SEARCH_OP_ENDS_WITH
SEARCH_OP_EQUAL
SEARCH_OP_GREATER_EQUAL
SEARCH_OP_GREATER_THAN
SEARCH_OP_IS_NOT_NULL
SEARCH_OP_IS_NULL
SEARCH_OP_LESS_EQUAL
SEARCH_OP_LESS_THAN
SEARCH_OP_NOT_EQUAL
The following expression is available for IDfValueRangeAttrExpression:
SEARCH_OP_BETWEEN
The following expressions can be used with IDfValueListAttrExpression:
SEARCH_OP_IN
SEARCH_OP_NOT_IN
Setting exact match
When you use IDfQueryBuilder to build the query, you can call the IDfSimpleAttrExpression method
setExactMatchEnabled(boolean) to turn off lemmatization, stop words, thesaurus, fuzzy search, and
wildcards.
Adding a WHERE clause to advanced search
In advanced search, you override buildQuery to access the user query. The search class is as follows:
package com.mycompany;
import com.documentum.fc.common.IDfValue;
import com.documentum.fc.client.search.IDfSimpleAttrExpression;
import com.documentum.fc.client.search.IDfExpressionSet;
import com.documentum.fc.client.search.IDfQueryBuilder;
public class AdvSearchEx extends
com.documentum.webtop.webcomponent.advsearch.AdvSearchEx
{
protected IDfQueryBuilder buildQuery() throws Exception
{
IDfQueryBuilder qb = super.buildQuery();
IDfExpressionSet rootSet = qb.getRootExpressionSet();
IDfExpressionSet setAnd = rootSet.addExpressionSet
(IDfExpressionSet.LOGICAL_OP_AND);
setAnd.addSimpleAttrExpression("object_name", IDfValue.DF_STRING,
IDfSimpleAttrExpression.SEARCH_OP_CONTAINS, true, false, "xpath");
return qb;
}
}
81
Changing the query source
You can change the location, including the source and folder path in the repository with query builder
APIs. The following example adds a source repository to IDfQueryBuilder instance and sets a path
within the repository for the query. The examples for basic and advanced search show you how to get
the query builder instance (variable qb in this example):
qb.clearSelectedSources();
qb.addSelectedSource("dm_notes");
// set source, path, descend flag
qb.addLocationScope("dm_notes", "/Temp", false);
The resulting query is like the following:
SELECT r_object_id,text,object_name,FROM dm_document
SEARCH DOCUMENT CONTAINS testing WHERE (object_name
LIKE %testing% ESCAPE \) AND FOLDER(/Temp) AND (a_is_hidden = FALSE)
Hiding the customization from query editing
If you have intercepted and modified a query after form submit, the hidden query processing will
be displayed when the user tries to modify the query. To hide the custom modification, add the
usepreviousinput parameter in the call to the advanced search component. Modify the titlebar
component definition to use your own titlebar.jsp page as follows:
<component modifies="titlebar:webtop/config/titlebar_component.xml">
<replace path="pages.start">
<start>/custom/titlebar/titlebar.jsp</start>
</replace></component>
In your custom titlebar JSP page, change the call to the advanced search component to set
usepreviousinput to false:
postComponentNestEvent(null, "advsearchcontainer","content","advsearch",
"type", "dm_sysobject", "usepreviousinput", "false")’
Programmatic search value assistance
Data dictionary value assistance is available in advanced search. If you have not defined
value assistance for an attribute in the repository data dictionary, you can add value assistance
programmatically. Define a custom tag handler to render the value assistance values. The tag handler
is specified in the search configuration file advsearchex.xml as follows:
<searchvalueassistance>
<attribute_type_name>
fully_qualified_class_name
</attribute_type_name>
</searchvalueassistance>
When the user selects an attribute for search, the values in the criteria dropdownlist control are filled
by the custom tag class. To add your own custom tag class, copy the file wdk/advsearchex.xml to
custom/config and add your handlers to the <searchvalueassistance> element. Your tag handler must
implement ISearchAttributeValueTag.
82
Note: Do not delete the Documentum value assistance handlers. The entire contents of the
<searchvalueassistance> overrides the contents of the element in the WDK version of this file.
The following tag handlers render values for certain attributes. The handler classes are in
com.documentum.web.formext.control.docbase.search.
• BooleanVATag
Provides values for any Boolean attribute
• ContentTypeVATag
Provides valid a_content_type (dm_format) names and descriptions
• ExistingValueVATag
Uncomment this tag and specify an attribute for which to populate the drop-down list with all
existing values for the selected object type
• ObjectTypeVATag
Populates the search object type drop-down list with available object types
• PermissionVATag
Provides possible permission values (none, browse, read, relate, version, write, delete) for setting
world_permit, group_permit, and owner_permit attributes
• SearchMetaDataVATag
Gets attribute names, default value, and description for each attribute. This handler is for internal
use only.
Your tag class must extend the abstract class SearchVADropDownListTag and
implement ISearchAttributeValueTag. For example, the BooleanVATag class implements
populateValueDropDownList to provide the two Boolean values:
protected void populateValueDropDownList(SearchDropDownList ddList)
{
Option optionTrue = new Option();
optionTrue.setValue("1");
optionTrue.setLabel(SearchControl.getString("MSG_TRUE", ddList));
ddList.addOption(optionTrue);
Option optionFalse = new Option();
optionFalse.setValue("0");
optionFalse.setLabel(SearchControl.getString("MSG_FALSE", ddList));
ddList.addOption(optionFalse);
}
83
Chapter 5
Configuring CenterStage Search
•
Set Federated Search Services options
•
Improving search performance
Set Federated Search Services options
Federated search is available if your organization has enabled the connection with the Federated
Search Services (FS2) server. Federated search allows users to search external and internal sources at
the same time and display all results consistently. This section briefly describes the main steps to add
and configure external sources. For more information on FS2, see the EMC Documentum Federated
Search Services 6.6 Administration Guide, available within the CenterStage product on EMC Online
Support (https://support.emc.com).
You manage external sources using the Admin Center FS2 administration tool. Each external source
in CenterStage is an information source in Admin Center. An information source relies on an
adapter bundle (available as a *.jar file) and a specific configuration. Some information sources can
be available with a default configuration because they correspond to public information sources.
For example, the information sources Google, Wikipedia, OpenDirectory, and YahooDirectory are
already configured and available in CenterStage. Other information sources require configuration
before being available to users.
The following adapter bundles are available out-of-the-box with FS2:
• EMC Documentum ECM (Enterprise Content Management)
• EMC Documentum eRoom
• EMC Documentum ApplicationXtender
• EMC Documentum EmailXtender
• EMC SourceOne
• JDBC/ODBC
• Google Desktop Enterprise
• Windows Search
• OpenSearch
• FS2 Indexing for shared drives
The configuration of each adapter is described in the EMC Documentum Federated Search Services
Adapter Installation Guide.
FS2 Admin Center can be accessed using a URL such as:
85
Configuring CenterStage Search
https://:<FS2_server_host>:<Admin_Center_port_number>/AdminCenter
where <FS2_server_host> is the name or the IP address of FS2 server,and
<Admin_Center_port_number> is set to 3003 by default.
Use FS2 Admin Center to perform the following administration tasks:
• Add information sources
• Upload new bundles
• Configure and test the adapters
• Set the authentication mode for the information sources: public access, corporate account (same
account shared by all users), and user account
Improving search performance
Due to the high number of available formats in the repository, searches perform poorly when the user
selects formats in the format filter. To improve search performance, configure the format filter to
ignore the formats that are not used. You can restore the filters at any time. You ignore a format by
setting the format_class attribute to kw_ignore in the formats table.
Ignoring some formats also reduces the list of possible formats in the Others format filter, which
can be a long list.
To ignore a format:
1. In DA, open the DQL editor.
2. Run the following DQL query to get the list of available formats in the repository:
SELECT name, mime_type, description FROM dm_format WHERE NOT ANY
format_class=’kw_ignore’ ORDER BY name
3. Run the following DQL query where xyz is the format to ignore.:
UPDATE dm_format OBJECTS APPEND format_class=’kw_ignore’ WHERE
"name" = ’xyz’
4. Restart the application server to clean the cache of the formats table.
To restore a format:
1. In DA, open the DQL editor.
2. Run the following DQL query where xyz is the format to ignore.
UPDATE dm_format OBJECTS REMOVE format_class[0] where "name" = ’xyz’
Index [0] is used if there was no value already set for the repeating attribute format_class.
Otherwise, check for the right index.
3. Restart the application server to clean the cache of the formats table.
86
Chapter 6
Troubleshooting
•
Troubleshooting Search
•
Problem queries
•
Debugging
Troubleshooting Search
Set the xPlore search service log level to WARN to log queries. If query auditing is enabled (the
default), you can view or edit reports on queries. Refer to EMC Documentum xPlore Administration
and Development Guide for more information.
For performance-related configuration, refer to EMC Documentum xPlore Administration and
Development Guide.
Inconsistent results between database and full-text queries
Some queries generate different results when they are executed as a full-text query than when they are
executed as a database query. Possible reasons for this problem are discussed in the following topics.
Document too large to be indexed
You can set a maximum size for content that is indexed by CPS. You set the actual document size,
not the size of the text within the content. To set the maximum content size, edit the index agent
configuration file. For more information, refer to EMC Documentum xPlore Administration and
Development Guide.
You can configure xPlore CPS to change the maximum text size within a document, or change the
thread pool size. You can also add a separate CPS instance that is dedicated to processing. This
processor does not interfere with query processing. For more information, refer to EMC Documentum
xPlore Administration and Development Guide.
Verifying the query plugin
Check the Content Server log after your start the Content Server. The file repository_name.log is
located in $DOCUMENTUM/dba/log. Look for the line like the following. It references a plugin
with DSEARCH in the name, like the following.
87
Troubleshooting
Mon Jun 14 21:53:50 2010 031000 [DM_FULLTEXT_T_QUERY_PLUGIN_VERSION]info:
"Loaded FT Query Plugin: ...C:\Documentum\product\6.5/bin/DSEARCHQueryPlugin.dll...
The Content Server query plugin properties of the dm_ftengine_config object are set during xPlore
configuration. If you have changed one of the properties, like the primary xPlore host, the plugin can
fail. Verify the plugin properties, especially the qrserverhost, with the following DQL:
1> select param_name, param_value from dm_ftengine_config
2> go
You see specific properties like the following:
param_name
param_value
dsearch_qrygen_mode
fast_wildcard_compatible
query_plugin_mapping_file
dsearch_domain DSS_LH1
dsearch_qrserver_host
dsearch_qrserver_port
dsearch_qrserver_target
both
true
C:\Documentum\fulltext\dsearch\dm_AttributeMapping.xml
Config8518VM0
9300
/dsearch/IndexServerServlet
Indexing latency
Latency is the time interval between two events. In the context of searching, latency caused by a
number of situations can cause inconsistent results. For example, the following situations can generate
latency periods that result in inconsistent results:
• An object was deleted in the repository but that deletion is not yet reflected in the index
In this case, a query against the index returns a result, whereas the same query against the repository
does not.
• An object was added to the repository but is not yet added to the index
In this case, a query against the repository returns the result, whereas the same query against the
index does not.
Lemmatization differences
The full-text engine uses lemmatization (grammatical normalization) when conducting a search.
Database searches do not support lemmatization. Content Server only returns exact matches. This
means that the same query, run against the index and run again against the database can return
different numbers of results.
Case sensitivity differences
Searches on the full-text index are not case sensitive. Searches in the database are case sensitive by
default. This difference can cause queries to return different numbers of results. For example, suppose
you issue the following query:
SELECT object_name,object_owner,title FROM dm_document
WHERE subject = ’bread’ ENABLE(FTDQL)
88
Troubleshooting
The example query runs as a full-text query. This query returns all objects whose subject is ’bread’,
’Bread’, ’bRead’, or any other combination of upper and lowercase letters that spell bread. If the query
is run with the hint ENABLE(NOFTDQL) hint, it runs against the database. In that case, the query
returns only those objects whose subject is ’bread’, all lowercase.
If you want to run that query against the database and in a case-insensitive manner, you could use
the upper (or lower) function:
SELECT object_name,object_owner,title FROM dm_document
WHERE UPPER(subject) = UPPER(’bread’)
Problem queries
A query can have the following problems:
• Foreign language not identified
The first language that is identified in associated with the document for indexing. Other language
content might not be properly indexed. Queries issued from Documentum clients are searched in the
language of the session_locale. The search client can set session locale through DFC or iAPI.
• Query is unselective
A query is unselective when it searches for a property value that is common among the objects in
the repository. For example, the following query is unselective if the specified property value is
common:
SELECT object_name, object_owner FROM dm_sysobject
WHERE a_storage_type = "engrfilestore" ENABLE(FTDQL)
If engrfilestore is the default file store for sysobjects, this query finds many objects but not the
object the user is searching for.
• Search contains a wildcard
• Wildcards match separate terms, not fragments of a term. Fragment search support can be turned
on in xPlore, but it causes slower performance. For details, refer to EMC Documentum xPlore
Wildcards are supported in attribute searches. The operator * matches 0 or more characters.
• Query for a specific folder
Folder descend query performance can depend on folder hierarchy and data distribution across
folders. The following conditions can degrade query performance:
– Many folders, and a large portion of them are empty
Increase folder_cache_limit in the dm_ftengine_config object.
– The search predicate is unselective but the folder constraint is selective
Decrease folder_cache_limit in the dm_ftengine_config object.
The folder_cache_limit setting in the dm_ftengine_config object specifies the maximum number
of folder IDs probed. Default is 2000. If the folder descend condition evaluates to less than the
folder_cache_limit value, then folder IDs are pushed into the index probe. If the condition exceeds
the folder_cache_limit value, the folder constraint is evaluated separately for each result.
• Search for XML elements
89
Troubleshooting
By default, XML content of an input document is not indexed. You can change XML indexing in the
xml-content element of the xPlore configuration file. indexserverconfig.xml. For more information,
refer to EMC Documentum xPlore Administration and Development Guide.
• Document indexed but term not found
Because lemmatization is context-based, a word is tokenized differently depending on its context in a
sentence, yielding variable results. For example, saw is lemmatized to the verb to see or to the noun
saw depending on the context. A query sometimes does not have enough context to determine which
of these bases is required. In another example, the noun swimming is not lemmatized to the related
verb to swim. A search for swimming does not return documents containing swim. (Alternative
lemmas solve this issue: both lemmas are saved for ambiguous contexts.) Lemmatization of queries
is more prone to error because less context is available in comparison to indexing. See EMC
• Query contains special characters
A search for a string containing special characters is treated as a phrase search. For example, when a
home_base is indexed, home and base are stored next to each other. A search for home_base finds
the containing document but does not find other documents containing home or base but not both.
Another example is a list of names containing White,Jim. This list is tokenized as "White,Jim"
because the comma is treated as a context character. A search for "White" does not return
this document. You can configure the special characters list to remove the comma. See EMC
• xQuery with DfXQuery.java is not thread-safe.
To execute the xQuery and other queries in one session, the xQuery must be synchronized until the
result stream is closed as shown in the following example:
synchronized(session.getDocbaseConnection()) {
try {
xq.execute(session, target);
InputStream in = xq.getInputStream(session);
//Change in to ByteArrayInputStream so that we can close xq
byte[] buff = new byte[10000];
int bytesRead = 0;
ByteArrayOutputStream bao = new ByteArrayOutputStream();
while((bytesRead = in.read(buff)) != -1) {
bao.write(buff, 0, bytesRead);
}
is = new ByteArrayInputStream(bao.toByteArray());
}
finally {
xq.close();
}
}
90
Troubleshooting
Debugging
You can test queries in xPlore administrator. Reports on slow queries allow you to see the actual
query and how it was executed.
Using Documentum Administrator, you can trace full-text querying operations. Go to Job
Management > Administration Methods > MODIFY_TRACE. Two tracing levels are available:
• None: Tracing is turned off.
• All: Content Server and full-text messages resulting from queries are logged.
You can trace index agent operations. See EMC Documentum xPlore Administration and Development
Guide.
If the query fails to return expected results in Webtop, perform a Ctrl-click on the Edit button in the
results page. The query is displayed in the events history as a select statement like the following:
IDfQueryEvent(INTERNAL, DEFAULT): [dm_notes] returned
[Start processing] at
[2010-06-30 02:31:00:176 -0700]
IDfQueryEvent(INTERNAL, NATIVEQUERY): [dm_notes] returned
[SELECT text,object_name,score,summary,r_modify_date,...
SEARCH DOCUMENT CONTAINS ’ctrl-click’ WHERE (...]
his action also displays the list of events that occurred during the search: The DQL sent, the FS2
query sent, and the errors from search sources.
If there is a processing error, the stack trace is shown.
91
Appendix A
DFC schemas
This appendix covers the following topics:
∙
DQL hints file DTD
∙
Extended object search schema
DQL hints file DTD
Following is the hints file DTD, parsed and enforced in DFC. It does not need a doctype declaration.
<!ELEMENT RuleSet (Rule*)>
<!ELEMENT Rule (Condition?, DQLHint?, SelectOption?, DisableFullText?, DisableFTDQL?)>
<!ELEMENT Condition (Select?, From?, Where?, Docbase?, FulltextExpression?)>
<!ELEMENT DQLHint (#PCDATA)>
<!ELEMENT SelectOption (#PCDATA)>
<!ELEMENT DisableFullText EMPTY>
<!ELEMENT DisableFTDQL EMPTY>
<!ELEMENT Select (Attribute+)>
<!ATTLIST Select condition (all | any) \"all\">
<!ELEMENT From (Type+)>
<!ATTLIST From condition (all | any) \"all\">
<!ELEMENT Where (Attribute+)>
<!ATTLIST Where condition (all | any) \"all\">
<!ELEMENT Docbase (Name+)>
<!ELEMENT FulltextExpression EMPTY>
<!ELEMENT FulltextExpression exists (true | false) #REQUIRED>
<!ELEMENT Attribute (#PCDATA)>
<!ATTLIST Attribute operator
(equal|not_equal|greater_than|greater_equal|less_than|less_equal|like|
not_like|is_null|is_not_null|in|not_in|between)#IMPLIED>
<!ELEMENT Type (#PCDATA)>
<!ELEMENT Name (#PCDATA)>
<!ATTLIST Name descend (true | false) #IMPLIED>
<?xml version="1.0"?>
<xsd:schema targetNamespace="http://www.documentum.com"
xmlns:doc="http://www.documentum.com"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns="http://www.documentum.com">
93
<xsd:element name="mapping" type="doc:JAXBMappingXplore"/>



<xsd:complexType name="JAXBMappingXplore">
<xsd:sequence>
<xsd:element name="interface" type="doc:JAXBSearchInterfaceXplore"
minOccurs="1" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="JAXBSearchInterfaceXplore">
<xsd:sequence>
<xsd:element name="alias" type="doc:JAXBAliasXplore" minOccurs="0"
maxOccurs="unbounded"/>
</xsd:sequence>
<xsd:attribute name="name" type="doc:Name" use="required"/>
<xsd:attribute name="map-to" type="doc:Identifier" use="optional"/>
<xsd:attribute name="primary" type="xsd:boolean" use="optional"
default="false"/>
</xsd:complexType>
<xsd:complexType name="JAXBAliasXplore">
<xsd:attribute name="name" type="doc:Name" use="required"/>
<xsd:attribute name="map-to" type="doc:MixIdentifier" use="required"/>
<xsd:attribute name="cardinality" default="ONE" type="doc:Cardinality"/>
</xsd:complexType>



<xsd:simpleType name="Name">
<xsd:restriction base="xsd:string">
<xsd:pattern value="[a-zA-Z][a-z_A-Z0-9]*"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name="Identifier">
<xsd:pattern value="[a-zA-Z][a-z_>A-Z0-9\.]*"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name="MixIdentifier">
<xsd:pattern value="[a-zA-Z][a-z_>A-Z]*(\.[a-zA-Z][a-z_>A-Z]*){0,2}"/>
</xsd:restriction>
94
DFC schemas
</xsd:simpleType>
<xsd:simpleType name="Cardinality">
<xsd:enumeration value="ONE"/>
<xsd:enumeration value="MANY"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:schema>
95
Index
C
S
case sensitivity
in WDK basic search, 68
of queries, 88
Content Server, 7
search
number of results, 72
results display, 72
term hit highlighting, 72
slow query
unselective, 89
special characters
troubleshooting, 90
F
Federated Search Services
Admin Center, setting external sources, 85
T
I
IDfQueryBuilder, 24
IDfQueryManager, 24
IDfQueryProcessor, 24
IDfSearchMetadataMgr, 25
index agent
described, 7
index server
xPlore, 8
L
term hit highlighting
in WDK search, 72
W
wildcard
contains fragment, 79
X
xPlore
index server, 8
languages
indexing, 8
latency
inconsistent query results, and, 88
lemmatization
inconsistent query results, and, 88
M
multirepository search data model, 25
P
performance
suppress folder path display, 74
suppress summary calculation, 74
Q
queries
case sensitivity, 88
inconsistent results, causes, 87
lemmatization of, 88
query
definition, in DFC, 24
97

EMC Documentum Search Development Guide

Transcription

Similar documents

- Lab for Media Search - National University of Singapore

- Lab for Media Search - National University of Singapore

Article (Published version)

Photo search by face positions and facial attributes on

nassi-shneiderman diagrams and tabletalk

Getting Started with Your Search

the puzzle sagiv levy

f(A)

Information Integration Using Logical Views