Yandex.XML Developer's guide 8.10.2014

Transcription

Yandex.XML Developer's guide 8.10.2014
Yandex.XML
Developer's guide
8.10.2014
Yandex.XML. Developer's guide. Version 1.2
Document build date: 8.10.2014.
This volume is a part of Yandex technical documentation.
Yandex helpdesk site: http://help.yandex.ru
© 2008—2014 Yandex LLC. All rights reserved.
Copyright Disclaimer
Yandex (and its applicable licensor) has exclusive rights for all results of intellectual activity and equated to them means of individualization, used for development, support,
and usage of the service Yandex.XML. It may include, but not limited to, computer programs (software), databases, images, texts, other works and inventions, utility models,
trademarks, service marks, and commercial denominations. The copyright is protected under provision of Part 4 of the Russian Civil Code and international laws.
You may use Yandex.XML or its components only within credentials granted by the Terms of Use of Yandex.XML or within an appropriate Agreement.
Any infringements of exclusive rights of the copyright owner are punishable under civil, administrative or criminal Russian laws.
Contact information
Yandex LLC
http://www.yandex.com
Phone: +7 495 739 7000
Email: [email protected]
Headquarters: 16 L'va Tolstogo St., Moscow, Russia 119021
Contents
Overview .............................................................................................................................................................................................. 4
Restrictions and requirements .............................................................................................................................................................. 4
Getting started ...................................................................................................................................................................................... 5
Registration .......................................................................................................................................................................................... 6
Request for search results ................................................................................................................................................................. 7
GET requests ................................................................................................................................................................................ 7
POST requests .............................................................................................................................................................................. 9
Response format ....................................................................................................................................................................... 11
request ................................................................................................................................................................................ 13
response .............................................................................................................................................................................. 14
Request for limits for the next day ................................................................................................................................................. 19
Response format ......................................................................................................................................................................... 19
Formatting results .............................................................................................................................................................................. 21
Protection from robots ....................................................................................................................................................................... 21
Questions and answers .................................................................................................................................................................... 24
What is XSLT? .......................................................................................................................................................................... 24
Notifications ............................................................................................................................................................................... 24
IP address ................................................................................................................................................................................... 25
Additional search features .......................................................................................................................................................... 26
Encoding .................................................................................................................................................................................... 27
Appendices ........................................................................................................................................................................................ 28
Validating XML files ................................................................................................................................................................. 28
Error codes ................................................................................................................................................................................. 28
Search regions ............................................................................................................................................................................ 29
Yandex.XML
Developer's guide
Developer's guide
Overview
Yandex.XML is a service that lets you send queries to the Yandex search engine and get responses in XML
format.
This document covers restrictions and requirements for using the service, basic steps for getting started
and registering, formats of search queries and responses, and answers to common questions.
The document is intended for developers who need to set up a search across a web site, group of sites, or the
Internet.
Restrictions and requirements
Yandex.XML provides access to Russian, Turkish, and Worldwide types of search. The desired search type
is selected during registration.
The search type determines the ranking formula, the set of documents that are searched (the search base), and the
restrictions that are applied to usage of Yandex.XML.
The following types of restrictions are applied:
•
Limit to the number of IP addresses that are associated with the account (by default, one).
•
Daily limits on the number of search queries sent. If the IP address changes, the limit applies to the total
number of queries sent from all network addresses.
The following table provides information about how restrictions depend on the search type and other conditions.
Condition
“Russian” search type
“Turkish” search type
Telephone number is not
confirmed.
Restrictions
be changed.
cannot 10 search queries per day.
Telephone number
confirmed.
Restrictions
be changed.
cannot 10,000
per day.
search
“Worldwide” search
type
10 search queries per day.
queries 10,000
per day.
search
queries
Restriction:
One telephone number
may be confirmed no more
than once, and only for a
single account.
The web site is registered
in Yandex.Webmaster
The number of search queries Restrictions
allowed
is
determined be changed.
individually for each user.
Restrictions depend on the
sites
registered
in Yandex.Webmaster.
cannot Restrictions
be changed.
cannot
Hourly restrictions are also
applied.
License agreement
http://legal.yandex.ru/xml
http://legal.yandex.com.tr/
xml
http://legal.yandex.com/xml
Changing restrictions (increasing the maximum number of queries per day and IP addresses allowed)
Become a partner of the
Yandex Advertising
Network.
Contact
a
Yandex
representative and discuss
how to expand your use of
Yandex.XML features.
Contact
a
Yandex
representative and discuss
how to expand your use of
Yandex.XML features.
To get information about additional features of Yandex.XML and how to get access to them, contact a Yandex
representative.
Yandex.XML
Developer's guide
4
Developer's guide
For each search query, no more than 1000 results are returned.
When using the service, follow the requirements for formatting results and the recommendations for protection
from robots.
Hourly limits for the “Russian” search type
For the “Russian” search type, additional hourly limits may be imposed that are calculated as percentages of the
daily query limit.
Information about hourly limits is available on the page Information on restrictions after registration.
The daily limit on the number of queries for a site is 1000. During each hour in the period from 7:00 to 19:00, no more
than 5% of the queries for this limit can be sent (50 queries).
Even if there were no search queries sent from the account in the period from 0:00 to 7:00, no more than 50 queries can be
sent during each hour from 7:00 to 19:00. In total, no more than 600 queries can be sent over this period.
Getting started
To set up and start using Yandex.XML, follow these steps:
1. Register the IP address that you plan to send search requests from.
2. Send a test request.
Make sure that requests are sent successfully from the specified IP address:
•
Send a request from the service's interface. The interface should be accessed from the computer that
is assigned the IP address specified during registration.
•
Form a GET request and send it from the computer that is assigned the IP address that was specified
during registration. For example, if during registration the field URL for queries displayed the string
“http://xmlsearch.yandex.ru/xmlsearch?user=test-yandex&key=09.31114:e650g7j”, you would use the
following GET request:
http://xmlsearch.yandex.ru/xmlsearch?user=testyandex&key=09.31114:e650g7j&query=yandex
3. Check the received XML document.
The response should correspond to the specified format and should not contain errors.
Note:
If there are no results for the search string, an error with the code “15” is acceptable.
4. Only for the “Russian” search type. Register your web sites in the Yandex.Webmaster service. After
registration, individual restrictions are determined for the current user.
5. Only for the “Russian” search type. Review the daily and hourly restrictions on the Information about
restrictions page.
Yandex.XML can only be used on sites for which the current user is the main owner in the
Yandex.Webmaster service. If necessary, ask the site owner to assign you the appropriate role.
6. Configure request parameters.
The GET and POST methods are supported.
7. Review the response format.
8. Set up response handling.
Yandex.XML
Developer's guide
5
Developer's guide
For formatting search results, you must comply with the design requirements.
9. If necessary. Request information about hourly restrictions for the next 24 hours.
10. Optional. Set up protection from robots.
Registration
To register on the Yandex.XML service, follow these steps:
1. Open the registration page (http://xml.yandex.ru/settings.xml).
This requires authentication in Yandex.Passport. If necessary, first register.
2. Look at the value of the URL for queries field:
•
For GET requests, this is the base part of the address that request parameters are appended to.
•
For POST requests, this is the URL to send the request body to.
3. Fill in the fields on the form:
Field
Description
Main IP-address
The unique network address of the computer that will be sending search queries.
To set the IP address of the computer you are using to register, use the value of the Your
current IP-address is field.
Search type
The selected value determines the set of documents that are searched (the search base),
the ranking formula, and usage restrictions.
Email notifications
The email address to send notifications to.
List of events
Choose events that notifications should be sent for.
Notification
language
The language to use for delivering messages about selected events.
4. Review the terms of the license agreement. The terms depend on the search type you selected.
5. Confirm your agreement (select the box I accept the terms of License Agreement).
6. Save the information you have entered (the Save button is enabled when the Email notifications and I
accept the terms of License Agreement boxes have been filled in).
If necessary, registration data can be edited on the Settings page.
Yandex.XML
Developer's guide
6
7
Request for search results
Yandex.XML supports two ways of sending a search request: GET and POST.
The response format is the same for both supported methods.
Attention!
To use algorithms for protection from robots, the request must pass information about the IP address and the
"spravka" cookie for the query author.
GET requests
Attention!
Special characters that are passed as parameter values must be replaced with the appropriate escape sequences
for percent-encoding. For example, instead of the equal sign (“=”), the escape sequence “%3D” must be used.
Request format
http://xmlsearch.yandex.<domain>/xmlsearch ?
user=<user name>
& key=<API key>
& query=<search query text>
& [lr=<ID of the search country/region>]
& [l10n=<notification language>]
& [sortby=<type of sorting>]
& [filter=<filter type>]
& [maxpassages=<number of passages>]
& [groupby=<parameters for grouping results>]
& [page=<page number>]
& [showmecaptcha=<yes>]
user
User name. Must match the login for Yandex.Passport that was set during registration.
key
Value of the API key that was issued during registration.
query
Text of the search query. Instead of special symbols, the corresponding escape sequences must be used.
The query has the following restrictions: maximum query length — 400 characters; maximum number
of words — 40.
lr
Supported only for “Russian” and “Turkish” search types.
ID of the country or region to search. Determines the rules for ranking documents. For example, if we
pass the value “11316” in this parameter (Novosibirsk region), when generating search results, a formula
is used that is defined for the Novosibirsk region.
A list of IDs of common countries and regions is provided in the appendix.
l10n
The notification language for the search response. It affects the text that is passed in the found-docshuman tag, as well as in error messages.
Acceptable values depend on the type of search used:
Yandex.XML
•
“Russian (yandex.ru)” — “ru” (Russian), “uk” (Ukrainian), “be” (Belarusian), “kk” (Kazakh).
If omitted, notifications are sent in Russian.
•
“Turkish (yandex.com.tr)” — Supports only the value “tr” (Turkish).
•
“Worldwide (yandex.com)” — Supports only the value “en” (English).
Developer's guide
8
sortby
Rules for sorting search results. Possible values:
•
“rlv” — By relevancy.
•
“tm” — By time when the document was changed.
If omitted, results are sorted by relevancy.
When sorting by change time, the parameter may contain the order attribute, which is the order
for sorting documents. Possible values:
•
“descending” — Forward (from most recent to oldest). Used by default.
•
“ascending” — Reverse (from oldest to most recent).
Format: sortby=<sorting type>.order%3D<sorting order>. For example, for reverse
sorting by date, you must use the following construction: sortby=tm.order%3Descending.
filter
Rules for filtering search results (excluding documents from search results based on one of the rules).
Possible values:
•
“none” — Filtering is disabled. The output includes any documents, regardless of their content.
•
“moderate” — Moderate filtering. The output excludes documents that fall into the “adults only”
category, if the search is not explicitly directed at finding these types of resources.
•
“strict” — Family filter. Regardless of the search query, the output excludes documents that fall
into the “adults only” category, as well as those that contain foul language.
If the parameter is omitted, moderate filtering is used.
maxpassages
The maximum number of passages that can be used when creating a snippet for the document. A passage
is an excerpt from a found document that contains the query words. Passages are used for creating
snippets, which are textual annotations to found documents.
Acceptable values — from 1 to 5. The search result may contain fewer passages than the value set for
this parameter.
If the parameter is omitted, no more than four passages with the query text are returned for each
document.
groupby
Set of parameters that define the rules for grouping results. Grouping is used to put documents from
the same domain in a container. Within the container, documents are ranked using the sorting rules
defined in the sortby parameter. Results passed to the container can be used for including several
documents from the same domain in search output.
Parameters are comma-separated and set in the format:
attr%3D<utility attribute>.mode%3D<grouping type>.groups-on-page
%3D<number of groups per page>.docs-in-group%3D<number of documents
per group>
You can find a description of the parameters mode, attr, groups-on-page and docs-in-group in the section
POST requests.
page
Number of the requested page in the search output. This determines the range of document positions
returned for the request. Numbering starts from zero (the first page corresponds to the value “0”).
For example, if the number of documents returned on a page is equal to “n”, and the value “p” is passed
in the parameter, the search results will include documents that fall within the range of output positions
from (p+1)*n+1 to (p+1)*n+n inclusively.
If the parameter is omitted, the first page of search output is returned.
showmecaptcha
Initiates user verification for possible protection from robots.
The only value used is “yes”.
Sample GET request
The following request returns the second page of search results for the query “<table>” for the user “xml-search-user”.
Search type: Russian (yandex.ru). Results are grouped by domain. Each group contains three documents, and five groups
can be returned per page.
Yandex.XML
Developer's guide
9
http://xmlsearch.yandex.ru/xmlsearch?user=xml-searchuser&key=03.44583456:c876e1b098gh65khg834ggg1jk4ll9j8&query=%3Ctable%3E&groupby=attr
%3Dd.mode%3Ddeep.groups-on-page%3D5.docs-in-group%3D3&maxpassages=3&page=1
POST requests
Attention!
Special characters that are passed as parameter values in the request body must be replaced with the appropriate
escape sequences for XML-encoding. For example, instead of the ampersand sign (“&”), the escape sequence
“&amp;” must be used.
Request URL
http://xmlsearch.yandex.<domain>/xmlsearch ?
user=<user name>
& key=<API key>
& filter=<filter type>
& [lr=<search region ID>]
& [l10n=<notification language>]
& [showmecaptcha=<yes>]
user
User name. Must match the login for Yandex.Passport that was set during registration.
key
Value of the API key that was issued during registration.
filter
Rules for filtering search results (excluding documents from search results based on one of the rules).
Possible values:
•
“none” — Filtering is disabled. The output includes any documents, regardless of their content.
•
“moderate” — Moderate filtering. The output excludes documents that fall into the “adults only”
category, if the search is not explicitly directed at finding these types of resources.
•
“strict” — Family filter. Regardless of the search query, the output excludes documents that fall
into the “adults only” category, as well as those that contain foul language.
If the parameter is omitted, moderate filtering is used.
lr
Supported only for “Russian” and “Turkish” search types.
ID of the country or region to search. Determines the rules for ranking documents. For example, if we
pass the value “11316” in this parameter (Novosibirsk region), when generating search results, a formula
is used that is defined for the Novosibirsk region.
A list of IDs of common countries and regions is provided in the appendix.
l10n
The notification language for the search response. It affects the text that is passed in the found-docshuman tag, as well as in error messages.
Acceptable values depend on the type of search used:
•
“Russian (yandex.ru)” — “ru” (Russian), “uk” (Ukrainian), “be” (Belarusian), “kk” (Kazakh).
If omitted, notifications are sent in Russian.
•
“Turkish (yandex.com.tr)” — Supports only the value “tr” (Turkish).
•
“Worldwide (yandex.com)” — Supports only the value “en” (English).
showmecaptcha
Initiates user verification for possible protection from robots.
The only value used is “yes”.
Yandex.XML
Developer's guide
10
Request body format
<?xml version="1.0" encoding="XML file encoding"?>
<request>
<!--Grouping tag-->
<query>
<!--Search query text-->
</query>
<sortby>
<!--Type of sorting for search results-->
</sortby>
<groupings>
<!--Grouping parameters in child tags-->
<groupby attr="d" mode="deep" groups-on-page="10" docs-in-group="1" />
</groupings>
<page>
<!--Number of the requested page in search results-->
</page>
</request>
Parameter
Description
request
Grouping tag. Child tags contain parameters of the search query.
query
Text of the search query. Instead of special symbols, the corresponding escape sequences
must be used.
The query has the following restrictions: maximum query length — 400 characters;
maximum number of words — 40.
sortby
Rules for sorting search results. Possible values:
•
“rlv” — By relevancy.
•
“tm” — By time when the document was changed.
If omitted, results are sorted by relevancy.
When sorting by change time, the parameter may contain the order attribute, which is the
order for sorting documents. Possible values:
maxpassages
•
“descending” — Forward (from most recent to oldest). Used by default.
•
“ascending” — Reverse (from oldest to most recent).
The maximum number of passages that can be used when creating a snippet for the
document. A passage is an excerpt from a found document that contains the query words.
Passages are used for creating snippets, which are textual annotations to found documents.
Acceptable values — from 1 to 5. The search result may contain fewer passages than
the value set for this parameter.
If the parameter is omitted, no more than four passages with the query text are returned
for each document.
page
Number of the requested page in the search output. This determines the range of document
positions returned for the request. Numbering starts from zero (the first page corresponds
to the value “0”).
For example, if the number of documents returned on a page is equal to “n”, and the value
“p” is passed in the parameter, the search results will include documents that fall within
the range of output positions from (p+1)*n+1 to (p+1)*n+n inclusively.
If the parameter is omitted, the first page of search output is returned.
Group tag groupings. The child tag contains parameters for grouping results.
groupby
Yandex.XML
Set of parameters that define the rules for grouping results. Grouping is used to put
documents from the same domain in a container. Within the container, documents are ranked
using the sorting rules defined in the sortby parameter. Results passed to the container
can be used for including several documents from the same domain in search output.
Developer's guide
11
Parameter
Description
Contains the following attributes:
•
mode — Grouping method. Possible values:
•
“flat” — Flat grouping. Each group contains a single document. Passed with
an empty value for the attr parameter (“" "”).
•
“deep” — Grouping by domain. Each group contains documents from a single
domain. Passed with the value “d” for the attr parameter.
If the parameter is not defined, flat grouping is used.
•
attr — Utility attribute. Depends on the value of the mode attribute.
•
groups-on-page — Maximum number of groups that can be returned per page
of search results. Acceptable values — from 1 to 100.
•
docs-in-group — Maximum number of documents that can be returned per group.
Acceptable values — from 1 to 3.
Tip:
If necessary, use the XML feed validator in the Yandex.Webmaster service. Detailed information about
validation is provided in the appendix.
Sample POST request
The request and request URL shown below return the third page of search results for the query “<table>” for the user “xmlsearch-user”. Results are sorted by time when the document was changed. Search type: Russian (yandex.ru). Results
are grouped by domain. Each group contains three documents, and ten groups can be returned per page. The maximum
number of passages per document is two. The service returns an XML file in UTF-8 encoding.
Request URL:
http://xmlsearch.yandex.ru/xmlsearch?user=xml-searchuser&key=03.44583456:c876e1b098gh65khg834ggg1jk4ll9j8
Request body:
<?xml version="1.0" encoding="UTF-8"?>
<request>
<query>%3Ctable%3E</query>
<sortby>tm</sortby>
<maxpassages>2</maxpassages>
<page>2</page>
<groupings>
<groupby attr="d" mode="deep" groups-on-page="10" docs-in-group="3" />
</groupings>
</request>
Response format
In response to the search request, Yandex.XML returns an XML file in UTF-8 encoding that contains the search
results.
Yandex.XML
Developer's guide
12
Restriction:
No more than 1000 results are returned for each search query. Depending on the value of the docs-in-group
attribute, each result may contain from one to three documents. The maximum number of pages with search
results is determined by the number of document groups returned on each page (the value of the groups-onpage attribute). For example, if the groups-on-page attribute is passed with the value “10”, no more than
100 pages containing search results can be made.
Files consist of the grouping tags request (general information about query parameters) and response (results
of processing the search query).
Below you will find the general structure of a resulting XML document with sample values.
Attention!
This structure is for illustrative purposes. It contains mutually exclusive elements.
<?xml version="1.0" encoding="utf-8"?>
<yandexsearch version="1.0">
<request>
<query>yandex</query>
<page>0</page>
<sortby order="descending" priority="no">rlv</sortby>
<maxpassages>2</maxpassages>
<groupings>
<groupby attr="d" mode="deep" groups-on-page="10" docs-in-group="3"
curcateg="-1" />
</groupings>
</request>
<response date="20120928T103130">
<error code="15">Sorry, there are no results for this search</error>
<reqid>1348828873568466-1289158387737177180255457-3-011-XML</reqid>
<found priority="phrase">206775197</found>
<found priority="strict">206775197</found>
<found priority="all">206775197</found>
<found-human>207 million pages found</found-human>
<misspell>
<rule>Misspell</rule>
<source-text>yande<hlword>xx</hlword></source-text>
<text>yandex</text>
</misspell>
<reask>
<rule>Misspell</rule>
<source-text><hlword>yn</hlword>dex</source-text>
<text-to-show>yandex</text-to-show>
<text>yandex</text>
</reask>
<results>
<grouping attr="d" mode="deep" groups-on-page="10" docs-in-group="3"
curcateg="-1">
<found priority="phrase">45094</found>
<found priority="strict">45094</found>
<found priority="all">45094</found>
<found-docs priority="phrase">192685602</found-docs>
<found-docs priority="strict">192685602</found-docs>
<found-docs priority="all">192685602</found-docs>
<found-docs-human>193 million pages found</found-docs-human>
<page first="1" last="10">0</page>
<group>
<categ attr="d" name="UngroupVital223.ru" />
<doccount>34</doccount>
<relevance priority="all" />
Yandex.XML
Developer's guide
13
<doc id="ZD831E1113BCFDD95">
<relevance priority="phrase" />
<url>http://www.yandex.ru/</url>
<domain>www.yandex.ru</domain>
<title>&quot;<hlword>Yandex</hlword>&quot; is a global search engine
and internet portal</title>
<headline>Search the entire internet based on the user's region.</
headline>
<modtime>20060814T040000</modtime>
<size>26938</size>
<charset>utf-8</charset>
<passages>
<passage><hlword>Yandex</hlword> — a search engine...</
passage>
</passages>
<properties>
<_PassagesType>0</_PassagesType>
<lang>ru</lang>
</properties>
<mime-type>text/html</mime-type>
<saved-copy-url>http://hghltd.yandex.net/yandbtm?
text=yandex&amp;url=http%3A%2F%2Fwww.yandex.ru
%2F&amp;fmode=inject&amp;mime=html&amp;l10n=ru&amp;sign=e3737561fc3d1105967d1ce6
19dbd3c7&amp;keyno=0</saved-copy-url>
</doc>
</group>
</grouping>
</results>
</response>
</yandexsearch>
request
Generalized information about request parameters. May be omitted if the response contains errors.
The request tags are described in the table below.
The "request"
group tags
Description
Attributes
query
Text of the search query that was passed.
None.
page
Number of the search results page returned. Numbering starts None.
from zero (the first page corresponds to the value “0”).
sortby
Parameters for sorting results Possible values:
•
“rlv” — By relevancy.
•
“tm” — By time when the document was changed.
•
order — Sorting order.
The “descending” value
(forward) is used by default.
When sorting by change time,
it can take the value
“ascending” (reverse).
•
priority — For service use.
Takes the value "no".
maxpassages
Maximum number of passages that can be passed in a single None.
search result.
groupings
Grouping.
None.
Contains grouping parameters in the groupby tag.
No attributes
Yandex.XML
Developer's guide
14
The "request"
group tags
Description
groupby
Grouping parameters for found search results.
Attributes
•
mode — Grouping method.
•
attr — For service use.
•
groups-on-page —
Maximum number of groups
that can be returned per page
of search results.
•
docs-in-group —
Maximum number
of documents that can be
returned per group. Any group
may contain fewer documents
than the value set in this
parameter.
•
curcateg — For service use.
Takes the value “-1”.
The following example shows the contents of the request grouping tag that are returned for the request http://
xmlsearch.yandex.com.tr/xmlsearch?lr=983&l10n=tr&user=xml-searchuser&key=03.79031114:b631r9j587dkl4jko987hgg7bn2kl8a2&query=%22has
sample
applications
for
the
most
popular
programming
%22&sortby=tm&maxpassages=2&groupby=attr%3Dd.mode%3Ddeep.groups-on-page
%3D5.docs-in-group%3D3&maxpassages=3&page=1
<request>
<query>&quot;has sample applications for the most popular programming&quot;</query>
<page>1</page>
<sortby order="descending" priority="no">tm</sortby>
<maxpassages>2</maxpassages>
<groupings>
<groupby attr="d" mode="deep" groups-on-page="5" docs-in-group="3" curcateg="-1" /
>
</groupings>
</request>
response
Results of processing the search query for which information is provided in the request child tags.
Contains
the
date
attribute
—
the
request
date
<year><month><day>Т<hour><minute><second> for UTC.
and
time,
in
the
format
Consists of the following sections:
•
General information about search results.
•
The misspell / reask block.
•
The results block.
General information about search results
The tags for the block with general information about search results are shown in the table below.
Yandex.XML
Tags with general
information about
search results
Description
Attributes
error
Error description.
code — Error code.
Present only when the search request is processed incorrectly
(for example, for an empty request, incorrect parameters,
etc.).
Developer's guide
15
Tags with general
information about
search results
Description
Attributes
In certain cases, it is mutually exclusive of other tags in the
response grouping tag.
reqid
Unique request ID.
found
Approximation of the number of documents found for the priority — For service use.
query.
Possible values:
found-human
None.
•
“phrase”
•
“strict”
•
“all”
A string in the language corresponding to the search type None.
selected. Contains information about the number
of documents found and accompanying information.
The misspell / reask block
Optional. Present if a typo was found (misspell) or corrected (reask) in the query.
The block tags are presented in the table below.
Tags
for
the Description
misspell / reask
blocks
Attributes
misspell
None.
Grouping.
Contains information about a possible typo in the search
query.
reask
Grouping.
None.
Contains information about corrections made to the source
query before searching for documents.
rule
The type of error found in the query.
None.
Possible values:
source-text
•
“Misspell” — Typo.
•
“KeyboardLayout” — Wrong keyboard layout.
•
“Volapyuk” — Query made in Russian using English
transliteration. Used if the search type is set to “Russian
(yandex.ru)”.
Source text of the query.
None.
The fragment of the search query that presumably contains
an error is highlighted by the hlword tag.
text-to-show
Optional (only for the reask grouping tag).
None.
Contains the corrected text of the search query. In most cases
it matches the value passed in the text tag.
text
Corrected text of the search query.
None.
The results block
Optional. Present if results were found for the query.
Yandex.XML
Developer's guide
16
The block tags are presented in the table below.
Tags for the results Description
block
Attributes
results
Grouping. Child tags contain information about search None.
parameters and found documents.
grouping
Grouping. Child tags contain information about search Attributes reflect the grouping rules
parameters and found documents.
for found documents.
found
found-docs
Estimated number of groups formed.
•
mode — Grouping method.
•
attr — For service use.
Depends on the value of the
mode attribute.
•
groups-on-page —
Number of groups that can be
returned per page of search
results.
•
docs-in-group —
Number of documents that
can be returned per group.
•
curcateg — For service use.
Takes the value “-1”.
priority — For service use.
Possible values:
•
“phrase”
•
“strict”
•
“all”
Approximation of the number of documents found for the priority — For service use.
query.
Possible values:
A more precise estimate compared to the value passed in the •
found tag for the block with general information about search
•
results.
•
found-docshuman
“phrase”
“strict”
“all”
A string in the language corresponding to the search type None.
selected. Contains information about the number
of documents found and accompanying information.
The value that is passed should be used when formatting
search results.
page
Number of the search results page returned. Numbering starts
•
from zero (the first page corresponds to the value “0”).
•
group
Grouping.
first — Ordinal number
of the first group with search
results that is displayed on the
page.
last — Ordinal number
of the last group with search
results that is displayed on the
page.
None.
Each group tag contains information about a found group
of documents.
Yandex.XML
Developer's guide
17
Tags for the results Description
block
categ
doccount
Identifying data about a group of found documents.
Attributes
•
attr — For service use. Must
match the value passed in the
request.
•
name — Unique group ID.
Approximation of the number of documents that are used None.
for forming the group.
Documents that potentially may be included in the group
are ranked according to the request conditions (the
sortby parameter). Depending on the value of the docs-ingroup parameter, from one to three of the first documents
are included in the group.
relevance
For service use.
priority — For service use.
doc
Grouping.
name — Unique ID of a found
document.
Each doc tag contains information about a found document.
Depending on the value of the docs-in-group parameter, each
group can contain from one to three of the doc grouping tags.
url
Address of a found document.
None.
domain
The domain that the found document is in.
None.
title
Title of the found document.
None.
Words that are in the search query are highlighted with
the hlword tag.
headline
Optional. Document summary.
None.
It is created using the HTML meta tag containing
the name attribute with the “description” value.
modtime
Date and time the document was changed, in the format:
None.
<year><month><day>Т<hour><minute><second>
size
Size of the found document, in bytes.
None.
charset
Encoding of the found document.
None.
passages
Grouping tag that contains a list of document passages.
None.
passage
Passage with the document summary.
None.
Words that are in the search query are highlighted with
the hlword tag.
The maximum number of passages to be passed in a single
passages tag is defined by the value of the
maxpassages parameter for the search request.
mime-type
The document type in accordance with RFC2046.
None.
properties
Grouping tag that contains document properties.
None.
_PassagesType
Passage type. Possible values:
None.
lang
•
“0” — Standard passage (created from the document
text).
•
“1” — Passage based on the link text. It is used if the
document was found via a link.
Optional.
None.
Document language.
Yandex.XML
Developer's guide
18
Yandex.XML
Tags for the results Description
block
Attributes
saved-copy-url
None.
Address of a saved copy of the document.
Developer's guide
19
Request for limits for the next day
Returns information about restrictions on the number of queries that can be sent each hour.
The response contains information for each hour in the next 24 hours.
Note:
Hourly limits are only applied to the “russian” type of search.
Request format
http://xmlsearch.yandex.<domain>/xmlsearch ? action=limits-info
& user=<user name>
& key=<key>
user
User name. Must match the login for Yandex.Passport that was set during registration.
key
Value of the API key that was issued during registration.
Sample request
This request returns information about hourly limits restricting the number of search queries that can be sent by the “xmlsearch-user” user during the next 24 hours:
http://xmlsearch.yandex.ru/xmlsearch?action=limits-info&user=xml-searchuser&key=03.44583456:c876e1b098gh65khg834ggg1jk4ll9j8
Response format
In response to a request for hourly limits, Yandex.XML returns an XML file in UTF-8 encoding.
Note:
•
If the allowed number of queries in one of the hours is exceeded, the excess queries are subtracted from
the same hour the next day. These excesses are calculated when generating the response.
•
Hourly limits are only applied to the “russian” type of search. For the other types of search, the service
returns information for each hour about the daily limits on the allowed number of queries.
Below you will find the general structure of a resulting XML document with sample values.
<yandexsearch version="1.0">
<response>
<limits>
<time-interval from="2014-07-22
+0000">500</time-interval>
<time-interval from="2014-07-22
+0000">450</time-interval>
<time-interval from="2014-07-22
+0000">590</time-interval>
<time-interval from="2014-07-22
+0000">600</time-interval>
<time-interval from="2014-07-23
+0000">300</time-interval>
<time-interval from="2014-07-23
+0000">200</time-interval>
Yandex.XML
20:00:00 +0000" to="2014-07-22 21:00:00
21:00:00 +0000" to="2014-07-22 22:00:00
22:00:00 +0000" to="2014-07-22 23:00:00
23:00:00 +0000" to="2014-07-23 00:00:00
00:00:00 +0000" to="2014-07-23 01:00:00
01:00:00 +0000" to="2014-07-23 02:00:00
Developer's guide
20
<time-interval from="2014-07-23
+0000">500</time-interval>
<time-interval from="2014-07-23
+0000">500</time-interval>
<time-interval from="2014-07-23
+0000">500</time-interval>
<time-interval from="2014-07-23
+0000">100</time-interval>
<time-interval from="2014-07-23
+0000">100</time-interval>
<time-interval from="2014-07-23
+0000">100</time-interval>
<time-interval from="2014-07-23
+0000">100</time-interval>
<time-interval from="2014-07-23
+0000">200</time-interval>
<time-interval from="2014-07-23
+0000">300</time-interval>
<time-interval from="2014-07-23
+0000">300</time-interval>
<time-interval from="2014-07-23
+0000">300</time-interval>
<time-interval from="2014-07-23
+0000">300</time-interval>
<time-interval from="2014-07-23
+0000">300</time-interval>
<time-interval from="2014-07-23
+0000">300</time-interval>
<time-interval from="2014-07-23
+0000">400</time-interval>
<time-interval from="2014-07-23
+0000">500</time-interval>
<time-interval from="2014-07-23
+0000">500</time-interval>
<time-interval from="2014-07-23
+0000">600</time-interval>
</limits>
</response>
</yandexsearch>
02:00:00 +0000" to="2014-07-23 03:00:00
03:00:00 +0000" to="2014-07-23 04:00:00
04:00:00 +0000" to="2014-07-23 05:00:00
05:00:00 +0000" to="2014-07-23 06:00:00
06:00:00 +0000" to="2014-07-23 07:00:00
07:00:00 +0000" to="2014-07-23 08:00:00
08:00:00 +0000" to="2014-07-23 09:00:00
09:00:00 +0000" to="2014-07-23 10:00:00
10:00:00 +0000" to="2014-07-23 11:00:00
11:00:00 +0000" to="2014-07-23 12:00:00
12:00:00 +0000" to="2014-07-23 13:00:00
13:00:00 +0000" to="2014-07-23 14:00:00
14:00:00 +0000" to="2014-07-23 15:00:00
15:00:00 +0000" to="2014-07-23 16:00:00
16:00:00 +0000" to="2014-07-23 17:00:00
17:00:00 +0000" to="2014-07-23 18:00:00
18:00:00 +0000" to="2014-07-23 19:00:00
19:00:00 +0000" to="2014-07-23 20:00:00
Tag
Description
Attributes
response
Grouping.
None.
limits
Grouping.
None.
Contains entries about hourly limits on the allowed number
of search queries.
time-interval
The number of search queries that can be sent during
•
the specified time interval.
The borders of the time interval are defined by attributes.
•
from — date and time
(inclusively) of the start of the
time interval the limit applies
to.
to — date and time (not
inclusively) of the end of
the time interval the limit
applies to.
Data format in attributes:
YYYY-MM-DD HH:MM:SS +HHMM
“HHMM” specifies the event offset
relative to UTC0.
Attention!
At this time, information about
hourly limits is output for UTC0.
Yandex.XML
Developer's guide
Developer's guide
Formatting results
When formatting search results, you must adhere to the rules described in the License for use of the Yandex.XML
service. The license differs for Russian, Turkish, and Worldwide search types.
Every page generated using Yandex.XML must contain:
•
A link to the Yandex home page, formatted as a logo.
•
Text with information about the number of documents found (“NNN pages found”). The information about
the number of documents found is passed in the found-docs-human tag in the XML file with search
results.
The links to logos that must be used depending on the background color, along with formatting examples,
are provided in the table below.
Background color
Logo
Black/dark
Download.
Formatting example
White font and a red letter “Y”.
Transparent background.
Red
Download.
White font. Transparent background.
White/light
Download.
Black font and a red letter “Y”.
Transparent background.
Protection from robots
Search queries can be submitted not only by users, but by robots, as well. When there is a flood of queries from
robots, you may exceed the limitations applied for usage of the Yandex.XML.
To prevent unauthorized access to the search by robots, a security algorithm is used. If it is suspected that a query
was submitted by a robot, a CAPTCHA is returned instead of search results (see this Wikipedia article about
CAPTCHA).
To use the algorithm for protection from robots, the partner must pass information about the IP address and the
"spravka" cookie for the request's author. The "spravka" cookie is generated on the Yandex.XML side and is
returned the first time the user accesses search results. In the value that is received, the partner must replace
the domain with his own, and then add the following string to the search response:
Set-Cookie: spravka=...
Information about the IP address and the "spravka" cookie are passed in the request header in the format:
X-Real-Ip: 99.999.999.99
Cookie: spravka=<value passed from Yandex>
The diagram below illustrates the steps performed for protection from robots.
Yandex.XML
Developer's guide
21
Developer's guide
1. The user sends a query to the Yandex.XML partner.
2. The search query is sent to the Yandex.XML service. The request must match the specified format.
3. Yandex.XML initiates the algorithm for protection from robots. The values of the IP address and "spravka"
cookie (if previously issued) are used for verification.
Possible results of verification:
•
The request was probably not sent by a robot. The process continues to step 13.
•
The request was probably sent by a robot. The decision is made to display a CAPTCHA.
4. Yandex.XML returns the partner an XML file in the following format:
Yandex.XML
Developer's guide
22
Developer's guide
<?xml version="1.0" encoding="utf-8"?>
<yandexsearch version="1.0">
<response>
<error code="100">Robot request</error>
</response>
<captcha-img-url>http://captcha.image.gif</captcha-img-url>
<captcha-key>CAPTCHA ID number</captcha-key>
<captcha-status>Status</captcha-status>
</yandexsearch>
5. The user is returned a page containing a CAPTCHA.
6. The user sends the CAPTCHA value to the partner.
7. The partner sends the CAPTCHA value obtained from the user via a GET request in the following format:
http://xmlsearch.yandex.ru/xcheckcaptcha?key=<CAPTCHA ID number>&rep=<CAPTCHA
value entered by user>
8. The value received is checked by the Yandex.XML service. If the CAPTCHA value was entered incorrectly,
the process continues to step 4. In addition, the captcha-status parameter is passed with the value
“failed”.
9. If the CAPTCHA value was entered correctly, Yandex.XML issues the user a "spravka" cookie and passes
it to the partner in the header with the following format:
HTTP/1.1 200 OK
Set-Cookie: spravka=<cookie value>
If the request passed to Yandex.XML in step 1 was saved successfully, the process continues to step 12.
10. The partner lets the user enter a query.
11. The user sends a query to the Yandex.XML partner.
12. The search query is sent to the Yandex.XML service. Along with the request, the user's IP address
and "spravka" cookie are passed.
13. Yandex.XML processes the search query and generates results.
14. An XML file with search results is returned to the partner.
15. The partner returns the processed response to the user. If in step 9 the Yandex.XML issued a "spravka"
cookie, it is saved on the user's computer.
Tip:
To try out how this flow works, use this script.
Verifying correct CAPTCHA display
To get familiar with the response format returned by Yandex.XML when a CAPTCHA is displayed, send
a request (the value of the query parameter of the search request) with the following string:
“e48a2b93de1740f48f6de0d45dc4192a”.
The following GET request can be used by the user “xml-search-user” for reviewing the response format returned when
a CAPTCHA is displayed:
wget -q --header="X-Real-Ip: 127.0.0.1" -SO- 'http://xmlsearch.yandex.ru/xmlsearch?
user=xml-searchuser&key=03.44583456:c876e1b098gh65khg834ggg1jk4ll9j8&query=e48a2b93de1740f48f6de0d45dc4
192a&showmecaptcha=yes'
Yandex.XML
Developer's guide
23
24
Questions and answers
This section answers some common questions that Yandex.XML users ask. For convenience, questions
are grouped in categories:
•
What is XSLT?
•
Notifications.
•
IP address.
•
Additional search features.
•
Encoding.
What is XSLT?
XSLT is a language for converting and rendering XML documents and is a part of the set of
XSL recommendations.
Detailed information about the XSLT language is provided in the following documents:
•
Extensible Stylesheet Language (XSL).
•
XSL Transformations (XSLT).
Notifications
What are notifications?
Notifications are a service for automatically sending email when problems arise during use of Yandex.XML.
The email address, settings for sending notifications, and thresholds are all set during registration.
What should I do if I get a notification that the number of requests
has sharply decreased?
The table below shows possible reasons for a decreased number of requests, how to diagnose it, and
recommended solutions.
Reason
Diagnostic methods
Recommended solutions
Decreased number of searches Review the site usage statistics for days Increase the notification threshold on the
performed.
For
example,
this of the week and time of day.
Settings page.
may occur due to natural variation in the
number of visitors depending on the
day of the week or the time of day.
Unavailability or partial availability Try submitting several search queries Check whether the request format
of Yandex.XML on the web site.
yourself. Check the accuracy of the is correct.
results that are returned.
Yandex.XML
Developer's guide
25
What should I do if I get a notification that the number of requests
has sharply increased?
The table below shows possible reasons for an increased number of queries, how to diagnose it, and
recommended solutions.
Reason
Diagnostic methods
Recommended solutions
Increased number of searches
Review the site usage statistics for days Increase the notification threshold on the
performed. For example, this
of the week and time of day.
Settings page.
may occur due to natural variation in the
number of visitors depending on the
day of the week or the time of day.
DoS attack.
Check the server log files for data
suggesting a DoS attack.
What should I do if I get a notification that there were no requests
for a 24-hour period?
Check how the search is working on the site.
If statistics show a sharp decrease in the number of queries made from the site, it is possible that this is due to the
search not working correctly.
What should I do if I get a notification that the number of requests
is approaching the daily limit?
Review the restrictions applied to the service and ways to get around them. Contact a Yandex representative
to discuss details of expanding search features.
IP address
Why is an IP address required for registration?
The IP address, in combination with a Yandex.Passport account, is used for identifying a Yandex.XML user.
The results of user identification determine the restrictions applied to service usage.
How do I find out what my IP address is?
The way to determine the IP address depends on the type of computer being used to access the Yandex.XML
service.
Device type
Possible methods for detecting the IP address
Server
Personal computer
•
Ask your provider for the IP address.
•
Set up a remote connection to the server and run the ipconfig command (Windows)
or ifconfig (Unix).
•
Run the ping <server name> command from the command line of a personal computer.
•
Use the Yandex.Internetometer service.
•
If static addresses are used, ask your service provider.
Note:
Note that if a modem is used, the IP address can change each time a connection is made.
Yandex.XML
Developer's guide
26
The IP address being registered is in use
The table below shows possible reasons and solutions.
Reason
Possible solution
An open proxy server is being used
to access Yandex.XML.
User your Internet provider's proxy server.
A modem is being used to access
the Internet.
Your provider assigns a dynamic IP address, which can change each time
you connect. Try disconnecting and reconnecting to the Internet.
The service is being accessed from
a server.
Obtain a dedicated IP address.
Additional search features
Setting up site search
To restrict the search to the web site only, use the host operator.
Syntax:
<query text> host:<URL of the site to search on>
The following request is used for searching for the phrase “search settings” on the web site http://help.yandex.ru/:
search settings host:help.yandex.ru
Restricting the search to a region or category
To restrict the search to documents that are relevant to a particular region or category, use the cat operator.
Syntax:
<query text> cat:<adjusted ID of the region or category>
For the value of the cat operator, pass the adjusted value of the region ID (added to “11000000”) or category
ID (added to “9000000”).
The request may specify multiple regions and categories. To do this, use the logical operators “AND”
(“&amp;&amp;”) and “OR” (“|”).
The following request is used for searching for the word “meat” in documents that are relevant to the category “bodybuilding
nutrition” (ID “3783”) in the city of “Samara” (ID “51”):
meat cat:11000051 &amp;&amp; cat:9003783
Search in results
To set up a search in results, use the &amp;&amp; operator.
Syntax:
(<original query text>) &amp;&amp; (<query text to search for in results>)
The following request is used to search for documents with the phrase “manual transmission” in results for the query
“autos”:
(autos) &amp;&amp; (manual transmission)
Yandex.XML
Developer's guide
27
Encoding
How to correctly set encoding for a request being sent?
The request encoding is set in the header of the XML file:
<?xml version="1.0" encoding="<encoding>"?>
Which encoding is used for sending the search response?
The XML file with search results is sent in UTF-8 encoding. To convert it to a different encoding, you can use a
library such as the libiconv library or the Convert::Cyrillic module.
Incorrect characters in the response
In most cases, incorrect characters in the response are the result of sending the request at the socket level.
Possible solutions:
Yandex.XML
•
Use HTTP version 1.0 instead of 1.1.
•
Use a higher-level type of interface.
•
Configure handling for chunked responses.
Developer's guide
Appendices
Appendices
Validating XML files
To prevent incorrect processing of search queries, at the testing stage we strongly recommend validating
XML files that are generated for requests using the POST method.
You can validate XML files in the Yandex.Webmaster service. For validation, the XML request schema is used.
To validate a file, follow these steps:
1. Open the XML-feed validator page.
2. Select other schemas → link in the Standard validation schema group.
3. Set the value “http://api.yandex.com/xml/doc/dg/res/request.en.xs” in the Specify the link to your
XSD schema box.
4. Set one of the ways to pass the contents of the XML document in the Feed for validation box.
5. Click the Check button.
If the XML file complies with the schema, the message “XML complies with the XSD schema” is returned.
If inconsistencies are discovered, it returns information about the line where you should look for an error.
Error codes
When search requests are processed incorrectly, the server response contains the error tag.
Format:
<error code="error code">
Error description text</error>
The table below lists codes and descriptions for common errors that occur when processing search requests.
Error code
Description
1
The query text (the value passed in the query element) contains a syntactical error.
For example, a query was sent that contained only two slash symbols in a row (“//”).
Yandex.XML
2
An empty search query was defined (an empty value was passed in the query element).
15
There are no search results for the specified search query.
18
The XML file cannot be validated, or invalid request parameters are set. Possible reasons:
•
Incorrect tags or tag values were passed.
•
The request body contains non-escaped special characters. For example, the ampersand
symbol (“&”), and so on.
•
The request page contains search results with more than 1000 entries. For example, if each
page contains 10 results, this error will be returned when attempting to request page 101 and
further in results.
19
The search query contains incompatible parameters (for example, incompatible values for the
groupings element).
20
The reason for the error is unknown. If the error persists, contact the support service.
31
The user is not registered on the service.
32
Limit exceeded for the number of queries allowed per day. Review the information about
restrictions and choose a suitable method for increasing your daily quota.
Developer's guide
28
Appendices
Error code
Description
33
The IP address that the search request was sent from does not match the one(s) set during
registration.
34
The user is not registered in Yandex.Passport.
37
Error in request parameters. Maybe mandatory parameters were omitted, or mutually exclusive
parameters were defined.
42
The key that was issued during registration contains an error. Check whether the correct address
is used for sending requests.
43
The version of the key that was issued during registration contains an error. Check whether
the correct address is used for sending requests.
44
The address that requests are sent to is no longer supported. Correct the value to match
the address that was given during registration.
48
The search type that was specified during registration does not match the search type that is being
used for requesting data. Reset the domain that is being used to the correct domain. For corrections,
use the URL for sending requests.
100
The request was most likely sent by a robot. When this error appears, a CAPTCHA must be returned
to the user.
Search regions
The region to give preference to when generating search results is defined by the value of the lr parameter
of the search query. Countries, federal subjects, and cities can be specified as the region.
A list of IDs for commonly used countries is provided in the table below.
ID
Country
225
Russia
187
Ukraine
149
Belarus
159
Kazakhstan
See also
Other popular regions
Yandex.XML
Developer's guide
29
Yandex.XML
Developer's guide
8.10.2014