Yandex.XML Developer's guide 8.10.2014
Transcription
Yandex.XML Developer's guide 8.10.2014
Yandex.XML Developer's guide 8.10.2014 Yandex.XML. Developer's guide. Version 1.2 Document build date: 8.10.2014. This volume is a part of Yandex technical documentation. Yandex helpdesk site: http://help.yandex.ru © 2008—2014 Yandex LLC. All rights reserved. Copyright Disclaimer Yandex (and its applicable licensor) has exclusive rights for all results of intellectual activity and equated to them means of individualization, used for development, support, and usage of the service Yandex.XML. It may include, but not limited to, computer programs (software), databases, images, texts, other works and inventions, utility models, trademarks, service marks, and commercial denominations. The copyright is protected under provision of Part 4 of the Russian Civil Code and international laws. You may use Yandex.XML or its components only within credentials granted by the Terms of Use of Yandex.XML or within an appropriate Agreement. Any infringements of exclusive rights of the copyright owner are punishable under civil, administrative or criminal Russian laws. Contact information Yandex LLC http://www.yandex.com Phone: +7 495 739 7000 Email: [email protected] Headquarters: 16 L'va Tolstogo St., Moscow, Russia 119021 Contents Overview .............................................................................................................................................................................................. 4 Restrictions and requirements .............................................................................................................................................................. 4 Getting started ...................................................................................................................................................................................... 5 Registration .......................................................................................................................................................................................... 6 Request for search results ................................................................................................................................................................. 7 GET requests ................................................................................................................................................................................ 7 POST requests .............................................................................................................................................................................. 9 Response format ....................................................................................................................................................................... 11 request ................................................................................................................................................................................ 13 response .............................................................................................................................................................................. 14 Request for limits for the next day ................................................................................................................................................. 19 Response format ......................................................................................................................................................................... 19 Formatting results .............................................................................................................................................................................. 21 Protection from robots ....................................................................................................................................................................... 21 Questions and answers .................................................................................................................................................................... 24 What is XSLT? .......................................................................................................................................................................... 24 Notifications ............................................................................................................................................................................... 24 IP address ................................................................................................................................................................................... 25 Additional search features .......................................................................................................................................................... 26 Encoding .................................................................................................................................................................................... 27 Appendices ........................................................................................................................................................................................ 28 Validating XML files ................................................................................................................................................................. 28 Error codes ................................................................................................................................................................................. 28 Search regions ............................................................................................................................................................................ 29 Yandex.XML Developer's guide Developer's guide Overview Yandex.XML is a service that lets you send queries to the Yandex search engine and get responses in XML format. This document covers restrictions and requirements for using the service, basic steps for getting started and registering, formats of search queries and responses, and answers to common questions. The document is intended for developers who need to set up a search across a web site, group of sites, or the Internet. Restrictions and requirements Yandex.XML provides access to Russian, Turkish, and Worldwide types of search. The desired search type is selected during registration. The search type determines the ranking formula, the set of documents that are searched (the search base), and the restrictions that are applied to usage of Yandex.XML. The following types of restrictions are applied: • Limit to the number of IP addresses that are associated with the account (by default, one). • Daily limits on the number of search queries sent. If the IP address changes, the limit applies to the total number of queries sent from all network addresses. The following table provides information about how restrictions depend on the search type and other conditions. Condition “Russian” search type “Turkish” search type Telephone number is not confirmed. Restrictions be changed. cannot 10 search queries per day. Telephone number confirmed. Restrictions be changed. cannot 10,000 per day. search “Worldwide” search type 10 search queries per day. queries 10,000 per day. search queries Restriction: One telephone number may be confirmed no more than once, and only for a single account. The web site is registered in Yandex.Webmaster The number of search queries Restrictions allowed is determined be changed. individually for each user. Restrictions depend on the sites registered in Yandex.Webmaster. cannot Restrictions be changed. cannot Hourly restrictions are also applied. License agreement http://legal.yandex.ru/xml http://legal.yandex.com.tr/ xml http://legal.yandex.com/xml Changing restrictions (increasing the maximum number of queries per day and IP addresses allowed) Become a partner of the Yandex Advertising Network. Contact a Yandex representative and discuss how to expand your use of Yandex.XML features. Contact a Yandex representative and discuss how to expand your use of Yandex.XML features. To get information about additional features of Yandex.XML and how to get access to them, contact a Yandex representative. Yandex.XML Developer's guide 4 Developer's guide For each search query, no more than 1000 results are returned. When using the service, follow the requirements for formatting results and the recommendations for protection from robots. Hourly limits for the “Russian” search type For the “Russian” search type, additional hourly limits may be imposed that are calculated as percentages of the daily query limit. Information about hourly limits is available on the page Information on restrictions after registration. The daily limit on the number of queries for a site is 1000. During each hour in the period from 7:00 to 19:00, no more than 5% of the queries for this limit can be sent (50 queries). Even if there were no search queries sent from the account in the period from 0:00 to 7:00, no more than 50 queries can be sent during each hour from 7:00 to 19:00. In total, no more than 600 queries can be sent over this period. Getting started To set up and start using Yandex.XML, follow these steps: 1. Register the IP address that you plan to send search requests from. 2. Send a test request. Make sure that requests are sent successfully from the specified IP address: • Send a request from the service's interface. The interface should be accessed from the computer that is assigned the IP address specified during registration. • Form a GET request and send it from the computer that is assigned the IP address that was specified during registration. For example, if during registration the field URL for queries displayed the string “http://xmlsearch.yandex.ru/xmlsearch?user=test-yandex&key=09.31114:e650g7j”, you would use the following GET request: http://xmlsearch.yandex.ru/xmlsearch?user=testyandex&key=09.31114:e650g7j&query=yandex 3. Check the received XML document. The response should correspond to the specified format and should not contain errors. Note: If there are no results for the search string, an error with the code “15” is acceptable. 4. Only for the “Russian” search type. Register your web sites in the Yandex.Webmaster service. After registration, individual restrictions are determined for the current user. 5. Only for the “Russian” search type. Review the daily and hourly restrictions on the Information about restrictions page. Yandex.XML can only be used on sites for which the current user is the main owner in the Yandex.Webmaster service. If necessary, ask the site owner to assign you the appropriate role. 6. Configure request parameters. The GET and POST methods are supported. 7. Review the response format. 8. Set up response handling. Yandex.XML Developer's guide 5 Developer's guide For formatting search results, you must comply with the design requirements. 9. If necessary. Request information about hourly restrictions for the next 24 hours. 10. Optional. Set up protection from robots. Registration To register on the Yandex.XML service, follow these steps: 1. Open the registration page (http://xml.yandex.ru/settings.xml). This requires authentication in Yandex.Passport. If necessary, first register. 2. Look at the value of the URL for queries field: • For GET requests, this is the base part of the address that request parameters are appended to. • For POST requests, this is the URL to send the request body to. 3. Fill in the fields on the form: Field Description Main IP-address The unique network address of the computer that will be sending search queries. To set the IP address of the computer you are using to register, use the value of the Your current IP-address is field. Search type The selected value determines the set of documents that are searched (the search base), the ranking formula, and usage restrictions. Email notifications The email address to send notifications to. List of events Choose events that notifications should be sent for. Notification language The language to use for delivering messages about selected events. 4. Review the terms of the license agreement. The terms depend on the search type you selected. 5. Confirm your agreement (select the box I accept the terms of License Agreement). 6. Save the information you have entered (the Save button is enabled when the Email notifications and I accept the terms of License Agreement boxes have been filled in). If necessary, registration data can be edited on the Settings page. Yandex.XML Developer's guide 6 7 Request for search results Yandex.XML supports two ways of sending a search request: GET and POST. The response format is the same for both supported methods. Attention! To use algorithms for protection from robots, the request must pass information about the IP address and the "spravka" cookie for the query author. GET requests Attention! Special characters that are passed as parameter values must be replaced with the appropriate escape sequences for percent-encoding. For example, instead of the equal sign (“=”), the escape sequence “%3D” must be used. Request format http://xmlsearch.yandex.<domain>/xmlsearch ? user=<user name> & key=<API key> & query=<search query text> & [lr=<ID of the search country/region>] & [l10n=<notification language>] & [sortby=<type of sorting>] & [filter=<filter type>] & [maxpassages=<number of passages>] & [groupby=<parameters for grouping results>] & [page=<page number>] & [showmecaptcha=<yes>] user User name. Must match the login for Yandex.Passport that was set during registration. key Value of the API key that was issued during registration. query Text of the search query. Instead of special symbols, the corresponding escape sequences must be used. The query has the following restrictions: maximum query length — 400 characters; maximum number of words — 40. lr Supported only for “Russian” and “Turkish” search types. ID of the country or region to search. Determines the rules for ranking documents. For example, if we pass the value “11316” in this parameter (Novosibirsk region), when generating search results, a formula is used that is defined for the Novosibirsk region. A list of IDs of common countries and regions is provided in the appendix. l10n The notification language for the search response. It affects the text that is passed in the found-docshuman tag, as well as in error messages. Acceptable values depend on the type of search used: Yandex.XML • “Russian (yandex.ru)” — “ru” (Russian), “uk” (Ukrainian), “be” (Belarusian), “kk” (Kazakh). If omitted, notifications are sent in Russian. • “Turkish (yandex.com.tr)” — Supports only the value “tr” (Turkish). • “Worldwide (yandex.com)” — Supports only the value “en” (English). Developer's guide 8 sortby Rules for sorting search results. Possible values: • “rlv” — By relevancy. • “tm” — By time when the document was changed. If omitted, results are sorted by relevancy. When sorting by change time, the parameter may contain the order attribute, which is the order for sorting documents. Possible values: • “descending” — Forward (from most recent to oldest). Used by default. • “ascending” — Reverse (from oldest to most recent). Format: sortby=<sorting type>.order%3D<sorting order>. For example, for reverse sorting by date, you must use the following construction: sortby=tm.order%3Descending. filter Rules for filtering search results (excluding documents from search results based on one of the rules). Possible values: • “none” — Filtering is disabled. The output includes any documents, regardless of their content. • “moderate” — Moderate filtering. The output excludes documents that fall into the “adults only” category, if the search is not explicitly directed at finding these types of resources. • “strict” — Family filter. Regardless of the search query, the output excludes documents that fall into the “adults only” category, as well as those that contain foul language. If the parameter is omitted, moderate filtering is used. maxpassages The maximum number of passages that can be used when creating a snippet for the document. A passage is an excerpt from a found document that contains the query words. Passages are used for creating snippets, which are textual annotations to found documents. Acceptable values — from 1 to 5. The search result may contain fewer passages than the value set for this parameter. If the parameter is omitted, no more than four passages with the query text are returned for each document. groupby Set of parameters that define the rules for grouping results. Grouping is used to put documents from the same domain in a container. Within the container, documents are ranked using the sorting rules defined in the sortby parameter. Results passed to the container can be used for including several documents from the same domain in search output. Parameters are comma-separated and set in the format: attr%3D<utility attribute>.mode%3D<grouping type>.groups-on-page %3D<number of groups per page>.docs-in-group%3D<number of documents per group> You can find a description of the parameters mode, attr, groups-on-page and docs-in-group in the section POST requests. page Number of the requested page in the search output. This determines the range of document positions returned for the request. Numbering starts from zero (the first page corresponds to the value “0”). For example, if the number of documents returned on a page is equal to “n”, and the value “p” is passed in the parameter, the search results will include documents that fall within the range of output positions from (p+1)*n+1 to (p+1)*n+n inclusively. If the parameter is omitted, the first page of search output is returned. showmecaptcha Initiates user verification for possible protection from robots. The only value used is “yes”. Sample GET request The following request returns the second page of search results for the query “<table>” for the user “xml-search-user”. Search type: Russian (yandex.ru). Results are grouped by domain. Each group contains three documents, and five groups can be returned per page. Yandex.XML Developer's guide 9 http://xmlsearch.yandex.ru/xmlsearch?user=xml-searchuser&key=03.44583456:c876e1b098gh65khg834ggg1jk4ll9j8&query=%3Ctable%3E&groupby=attr %3Dd.mode%3Ddeep.groups-on-page%3D5.docs-in-group%3D3&maxpassages=3&page=1 POST requests Attention! Special characters that are passed as parameter values in the request body must be replaced with the appropriate escape sequences for XML-encoding. For example, instead of the ampersand sign (“&”), the escape sequence “&” must be used. Request URL http://xmlsearch.yandex.<domain>/xmlsearch ? user=<user name> & key=<API key> & filter=<filter type> & [lr=<search region ID>] & [l10n=<notification language>] & [showmecaptcha=<yes>] user User name. Must match the login for Yandex.Passport that was set during registration. key Value of the API key that was issued during registration. filter Rules for filtering search results (excluding documents from search results based on one of the rules). Possible values: • “none” — Filtering is disabled. The output includes any documents, regardless of their content. • “moderate” — Moderate filtering. The output excludes documents that fall into the “adults only” category, if the search is not explicitly directed at finding these types of resources. • “strict” — Family filter. Regardless of the search query, the output excludes documents that fall into the “adults only” category, as well as those that contain foul language. If the parameter is omitted, moderate filtering is used. lr Supported only for “Russian” and “Turkish” search types. ID of the country or region to search. Determines the rules for ranking documents. For example, if we pass the value “11316” in this parameter (Novosibirsk region), when generating search results, a formula is used that is defined for the Novosibirsk region. A list of IDs of common countries and regions is provided in the appendix. l10n The notification language for the search response. It affects the text that is passed in the found-docshuman tag, as well as in error messages. Acceptable values depend on the type of search used: • “Russian (yandex.ru)” — “ru” (Russian), “uk” (Ukrainian), “be” (Belarusian), “kk” (Kazakh). If omitted, notifications are sent in Russian. • “Turkish (yandex.com.tr)” — Supports only the value “tr” (Turkish). • “Worldwide (yandex.com)” — Supports only the value “en” (English). showmecaptcha Initiates user verification for possible protection from robots. The only value used is “yes”. Yandex.XML Developer's guide 10 Request body format <?xml version="1.0" encoding="XML file encoding"?> <request> <!--Grouping tag--> <query> <!--Search query text--> </query> <sortby> <!--Type of sorting for search results--> </sortby> <groupings> <!--Grouping parameters in child tags--> <groupby attr="d" mode="deep" groups-on-page="10" docs-in-group="1" /> </groupings> <page> <!--Number of the requested page in search results--> </page> </request> Parameter Description request Grouping tag. Child tags contain parameters of the search query. query Text of the search query. Instead of special symbols, the corresponding escape sequences must be used. The query has the following restrictions: maximum query length — 400 characters; maximum number of words — 40. sortby Rules for sorting search results. Possible values: • “rlv” — By relevancy. • “tm” — By time when the document was changed. If omitted, results are sorted by relevancy. When sorting by change time, the parameter may contain the order attribute, which is the order for sorting documents. Possible values: maxpassages • “descending” — Forward (from most recent to oldest). Used by default. • “ascending” — Reverse (from oldest to most recent). The maximum number of passages that can be used when creating a snippet for the document. A passage is an excerpt from a found document that contains the query words. Passages are used for creating snippets, which are textual annotations to found documents. Acceptable values — from 1 to 5. The search result may contain fewer passages than the value set for this parameter. If the parameter is omitted, no more than four passages with the query text are returned for each document. page Number of the requested page in the search output. This determines the range of document positions returned for the request. Numbering starts from zero (the first page corresponds to the value “0”). For example, if the number of documents returned on a page is equal to “n”, and the value “p” is passed in the parameter, the search results will include documents that fall within the range of output positions from (p+1)*n+1 to (p+1)*n+n inclusively. If the parameter is omitted, the first page of search output is returned. Group tag groupings. The child tag contains parameters for grouping results. groupby Yandex.XML Set of parameters that define the rules for grouping results. Grouping is used to put documents from the same domain in a container. Within the container, documents are ranked using the sorting rules defined in the sortby parameter. Results passed to the container can be used for including several documents from the same domain in search output. Developer's guide 11 Parameter Description Contains the following attributes: • mode — Grouping method. Possible values: • “flat” — Flat grouping. Each group contains a single document. Passed with an empty value for the attr parameter (“" "”). • “deep” — Grouping by domain. Each group contains documents from a single domain. Passed with the value “d” for the attr parameter. If the parameter is not defined, flat grouping is used. • attr — Utility attribute. Depends on the value of the mode attribute. • groups-on-page — Maximum number of groups that can be returned per page of search results. Acceptable values — from 1 to 100. • docs-in-group — Maximum number of documents that can be returned per group. Acceptable values — from 1 to 3. Tip: If necessary, use the XML feed validator in the Yandex.Webmaster service. Detailed information about validation is provided in the appendix. Sample POST request The request and request URL shown below return the third page of search results for the query “<table>” for the user “xmlsearch-user”. Results are sorted by time when the document was changed. Search type: Russian (yandex.ru). Results are grouped by domain. Each group contains three documents, and ten groups can be returned per page. The maximum number of passages per document is two. The service returns an XML file in UTF-8 encoding. Request URL: http://xmlsearch.yandex.ru/xmlsearch?user=xml-searchuser&key=03.44583456:c876e1b098gh65khg834ggg1jk4ll9j8 Request body: <?xml version="1.0" encoding="UTF-8"?> <request> <query>%3Ctable%3E</query> <sortby>tm</sortby> <maxpassages>2</maxpassages> <page>2</page> <groupings> <groupby attr="d" mode="deep" groups-on-page="10" docs-in-group="3" /> </groupings> </request> Response format In response to the search request, Yandex.XML returns an XML file in UTF-8 encoding that contains the search results. Yandex.XML Developer's guide 12 Restriction: No more than 1000 results are returned for each search query. Depending on the value of the docs-in-group attribute, each result may contain from one to three documents. The maximum number of pages with search results is determined by the number of document groups returned on each page (the value of the groups-onpage attribute). For example, if the groups-on-page attribute is passed with the value “10”, no more than 100 pages containing search results can be made. Files consist of the grouping tags request (general information about query parameters) and response (results of processing the search query). Below you will find the general structure of a resulting XML document with sample values. Attention! This structure is for illustrative purposes. It contains mutually exclusive elements. <?xml version="1.0" encoding="utf-8"?> <yandexsearch version="1.0"> <request> <query>yandex</query> <page>0</page> <sortby order="descending" priority="no">rlv</sortby> <maxpassages>2</maxpassages> <groupings> <groupby attr="d" mode="deep" groups-on-page="10" docs-in-group="3" curcateg="-1" /> </groupings> </request> <response date="20120928T103130"> <error code="15">Sorry, there are no results for this search</error> <reqid>1348828873568466-1289158387737177180255457-3-011-XML</reqid> <found priority="phrase">206775197</found> <found priority="strict">206775197</found> <found priority="all">206775197</found> <found-human>207 million pages found</found-human> <misspell> <rule>Misspell</rule> <source-text>yande<hlword>xx</hlword></source-text> <text>yandex</text> </misspell> <reask> <rule>Misspell</rule> <source-text><hlword>yn</hlword>dex</source-text> <text-to-show>yandex</text-to-show> <text>yandex</text> </reask> <results> <grouping attr="d" mode="deep" groups-on-page="10" docs-in-group="3" curcateg="-1"> <found priority="phrase">45094</found> <found priority="strict">45094</found> <found priority="all">45094</found> <found-docs priority="phrase">192685602</found-docs> <found-docs priority="strict">192685602</found-docs> <found-docs priority="all">192685602</found-docs> <found-docs-human>193 million pages found</found-docs-human> <page first="1" last="10">0</page> <group> <categ attr="d" name="UngroupVital223.ru" /> <doccount>34</doccount> <relevance priority="all" /> Yandex.XML Developer's guide 13 <doc id="ZD831E1113BCFDD95"> <relevance priority="phrase" /> <url>http://www.yandex.ru/</url> <domain>www.yandex.ru</domain> <title>"<hlword>Yandex</hlword>" is a global search engine and internet portal</title> <headline>Search the entire internet based on the user's region.</ headline> <modtime>20060814T040000</modtime> <size>26938</size> <charset>utf-8</charset> <passages> <passage><hlword>Yandex</hlword> — a search engine...</ passage> </passages> <properties> <_PassagesType>0</_PassagesType> <lang>ru</lang> </properties> <mime-type>text/html</mime-type> <saved-copy-url>http://hghltd.yandex.net/yandbtm? text=yandex&url=http%3A%2F%2Fwww.yandex.ru %2F&fmode=inject&mime=html&l10n=ru&sign=e3737561fc3d1105967d1ce6 19dbd3c7&keyno=0</saved-copy-url> </doc> </group> </grouping> </results> </response> </yandexsearch> request Generalized information about request parameters. May be omitted if the response contains errors. The request tags are described in the table below. The "request" group tags Description Attributes query Text of the search query that was passed. None. page Number of the search results page returned. Numbering starts None. from zero (the first page corresponds to the value “0”). sortby Parameters for sorting results Possible values: • “rlv” — By relevancy. • “tm” — By time when the document was changed. • order — Sorting order. The “descending” value (forward) is used by default. When sorting by change time, it can take the value “ascending” (reverse). • priority — For service use. Takes the value "no". maxpassages Maximum number of passages that can be passed in a single None. search result. groupings Grouping. None. Contains grouping parameters in the groupby tag. No attributes Yandex.XML Developer's guide 14 The "request" group tags Description groupby Grouping parameters for found search results. Attributes • mode — Grouping method. • attr — For service use. • groups-on-page — Maximum number of groups that can be returned per page of search results. • docs-in-group — Maximum number of documents that can be returned per group. Any group may contain fewer documents than the value set in this parameter. • curcateg — For service use. Takes the value “-1”. The following example shows the contents of the request grouping tag that are returned for the request http:// xmlsearch.yandex.com.tr/xmlsearch?lr=983&l10n=tr&user=xml-searchuser&key=03.79031114:b631r9j587dkl4jko987hgg7bn2kl8a2&query=%22has sample applications for the most popular programming %22&sortby=tm&maxpassages=2&groupby=attr%3Dd.mode%3Ddeep.groups-on-page %3D5.docs-in-group%3D3&maxpassages=3&page=1 <request> <query>"has sample applications for the most popular programming"</query> <page>1</page> <sortby order="descending" priority="no">tm</sortby> <maxpassages>2</maxpassages> <groupings> <groupby attr="d" mode="deep" groups-on-page="5" docs-in-group="3" curcateg="-1" / > </groupings> </request> response Results of processing the search query for which information is provided in the request child tags. Contains the date attribute — the request date <year><month><day>Т<hour><minute><second> for UTC. and time, in the format Consists of the following sections: • General information about search results. • The misspell / reask block. • The results block. General information about search results The tags for the block with general information about search results are shown in the table below. Yandex.XML Tags with general information about search results Description Attributes error Error description. code — Error code. Present only when the search request is processed incorrectly (for example, for an empty request, incorrect parameters, etc.). Developer's guide 15 Tags with general information about search results Description Attributes In certain cases, it is mutually exclusive of other tags in the response grouping tag. reqid Unique request ID. found Approximation of the number of documents found for the priority — For service use. query. Possible values: found-human None. • “phrase” • “strict” • “all” A string in the language corresponding to the search type None. selected. Contains information about the number of documents found and accompanying information. The misspell / reask block Optional. Present if a typo was found (misspell) or corrected (reask) in the query. The block tags are presented in the table below. Tags for the Description misspell / reask blocks Attributes misspell None. Grouping. Contains information about a possible typo in the search query. reask Grouping. None. Contains information about corrections made to the source query before searching for documents. rule The type of error found in the query. None. Possible values: source-text • “Misspell” — Typo. • “KeyboardLayout” — Wrong keyboard layout. • “Volapyuk” — Query made in Russian using English transliteration. Used if the search type is set to “Russian (yandex.ru)”. Source text of the query. None. The fragment of the search query that presumably contains an error is highlighted by the hlword tag. text-to-show Optional (only for the reask grouping tag). None. Contains the corrected text of the search query. In most cases it matches the value passed in the text tag. text Corrected text of the search query. None. The results block Optional. Present if results were found for the query. Yandex.XML Developer's guide 16 The block tags are presented in the table below. Tags for the results Description block Attributes results Grouping. Child tags contain information about search None. parameters and found documents. grouping Grouping. Child tags contain information about search Attributes reflect the grouping rules parameters and found documents. for found documents. found found-docs Estimated number of groups formed. • mode — Grouping method. • attr — For service use. Depends on the value of the mode attribute. • groups-on-page — Number of groups that can be returned per page of search results. • docs-in-group — Number of documents that can be returned per group. • curcateg — For service use. Takes the value “-1”. priority — For service use. Possible values: • “phrase” • “strict” • “all” Approximation of the number of documents found for the priority — For service use. query. Possible values: A more precise estimate compared to the value passed in the • found tag for the block with general information about search • results. • found-docshuman “phrase” “strict” “all” A string in the language corresponding to the search type None. selected. Contains information about the number of documents found and accompanying information. The value that is passed should be used when formatting search results. page Number of the search results page returned. Numbering starts • from zero (the first page corresponds to the value “0”). • group Grouping. first — Ordinal number of the first group with search results that is displayed on the page. last — Ordinal number of the last group with search results that is displayed on the page. None. Each group tag contains information about a found group of documents. Yandex.XML Developer's guide 17 Tags for the results Description block categ doccount Identifying data about a group of found documents. Attributes • attr — For service use. Must match the value passed in the request. • name — Unique group ID. Approximation of the number of documents that are used None. for forming the group. Documents that potentially may be included in the group are ranked according to the request conditions (the sortby parameter). Depending on the value of the docs-ingroup parameter, from one to three of the first documents are included in the group. relevance For service use. priority — For service use. doc Grouping. name — Unique ID of a found document. Each doc tag contains information about a found document. Depending on the value of the docs-in-group parameter, each group can contain from one to three of the doc grouping tags. url Address of a found document. None. domain The domain that the found document is in. None. title Title of the found document. None. Words that are in the search query are highlighted with the hlword tag. headline Optional. Document summary. None. It is created using the HTML meta tag containing the name attribute with the “description” value. modtime Date and time the document was changed, in the format: None. <year><month><day>Т<hour><minute><second> size Size of the found document, in bytes. None. charset Encoding of the found document. None. passages Grouping tag that contains a list of document passages. None. passage Passage with the document summary. None. Words that are in the search query are highlighted with the hlword tag. The maximum number of passages to be passed in a single passages tag is defined by the value of the maxpassages parameter for the search request. mime-type The document type in accordance with RFC2046. None. properties Grouping tag that contains document properties. None. _PassagesType Passage type. Possible values: None. lang • “0” — Standard passage (created from the document text). • “1” — Passage based on the link text. It is used if the document was found via a link. Optional. None. Document language. Yandex.XML Developer's guide 18 Yandex.XML Tags for the results Description block Attributes saved-copy-url None. Address of a saved copy of the document. Developer's guide 19 Request for limits for the next day Returns information about restrictions on the number of queries that can be sent each hour. The response contains information for each hour in the next 24 hours. Note: Hourly limits are only applied to the “russian” type of search. Request format http://xmlsearch.yandex.<domain>/xmlsearch ? action=limits-info & user=<user name> & key=<key> user User name. Must match the login for Yandex.Passport that was set during registration. key Value of the API key that was issued during registration. Sample request This request returns information about hourly limits restricting the number of search queries that can be sent by the “xmlsearch-user” user during the next 24 hours: http://xmlsearch.yandex.ru/xmlsearch?action=limits-info&user=xml-searchuser&key=03.44583456:c876e1b098gh65khg834ggg1jk4ll9j8 Response format In response to a request for hourly limits, Yandex.XML returns an XML file in UTF-8 encoding. Note: • If the allowed number of queries in one of the hours is exceeded, the excess queries are subtracted from the same hour the next day. These excesses are calculated when generating the response. • Hourly limits are only applied to the “russian” type of search. For the other types of search, the service returns information for each hour about the daily limits on the allowed number of queries. Below you will find the general structure of a resulting XML document with sample values. <yandexsearch version="1.0"> <response> <limits> <time-interval from="2014-07-22 +0000">500</time-interval> <time-interval from="2014-07-22 +0000">450</time-interval> <time-interval from="2014-07-22 +0000">590</time-interval> <time-interval from="2014-07-22 +0000">600</time-interval> <time-interval from="2014-07-23 +0000">300</time-interval> <time-interval from="2014-07-23 +0000">200</time-interval> Yandex.XML 20:00:00 +0000" to="2014-07-22 21:00:00 21:00:00 +0000" to="2014-07-22 22:00:00 22:00:00 +0000" to="2014-07-22 23:00:00 23:00:00 +0000" to="2014-07-23 00:00:00 00:00:00 +0000" to="2014-07-23 01:00:00 01:00:00 +0000" to="2014-07-23 02:00:00 Developer's guide 20 <time-interval from="2014-07-23 +0000">500</time-interval> <time-interval from="2014-07-23 +0000">500</time-interval> <time-interval from="2014-07-23 +0000">500</time-interval> <time-interval from="2014-07-23 +0000">100</time-interval> <time-interval from="2014-07-23 +0000">100</time-interval> <time-interval from="2014-07-23 +0000">100</time-interval> <time-interval from="2014-07-23 +0000">100</time-interval> <time-interval from="2014-07-23 +0000">200</time-interval> <time-interval from="2014-07-23 +0000">300</time-interval> <time-interval from="2014-07-23 +0000">300</time-interval> <time-interval from="2014-07-23 +0000">300</time-interval> <time-interval from="2014-07-23 +0000">300</time-interval> <time-interval from="2014-07-23 +0000">300</time-interval> <time-interval from="2014-07-23 +0000">300</time-interval> <time-interval from="2014-07-23 +0000">400</time-interval> <time-interval from="2014-07-23 +0000">500</time-interval> <time-interval from="2014-07-23 +0000">500</time-interval> <time-interval from="2014-07-23 +0000">600</time-interval> </limits> </response> </yandexsearch> 02:00:00 +0000" to="2014-07-23 03:00:00 03:00:00 +0000" to="2014-07-23 04:00:00 04:00:00 +0000" to="2014-07-23 05:00:00 05:00:00 +0000" to="2014-07-23 06:00:00 06:00:00 +0000" to="2014-07-23 07:00:00 07:00:00 +0000" to="2014-07-23 08:00:00 08:00:00 +0000" to="2014-07-23 09:00:00 09:00:00 +0000" to="2014-07-23 10:00:00 10:00:00 +0000" to="2014-07-23 11:00:00 11:00:00 +0000" to="2014-07-23 12:00:00 12:00:00 +0000" to="2014-07-23 13:00:00 13:00:00 +0000" to="2014-07-23 14:00:00 14:00:00 +0000" to="2014-07-23 15:00:00 15:00:00 +0000" to="2014-07-23 16:00:00 16:00:00 +0000" to="2014-07-23 17:00:00 17:00:00 +0000" to="2014-07-23 18:00:00 18:00:00 +0000" to="2014-07-23 19:00:00 19:00:00 +0000" to="2014-07-23 20:00:00 Tag Description Attributes response Grouping. None. limits Grouping. None. Contains entries about hourly limits on the allowed number of search queries. time-interval The number of search queries that can be sent during • the specified time interval. The borders of the time interval are defined by attributes. • from — date and time (inclusively) of the start of the time interval the limit applies to. to — date and time (not inclusively) of the end of the time interval the limit applies to. Data format in attributes: YYYY-MM-DD HH:MM:SS +HHMM “HHMM” specifies the event offset relative to UTC0. Attention! At this time, information about hourly limits is output for UTC0. Yandex.XML Developer's guide Developer's guide Formatting results When formatting search results, you must adhere to the rules described in the License for use of the Yandex.XML service. The license differs for Russian, Turkish, and Worldwide search types. Every page generated using Yandex.XML must contain: • A link to the Yandex home page, formatted as a logo. • Text with information about the number of documents found (“NNN pages found”). The information about the number of documents found is passed in the found-docs-human tag in the XML file with search results. The links to logos that must be used depending on the background color, along with formatting examples, are provided in the table below. Background color Logo Black/dark Download. Formatting example White font and a red letter “Y”. Transparent background. Red Download. White font. Transparent background. White/light Download. Black font and a red letter “Y”. Transparent background. Protection from robots Search queries can be submitted not only by users, but by robots, as well. When there is a flood of queries from robots, you may exceed the limitations applied for usage of the Yandex.XML. To prevent unauthorized access to the search by robots, a security algorithm is used. If it is suspected that a query was submitted by a robot, a CAPTCHA is returned instead of search results (see this Wikipedia article about CAPTCHA). To use the algorithm for protection from robots, the partner must pass information about the IP address and the "spravka" cookie for the request's author. The "spravka" cookie is generated on the Yandex.XML side and is returned the first time the user accesses search results. In the value that is received, the partner must replace the domain with his own, and then add the following string to the search response: Set-Cookie: spravka=... Information about the IP address and the "spravka" cookie are passed in the request header in the format: X-Real-Ip: 99.999.999.99 Cookie: spravka=<value passed from Yandex> The diagram below illustrates the steps performed for protection from robots. Yandex.XML Developer's guide 21 Developer's guide 1. The user sends a query to the Yandex.XML partner. 2. The search query is sent to the Yandex.XML service. The request must match the specified format. 3. Yandex.XML initiates the algorithm for protection from robots. The values of the IP address and "spravka" cookie (if previously issued) are used for verification. Possible results of verification: • The request was probably not sent by a robot. The process continues to step 13. • The request was probably sent by a robot. The decision is made to display a CAPTCHA. 4. Yandex.XML returns the partner an XML file in the following format: Yandex.XML Developer's guide 22 Developer's guide <?xml version="1.0" encoding="utf-8"?> <yandexsearch version="1.0"> <response> <error code="100">Robot request</error> </response> <captcha-img-url>http://captcha.image.gif</captcha-img-url> <captcha-key>CAPTCHA ID number</captcha-key> <captcha-status>Status</captcha-status> </yandexsearch> 5. The user is returned a page containing a CAPTCHA. 6. The user sends the CAPTCHA value to the partner. 7. The partner sends the CAPTCHA value obtained from the user via a GET request in the following format: http://xmlsearch.yandex.ru/xcheckcaptcha?key=<CAPTCHA ID number>&rep=<CAPTCHA value entered by user> 8. The value received is checked by the Yandex.XML service. If the CAPTCHA value was entered incorrectly, the process continues to step 4. In addition, the captcha-status parameter is passed with the value “failed”. 9. If the CAPTCHA value was entered correctly, Yandex.XML issues the user a "spravka" cookie and passes it to the partner in the header with the following format: HTTP/1.1 200 OK Set-Cookie: spravka=<cookie value> If the request passed to Yandex.XML in step 1 was saved successfully, the process continues to step 12. 10. The partner lets the user enter a query. 11. The user sends a query to the Yandex.XML partner. 12. The search query is sent to the Yandex.XML service. Along with the request, the user's IP address and "spravka" cookie are passed. 13. Yandex.XML processes the search query and generates results. 14. An XML file with search results is returned to the partner. 15. The partner returns the processed response to the user. If in step 9 the Yandex.XML issued a "spravka" cookie, it is saved on the user's computer. Tip: To try out how this flow works, use this script. Verifying correct CAPTCHA display To get familiar with the response format returned by Yandex.XML when a CAPTCHA is displayed, send a request (the value of the query parameter of the search request) with the following string: “e48a2b93de1740f48f6de0d45dc4192a”. The following GET request can be used by the user “xml-search-user” for reviewing the response format returned when a CAPTCHA is displayed: wget -q --header="X-Real-Ip: 127.0.0.1" -SO- 'http://xmlsearch.yandex.ru/xmlsearch? user=xml-searchuser&key=03.44583456:c876e1b098gh65khg834ggg1jk4ll9j8&query=e48a2b93de1740f48f6de0d45dc4 192a&showmecaptcha=yes' Yandex.XML Developer's guide 23 24 Questions and answers This section answers some common questions that Yandex.XML users ask. For convenience, questions are grouped in categories: • What is XSLT? • Notifications. • IP address. • Additional search features. • Encoding. What is XSLT? XSLT is a language for converting and rendering XML documents and is a part of the set of XSL recommendations. Detailed information about the XSLT language is provided in the following documents: • Extensible Stylesheet Language (XSL). • XSL Transformations (XSLT). Notifications What are notifications? Notifications are a service for automatically sending email when problems arise during use of Yandex.XML. The email address, settings for sending notifications, and thresholds are all set during registration. What should I do if I get a notification that the number of requests has sharply decreased? The table below shows possible reasons for a decreased number of requests, how to diagnose it, and recommended solutions. Reason Diagnostic methods Recommended solutions Decreased number of searches Review the site usage statistics for days Increase the notification threshold on the performed. For example, this of the week and time of day. Settings page. may occur due to natural variation in the number of visitors depending on the day of the week or the time of day. Unavailability or partial availability Try submitting several search queries Check whether the request format of Yandex.XML on the web site. yourself. Check the accuracy of the is correct. results that are returned. Yandex.XML Developer's guide 25 What should I do if I get a notification that the number of requests has sharply increased? The table below shows possible reasons for an increased number of queries, how to diagnose it, and recommended solutions. Reason Diagnostic methods Recommended solutions Increased number of searches Review the site usage statistics for days Increase the notification threshold on the performed. For example, this of the week and time of day. Settings page. may occur due to natural variation in the number of visitors depending on the day of the week or the time of day. DoS attack. Check the server log files for data suggesting a DoS attack. What should I do if I get a notification that there were no requests for a 24-hour period? Check how the search is working on the site. If statistics show a sharp decrease in the number of queries made from the site, it is possible that this is due to the search not working correctly. What should I do if I get a notification that the number of requests is approaching the daily limit? Review the restrictions applied to the service and ways to get around them. Contact a Yandex representative to discuss details of expanding search features. IP address Why is an IP address required for registration? The IP address, in combination with a Yandex.Passport account, is used for identifying a Yandex.XML user. The results of user identification determine the restrictions applied to service usage. How do I find out what my IP address is? The way to determine the IP address depends on the type of computer being used to access the Yandex.XML service. Device type Possible methods for detecting the IP address Server Personal computer • Ask your provider for the IP address. • Set up a remote connection to the server and run the ipconfig command (Windows) or ifconfig (Unix). • Run the ping <server name> command from the command line of a personal computer. • Use the Yandex.Internetometer service. • If static addresses are used, ask your service provider. Note: Note that if a modem is used, the IP address can change each time a connection is made. Yandex.XML Developer's guide 26 The IP address being registered is in use The table below shows possible reasons and solutions. Reason Possible solution An open proxy server is being used to access Yandex.XML. User your Internet provider's proxy server. A modem is being used to access the Internet. Your provider assigns a dynamic IP address, which can change each time you connect. Try disconnecting and reconnecting to the Internet. The service is being accessed from a server. Obtain a dedicated IP address. Additional search features Setting up site search To restrict the search to the web site only, use the host operator. Syntax: <query text> host:<URL of the site to search on> The following request is used for searching for the phrase “search settings” on the web site http://help.yandex.ru/: search settings host:help.yandex.ru Restricting the search to a region or category To restrict the search to documents that are relevant to a particular region or category, use the cat operator. Syntax: <query text> cat:<adjusted ID of the region or category> For the value of the cat operator, pass the adjusted value of the region ID (added to “11000000”) or category ID (added to “9000000”). The request may specify multiple regions and categories. To do this, use the logical operators “AND” (“&&”) and “OR” (“|”). The following request is used for searching for the word “meat” in documents that are relevant to the category “bodybuilding nutrition” (ID “3783”) in the city of “Samara” (ID “51”): meat cat:11000051 && cat:9003783 Search in results To set up a search in results, use the && operator. Syntax: (<original query text>) && (<query text to search for in results>) The following request is used to search for documents with the phrase “manual transmission” in results for the query “autos”: (autos) && (manual transmission) Yandex.XML Developer's guide 27 Encoding How to correctly set encoding for a request being sent? The request encoding is set in the header of the XML file: <?xml version="1.0" encoding="<encoding>"?> Which encoding is used for sending the search response? The XML file with search results is sent in UTF-8 encoding. To convert it to a different encoding, you can use a library such as the libiconv library or the Convert::Cyrillic module. Incorrect characters in the response In most cases, incorrect characters in the response are the result of sending the request at the socket level. Possible solutions: Yandex.XML • Use HTTP version 1.0 instead of 1.1. • Use a higher-level type of interface. • Configure handling for chunked responses. Developer's guide Appendices Appendices Validating XML files To prevent incorrect processing of search queries, at the testing stage we strongly recommend validating XML files that are generated for requests using the POST method. You can validate XML files in the Yandex.Webmaster service. For validation, the XML request schema is used. To validate a file, follow these steps: 1. Open the XML-feed validator page. 2. Select other schemas → link in the Standard validation schema group. 3. Set the value “http://api.yandex.com/xml/doc/dg/res/request.en.xs” in the Specify the link to your XSD schema box. 4. Set one of the ways to pass the contents of the XML document in the Feed for validation box. 5. Click the Check button. If the XML file complies with the schema, the message “XML complies with the XSD schema” is returned. If inconsistencies are discovered, it returns information about the line where you should look for an error. Error codes When search requests are processed incorrectly, the server response contains the error tag. Format: <error code="error code"> Error description text</error> The table below lists codes and descriptions for common errors that occur when processing search requests. Error code Description 1 The query text (the value passed in the query element) contains a syntactical error. For example, a query was sent that contained only two slash symbols in a row (“//”). Yandex.XML 2 An empty search query was defined (an empty value was passed in the query element). 15 There are no search results for the specified search query. 18 The XML file cannot be validated, or invalid request parameters are set. Possible reasons: • Incorrect tags or tag values were passed. • The request body contains non-escaped special characters. For example, the ampersand symbol (“&”), and so on. • The request page contains search results with more than 1000 entries. For example, if each page contains 10 results, this error will be returned when attempting to request page 101 and further in results. 19 The search query contains incompatible parameters (for example, incompatible values for the groupings element). 20 The reason for the error is unknown. If the error persists, contact the support service. 31 The user is not registered on the service. 32 Limit exceeded for the number of queries allowed per day. Review the information about restrictions and choose a suitable method for increasing your daily quota. Developer's guide 28 Appendices Error code Description 33 The IP address that the search request was sent from does not match the one(s) set during registration. 34 The user is not registered in Yandex.Passport. 37 Error in request parameters. Maybe mandatory parameters were omitted, or mutually exclusive parameters were defined. 42 The key that was issued during registration contains an error. Check whether the correct address is used for sending requests. 43 The version of the key that was issued during registration contains an error. Check whether the correct address is used for sending requests. 44 The address that requests are sent to is no longer supported. Correct the value to match the address that was given during registration. 48 The search type that was specified during registration does not match the search type that is being used for requesting data. Reset the domain that is being used to the correct domain. For corrections, use the URL for sending requests. 100 The request was most likely sent by a robot. When this error appears, a CAPTCHA must be returned to the user. Search regions The region to give preference to when generating search results is defined by the value of the lr parameter of the search query. Countries, federal subjects, and cities can be specified as the region. A list of IDs for commonly used countries is provided in the table below. ID Country 225 Russia 187 Ukraine 149 Belarus 159 Kazakhstan See also Other popular regions Yandex.XML Developer's guide 29 Yandex.XML Developer's guide 8.10.2014