Add translation to your software
Transcription
Add translation to your software
Wordbee Beebox White paper Add Translation to Your Content, Web Application or Software Written for Technical People June 22, 2015 http://www.wordbee.com http://www.beeboxlinks.com Copyright Wordbee SA © 2015 Wordbee 1 1 Welcome This white paper provides insightful information about adding translation capabilities to content management systems, software applications, SaaS solutions, or any other type of product you might develop. Maybe you are developing a travel web site; selling books online, or working with cars... Whatever the precise nature of your business, you most likely manage a web site or application containing a sufficiently large number of articles, products, or content pages. In this scenario, translation quickly becomes a challenge. Whether multilingual content already exists or not, you may have the impression that adding translation is expensive, inefficient, or difficult to organize. Your web site or software. English Only! Beebox Your web site or software. English, French, Chinese, Russian, Spanish +++++! Beebox provides a simple, effective way to make your web site, shop, mobile app or software multilingual. Built by developers for developers, Beebox delivers an exceptional toolbox to make fulfill your multilingual needs. Table of contents 1 Welcome ........................................................................................................................................... 2 2 About Beebox ................................................................................................................................... 3 3 The Inner Workings........................................................................................................................... 5 4 Developers Guide.............................................................................................................................. 8 5 Machine and Human Translation ................................................................................................... 12 6 Licensing.......................................................................................................................................... 15 © 2015 Wordbee 2 2 About Beebox Beebox is a Microsoft Windows application, which can easily be installed to a development PC, a test server, or a production server. It not only takes care of translating your content, but also supports machine translation and human translation workflows. To use Beebox, you will first need to create a Beebox project for your data. The project exposes two file directories: IN and OUT. The “IN” directory is used to copy texts or files for translation and the “OUT” directory is the storage location for translated files. This is depicted in the following diagram: Your data/system File copy, Api, Ftp, Dropbox... Beebox project (1) Send original files/texts English Here your contents gets translated! Machine translation: Yes/No in Human translation: Yes/No Count words and cost: Yes/No (2) Receive translated files/texts Chinese French Xml, Html, Json, Flat files, Php, .Net, Csv, Strings, Resource files, DB dumps, Word, Excel, Powerpoint, Indesign and many more Remember and re-use translations! out Spanish ... Any type of content or files may be sent to Beebox and translated files will have the exact same format as the original. For example, if you send an English PHP file to “IN”, a translated PHP file will be placed in the “OUT” directory. Beebox is capable of identifying what content needs to be translated. It further protects HTML, JavaScript, server, or other format-specific data and ensures that translators cannot alter this information. Beebox supports almost any file format out of the box including: XML, Json, HTML, PHP, ASPX, Javascript Files, Razor, Code Files (c, java…) Flat Files, CSV, String Files, iOS Files, Java Properties, INI Files Word, Excel, PowerPoint, InDesign, Open Office, RTF, DITA, .Net resources, XLIFF… Each format may be customized to precisely fine-tune what needs to be translated by Beebox. For example, XML formats let you specify XPath expressions, HTML-based formats let you specify the HTML tags or attributes to translate, and still other formats use regular expressions to include or exclude certain strings. You may use the Beebox Web API (recommended) rather than performing steps to copy the files. Please see Chapter 4 for more details. © 2015 Wordbee 3 2.1 Examples Translating texts contained in a database Let us imagine that you have a database storing product descriptions in HTML format. Your database is already designed to store texts in each of your required languages; however, the translations are missing! To solve this problem, you can write a tool or script that dumps the database to an XML file (CSV, Excel, or any other format). The XML might contain one node per product and an attribute containing the product id. First, copy the file to Beebox. As soon as the content has been translated, you can download the translated files, one per target language. The final step is to store the translations back into your database (i.e. The product id mentioned earlier lets you locate where to insert each product). Since Beebox remembers every completed machine or human translation, you are able to send a full dump as many times as needed without having to pay for translation again. Beebox locates the new or updated texts without requiring input from the user so that only the identified texts will go into a translation workflow. Of course, it is not necessary to dump the entire database. If you track which products are changed, it may be faster to extract only changed or added products to a single file, or one file per product. Translating software string or HTML Copy your HTML, PHP, ASPX, or other files to Beebox and it will again reuse previously completed translations to ensure that only the latest changes are translated. The resulting translated HTML files perfectly reconstitute markup, JavaScript code, or server code. You can customize Beebox to extract translatable JavaScript strings or server code strings next to the basic HTML contents. 2.2 Why Beebox? Beebox is the easiest means you will find when it comes to adding translation capabilities to your software or solutions. This is because Beebox does the following: Supports several file formats. There is no need to convert your content. Generates your translated files in the same original format. Remembers all translations, never pay twice for translating the same piece of text. Finds unique texts: A string like “Click here” may show up 500 times in your software. Beebox will send the string just once for translation. Supports a wide range of machine translation systems. Supports human translation workflows via XLIFF or direct web link to translation vendors. Includes smart alignment technology. Translated content can be aligned on the fly. Includes a web user interface to monitor operations, locate, filter, and change text content. Easy to integrate and offers multiple choices, like: File copy, Web API, Http callbacks, and PowerShell scripts. © 2015 Wordbee 4 3 The Inner Workings The system may be configured to use any combination of machine translation1, pseudo translation2 and human translation. It adapts to your exact requirements. During development you will likely want to use a workflow that employs pseudo- or machine translation and skips any cost incurring human work. In production you will want to add human translation. Here is a visual explanation of what happens to files placed in the “IN” directory or sent via the API: 1. Text extraction Text segments English English Original file All (translatable) text is extracted from the file. The result is a list of segments, typically paragraphs or sentences. These segments are file format independent. Product Alpha We are happy... Click here Unmatched ... 2. Pretranslation The translation memory remembers every single translation done in the past. We look up each segment to see if a translation already exists. TM French German Produit Alpha Alpha Produkt Nous sommes... Wir sind... Cliquer ici Hier klicken Chinese Unschlagbar... 3. Machine translate French German Chinese Untranslated segments can optionally be machine translated using Microsoft, Google or other systems. Preliminary translated files can be retrieved at this stage. French German Chinese Produit Alpha Alpha Produkt 阿尔法产品 Nous sommes... Wir sind... 我们很高兴... Cliquer ici Hier klicken 点击这里 Inégalée ... Unschlagbar... 无与伦比的... Preliminary translations Quality check 4. Human translate French Missing or machine-only translations can optionally be sent to translators or LSPs. The result are human quality translations. 5. Delivery French German Chinese The translated files are created from the translations. Optionally notify by email, callback urls or custom scripts. German Chinese Produit Alpha Alpha Produkt 阿尔法产品 Nous sommes... Wir sind... 我们很高兴... Cliquer ici Hier klicken 点击这里 Inégalée ... Unschlagbar... 无与伦比的... Inhouse staff can use the web based Beebox UI to do quality checks, searches, fixes... This step is optional. Final translated files 1 2 The Beebox interfaces with major online machine translation systems such as Google and Microsoft. To simulate translation workflows. « Translates » by converting text to lower or uppercase. Shift letters, etc. © 2015 Wordbee 5 3.1 What is the purpose of the translation memory? As you can see, all translations ever completed in a Beebox project are remembered. The system also keeps track of quality levels. It knows which translations have been approved by a human translator or proofreader and which ones have been completed by machine without approval. We have further seen that Beebox splits your text content into smaller segments, typically the size of a sentence or paragraph. If a specific sentence was translated into Chinese earlier and the exact same sentence shows up in another file, Beebox can pre-translate this second sentence. Think of a text such as “Click here”. It may show up hundreds of times in a web site, but Beebox will send it to a MT or a human translator just once! English French Alpha Produkt 阿尔法产品 We are happy... Nous sommes... Wir sind... 我们很高兴... Click here Cliquer ici Hier klicken 点击这里 Unschlagbar... 无与伦比的... Hello world Bonjour tout le.. Hallo Welt 你好世界 Welcome! Bienvenue! Willkommen! 欢迎! Original segment (sentence, phrase, link...) Human translation or approved Machine translation, unapproved Not translated Chinese Produit Alpha Unmatched ... TM German Product Alpha Language: German Text: Bienvenue! File: products\welcome.htm Origin: Machine translation Human approval: None Context: {Text before + text right after} Flags: Lock translation, Bookmarks, Comments When a new or updated file is added to Beebox, it will look up each « segment » in the translation memory. Even if the file is completely new, there is very high likelihood that some translations from other files can be reused. This helps to reduce overall translation costs. A Beebox project can be configured for different leveraging modes: Leverage translations of identical texts from wherever they come from, like in the “Click here” example above. This is the default mode and fine with most content. Leverage translations only if the source text is identical as well as the context, i.e. the segment above and below the text. Sometimes translations are done differently depending on the context. This mode produces potentially higher quality, but also more segments to translate. As a developer, these details are not as important as it is typically the language service provider who makes this choice. © 2015 Wordbee 6 3.2 What happens if I send a new version of an already translated file? Beebox will start by extracting the text content and split it into segments. It then attempts to pretranslate all segments from the translation memory. If the new file version did not change, then Beebox will find all translations in the memory. It is then consequently able to fully translate this information and send back the translated file to you. Otherwise, it will pre-translate all unchanged segments and move the added or changed segments to MT or human translation. For example, if a single word is replaced inside a huge iOS strings-file, only the one string containing the changed word will be sent for MT or Human translation. All the other strings are reused from the translation memory. As you can see Beebox is a safeguard for spending money on translation where this is not necessary. When Beebox pre-translates a piece of text (segment), there may be multiple choices. To maximize pre-translation quality, the system uses heuristics to choose the best translation: - It first attempts to pre-translate a file from the previous file version (if one exists). It gives more weight to approved translations. It gives more weight to pre-translations when the context matches (text before and after). It gives more weight to pre-translations done or post-edited by human translators. It gives less weight to translations with QA (quality assurance) problems 3.3 Does Beebox run automatically and unattended? Yes, Beebox is capable of running automatically with little to no human intervention – like a true black box. However, you can also configure Beebox to wait for confirmation at certain steps in the workflow. Examples are: 1. Confirming work before it is sent to translators (e.g. to check cost). 2. Validating work received from human translators. In the second example, you would connect to the web interface of your Beebox to filter translations, run checks, manually correct problems, or send the translation back to the translator. In general, it is recommended to automate all steps. 3.4 Alignment - Can I send already translated files? Yes, Beebox incorporates alignment technology as translated versions may be sent with your source files. Beebox splits source and translations into segments and aligns them. In other words, you can send existing translations to Beebox without having to create translation memories in advance (using 3rd party alignment software). It is as simple as that! The ability to include translated files also serves another purpose. Imagine that a file had been translated in the past. Now, someone edits the translated content (outside Beebox) and changes one or two sentences. The next time you send the source and translated files to Beebox, it will align the files and identify the changes in the translation. The changes go to the human translation team for approval. This use case is specifically interesting with CMS connections where CMS proofreaders may © 2015 Wordbee 7 want to edit the translated content and then send the content to Beebox so that its memories are updated. 4 Developers Guide Beebox is translation software that was built by developers for developers. We know how hard it can be to integrate third party products or libraries. Therefore, our design goal was to make your life as easy as possible. In this chapter we outline the multiple means to interface with Beebox. We encourage you to choose the methods and options that you feel most comfortable with. View online documentation. 4.1 File copy Integration truly is simple with the File Copy option. With this feature, you can copy files requiring translation directly to the “IN” directory, wait, and then fetch translations from the “OUT” directory. If Beebox is located on a server, you may do this transfer using a Windows file-share, FTP, Dropbox, or other file sharing system. For evaluating Beebox, we suggest you install it on your development PC. 4.2 Web API If copying files sounds too “primitive” in your ears, then please check out the web-based API. In most scenarios, you usually only need 6 to 10 methods to complete translation objectives. By default, the API listens on port 8089. The first method you will need to use in the web-based API establishes a connection and obtains a connection token. Here are a few examples: (GET) http://localhost:8089/api/connect?project=...&login=myname&password=whatever To send a file to the “IN” directory, use: (PUT) /api/files/file?token={token}&locale=en-US&folder=&filename=products\product1.xml To poll which files have been fully translated and ready in “OUT”, use: (GET) /api/workprogress/translatedfiles?token={token}&filter=&skip=&count= To download a translated file, use: (GET) /api/files/file?token={token}&locale=fr&folder=&filename=products\product1.xml There are also calls to obtain cost quotations in case the human translation workflow is linked to a Wordbee Translator platform, which is typically owned by a freelance translator or a language service provider. © 2015 Wordbee 8 A word on security: You assign dedicated API credentials to each Beebox project. This is useful if your project requires credentials to be handed out to your customers. Imagine if you gave them software, which needs to call your Beebox API. When you connect to the API with a set of credentials, the caller can access the associated project data only. 4.3 Instructions files With each source file you save to the “IN” directory, you can include a so-called “instructions file”. This optional Json formatted file, lets you: Specify the target languages for translation. By default, a file is translated into all project target languages. This is helpful in instances where certain content must be translated into certain specified languages. Specify a deadline for human translation steps. Request alignment of source file and translated file(s) Exclude the file from any machine translation or pseudo-translation operations (kind of dummy automatic translation). Specify that file content shall never be sent for human translation. This is useful if machine translation is sufficient for some content, but not sufficient for others. Suppose the source file is “products\product1.xml”, then the instructions file might be: Products\product1.xml.beebox { "locales": [ "fr" ], "disableMT": true } Suppose translated files are saved to Beebox for alignment purposes. The instructions include the “align” node with the target languages in your saved translations: Products\product1.xml.beebox { "align": { “locales”: [ "fr" ] } } 4.4 Callbacks You can configure Beebox to call a web URL with any important events. For example, Beebox can notify your systems when a file was translated and saved to the “OUT” directory. 1. New translated files! Beebox project 2. Beebox calls your url Http://myserver.com/notify?event=translation&date=... out 3. Get translation from « OUT » © 2015 Wordbee 9 The callback mechanism makes it unnecessary to poll Beebox at regular intervals in order to know whether translations have arrived. 4.5 PowerShell scripts PowerShell scripts are another mechanism that will prove to be beneficial in certain integration scenarios. You can develop a script that is automatically executed by Beebox whenever new files to translate are received and whenever files have been translated. 1. New translated files! Beebox project 2. Beebox calls your Powershell script out Myscript.ps1 The script can be used to transfer translated files to another server or location, to initiate an FTP, or to do whatever is needed to integrate translations into your systems. Learn more. 4.6 Microsoft.Net extensions Extend Beebox functionality using c# and Visual Studio. You can code translation algorithms, custom filters and more, and upload your code to a Beebox. From a developer point of view, things are quite straightforward: Start with creating a class library project in Visual Studio. Add one class per extension. Finally, you compile your project and upload the dll to the Beebox server. Learn more. 4.7 Email notifications Finally, Beebox can send out email notifications, such as “File translated”, “Texts sent to translator”, “Texts received from translator”, etc. for easy identification of a translation’s current status. 4.8 A word on projects and languages One Beebox installation can manage any number of Beebox projects in any languages. Each project is for one specific source language, which is the language in which content to translate is written. A project can have one or more target languages, which are the languages into which content shall be translated. If your original content is for multiple source languages, simply create as many projects as needed to manage the current number of source languages. © 2015 Wordbee 10 Languages are managed with their ISO two letter (sometimes three letter) codes. You can use the Beebox API to obtain a complete list of language codes together with their English names. Languages are expressed in their neutral form (“en” for English) or with a region indicator (“en-GB” UK English, “en-US” US English). Both are commonly used codes as you can see by viewing this page http://en.wikipedia.org/wiki/List_of_ISO_639-2_codes 4.9 Testing and production It is fairly simple to work with Beebox during development, testing, and production. Your first tests To begin trying out Beebox, you will first need to download and install Beebox to your development PC. Create a project and configure an automated workflow for “pseudo-translation”. This simulates a zero-cost translation workflow: All source texts are simply converted to uppercase, lowercase or letter shifted. Now, whatever you put into the “IN” directory will be “translated” within seconds. The “translated” files immediately show up in the “OUT” directory. Machine translation is not very expensive and you might consider enabling one of the web based machine translation system such as Google Translate or Microsoft Translator. Development in a team If you have multiple developers who all need to use Beebox, we recommend installing it to a development server. Please note, that a Beebox license includes one production install and one test install. So you are immediately fine! Production environments Install Beebox to your production environment. This step is of course not required if your customers or a language service provider would host Beebox. In production, create the Beebox projects you need and configure the “true” workflows. For example, you may want to enable a human translation workflow or a combination of human and machine translation workflows. Staging environments For staging environments, you can reuse the same Beebox you installed for development or for production. Then simply create dedicated Beebox projects for staging use. © 2015 Wordbee 11 5 Machine and Human Translation Each Beebox project can be configured with its own translation workflows and level of automation. Please also refer to the user documentation for additional details and helpful screenshots. Basically, a Beebox project can provide these workflow steps: 5.1 Pseudo translation If enabled, new or changed text segments are first pseudo translated. Pseudo-translation refers to conversion of text into uppercase, lowercase, letter shifts or length changes. This is useful for zerocost and rapid test workflows. 5.2 Machine translation If enabled, new or changed text segments are first machine translated. Machine translation refers to online systems such as Google Translate or Microsoft Translator. You can then decide whether machine translations shall be “considered” approved or still require a subsequent human approval or correction. In the former case, the system is told that the machinetranslated files can be created immediately and placed into the “OUT” directory. You would enable this option if your workflow does not require any subsequent human validation. 5.3 Human translation Unapproved content is sent to a human translator, team or language service provider. Please note that the system does not send your original files! Instead it sends all the “unapproved” or “untranslated” text segments earlier extracted from your files in the “IN” directory. Beebox creates translation jobs from those segments. It then provides two workflow options to choose from: a) XLIFF - Use XLIFF to exchange translation work with human translators. In this case, Beebox will automatically create a translation job (or jobs) with all new or changed content. Within the Beebox user interface, you are then able to download the XLIFF files and send them to your translators. Once the translation is finished, the translated XLIFF files are uploaded back to Beebox and translated files are created in the “OUT” directory. The exchange of the Xliff jobs with a translation management or other system can be fully © 2015 Wordbee 12 automated by means of hot-folders. In this scenario Beebox would extract new Xliff jobs to one directory and scan another directory for translated Xliffs, all unattended. b) Wordbee Translator - Use a direct link to a translation management system (TMS). Currently Beebox can link to the Wordbee Translator management tool, which is a web based SaaS solution (www.wordbee.com). With this option, all exchanges between your Beebox and a translator or language service provider can run automatically (if desired). This occurs via the web and with absolutely no intervention from your side. The Wordbee Translator translation environment. See www.wordbee.com 5.4 Validation In a fully automated workflow, you would not manually verify machine or human translations and the system would immediately create the final translated files. However, if you want to put in place a manual verification step, you can do so. Beebox has a user interface that is capable of finding and filtering approved, unapproved, or QA check translations. If problems are found you can either fix them yourself or flag them for resending to a translator (which would create a new translation job). © 2015 Wordbee 13 The user interface also lets you preview translated files: 5.5 Delivery Once all translations are approved (automatically or semi-automatically), Beebox creates the required translated files from the translated content. The files are saved to the “OUT” directory and can be retrieved directly from this location or by using the web based API. 5.6 Incorporating legacy translations What if you already have some translated content? You do not want to translate those once again with Beebox, right? The following options are available in these instances: Existing translation memories First of all, Beebox lets you upload legacy translations as an Excel file with one column per language. If a freelance translator or language service provider completed your translations, please ask them to send you the translations as translation memories. This approach typically yields non-perfect results and not all content may get translated from the memories. This is due to potential differences in how text is segmented and marked up in Beebox vs. the translation tools used for the legacy translation. Use Beebox alignment features to build memories on the fly This is both the recommended and the easiest approach. Basically, you would save the existing source files and any existing translated files to Beebox. Beebox will then align each source-target couple and build memories on the fly. With this approach, no manual intervention is needed. © 2015 Wordbee 14 The process is also designed to leverage legacy memories. Simply upload any memories you have from your translators and Beebox will use them to optimize the alignment process. If you do not have any memories you can also improve the alignment by uploading dictionaries. 6 Licensing One Beebox license grants you the right to install Beebox to one Windows instance for production use. In addition, you have the right to install a second Beebox to another Windows instance for staging or testing purposes. If you require a volume license plan, please contact Wordbee for more details. After downloading the Beebox software, a 30-day trial is activated. This trial may be extended in specific instances and you will need to contact Wordbee to acquire an extension of your trial when needed. Each Beebox installation can manage an unlimited number of Beebox projects, limited only by the hardware on which Beebox is installed. You can further assign dedicated API credentials to each project. This means that when your one Beebox serves 10 customers you can create 10 different sets of credentials and disclose this information to each individual customer. These are the most typical licensing scenarios: You want to add translation capabilities to your own web app or software: Beebox would need to be installed on your servers. You do not need to distribute Beebox with your software to your customers. A single Beebox license is sufficient. Add translation capabilities to your software and distribute the software: In this case you have two options: (a) Distribute Beebox with your software, or, (b) Have your software communicate with your own Beebox installation through the Beebox web API. In the first scenario, you need to obtain one Beebox license per product installation. Develop an off-the-shelf connector to localize a specific CMS: For example, you want to distribute a plugin to a CMS (like Drupal, Sitefinity, Adobe CQ5 …) In that case, the plugin communicates with Beebox via the web API. You would likely not install Beebox at each customer premises (although you could). Instead, you will either need to complete the install of Beebox on one of your servers in order to translate content for all of your customers or have the customers’ own language service provider install a Beebox. © 2015 Wordbee 15