Add translation to your software

Transcription

Add translation to your software
Wordbee Beebox
White paper
Add Translation to Your
Content, Web Application or Software
Written for Technical People
June 22, 2015
http://www.wordbee.com
http://www.beeboxlinks.com
Copyright Wordbee SA
© 2015 Wordbee
1
1 Welcome
This white paper provides insightful information about adding translation capabilities to content
management systems, software applications, SaaS solutions, or any other type of product you might
develop.
Maybe you are developing a travel web site; selling books online, or working with cars... Whatever
the precise nature of your business, you most likely manage a web site or application containing a
sufficiently large number of articles, products, or content pages. In this scenario, translation quickly
becomes a challenge. Whether multilingual content already exists or not, you may have the
impression that adding translation is expensive, inefficient, or difficult to organize.
Your web site or
software.
English Only!
Beebox
Your web site or
software.
English, French,
Chinese, Russian,
Spanish +++++!
Beebox provides a simple, effective way to make your web site, shop, mobile app or software multilingual. Built by developers for developers, Beebox delivers an exceptional toolbox to make fulfill
your multilingual needs.
Table of contents
1
Welcome ........................................................................................................................................... 2
2
About Beebox ................................................................................................................................... 3
3
The Inner Workings........................................................................................................................... 5
4
Developers Guide.............................................................................................................................. 8
5
Machine and Human Translation ................................................................................................... 12
6
Licensing.......................................................................................................................................... 15
© 2015 Wordbee
2
2 About Beebox
Beebox is a Microsoft Windows application, which can easily be installed to a development PC, a test
server, or a production server. It not only takes care of translating your content, but also supports
machine translation and human translation workflows.
To use Beebox, you will first need to create a Beebox project for your data. The project exposes two
file directories: IN and OUT. The “IN” directory is used to copy texts or files for translation and the
“OUT” directory is the storage location for translated files. This is depicted in the following diagram:
Your data/system
File copy, Api, Ftp, Dropbox...
Beebox project
(1) Send original files/texts
English
Here your contents gets translated!
Machine translation: Yes/No
in
Human translation: Yes/No
Count words and cost: Yes/No
(2) Receive translated files/texts
Chinese
French
Xml, Html, Json, Flat files,
Php, .Net, Csv, Strings,
Resource files, DB dumps,
Word, Excel, Powerpoint,
Indesign and many more
Remember and re-use translations!
out
Spanish
...
Any type of content or files may be sent to Beebox and translated files will have the exact same
format as the original. For example, if you send an English PHP file to “IN”, a translated PHP file will
be placed in the “OUT” directory. Beebox is capable of identifying what content needs to be
translated. It further protects HTML, JavaScript, server, or other format-specific data and ensures
that translators cannot alter this information. Beebox supports almost any file format out of the box
including:
XML, Json, HTML, PHP, ASPX, Javascript Files, Razor, Code Files (c, java…)
Flat Files, CSV, String Files, iOS Files, Java Properties, INI Files
Word, Excel, PowerPoint, InDesign, Open Office, RTF, DITA,
.Net resources, XLIFF…
Each format may be customized to precisely fine-tune what needs to be translated by Beebox. For
example, XML formats let you specify XPath expressions, HTML-based formats let you specify the
HTML tags or attributes to translate, and still other formats use regular expressions to include or
exclude certain strings. You may use the Beebox Web API (recommended) rather than performing
steps to copy the files. Please see Chapter 4 for more details.
© 2015 Wordbee
3
2.1 Examples
Translating texts contained in a database
Let us imagine that you have a database storing product descriptions in HTML format. Your database
is already designed to store texts in each of your required languages; however, the translations are
missing!
To solve this problem, you can write a tool or script that dumps the database to an XML file (CSV,
Excel, or any other format). The XML might contain one node per product and an attribute containing
the product id. First, copy the file to Beebox. As soon as the content has been translated, you can
download the translated files, one per target language. The final step is to store the translations
back into your database (i.e. The product id mentioned earlier lets you locate where to insert each
product).
Since Beebox remembers every completed machine or human translation, you are able to send a full
dump as many times as needed without having to pay for translation again. Beebox locates the new
or updated texts without requiring input from the user so that only the identified texts will go into a
translation workflow. Of course, it is not necessary to dump the entire database. If you track which
products are changed, it may be faster to extract only changed or added products to a single file, or
one file per product.
Translating software string or HTML
Copy your HTML, PHP, ASPX, or other files to Beebox and it will again reuse previously completed
translations to ensure that only the latest changes are translated. The resulting translated HTML files
perfectly reconstitute markup, JavaScript code, or server code.
You can customize Beebox to extract translatable JavaScript strings or server code strings next to the
basic HTML contents.
2.2 Why Beebox?
Beebox is the easiest means you will find when it comes to adding translation capabilities to your
software or solutions. This is because Beebox does the following:









Supports several file formats. There is no need to convert your content.
Generates your translated files in the same original format.
Remembers all translations, never pay twice for translating the same piece of text.
Finds unique texts: A string like “Click here” may show up 500 times in your software. Beebox will
send the string just once for translation.
Supports a wide range of machine translation systems.
Supports human translation workflows via XLIFF or direct web link to translation vendors.
Includes smart alignment technology. Translated content can be aligned on the fly.
Includes a web user interface to monitor operations, locate, filter, and change text content.
Easy to integrate and offers multiple choices, like: File copy, Web API, Http callbacks, and
PowerShell scripts.
© 2015 Wordbee
4
3 The Inner Workings
The system may be configured to use any combination of machine translation1, pseudo translation2
and human translation. It adapts to your exact requirements. During development you will likely
want to use a workflow that employs pseudo- or machine translation and skips any cost incurring
human work. In production you will want to add human translation.
Here is a visual explanation of what happens to files placed in the “IN” directory or sent via the API:
1. Text extraction
Text segments
English
English
Original file
All (translatable) text is
extracted from the file. The
result is a list of segments,
typically paragraphs or
sentences. These segments
are file format independent.
Product Alpha
We are happy...
Click here
Unmatched ...
2. Pretranslation
The translation memory
remembers every single
translation done in the past.
We look up each segment to
see if a translation already
exists.
TM
French
German
Produit Alpha
Alpha Produkt
Nous sommes...
Wir sind...
Cliquer ici
Hier klicken
Chinese
Unschlagbar...
3. Machine translate
French
German
Chinese
Untranslated segments can
optionally be machine
translated using Microsoft,
Google or other systems.
Preliminary translated files
can be retrieved at this
stage.
French
German
Chinese
Produit Alpha
Alpha Produkt
阿尔法产品
Nous sommes...
Wir sind...
我们很高兴...
Cliquer ici
Hier klicken
点击这里
Inégalée ...
Unschlagbar...
无与伦比的...
Preliminary translations
Quality check
4. Human translate
French
Missing or machine-only
translations can optionally
be sent to translators or
LSPs. The result are human
quality translations.
5. Delivery
French
German
Chinese
The translated files are
created from the
translations. Optionally
notify by email, callback urls
or custom scripts.
German
Chinese
Produit Alpha
Alpha Produkt
阿尔法产品
Nous sommes...
Wir sind...
我们很高兴...
Cliquer ici
Hier klicken
点击这里
Inégalée ...
Unschlagbar...
无与伦比的...
Inhouse staff can use the
web based Beebox UI to do
quality checks, searches,
fixes... This step is optional.
Final translated files
1
2
The Beebox interfaces with major online machine translation systems such as Google and Microsoft.
To simulate translation workflows. « Translates » by converting text to lower or uppercase. Shift letters, etc.
© 2015 Wordbee
5
3.1 What is the purpose of the translation memory?
As you can see, all translations ever completed in a Beebox project are remembered. The system also
keeps track of quality levels. It knows which translations have been approved by a human translator
or proofreader and which ones have been completed by machine without approval.
We have further seen that Beebox splits your text content into smaller segments, typically the size of
a sentence or paragraph. If a specific sentence was translated into Chinese earlier and the exact
same sentence shows up in another file, Beebox can pre-translate this second sentence.
Think of a text such as “Click here”. It may show up hundreds of times in a web site, but Beebox will
send it to a MT or a human translator just once!
English
French
Alpha Produkt
阿尔法产品
We are happy...
Nous sommes...
Wir sind...
我们很高兴...
Click here
Cliquer ici
Hier klicken
点击这里
Unschlagbar...
无与伦比的...
Hello world
Bonjour tout le..
Hallo Welt
你好世界
Welcome!
Bienvenue!
Willkommen!
欢迎!
Original segment (sentence, phrase, link...)
Human translation or approved
Machine translation, unapproved
Not translated
Chinese
Produit Alpha
Unmatched ...
TM
German
Product Alpha
Language: German
Text: Bienvenue!
File: products\welcome.htm
Origin: Machine translation
Human approval: None
Context: {Text before + text right after}
Flags: Lock translation, Bookmarks, Comments
When a new or updated file is added to Beebox, it will look up each « segment » in the translation
memory. Even if the file is completely new, there is very high likelihood that some translations from
other files can be reused. This helps to reduce overall translation costs.
A Beebox project can be configured for different leveraging modes:
 Leverage translations of identical texts from wherever they come from, like in the “Click
here” example above. This is the default mode and fine with most content.
 Leverage translations only if the source text is identical as well as the context, i.e. the
segment above and below the text. Sometimes translations are done differently depending
on the context. This mode produces potentially higher quality, but also more segments to
translate.
As a developer, these details are not as important as it is typically the language service provider who
makes this choice.
© 2015 Wordbee
6
3.2 What happens if I send a new version of an already translated file?
Beebox will start by extracting the text content and split it into segments. It then attempts to pretranslate all segments from the translation memory. If the new file version did not change, then
Beebox will find all translations in the memory. It is then consequently able to fully translate this
information and send back the translated file to you. Otherwise, it will pre-translate all unchanged
segments and move the added or changed segments to MT or human translation.
For example, if a single word is replaced inside a huge iOS strings-file, only the one string containing
the changed word will be sent for MT or Human translation. All the other strings are reused from the
translation memory. As you can see Beebox is a safeguard for spending money on translation where
this is not necessary.
When Beebox pre-translates a piece of text (segment), there may be multiple choices. To maximize
pre-translation quality, the system uses heuristics to choose the best translation:
-
It first attempts to pre-translate a file from the previous file version (if one exists).
It gives more weight to approved translations.
It gives more weight to pre-translations when the context matches (text before and after).
It gives more weight to pre-translations done or post-edited by human translators.
It gives less weight to translations with QA (quality assurance) problems
3.3 Does Beebox run automatically and unattended?
Yes, Beebox is capable of running automatically with little to no human intervention – like a true
black box. However, you can also configure Beebox to wait for confirmation at certain steps in the
workflow. Examples are:
1. Confirming work before it is sent to translators (e.g. to check cost).
2. Validating work received from human translators.
In the second example, you would connect to the web interface of your Beebox to filter translations,
run checks, manually correct problems, or send the translation back to the translator. In general, it is
recommended to automate all steps.
3.4 Alignment - Can I send already translated files?
Yes, Beebox incorporates alignment technology as translated versions may be sent with your source
files. Beebox splits source and translations into segments and aligns them. In other words, you can
send existing translations to Beebox without having to create translation memories in advance (using
3rd party alignment software). It is as simple as that!
The ability to include translated files also serves another purpose. Imagine that a file had been
translated in the past. Now, someone edits the translated content (outside Beebox) and changes one
or two sentences. The next time you send the source and translated files to Beebox, it will align the
files and identify the changes in the translation. The changes go to the human translation team for
approval. This use case is specifically interesting with CMS connections where CMS proofreaders may
© 2015 Wordbee
7
want to edit the translated content and then send the content to Beebox so that its memories are
updated.
4 Developers Guide
Beebox is translation software that was built by developers for developers. We know how hard it can
be to integrate third party products or libraries. Therefore, our design goal was to make your life as
easy as possible. In this chapter we outline the multiple means to interface with Beebox. We
encourage you to choose the methods and options that you feel most comfortable with. View online
documentation.
4.1 File copy
Integration truly is simple with the File Copy option. With this feature, you can copy files requiring
translation directly to the “IN” directory, wait, and then fetch translations from the “OUT” directory.
If Beebox is located on a server, you may do this transfer using a Windows file-share, FTP, Dropbox,
or other file sharing system. For evaluating Beebox, we suggest you install it on your development
PC.
4.2 Web API
If copying files sounds too “primitive” in your ears, then please check out the web-based API. In most
scenarios, you usually only need 6 to 10 methods to complete translation objectives. By default, the
API listens on port 8089. The first method you will need to use in the web-based API establishes a
connection and obtains a connection token. Here are a few examples:
(GET) http://localhost:8089/api/connect?project=...&login=myname&password=whatever
To send a file to the “IN” directory, use:
(PUT) /api/files/file?token={token}&locale=en-US&folder=&filename=products\product1.xml
To poll which files have been fully translated and ready in “OUT”, use:
(GET) /api/workprogress/translatedfiles?token={token}&filter=&skip=&count=
To download a translated file, use:
(GET) /api/files/file?token={token}&locale=fr&folder=&filename=products\product1.xml
There are also calls to obtain cost quotations in case the human translation workflow is linked to a
Wordbee Translator platform, which is typically owned by a freelance translator or a language
service provider.
© 2015 Wordbee
8
A word on security:
You assign dedicated API credentials to each Beebox project. This is useful if your project requires
credentials to be handed out to your customers. Imagine if you gave them software, which needs to
call your Beebox API. When you connect to the API with a set of credentials, the caller can access the
associated project data only.
4.3 Instructions files
With each source file you save to the “IN” directory, you can include a so-called “instructions file”.
This optional Json formatted file, lets you:
 Specify the target languages for translation. By default, a file is translated into all project
target languages. This is helpful in instances where certain content must be translated into
certain specified languages.
 Specify a deadline for human translation steps.
 Request alignment of source file and translated file(s)
 Exclude the file from any machine translation or pseudo-translation operations (kind of
dummy automatic translation).
 Specify that file content shall never be sent for human translation. This is useful if machine
translation is sufficient for some content, but not sufficient for others.
Suppose the source file is “products\product1.xml”, then the instructions file might be:
Products\product1.xml.beebox
{ "locales": [ "fr" ], "disableMT": true }
Suppose translated files are saved to Beebox for alignment purposes. The instructions include the
“align” node with the target languages in your saved translations:
Products\product1.xml.beebox
{ "align": { “locales”: [ "fr" ] } }
4.4 Callbacks
You can configure Beebox to call a web URL with any important events. For example, Beebox can
notify your systems when a file was translated and saved to the “OUT” directory.
1. New translated files!
Beebox project
2. Beebox calls your url
Http://myserver.com/notify?event=translation&date=...
out
3. Get translation from
« OUT »
© 2015 Wordbee
9
The callback mechanism makes it unnecessary to poll Beebox at regular intervals in order to know
whether translations have arrived.
4.5 PowerShell scripts
PowerShell scripts are another mechanism that will prove to be beneficial in certain integration
scenarios. You can develop a script that is automatically executed by Beebox whenever new files to
translate are received and whenever files have been translated.
1. New translated files!
Beebox project
2. Beebox calls your
Powershell script
out
Myscript.ps1
The script can be used to transfer translated files to another server or location, to initiate an FTP, or
to do whatever is needed to integrate translations into your systems. Learn more.
4.6 Microsoft.Net extensions
Extend Beebox functionality using c# and Visual Studio. You can code translation algorithms,
custom filters and more, and upload your code to a Beebox.
From a developer point of view, things are quite straightforward: Start with creating a class library
project in Visual Studio. Add one class per extension. Finally, you compile your project and
upload the dll to the Beebox server. Learn more.
4.7 Email notifications
Finally, Beebox can send out email notifications, such as “File translated”, “Texts sent to translator”,
“Texts received from translator”, etc. for easy identification of a translation’s current status.
4.8 A word on projects and languages
One Beebox installation can manage any number of Beebox projects in any languages. Each project is
for one specific source language, which is the language in which content to translate is written. A
project can have one or more target languages, which are the languages into which content shall be
translated. If your original content is for multiple source languages, simply create as many projects as
needed to manage the current number of source languages.
© 2015 Wordbee
10
Languages are managed with their ISO two letter (sometimes three letter) codes. You can use the
Beebox API to obtain a complete list of language codes together with their English names. Languages
are expressed in their neutral form (“en” for English) or with a region indicator (“en-GB” UK English,
“en-US” US English). Both are commonly used codes as you can see by viewing this page
http://en.wikipedia.org/wiki/List_of_ISO_639-2_codes
4.9 Testing and production
It is fairly simple to work with Beebox during development, testing, and production.
Your first tests
To begin trying out Beebox, you will first need to download and install Beebox to your development
PC. Create a project and configure an automated workflow for “pseudo-translation”. This simulates a
zero-cost translation workflow: All source texts are simply converted to uppercase, lowercase or
letter shifted. Now, whatever you put into the “IN” directory will be “translated” within seconds. The
“translated” files immediately show up in the “OUT” directory. Machine translation is not very
expensive and you might consider enabling one of the web based machine translation system such as
Google Translate or Microsoft Translator.
Development in a team
If you have multiple developers who all need to use Beebox, we recommend installing it to a
development server. Please note, that a Beebox license includes one production install and one test
install. So you are immediately fine!
Production environments
Install Beebox to your production environment. This step is of course not required if your customers
or a language service provider would host Beebox.
In production, create the Beebox projects you need and configure the “true” workflows. For
example, you may want to enable a human translation workflow or a combination of human and
machine translation workflows.
Staging environments
For staging environments, you can reuse the same Beebox you installed for development or for
production. Then simply create dedicated Beebox projects for staging use.
© 2015 Wordbee
11
5 Machine and Human Translation
Each Beebox project can be configured with its own translation workflows and level of automation.
Please also refer to the user documentation for additional details and helpful screenshots. Basically,
a Beebox project can provide these workflow steps:
5.1 Pseudo translation
If enabled, new or changed text segments are first pseudo translated. Pseudo-translation refers to
conversion of text into uppercase, lowercase, letter shifts or length changes. This is useful for zerocost and rapid test workflows.
5.2 Machine translation
If enabled, new or changed text segments are first machine translated. Machine translation refers to
online systems such as Google Translate or Microsoft Translator.
You can then decide whether machine translations shall be “considered” approved or still require a
subsequent human approval or correction. In the former case, the system is told that the machinetranslated files can be created immediately and placed into the “OUT” directory. You would enable
this option if your workflow does not require any subsequent human validation.
5.3 Human translation
Unapproved content is sent to a human translator, team or language service provider. Please note
that the system does not send your original files! Instead it sends all the “unapproved” or
“untranslated” text segments earlier extracted from your files in the “IN” directory. Beebox creates
translation jobs from those segments. It then provides two workflow options to choose from:
a) XLIFF - Use XLIFF to exchange translation work with human translators. In this case, Beebox
will automatically create a translation job (or jobs) with all new or changed content. Within
the Beebox user interface, you are then able to download the XLIFF files and send them to
your translators. Once the translation is finished, the translated XLIFF files are uploaded back
to Beebox and translated files are created in the “OUT” directory.
The exchange of the Xliff jobs with a translation management or other system can be fully
© 2015 Wordbee
12
automated by means of hot-folders. In this scenario Beebox would extract new Xliff jobs to
one directory and scan another directory for translated Xliffs, all unattended.
b) Wordbee Translator - Use a direct link to a translation management system (TMS). Currently
Beebox can link to the Wordbee Translator management tool, which is a web based SaaS
solution (www.wordbee.com). With this option, all exchanges between your Beebox and a
translator or language service provider can run automatically (if desired). This occurs via the
web and with absolutely no intervention from your side.
The Wordbee Translator translation environment. See www.wordbee.com
5.4 Validation
In a fully automated workflow, you would not manually verify machine or human translations and
the system would immediately create the final translated files. However, if you want to put in place a
manual verification step, you can do so.
Beebox has a user interface that is capable of finding and filtering approved, unapproved, or QA
check translations. If problems are found you can either fix them yourself or flag them for resending
to a translator (which would create a new translation job).
© 2015 Wordbee
13
The user interface also lets you preview translated files:
5.5 Delivery
Once all translations are approved (automatically or semi-automatically), Beebox creates the
required translated files from the translated content. The files are saved to the “OUT” directory and
can be retrieved directly from this location or by using the web based API.
5.6 Incorporating legacy translations
What if you already have some translated content? You do not want to translate those once again
with Beebox, right? The following options are available in these instances:
Existing translation memories
First of all, Beebox lets you upload legacy translations as an Excel file with one column per language.
If a freelance translator or language service provider completed your translations, please ask them to
send you the translations as translation memories. This approach typically yields non-perfect results
and not all content may get translated from the memories. This is due to potential differences in how
text is segmented and marked up in Beebox vs. the translation tools used for the legacy translation.
Use Beebox alignment features to build memories on the fly
This is both the recommended and the easiest approach. Basically, you would save the existing
source files and any existing translated files to Beebox. Beebox will then align each source-target
couple and build memories on the fly. With this approach, no manual intervention is needed.
© 2015 Wordbee
14
The process is also designed to leverage legacy memories. Simply upload any memories you have
from your translators and Beebox will use them to optimize the alignment process. If you do not
have any memories you can also improve the alignment by uploading dictionaries.
6 Licensing
One Beebox license grants you the right to install Beebox to one Windows instance for production
use. In addition, you have the right to install a second Beebox to another Windows instance for
staging or testing purposes. If you require a volume license plan, please contact Wordbee for more
details.
After downloading the Beebox software, a 30-day trial is activated. This trial may be extended in
specific instances and you will need to contact Wordbee to acquire an extension of your trial when
needed.
Each Beebox installation can manage an unlimited number of Beebox projects, limited only by the
hardware on which Beebox is installed. You can further assign dedicated API credentials to each
project. This means that when your one Beebox serves 10 customers you can create 10 different sets
of credentials and disclose this information to each individual customer.
These are the most typical licensing scenarios:
You want to add translation capabilities to your own web app or software: Beebox would need to be
installed on your servers. You do not need to distribute Beebox with your software to your
customers. A single Beebox license is sufficient.
Add translation capabilities to your software and distribute the software: In this case you have two
options: (a) Distribute Beebox with your software, or, (b) Have your software communicate with your
own Beebox installation through the Beebox web API. In the first scenario, you need to obtain one
Beebox license per product installation.
Develop an off-the-shelf connector to localize a specific CMS: For example, you want to distribute a
plugin to a CMS (like Drupal, Sitefinity, Adobe CQ5 …) In that case, the plugin communicates with
Beebox via the web API. You would likely not install Beebox at each customer premises (although you
could). Instead, you will either need to complete the install of Beebox on one of your servers in order
to translate content for all of your customers or have the customers’ own language service provider
install a Beebox.
© 2015 Wordbee
15