The Versatility of XML in XML Translation

by Justin Carrell

Since its creation in the late 1990s, Extensible Markup Language (XML) has been a much-hyped technology.  Today, XML is more powerful and widespread than ever.  While most web professionals already understand the basics of XML, taking a deeper dive into the history and features of XML and XML translation is both fascinating and worthwhile.

The main features of XML include: 

  • Ideal for handling data with a complex structure or atypical data
  • Allows for the description of data using mark up language
  • Allows for the interchange of data between systems by using a text-based format
  • Human-friendly and computer-friendly format
  • Handles data in a tree structure having one-and only one-root element
  • Excellent for long-term data storage and data reusability

XML's roots actually go all the way back to the 1970s, as it was developed to be a lighter version of the Standard Generalized Markup Language (SGML).  Now, many years later, XML has itself spawned associated technologies such as Extensible Stylesheet Language (XSL), Document Type Definition (DTD), XML Schema Definition (XSD), XPath, and XQuery, to name a few.  Today, XML is commonly used to exchange data over the Internet.

As a markup language, novices may at first confuse XML with HTML, but the resemblance ends at the markup syntax.  HTML has strictly defined tags, whereas XML is extensible, and the tags are user-defined.  This extensibility is the source of XML's power: the familiar syntax allows the files to be read by virtually any computer system, while the definition of the tags conveys the needed data to the program which requires it. Since it is plain text, XML can be read by humans, although it is not typically intended to be without its Stylesheet.

XML files are natively encoded as UTF-8, so they support any character set that can be entered electronically. This makes the XML format ideal for natural language translation, as many non Latin-based characters are not supported in traditional ASCII encoding. In fact, XML is often used in software localization by placing User Interface (UI) strings in a resx file, or in a "strings.xml" file.  These techniques allow a program to support multilingual UI without having to recompile the source code, and additional languages can be added at any time by simply adding a new string file in another language.

Although XML is an ideal file format for translation, not all translators can work with XML files.  Structural data inside the tags must remain untouched, as must some elements and attributes.  When translating XML files, it is imperative that translators process only translatable content. Some translation tools come with ready-made filters which define what needs to be translated and what needs to remain intact. Additionally, tools will allow for customization of filters as needed.

The following is a brief example of English (source) extracted from an XML file, followed by the same content having been translated into Russian, German, and Simplified Chinese:




Simplified Chinese

Using XML to exchange information has become universally popular, and XML translation is the process by which the information contained within XML files is converted into any world language, regardless of character set. Interpro is experienced and knowledgeable with XML formats, and provides turnkey XML translation services that ensure XML content will be correctly localized and fully functional.

Justin Carrell

Localization Engineer

"I enjoy working at Interpro because it is refreshing to work every day with a group of people who are friendly and dedicated to their work."