The PDF format is primarily intended for publication and distribution, not for editing. It was originally developed as a way to reliably exchange and present documents independent of software, hardware, or operating system. Since PDFs are not as editable as other file types, translating a PDF file is always more time consuming than working with native file formats (the format before it was converted to PDF).
For the purposes of translation, there are basically two types of PDF documents: those with live text, and those that are scanned documents. It's easy to tell the difference between the two types. Open the PDF and try to select some text. You will only be able to select, copy, and paste a word or paragraph if the PDF has live text. If the PDF document was created by scanning from a printed copy, then the content is a series of images, and you won't be able to select any of the text.
Translating Scanned PDFs
For scanned PDFs, the first step in the translation process is to try and convert the text into an editable form. The content has be processed with Optical Character Recognition (OCR) software and converted to editable text before it can be translated. This is a more time consuming and thus expensive PDF translation workflow. OCR conversions are often inaccurate and need to be reviewed carefully. This inaccuracy usually creates the additional steps of having to proofread and edit the OCR output. How much editing usually depends on the quality of the scanned content, the text size, the layout, and the source language. In the worst case, we may have to manually re-type some or all of the content. This increases the cost due to the extra time it takes to extract and prepare the text for translation.
Translating PDFs with Live Text
A PDF with live text is much easier to prepare for translation. We can skip the OCR conversion, editing, and proofreading process, and proceed directly to text extraction. At this point we can copy the editable text into Microsoft Word for translation. This will still involve a round of clean-up, just to make sure that all text has been extracted, unnecessary hard returns and hyphens are removed, and that the text appears in the correct reading order.
Maintaining Formatting in the Translated PDF
If you want the translated PDF file to match the source formatting, be prepared for the extra time, effort, and expertise required. This type of work is best done by multilingual desktop publishing specialists, not translators. There will be an extra step of converting the PDF into a translation-friendly layout format. The original design is then re-created to produce a source language layout file before the PDF content is translated. This is commonly a Microsoft Word document, but could also be in Adobe InDesign, Adobe Illustrator, or several other formats. The client is always made aware of the authoring application that will be used to re-create the layout. Typically, we try use the original application that generated the PDF being translated. Our clients always have the option to specify the application, version, and platform that the work will be done in, and we do our best to accommodate them whenever possible. An added value here is that you will have an editable source language layout, in case you need to make updates in the future.
It is possible to save a PDF as a Word document directly from Adobe Acrobat, but this usually produces less-than-ideal results. It adds unnecessary page, section, and column breaks within the main text flow, and does not usually convert headers and footers properly. This makes it much more difficult to format the document once it is translated, since the text expands or contracts depending on the target language. It's almost always less expensive to spend the time up-front making sure the layout is translation-ready, especially if the PDF is going to be translated into multiple languages.
One of the biggest limitations to maintaining the source formatting in translated PDF documents is dealing with embedded images. Basically, there is no option other than working with the images available in the PDF provided. Images are often downsampled to low resolution in PDFs. This means the images will look good on screen, but not when printed. Any embedded images from the PDF are extracted and placed in the re-created layout, but will usually be low resolution. If the translated PDF is destined for online distribution, this is usually not a problem. However, it is not recommended for translated PDFs that will go to print. The image resolution won't be high enough to produce quality printed materials.
We are able to handle as much of the PDF translation process as required. This means that our turnkey PDF translation services ensure that you receive back exactly what you need. Deliverables can range from a single target language translation, to a two-column format with the source and target languages side-by-side, to a fully formatted PDF ready for distribution – leaving no additional work for you or your team to do.
Should you have any questions concerning the PDF translation processes, or if you would like to have more information, please contact us and we will be happy to discuss your project with you and provide you with a recommended solution.