Translation Industry

Why there are Translation Issues with PDF Format?

PDF is a popular file format. This format is very easy and perfect to share information. For this file format to be shared and read, translation of this type of document is somehow critical – for professional as well as machine translators.

*PDF stands for Portable Document Format which is used to make documents presentable that are independent of software, hardware, and other operating systems. A PDF file encapsulates the description of the fixed layout document, including fonts, text, graphics, and other display related information.

If a PDF documents fulfill the definition from the Wikipedia, as stated above, translation of PDF file format would be easy always. But, unfortunately, this is not the case. Despite with the .pdf extension in the name, the document can be different. PDF is usually used in certificate documents. So, the certificate translations get affected.

All the professional translators either working individually or working for a Translation Agency prefer the usual formats like PowerPoint or Word because they are very easy to process. With PDF file format, there is a slow process of translation. Struggling with difficult file formats is never the translator’s dream. So, many translators charge high rates for PDF file translations. The reason behind this is not the money, but to compensate a headache and extra time required for PDF translations.

Many professional translators use Computer Assisted Tools and translation memories for more work efficiency. Unfortunately, these tools are all useless if the computer is not able to read the content in the PDF files.

This is the reason why machines cannot process PDF files. The text that is, seen and understood by the human eye is not treated as a text from a technical point of view. For example, scanned documents are images, not text.   Without an OCR software, it is impossible for a machine to read and understand the scanned documents.

There are many different tools to edit the files, but they are very expensive methods. Moreover, same difficulties that translators face are also with these tools. There are some PDF files that can be modified, not all.

Fortunately, some documents can be read. But these tools are still not perfect and the results are not as the same as the original. The alignment and original layout are the things that should be preserved while translations. If the layout changes or any element of the original text is missing, some data might be lost. PDF can be made password protected. So the tools and machines cannot read the text without unlocking these files. Passwords may get lost at any time. Most often, the only option left is to work manually from scratch to the end.

The process includes the steps of reading, rewriting, translation, edit, design, and proofreading. This takes time and also increases the translation cost. Machine translation is cheap and fast, but the layout and other information might be lost.

Even if the translation is done by machine, the original layout may not work for that translated content. PDF is a visual document and its layout is designed for the original language. The new language may require more or less space, and the direction of the content may also change. It sounds like, PDF is the worst format. But it is not. Many are easy and clear for technical perspective.