The Scale of the Problem: Why Data Security in Translation Is Not Paranoia
According to the Slator 2024 report, 67% of translation agencies use at least one cloud platform for project management. Meanwhile, 43% of corporate clients cite data security as the top criterion when choosing a translation vendor — higher than price (38%) and turnaround time (31%).
The problem is not theoretical. In March 2023, Samsung banned the use of ChatGPT after three confidential information leak incidents within 20 days. The data ended up on OpenAI's servers and was used for further model training. In the translation industry, similar risks exist at every stage: from machine translation to Translation Memory storage.
For companies working with patent documentation, M&A deals, medical data, or personal information, a leak through the translation channel can cost millions in fines and reputational damage.
Free Machine Translation: What Happens to Your Texts
Free online translators are the most obvious leak channel. What happens to your text when you paste it into Google Translate, DeepL Free, or Yandex Translator:
Google Translate (free version). According to the terms of service, Google obtains the right to use submitted content to improve its services. Text passes through Google servers, is logged, and may be used for model training. Corporate clients have the Cloud Translation API with different terms — but most employees use the free web interface.
DeepL Free. DeepL states that free users' texts are not used for training, but are temporarily stored on servers for processing. The paid DeepL Pro version guarantees text deletion immediately after translation and GDPR compliance. The difference is fundamental — but is it worth the risk when dealing with a confidential contract?
ChatGPT and other LLMs. OpenAI explicitly states that data from free accounts is used for training. Even paid ChatGPT Plus subscriptions don't provide full guarantees — you need API access with a separate Data Processing Agreement (DPA). A single legal document uploaded to a chatbot by a manager "for a quick translation" can end up in the training dataset permanently.
Practical conclusion: free MT services are categorically unsuitable for confidential texts. Even if the privacy policy formally allows use, you lose control over your data.
Cloud CAT Systems: Who Owns Your Translation Memory?
Cloud CAT platforms — Phrase (formerly Memsource), Smartcat, Crowdin, XTM Cloud — store Translation Memory, terminology databases, and source texts on their servers. This creates several risk levels:
Data ownership. Who legally owns the TM accumulated over 5 years of work? Under most cloud CAT terms, data belongs to the user. But in practice, exporting TM from one system to another involves losing metadata, segmentation, and project links. Vendor lock-in is a real problem.
Third-party access. Cloud platform administrators technically have access to your data. Phrase (Memsource) and Smartcat encrypt data at rest and in transit, but the platform decrypts it for processing — otherwise TM and search functions wouldn't work.
Storage jurisdiction. Phrase servers are located in AWS (Ireland and Frankfurt for EU clients, Virginia for others). Smartcat stores data in AWS US-East. For Russian companies required to comply with Federal Law 152-FZ on personal data localization, this is a potential violation.
MT integration. Many cloud CAT systems offer built-in machine translation: Phrase integrates Google, DeepL, Amazon Translate. When MT suggestions are enabled, source text is automatically sent to the MT provider's servers. Some clients don't realize that enabling this feature means transferring data to a third party.
Before choosing a cloud CAT system, verify: where data is physically stored, whether end-to-end encryption is supported, whether there's a DPA (Data Processing Agreement), and whether MT integrations can be disabled for specific projects.
On-Premise and Self-Hosted Solutions: Full Control
For projects with maximum security requirements, cloud solutions are not suitable. Alternatives:
SDL Trados Studio (desktop). Installed locally on the translator's computer. TM and terminology databases are stored on a local drive or corporate server. No data is transmitted to external servers — provided that cloud features (Language Cloud) are disabled. For technical translation of confidential documentation, this remains the gold standard.
memoQ Server (on-premise). The server version of memoQ is installed on the client's or agency's infrastructure. Teamwork, TM management, quality control — all within the corporate network. License cost — from 22,000 EUR per server + client licenses, but for large projects this is justified.
Self-hosted MT: MarianNMT and OpenNMT. Instead of sending texts to Google or DeepL, you can deploy your own machine translation engine. MarianNMT (Microsoft) and OpenNMT are open-source frameworks that run on a local server. Quality is 5–15% below commercial engines by BLEU score, but data doesn't leave the perimeter.
Phrase (Memsource) with SSO and dedicated environment. For corporate clients, Phrase offers an Enterprise plan with Single Sign-On (SSO), IP whitelisting, dedicated instances, and an extended DPA. This is a compromise between cloud convenience and security requirements.
The choice depends on the balance between budget, convenience, and confidentiality level. For standard commercial translations, a cloud platform with a DPA is a reasonable choice. For patent documentation, medical data, or M&A projects — on-premise only.
GDPR, Russian Federal Law 152-FZ, and International Requirements
If you work with European counterparties or translate documents containing personal data of EU citizens, GDPR compliance is mandatory. Key requirements for the translation process:
- Data Processing Agreement (DPA). The translation agency acts as a data processor. The DPA specifies: purpose of processing, data categories, retention period, technical and organizational measures (TOMs). Without a signed DPA, transferring data to an agency is a GDPR violation.
- Data minimization. The translator should receive only the amount of personal data necessary to complete the translation. Where possible — anonymization or pseudonymization before transfer.
- Right to erasure. The client has the right to request deletion of all data after project completion. The agency must delete source files, translations, and TM entries containing personal data.
- Incident notification. In case of a data breach — notification of the supervisory authority within 72 hours, notification of data subjects — without undue delay.
Russian legislation (152-FZ). Requires storage of Russian citizens' personal data on Russian territory. If an agency uses a cloud CAT system with foreign servers to translate documents with PD of Russian citizens — this is a potential violation. Fines: up to 18 million RUB for legal entities (considering the 2024 tightening).
For projects involving medical translations with diagnoses, medical histories, and test results, requirements are even stricter: Federal Law 323-FZ on medical confidentiality additionally restricts who can access medical information.
How Translation Agency "Universal" Protects Client Data
We have built a data protection system on four levels:
1. Legal level. An NDA is signed before materials are transferred — with every client and every translator. Standard NDA validity — 5 years after project completion. We accept the client's NDA or provide our own legally vetted form. We work under contract with a full set of closing documents (the agency uses a simplified tax system, no VAT).
2. Technical level. Files are transmitted via a secure portal (TLS 1.3). If needed — AES-256 encryption with the password sent via a separate channel. For on-premise projects, we use SDL Trados Studio without cloud features, with TM stored on our secure server in Moscow.
3. Organizational level. Principle of least privilege: the translator gets access only to their segment of the project. A ban on using free MT services and chatbots for confidential texts is written into our regulations and monitored. All 50+ translators undergo information security briefings.
4. Audit and control. Logging of all file operations. Quarterly audits of procedure compliance. In 2023, we terminated cooperation with two specialists for security protocol violations.
For top-confidentiality projects — notarized translations of legal documents, patent applications, M&A documentation — we offer an extended protocol: dedicated translators, an isolated workspace, and deletion of all data within 24 hours of acceptance.
Checklist: What to Look for When Choosing a Vendor
Before transferring confidential documents for translation, verify:
- NDA. Is the vendor willing to sign your NDA or provide their own? Do subcontractors (freelancers) sign NDAs?
- CAT system. Which platform is used? Cloud or desktop? Where are TM and source files stored?
- MT policy. Does the agency use machine translation? Which engine? Is data sent to external servers?
- Data storage. How long are files kept after delivery? Where are servers physically located?
- Translator access. How is access organized? Do translators download files to personal devices?
- Regulatory compliance. GDPR, 152-FZ — is there proof of compliance?
- Audits and incidents. Are security audits conducted? Is there an incident response plan?
If a vendor cannot answer these questions, that's a serious red flag. Data security doesn't happen "by default": it requires specific measures, procedures, and investment.
Find out the cost of translation with guaranteed data protection or ask questions — contact us.