Quality estimation

With the great progress in AI for generation, there is growing interest in AI for verification (✓ or ✗) to trigger human intervention where needed.

Verification for the original generative task, translation, is based on technology originally known as quality estimation in the research world.

Quality estimation is a key internal component of AI that is now successfully scaling translation and keeping human quality in the real world.

— Adam Bittlingmayer, technical co-founder and CEO, ModelFront

Machine translation quality estimation (MTQE or QE) is AI to score AI translations.

AI to score AI translations

The input is the source and the machine translation — no human translation.

So quality estimation can be used for new content, in production. That makes quality estimation models fundamentally different from quality evaluation, like BLEU. Quality estimation is much more valuable, but also much harder.

The output is a score from 0 to 100, for each new machine translation.

Example

Input
Source and translation
  Output
Score
English Hello world!
Spanish Mundo de hola!
0/100
English Hello world!
Spanish ¡Hola mundo!
90/100

Scores themselves do not create value for translation buyers, but they are key to the AI systems that do.

tl;dr

Quality estimation is AI to score AI translations.

From research task to production system

In the real world, quality estimation failed as a standalone technology, despite dozens of attempts by epic AI companies like Google over almost a decade since Transformer-based models were invented and rolled out for translation.

Evolution of quality estimation

  1. AI to score AI translations (quality estimation): 0-100
  2. AI to check AI translations (quality prediction): ✓/✗
  3. AI to check and fix AI translations and trigger human intervention as needed: xx million automated words

The companies buying translation (and human translators) failed to get concrete value out of millions of abstract raw scores like “89” or “42”. Imagine if a self-driving car app like Waymo forced you to decide on thresholds, just to get a ride safely.

In the end, bigger, older businesses, from those like Google building AI for generation, to translation agencies reselling manual human translation, provide only a raw score, not a decision. They do not want to take responsibility for keeping human quality.

Inside ModelFront, the only company fully dedicated to AI to scale translation while keeping human quality, we had no choice but to make it a true success for our customers.

And they didn’t need scores, they needed to automate millions of words.

So quality estimation models became a key internal component of ModelFront, the independent production AI system to check and fix AI translations and trigger human intervention where needed.

Large translation buyers are successfully using these AI systems to scale translation while keeping human quality.

For example, a Fortune 500 translation team that needs to buy 100 millions words of human-quality translation, might get 80 million words fully automated with AI, and send the remaining 20 million words to the manual human translation agency.

Example

Input
Source and translation
  Output
Translation and status
English Hello world!
Spanish Mundo de hola!
¡Hola mundo!
English Another great example
Spanish Otro gran ejemplo
Otro ejemplo perfecto
English Open new tab
Spanish Abrir cuenta nueva
Abrir una nueva pestaña
English 2025
Spanish 2025
2025
English 2026
Spanish 2.026
2026
… ✔

So a successful system is not just a model outputting raw scores, but more like a self-driving car app like Waymo. It creates concrete value, safely and simply, despite lots of complexity under the hood.

ModelFront actually takes responsibility for keeping human quality. That includes calibrating thresholds across tens of thousands of combinations of language and content type, evaluation, guardrails, automatic post-editing, transparent monitoring and managing the whole lifecycle of data and models over years.

Quality estimation inside a production AI system

In a production system for AI to check and fix AI translations and trigger human intervention where needed, quality estimation is inside the quality prediction model.

Example


FAQ

Answers to frequently asked questions about machine translation quality estimation

Read more


© 2026 qualityestimation.org

Supported by the team at ModelFront — AI to check and fix AI translations, and trigger human intervention where needed