The Bitext Approach

Vision, Problem and Mission

AI needs to be able to understand text if it is to be successful in areas like human-machine interaction. Most of the AI industry is focused on statistical approaches to text analysis. Results keep improving but are still limited, despite huge investments by giants like Google, Apple, Amazon, Microsoft and Facebook. Current statistical techniques increase their accuracy when enriched with knowledge about language. Linguistics is one of the best sources to enrich text, since it is the science of language. At Bitext, we are making linguistic knowledge easy to leverage and integrate for the AI industry. Our benchmarks show that we can currently improve machine learning precision by at least 15%. This is the breakthrough that will create a new industry around text understanding.

Differential Value

We are experts in computational and formal linguistics. We have built a Deep Linguistic Analysis Platform that handles any language phenomena in a scientific way. This scientific approach allows for industrial engineering of language tasks. That's why products built on this platform are accurate, have predictable behavior, can be systematically improved and can be customized for different languages, use cases and domains. Additionally, we know how to make this knowledge easy to access for current statistical value chains, such as machine learning. Finally, we know how to do this in many different languages (up to 50 so far). We have created the platform to understand text.


Bitext Deep Linguistic Analysis Platform covers every aspect of language analysis, from the lexical to the semantic level. Our products are based on this platform. The platform is structured in layers and the different products stem from each layer. It is a virtuous cycle. First, for the platform to work efficiently, it needs to be structured in layers, so it can be customized and fine-tuned. Second, this layerization eases the productization of platform components. The platform is the perfect resource to fill different market gaps as they arise. Products developed for our clients stem from the lexical layer; while customer analytics products stem from the semantic layer. The Bitext Deep Linguistic Analysis Platform offers a wide range of rich linguistic services, from lemmatization to full syntactic analysis, in over 20 languages. It uses consistent linguistic structures and labelling across all languages



Provides all potential roots (lemmas) of words. For example, for the word "spoke", the lemmas "speak" and "spoke".

POS Tagging

Provides the POS (part of speech) or grammatical category of words. For example, for the word "spoke", in the sentence "The front wheel has a broken spoke.", the POS "noun".

POS Tagging and Lemmatization

Provides both the POS and lemma of words. For example, for the word "spoke", in the sentence "The front wheel has a broken spoke.", the lemma "spoke" and the POS "noun".

Phrase Extraction

Provides the phrases (nominal, verbal, adjectival or adverbial phrases) in the sentences. For example, for the sentence "You should also fix the problems with the handlebar.", the phrases "You" as "NP", "should also fix" as "VG", "the problems" as "NP" and "with the handlebar" as "PP".

Parsing (Coming Soon)

Our parser engine produces different types of output. These types are classified into two main groups

  • Shallow parsing: this type of output provides a parse tree describing the structure of the constituents of the sentence
  • Deep parsing: this type of output provides a parse tree describing the structure of the constituents of the sentence and their syntactic functions (subject, direct object…)

Each is used for different business purposes. Shallow parsing is a very effective way to tackle informal language, like social media or user reviews, where grammatical structure is weak. Deep parsing is the perfect way to approach formal language, like news; the quality of the text allows for fine-grained information extraction. Some challenges do not fit either method, like chatbots. For chatbots we have developed a hybrid model, combining shallow and parsing.

If you are interested in parsing, do let us know. We will be very happy to have a chat.

As for deployment, all our services can be used either as a service, via PAI, or on-premise. Also, our platform can be run on edge devices, like mobile devices or ***home speakers***


Bitext DLA Platform Overview (a high level explanation of the Bitext services)

Bitext API technical documentation (Coming soon)

Bitext Lexical Attributes in +50 languages (a comprehensive list of lexical attributes codified per language)

White Papers

Lemmatization vs stemming (why lemmatization is a better solution than stemming)

Lemmatization for topic modeling (how lemmatization improves topic modeling results)

Lemmatization for search (how lemmatization improves search user interfaces)


Prediction of User Opinion for Products - A Bag-of-Words and Collaborative Filtering based Approach. Using Bitext services

Tourism as a Life Experience: A Service Science Approach. Using Bitext services

Framing Meaningful Experiences Toward a Service Science-Based Tourism Experience Design. Using Bitext services



José Echegaray 8 , building 3, office 4
Parque Empresarial Las Rozas
28232 Las Rozas



541 Jefferson Ave., Ste. 100 Redwood City
CA 94063