The Bitext Approach
Vision, Problem and Mission
AI needs to be able to understand text if it is to be successful in areas like human-machine interaction. Most of the AI industry is focused on statistical approaches to text analysis. Results keep improving but are still limited, despite huge investments by giants like Google, Apple, Amazon, Microsoft and Facebook. Current statistical techniques increase their accuracy when enriched with knowledge about language. Linguistics is one of the best sources to enrich text, since it is the science of language. At Bitext, we are making linguistic knowledge easy to leverage and integrate for the AI industry. Our benchmarks show that we can currently improve machine learning precision by at least 15%. This is the breakthrough that will create a new industry around text understanding.
Differential Value
We are experts in computational and formal linguistics. We have built a Deep Linguistic Analysis Platform that handles any language phenomena in a scientific way. This scientific approach allows for industrial engineering of language tasks. That's why products built on this platform are accurate, have predictable behavior, can be systematically improved and can be customized for different languages, use cases and domains. Additionally, we know how to make this knowledge easy to access for current statistical value chains, such as machine learning. Finally, we know how to do this in many different languages (up to 50 so far). We have created the platform to understand text.
Product
Bitext Deep Linguistic Analysis Platform covers every aspect of language analysis, from the lexical to the semantic level. Our products are based on this platform. The platform is structured in layers and the different products stem from each layer. It is a virtuous cycle. First, for the platform to work efficiently, it needs to be structured in layers, so it can be customized and fine-tuned. Second, this layerization eases the productization of platform components. The platform is the perfect resource to fill different market gaps as they arise. Products developed for our clients stem from the lexical layer; while customer analytics products stem from the semantic layer. The Bitext Deep Linguistic Analysis Platform offers a wide range of rich linguistic services, from lemmatization to full syntactic analysis, in over 20 languages. It uses consistent linguistic structures and labelling across all languages
Services
Lemmatization
Provides all potential roots (lemmas) of words. For example, for the word "spoke", the lemmas "speak" and "spoke".
POS Tagging
Provides the POS (part of speech) or grammatical category of words. For example, for the word "spoke", in the sentence "The front wheel has a broken spoke.", the POS "noun".
POS Tagging and Lemmatization
Provides both the POS and lemma of words. For example, for the word "spoke", in the sentence "The front wheel has a broken spoke.", the lemma "spoke" and the POS "noun".
Phrase Extraction
Provides the phrases (nominal, verbal, adjectival or adverbial phrases) in the sentences. For example, for the sentence "You should also fix the problems with the handlebar.", the phrases "You" as "NP", "should also fix" as "VG", "the problems" as "NP" and "with the handlebar" as "PP".
Parsing (Coming Soon)
Our parser engine produces different types of output. These types are classified into two main groups
- Shallow parsing: this type of output provides a parse tree describing the structure of the constituents of the sentence
- Deep parsing: this type of output provides a parse tree describing the structure of the constituents of the sentence and their syntactic functions (subject, direct object…)
Each is used for different business purposes. Shallow parsing is a very effective way to tackle informal language, like social media or user reviews, where grammatical structure is weak. Deep parsing is the perfect way to approach formal language, like news; the quality of the text allows for fine-grained information extraction. Some challenges do not fit either method, like chatbots. For chatbots we have developed a hybrid model, combining shallow and parsing.
If you are interested in parsing, do let us know. We will be very happy to have a chat.
As for deployment, all our services can be used either as a service, via PAI, or on-premise. Also, our platform can be run on edge devices, like mobile devices or ***home speakers***