Technology offer : Set of software tools able to generate in an occurate and quasi-exhaustive manner all the grammatical forms that common nouns, adjectives and verbs can take in Arabic.

Technology's description

In most languages, common nouns, adjectives and verbs can take very various forms in sentences, depending on the grammatical rules of the language. This is specially true in the Arabic language, where a single root of three consonants can generate hundreds of different forms.

While traditionnal dictionnaries cover only a small fraction of the whole range of forms found in texts, our technology has been used to generated a database of 65 000 entries with their 6 millions of forms, covering more than 98% of the forms found in any sort of text ( literature, newspaper articles, etc...), the remainning 2% including proper names.

Arabic Lexicon interfaces with Unitex, which is an open source corpus processing system for language processing, developped by Gaspard Monge Laboratory ( LIGM UPEM ).

Unitex Arabic has been presented to prestigious organizations, like Al-Ghazali Institute of La Grande Mosquée de Paris and L'Institut du Monde Arabe.

It now can be used in a wide range of domains, like text editors, digitalization of printed documents, data mining in Arabic web contents and e-learning of Arabic.



Advantages

  • Accuracy
  • Exhaustivity
  • Responsiveness
Ref : OT-00086

Area of activity : HUMAN SCIENCES

Download pdf

Industrial applications

  • Orthographic correction - Automatic typing word completion - E-reputation analysis on web sites - E-learning of the Arabic language - Digitalization of documents

Intellectual property

  • Copyright

Technology transfer

  • Know-how

CONTACT


Serge DUC
Business Development Manager

serge.duc@cvt-sud.fr
P: +33 (0) 491 999 456