Cratilo: A software package for the lexicographical analysis of texts.

Grupo de Lingüística computacional
Instituto de filosofía. Universidad de Antioquia.
P.O. Box 1226. Medellín. Colombia.

Jorge Antonio Mejía
P.O. Box 1226. Medellín. Colombia.
jamejia@catios.udea.edu.co


Software demonstration

Cratilo is a software package for the construction of concordances of written texts. The method used is a variant of the one designed by Roberto Busa (author of Index Thomisticus), adapted for the users of personal computers. Its purpose is to make the employment of computers easily available to individual researchers, in order to add speed to the task of text interpretation. This is possible both heuristically and contrastively: Heuristically, for a statistical approach can suggest preliminary hypotheses, and contrastively, for the precise address given to every word provides the possibility of checking the coherence between generalizations and data.

By concordance is meant the segmentation of the full text, word by word, keeping the address of each one (i.e. page and line), and opening, in database structure, additional fields, so the word receives a lemma (that is, the entry, should the word be classified in a dictionary) and a designation of the grammatical function of the word in the text.

Words are classified alphabetically, first by lemmas, and then (within the lemmas) by the word itself, as it appears in the text (the latter being named its graphic form).

In addition to making the concordance, Cratilo counts frequencies of characters, words and lemmas, and provides a statistical report of them that may be filtered according to the researcher's criteria (alphabetical order, Boolean connectives, ranges, etc.). A short table shows the words of the text classified by length.

Additionally, a reverse index is provided, in order to furnish the user with the possibility of examining the words by their desinence, insofar as these incorporate very important information about musical features of the text or about families of words.

Cratilo has a double interface for the display of results: printer and screen.

Lemmas and classifications have to be introduced by a human interpreter. Cratilo is endowed with a lexicon in order to facilitate this task, but it's finally the human interpreter who chooses the adequate option. This means that up to the present no routines of fully automatic or semi-automatic lemmatization have been developed.

The demonstration shall be a "trip" into the full process by using a text of about 100.000 words, in order to show the functioning of the program and the outputs Cratilo can supply to the researcher.


Hardware requirements: Pentium PC. At least 24 Mb RAM. Windows 95 and MS Office.