The many linguistic techniques for reducing the amount of dictionary information that have been proposed all organize the dictionary's contents around prefixes, stems, suffixes, etc.. A significant reduction in the voume of store information is thus realized, especially for a highly inflected language such as Russian. For English the reduction in size is less striking. This approach requires that: (1) each text word be separated into smaller elements to establish a correspondence between the occurrence and dictionary entries, and (2) the information retrieved from several entries in the dictionary be synthesized into a description of the particular word. The logical scheme used to accomplish the former influences the placement of information in the dictionary file. Implementation of the latter requires storage of information needed only for synthesis.
We suggest the application of certain data-processing techniques as a solution to the problem. But first, we must define two terms so that their meaning will be clearly understood:
form -- any unique sequence of alphabetic characters that can appear in a language preceded and followed by a space.
occurrence -- an instance of a form in text.
We propose a method for selecting only dictionary information required by the text being translated and a means for passing the information directly to the occurrences in text. We accomplish this by compiling a list of text forms as text is read by the computer. A random storage scheme, based on the spelling of forms, provides an economical way to compile this text form list. Dictionary forms found to match forms in the text list are marked. A location in the computer store is also named for each marked form; dictionary information about the form stored at this location can be retrieved directly by occurrences of the form in text. Finally, information is retrieved from the dictionary as required by stages of the translation process -- the grammatic description for sentence-structure determination, equivalent choice information for semantic analysis, and target-language equivalents for output construction.