Dictionary Coding User Guide
Copyright © 2002 by SYSTRAN
1.1. Types of entries
To adapt translations to specific terminological needs, it is possible to:
- Reserve words that should not be translated: DNT entries (Do Not Translate)
- Create multilingual entries to modify current translations by giving equivalencies or adding new words.
1.1.1. DNT entries (Do Not Translate)
Do Not Translate (DNT) entries are used to circumvent the translation of company names, proper names, locations, trademarks or any titles or expressions that should not be translated.
Enter the DNT entry in the dictionary as is, without its corresponding meaning. Case-sensitive rules apply to all DNT entries. Enter the word, paying attention to capital letters, accents, etc. Each entry must be on a separate line.
As soon as you have a compound DNT of more than three words, you should put it between quotation marks and try to add external clues like (proper noun) or (location).
See the section: General coding rules.
Example 1. DNT entries
“Virgin Mega Store” (proper noun)
“Los Angeles” (city)
Apple (company name)
Lu (company name)
Alcampo (company name)
Telefónica (company name)
Hoechst (company name)
Example 2. English to French
« Virgin Mega Store is not a virgin mega store »
Dictionary | Translated text |
---|---|
“Virgin Mega Store” (proper noun) | Virgin Mega Store n’est pas une mémoire mega vierge. |
1.1.2. Multilingual entries
You may also want to change the default translation to:
- Give a technical equivalent for a general word,
- Define the specific meaning of a word with multiple possible translations,
- Add words that are not part of SYSTRAN’s standard dictionaries (Not Found Word).
For each entry, enter source=target for bilingual format (multi-target and Microsoft Excel formats are also available; refer to the Dictionary Manager documentation for details). Each entry must be on a separate line.
Example 3. Multilingual entries
to play = jouer
a product = un produit
Both simple and compound words can be entered as long as the whole entry can be treated as a single unit. This is akin to a traditional paper dictionary wherein the translation of compound words is given alongside the main entry.
Example 4. Simple and compound multilingual entries
a drive shaft = un arbre d’entraînement
a watering can = un arrosoir
“all rights reserved” (sentence) = “tous droits réservés” (sentence)
The quality of the translation results depends greatly on the grammatical accuracy of the original document and on the proper use of the basic punctuation and typographic conventions.
1.2. Coding principles
1.2.1. Common sequences
As a general principle, enter the canonical form of your entry (the “simplest” form that is found in a paper dictionary, either single or compound words). This way all entries will be recognized whatever their form, inflected or not. Using a large linguistic thesaurus, the system will be able to recognize the linguistic behaviour of your entry and add implicit information to generate all inflected and conjugated forms.
- For nouns, enter the singular and nominative forms (for some specific languages) and not the plural form. When an entry is coded as plural, the system only considers the plural inflection.
- For adjectives, always enter the primary form (singular and masculine). If an adjective is coded as feminine or plural, its other basic forms will not be recognized.
- For verbs, enter the infinitive (in English with the word to, and not the conjugated form. If a verb is not coded in its procedural form (its most basic form), it will not be recognized.
When an inflected form is entered in the dictionary instead of the canonical form, the system will only translate this inflected form, not the other inflected forms.
1.2.2. Protected sequences
Protected sequences are those words and phrases (fixed expressions) that do not undergo linguistic analysis, but that are accepted “as-is” for the final translation. As a consequence, none of the individual items will be inflected and the sequence will be translated exactly as entered in the dictionary. This is why it is important to keep in your dictionary the original formatting of the entry. If it appears in capital letters in your document, enter it in capital letters in your dictionary.
Protected sequences must be entered between quotation marks and have their grammatical category specified in parenthesis.
See the section: Advanced coding: Forcing the grammatical category.
Example 5. Protected sequences
“all rights reserved” (sentence) = “tous droits réservés” (sentence)
“OTAN” (noun)=”NATO” (noun)
“bi-parting” (adjective)=”à deux battants” (adjective)
The entries will remain invariable unless specified otherwise via additional clues or linguistic information.
See the section: Advanced coding: Forcing the number.
Quotation marks can be used for all or part of a fixed expression. They allow to use special characters that would not be recognized otherwise.
Example 6. Protected sequences with special characters
Any sequences of less than two letters or more than five words must be written between quotation marks.
1.2.3. Upper-case
The use of capital letters in the dictionary adheres to the same guideline as for canonical form. This means that the entry must be in its native case (in French and German, the first letter of proper nouns is in uppercase). Otherwise the system will interpret the uppercase letters as an additional linguistic clue.
For example, if a word is written in capital letters in the original document to be translated, there is no need to enter it in capital letters in the dictionary (except with regard to protected sequences) since the original format is automatically detected and respected.
Example 7. English to French
« We offer Machine Translation. »
Coding level | Dictionary | Translated text |
---|---|---|
Intuitive | machine translation = traduction automatique | Nous proposons de la Traduction Automatique. |
In fact, the use of upper-case letters, in most languages, is a clue for proper nouns and acronyms. Therefore it is recommended that its use be restrained to these situations.
1.2.4. Accentuation
The use of accented characters in the dictionary adheres to the same guidelines as for canonical form. This means that the entry must be correctly accented to be properly recognized and interpreted by the system.
1.3. Coding enhancement: linguistic clues
The SYSTRAN Dictionary Manager offers the possibility to add specific linguistic information (“linguistic clues”) to dictionary entries. Using linguistic clues will greatly improve the linguistic analysis and subsequently the translation. There are two main levels of coding:
- Intuitive coding: adding of user-friendly linguistic clues such as determiners or particles. This intuitive coding does not require specific linguistic knowledge.
- Advanced coding: adding external information such as the grammatical category, the gender or the context of an entry. This level requires basic to advanced linguistic knowledge.
Note that the two levels are compatible and that they can be used in the same dictionary.
Intuitive coding is the practice of adding intuitive grammatical clues to an entry in order to provide more information on its nature.
Adding these simple intuitive clues (determiners, particles) will give the system valuable information about the kind of entry you are entering: whether it is a noun, a verb or an expression (sentence), masculine or feminine, singular or plural.
Example 8. Spanish to English dictionary with intuitive coding
ejecutar = to run
unos tipógrafos = a typeface
2.1. Forcing the grammatical category
When an entry is ambiguous, it is possible to force its grammatical category by adding a determiner (definite or indefinite article) next to it.
Example 9. English to French
« They run the run every week. »
Coding level | Dictionary | Translated text |
---|---|---|
Intuitive | to run = faire partie de a run = une course |
Ils font partie de la course chaque semaine. |
2.2. Forcing the gender
When the SYSTRAN dictionaries contain only the masculine or the feminine form of an entry (or if it is assumed as such), or when an entry is ambiguous, it is possible to force its gender by adding a determiner next to it.
Example 10. English to French
« He left a check mark in the book. »
Coding level | Dictionary | Translated text |
---|---|---|
General | check mark = coche | Il a laissé un coche dans le livre. |
Intuitive | check mark = une coche | Il a laissé une coche dans le livre. |
2.3. Forcing the number
When a singular entry needs to be translated by a plural form, it is possible to force its number by adding the plural form in the dictionary.
Example 11. English to Spanish
« His business is prosperous. »
Coding level | Dictionary | Translated text |
---|---|---|
General | business = negocio | Su negocio es próspero. |
Intuitive | business = negocios | Sus negocios son prósperos. |
Here, the subject business will be translated into the Spanish plural form negocios and any dependent items will bear the plural inflection (verb, adjectives, determiners).
The features that fall into the realm of advanced coding are the best proof thus far of SYSTRAN’s high capacity for customization. These features allow a higher level of customization in translation, though a user must have a good general understanding of linguistic phenomena in order to act on the inflection of an entry.
Advanced coding allows a higher level of personalization: it is the practice of adding advanced linguistic information (semantic, syntactic, morphological, contextual) on the nature of an entry. These linguistic clues are always enclosed in parenthesis.
Each language has its own set of linguistic clues.
Example 12. English to French dictionary
John (proper noun) (masculine)
Portugal = Portugal (country)
red = rouge (adjective)
check box = coche (feminine)
business = affaire (plural)
3.1. Morphology
3.1.1. Forcing the grammatical category
When the grammatical category of an entry is ambiguous, it is possible to specify it. It must be added in parenthesis next to the entry.
- For a verbal entry: (verb)
- For an adjectival entry: (adjective)
- For an adverbial entry: (sentence)
- For a nominal entry: (noun)
- For proper nouns: (proper noun)
- For an acronym: (acronym)
3.1.2. Forcing the gender
If the SYSTRAN dictionaries contain the masculine form of an entry (or if it is assumed as such), or in cases in which an entry is ambiguous, it is possible to force the feminine form by adding the gender of the entry in parenthesis. Of course entries in the feminine form can also be forced to the masculine form.
This applies to nouns and proper nouns only, by adding (masculine) or (feminine).
Example 13. English to French
« He left a check mark in the book. »
Coding level | Dictionary | Translated text |
---|---|---|
General | check mark = coche | Il a laissé un coche dans le livre. |
Intuitive | check mark = coche (feminine) | Il a laissé une coche dans le livre. |
Here, the French word coche appears in the SYSTRAN French monolingual dictionary, but only as a masculine noun (and therefore does not correspond to the English noun check mark
By adding gender information to the user dictionary, using the grammatical clue (feminine), the entry is indicated as feminine regardless of the content of the monolingual dictionary, and it must be inflected as such.
3.1.3. Forcing the number
When a singular entry needs to be translated by a plural form, it is possible to force its number by adding the corresponding grammatical clue in parenthesis.
This applies to nouns only, by adding (singular) or (plural).
Example 14. English to Spanish
« His business is prosperous. »
Coding level | Dictionary | Translated text |
---|---|---|
General | business = negocio | Su negocio es próspero. |
Intuitive | business = negocio (plural) | Sus negocios son prósperos. |
Here, the subject business will be translated into the Spanish plural form negocios and any dependent items will bear the plural inflection (verb, adjectives, determiners).
3.1.4. Inflects like
It is possible to inform the system, using this advanced coding feature, of the correct inflection paradigm of an unknown entry by providing another entry that belongs to the same grammatical category and that inflects in the same manner. Thus, the feature helps the system to recognize the inflection pattern and to apply it to the entry. This is done for all grammatical categories by adding the clue (inflects like: XXX).
Example 15. French to English
« Il formate son fichier. »
Coding level | Dictionary | Translated text |
---|---|---|
General | formater = to format | He is formating his file. |
Intuitive | formater = to format (inflects like: to quit) | He is formatting his file. |
3.1.5. Plural form
Relevant only in the coding of nouns, this advanced coding feature allows users to force a particular plural form using (plural: XXX).
Also, not only does this feature provide a means for translating a singular source entry into a plural one, it also enables users to indicate the inflection pattern desired for the particular entry. As such, the system will no longer choose the form found in SYSTRAN’s linguistic resources (inflection tables and monolingual dictionary), but the form given by the user.
The plural advanced coding feature is very useful for encoding lexicons of Latin or Greek origin in which the plural form is not always well guessed by the system.
Example 16. English to French
« He has written many interesting film scripts. »
Coding level | Dictionary | Translated text |
---|---|---|
General | film script = scénario | Il a écrit beaucoup de scénarios intéressants. |
Intuitive | film script = scénario (plural: scénari) | Il a écrit beaucoup de scénari intéressants. |
3.2. Syntax
3.2.1. Prepositions
A preposition can be linked to nouns, verbs or adjectives. The preposition must be specified for both the source and target entries using (prep: XXX).
If an entry does not require a preposition, it is necessary to add (no preposition).
Example 17. English to French
« He protects his car from the rain. »
Coding level | Dictionary | Translated text |
---|---|---|
General | to protect = protéger | Il protège sa voiture contre la pluie. |
Intuitive | to protect (prep:from) = protéger (prep:de) | Il protège sa voiture de la pluie. |
Example 18. French to English
« Le maire fait don de son terrain. »
Coding level | Dictionary | Translated text |
---|---|---|
General | faire don = to offer | The mayor offered of his ground. |
Intuitive | faire don (prep:de) = to offer (no preposition) | The mayor offers his field. |
3.2.2. Bracketing
The square brackets meta-characters ([...]) isolate a compound entry within a larger one. This makes the relation between the elements of a compound clearer, thereby improving the translation. They are especially useful in making translations from an English source.
Example 19. English to Spanish
« The technical support hours are available on the web site. »
Coding level | Dictionary | Translated text |
---|---|---|
General | technical support hour = horario del servicio técnico | Las horas técnicas de la ayuda están disponibles en el website. |
Intuitive | technical support hour = horario del [servicio técnico] | Los horarios del servicio técnico están disponibles en el sitio web. |
3.3. Semantic
3.3.1. Adding a semantic category
These categories will modify the preposition or determiner that introduces the entry in the translation.
The following semantic categories apply to proper nouns only.
- (location)
- (city)
- (country)
- (first name)
- (last name)
- (product name)
- (company name)
The following semantic categories apply to both nouns and proper nouns.
- (human)
- (non human)
Example 20. English to French
« Portugal is a beautiful country. »
Coding level | Dictionary | Translated text |
---|---|---|
General | Portugal = Portugal | Portugal est un beau pays. |
Intuitive | Portugal = Portugal (country) | Le Portugal est un beau pays. |
3.4. Context
The translation of polysemic entries can be controlled by defining their semantic and/or syntactic context.
3.4.1. Semantic category
Each noun can be linked to one or more semantic categories. This is accomplished by adding (semcat: XXXX) where XXXX is the name of the category defined by the user (alphabetical uppercase name).
To use such semantic categories the dictionary must contain syntactic context entries that recall the categories.
3.4.2. Syntactic context
Each verb can be linked to specific syntactic contexts. This is accomplished by adding (context: XXXX). XXXX can either be a noun, or a semantic category that must have been previously defined in the dictionary.
Example 21. Semantic category creation and use. English to Spanish
« He saved three files. He saved the records. He saved many repertories. He saved money. He saved energy. »
Dictionary | Translated text |
---|---|
file (semcat : FILE)=fichero record (semcat: FILE)=archivo repertory (semcat: FILE)=repertorios money (semcat: RESOURCE)=dinero energy (semcat: RESOURCE)=energía to save (context: FILE)=guardar to save (context: RESOURCE)=ahorrar |
Él guardó tres ficheros. Él guardó los archivos. Él guardó muchos repertorios. Él ahorró el dinero. Él ahorró la energía el dinero. |
Example 22. Simple syntactic context. English to Italian
« My soul was saved. The files were saved. »
Dictionary | Translated text |
---|---|
to save (context: a soul) = liberare to save (context: a file) = conservare |
La mia anima è stata liberata. Le lime sono state conservate. |
3.5. Expert features
3.5.1. Noun form
By indicating the derived nominal form of a verbal entry, SYSTRAN offers the system the possibility of an alternative translation into nominal form, by using (noun form: XXX) for verbs.
Example 23. English to French
« Using this tool is simple. »
Coding level | Dictionary | Translated text |
---|---|---|
Advanced | to use = utiliser (noun form: utilisation) | L’utilisation de cet outil est simple. |