Thesauruses

Thesauruses

Introduction

A thesaurus is a collection of phrase sets, which are sets of words or phrases that are interchangeable synonyms of each other. Phrase sets are used for spintax, which is the key to whirlDOC’s low-level document variation done at the word and phrase level.

whirlDoc uses three thesauruses, a "master" thesaurus, a "user" thesaurus, and a "document" thesaurus. The "master" thesaurus comes with the application and contains synonyms of common words and phrases. The "user" thesaurus is empty when whirlDoc is first run and is added to as documents are edited, resulting in a thesaurus that is customized to a user’s particular needs. Each document contains an embedded "document" thesaurus that provides phrase sets for the document’s spintax.

Spintax and Phrase Sets

Spintax is a standard technique for varying words in text. Words are designated as choices between synonyms. One synonym is chosen at random when making a document variation. The word spintax is a portmanteau of spin, for spinning a document variation, and syntax, which is demonstrated below.

Spintax in whirlDOC is made from a phrase set, which is a set of words or phrases that can be substituted for each other. For example, consider the sentence, “He saw the attractive painting.” The word attractive could be replaced with several alternatives while maintaining the approximate meaning of the sentence. Possibilities are beautiful, decorative, and ornamental. The four possible sentences are shown below.

He saw the attractive painting.
He saw the beautiful painting.
He saw the decorative painting.
He saw the ornamental painting.

The four adjectives can be turned into spintax. This results in the following sentence in spintax syntax. The spintax is enclosed with braces and each choice is separated by a "pipe" character.

He saw the {attractive|beautiful|decorative|ornamental} painting.

When editing a document, whirlDOC hides the spintax. One of the four adjectives will be shown blue and underlined. To spin a document variation, one of the four words in the spintax is randomly chosen. So putting this one spintax element in the sentence can produce four sentence variations. Replacing other words with spintax will multiply that number.

This example illustrates what spintax is and how phrase sets are used to create it. The four adjectives form a phrase set, which is part of the "master" or "user" thesaurus. If a document uses the phrase set for spintax then the phrase set is put into the document’s "document" thesaurus.

Phrases

Phrases do not have to be single words. They can, as the name implies, be phrases or multiple word terms. For example, “based on”, “built upon”, “made from”, and “fashioned from” could be used for a phrase set.

Usually phrases should be in all lowercase. whirlDOC will automatically capitalize the first letter of a phrase when spintax is at the beginning of a sentence. In a few cases phrases may have to be expressly capitalized. An example of this is phrases with words that should always be capitalized, like the name of a company.

The Empty Phrase

whirlDOC’s phrase sets support an empty phrase. Phrase sets that are marked as containing an empty phrase can resolve to nothing when a document is spun. Using the earlier example, if the “attractive” phrase set contained the empty phrase then the example sentence could resolve to, “The man saw the painting.” The sentence with spintax follows. Note the ending “|” character that follows the word ornamental.

He saw the {attractive|beautiful|decorative|ornamental|} painting.

The "master" thesaurus does not contain any phrase sets with the empty phrase. It is up to the user to decide whether to use empty phrases. This may be important if whirlDOC is used to generate documents with unresolved spintax that will be fed into tools that may not support an empty phrase.

Phrase Set Identifiers

Phrase sets have an identifier used to refer to them. These are assigned automatically, and the typical user should not have to worry about them. The thesaurus editor displays these IDs.

The only thing a typical user needs to know about phrase set IDs is that spintax segments in a document refer to their phrase set by ID. The phrases in a phrase set used by a spintax element are not taken out of the phrase set until a document variation is being spun. This means that changing a phrase set will change all spintax elements that use the same phrase set. This allows easy and document wide changes simply by modifying a phrase set used by multiple spintax elements in a document.

The "Master" Thesaurus

The "master" thesaurus is built into whirlDOC. It contains common phrase sets useful for general writing. The "master" thesaurus provides an initial foundation for adding spintax to documents so a user does not have to start from scratch with no predefined phrase sets.

Updating the "Master" Thesaurus

Periodically Creative Adept, the makers of whirlDOC, may release an updated version of the "master" thesaurus. The application offers an easy way to do the update. Place the provided thesaurus file in a known directory or on the desktop. Select the "Thesaurus" drop down menu at the top of the screen and click the menu item labeled "Update Master…". A small dialog box will pop up. Select the location of the thesaurus file and click the "Update" button.

The "User" Thesaurus

The "user" thesaurus is built as documents are edited. It is empty when whirlDOC is first run. As new phrase sets are created, the phrase sets may be added to the "user" thesaurus so technical terms, specialized language, and slang terms can be reused. This produces a customized thesaurus that is adapted to a user’s specific needs and writing style.

Editing the "User" Thesaurus

Phrase sets created while editing documents are automatically added to the "user" thesaurus. Phrase sets do not have to be added to the "user" thesaurus by editing the thesaurus, but whirlDOC contains a thesaurus editor where the "user" thesaurus can be viewed and modified. This editor allows phrase sets to be created, deleted, and merged. Phrase sets can also be cloned and the clones pluralized. The thesaurus editing page explains how to use the editor.

Exporting the "User" Thesaurus

The "user" thesaurus can be exported to a file that can be exchanged with other people, who can import the "user" thesaurus file into their own copy of whirlDOC. To export the thesaurus, use the "Export User…" item on the "Thesaurus" drop down menu at the top of whirlDOC’s window. A small dialog will pop up that looks like the one below.

Thesaurus Export Dialog

The export dialog is easy to use. Select the export format then select the name and location of the file that will be written.

There are two possible formats for the thesaurus file that will be written. Both can be exchanged with others. A full explanation of of the file formats is given in thesaurus file formats appendix.

  • XML: The XML format is the native thesaurus format for whirlDOC. By convention, the format usually has an extension of ".xthe".
  • TEXT: The text format is an easy text format that other software packages may be able to use. It has simple and enhanced forms. The files typically have an extension of ".txt".

Importing the "User" Thesaurus

An exported "user" thesaurus can be imported using the "Import User…" menu item on the "Thesaurus" drop down menu. The import dialog looks similar to the picture below.

Thesaurus Import Dialog

The import dialog is simple. Select the import operation then select the name and location of the file to be imported. The import file can be in either XML format or text format.

There are four ways to import a "user" thesaurus.

  • Replace Thesaurus: Completely replace the current "user" thesaurus with the imported file.
  • Smart Merge: Phrase sets in the existing "user" thesaurus that share two or more phrases with an imported phrase set will be merged with each other. Phrase sets that share only one phrase will both be put into the new "user" thesaurus. The thesaurus can be manually cleaned up with the thesaurus editor.
  • Merge With Replace: Phrase sets in the existing "user" thesaurus that share one or more phrases with an imported phrase set will be replaced.
  • Merge Without Replace: Phrase sets in the existing "user" thesaurus that share one or more phrases with an imported phrase set will not be replaced.

Resetting the "User" Thesaurus

The "user" thesaurus can be emptied with the "Reset User…" item on the "Thesaurus" drop down menu. This sets the "user" thesaurus to the same state it was when the application was first run and all user created phrase sets that were created while editing documents will be lost. Thus the program will lose its adaptation to the user’s language patterns.

The "Document" Thesaurus

A whirlDOC document has an embedded thesaurus called the "document" thesaurus. Its purpose is to store phrase sets used by spintax in the document. This makes each document completely self-contained. Documents are not tied to the particular instance of the whirlDOC program that was used to create them. They can be moved between different instances of of whirlDOC, so users can exchange them with other people.

Phrase sets are put into a document’s thesaurus automatically. Those phrase sets may come from the "master" thesaurus, the "user" thesaurus, or be created for the document being edited. whirlDOC ensures that all of a document’s spintax segments have the phrase sets they use added to the "document" thesaurus, so the user does not have to worry managing the thesaurus.

The current version of whirlDOC does not contain an editor to display and change a document’s thesaurus.