whirlDOC Documents

whirlDOC Documents

Document Overview

whirlDOC documents are designed to produce documents that are varied at two levels, the word/phrase level and the structural level. Word and phrase level variations are done with spintax, a standard method that randomly chooses one word/phrase from multiple words/phrases. Structural variations are done with sentence and paragraph variants. When spinning a document variation, one of the variants can be randomly chosen from its group of variants or the group’s variants can be rearranged. The end result is a variation of the source document that is readable yet uses different words and structure.

For example, consider a document with an introductory paragraph followed by three paragraphs that each explain a different point. A whirlDOC document can be constructed so that spinning the document will generate variations with the introductory paragraph followed by the three paragraphs arranged in one of six possible ways. If each paragraph has three sentences and each sentence has three variants then each paragraph will have twenty-seven possible sentence combinations. Four such paragraphs in sequence will have more than half a million combinations of sentences. Taking the paragraph rearrangement described into account, there will be more than three million possible sentence arrangements. With sentences containing spintax to vary words and phrases, the total number of different document variations that can be spun will be immense.

The Document Thesaurus and Spintax

All words and phrases used in a document’s spintax are contained in a thesaurus that is embedded in the document in addition to the document’s text and structural information. This "document" thesaurus makes documents self-contained and portable between instances of the whirlDOC application. For those interested in developing applications that use whirlDOC documents, the file format is explained in the document file format appendix.

A complete explanation of thesauruses is given in the thesaurus chapter but, in short, a thesaurus contains sets of phrases that are synonyms of each other. These sets are called phrase sets. A phrase may be a single word or multiple words. For example, the words "attractive", "beautiful", and "stunning" could make up a phrase set used as an adjective. Spintax segments in a document use a phrase set in the document’s thesaurus.

Spintax Segments Are Linked Together

A feature of whirlDOC documents is that multiple spintax segments can use the same phrase set. This makes it easy to make changes to spintax throughout a document by changing a phrase set used by many spintax segments. For example, six spintax segments in a document could use the above example phrase set that contains three synonyms for "attractive". If the word "pretty" was added to the phrase set then all six spintax segments would be changed.

Document Structure

Documents are structured as a hierarchy of elements. At the top is the document itself, referred to as the root element. It contains paragraph groups that contain paragraphs that contain line groups that contain lines. The figure below illustrates the structure.

whirlDOC Document Structure

The diagram shows a paragraph group that contains two paragraphs. The first paragraph has two line groups, and the second paragraph has one. Each line group has two lines. The image below shows how a document with the same structure would be shown in whirlDOC’s document editor.

Example of whirlDOC Document Structure

Lines and Spintax

The low level building block of document structure is a line, which is usually a sentence. Lines consist of plain text and spintax segments. The following is an example line that shows expanded spintax.

The dog is {large|big|huge}.

When editing with whirlDOC, spintax is displayed as one of its phrases that is blue and underlined, but here the spintax is shown expanded so all three phrases are visible. This short line can produce three different sentences, one for each of its spintax’s three adjectives. The three sentences are:

The dog is large.
The dog is big.
The dog is huge.

whirlDOC automatically handles capitalization of a spintax segment when it is at the beginning of a sentence. Typically phrases in a thesaurus are in all lower case. Phrases that should always be capitalized, like proper nouns, should be capitalized when adding a new phrase set.

Line Groups and Sentence Variants

A line group is a set of variant lines, one of which is usually randomly selected when spinning a document variation. The selection policy can be changed by changing element attributes, which are explained later. When choosing one line at random, each of a line group’s lines typically have the same meaning expressed in different ways. The example below shows this.

On Monday he goes to the store.
He goes to the store on Monday.

The two lines are structured differently but interchangeable. Either can be substituted for the other without the meaning being changed. If most sentences in a document have variant forms like this then spun variations will have a large number of possibilities at the sentence level and the variations will be perfectly readable.

Paragraphs

Paragraphs are sequences of line groups. The lines chosen for each of a paragraph’s line groups are usually combined to form a paragraph by placing spaces between each line, but this behavior can be changed by altering attributes.

Paragraph Groups and Paragraph Variants

Paragraph groups can contain multiple paragraphs but often contain only one. If variations at the paragraph level are desired then multiple paragraphs can be put into a paragraph group. There are two typical types of paragraph variation. First, multiple paragraphs can be put into random order. Second, one of several paragraphs can be chosen. The first case is used when several points need to be made but their order is unimportant. The second case is used to choose between versions of a paragraph that are each written differently but explain the same thing. Attributes switch between the two cases.

The Document Root

The top of the hierarchy is the document root, which can be viewed as a container for the paragraph groups. Normally its paragraph groups are simply placed one after another.