Document Tools

Document Tools

The tools are accessed with items under the application’s "Tools" drop down menu.

Keyword Density

The keyword density tool displays what percentage of words in spun documents are keywords, which are set in the document’s settings. The tool calculates percentages by spinning hundreds of document variations and counting the number of keywords versus the total number of words. So the percentages displayed are typical of an average document variation.

Instead of the density of a single keyword, usually the density of variants of a keyword is what is important. For this reason, whirlDOC supports variants of keywords. The density tool will calculate the density of all variants. This makes it easy for spintax to be used for keywords as well as singular forms, plural forms, and forms that may not fit in a spintax phrase set because they are not grammatically interchangeable with other phrases in a spintax phrase set.

The keyword density dialog is displayed below. It shows two keywords that each have variants. The variants are listed below the primary keyword. The first keyword is "zombies." It has the variants "walking dead" and "undead." The second keyword is "lunch break." It has the variants "noon" and "midday," which probably should be spelled "mid day" or have that as an additional variant.

Keyword density

The "lunch break" keyword is shown with a percentage of 0.28%. This means that 0.28% of the words in an average document variation are "lunch break." 0.57% is shown for all variants of "lunch break". This 0.57% includes "lunch break."

Variation Uniqueness

The variation uniqueness tool displays scores that rate the uniqueness of document variations. These scores can be considered a rating of whether variatons will be seen as containing content copied from other variations. The scores are computed by spinning several hundred document variations then comparing them against each other by finding sections of shared text. The more text that is shared, the less unique a variation is and the lower the uniqueness scores will be. So higher scores indicate that variations will have a lower chance of being flagged by search engines as duplicate content. The algorithm used is similar to those used by plagiarism detection services.

Variation uniqueness dialog

The uniqueness dialog looks similar to the picture above. It has three scores, a phrase score, a structure score, and a composite score. Each has values ranging from zero to one hundred percent. A score of zero indicates that variations of the current document will all be the same, and one hundred indicates that variations will be completely different from one another.

The phrase score measures the variety of phrases and includes the effect of spintax. To increase it, add more spintax to the document or add more phrases to existing spintax segments. Converting more words to spintax is generally more effective than adding more phrases. Note that the uniqueness algorithm works similar to common search engine algorithms in that words that add little meaning are ignored. These words, usually called "stop words," number about two hundred and include words like "a", "the", and "which". The stop words are removed before variations are compared to each other, so when adding spintax concentrate on significant words, which are usually called "function words."

The structure score measures the effect of structural changes from sentence and paragraph variants. The measurement is done without the effect of spintax. To increase this score, add more variants. Adding paragraph variants is usually more beneficial than adding sentence variants. Writing variants that dramatically differ from existing ones is more effective than making variants by grammatically rearranging text.

The composite score takes into account both phrase uniqueness and structural uniqueness. This score can be viewed as a single number rating how well a document’s variations will avoid being detected as duplicate content

Understanding Variation Uniqueness

To understand variation uniqueness, consider a document with no spintax, no sentence variants, and no paragraph variants. Every variation spun will be exactly the same as other variations. Variations will have zero uniqueness.

If words are replaced with spintax then variations will have words that differ from other variations. The more words and phrases that are replaced with spintax, the more variations will differ from one another; but there is a limit for how unique variations can be when using only spintax. For example, if three consecutive words are converted to three spintax segments, each with two phrases, then on average one in eight (2x2x2) variations will share the same three words. So even with spintax a fraction of variations will share short strings of words. Using lots of spintax just reduces the number of string shared between variations and the percentage of variations that share the same string.

Sentence and paragraph variants allow variations to be more unique than using spintax alone. Use of variants of different lengths will shift the location of text following the variant, which makes relating a section of text in one variation to another more difficult. This is made more so when spintax is used to change words. The real power of variants lies in the ability to rewrite sections of a document so its variations bear little resemblance to variations using other variants. For example, a paragraph that makes a point using a two or three sentence metaphor can be rewritten to use another metaphor. Variations would then appear much more unique than doing a simple grammatical rearrangement.

Document Statistics

The document statistics dialog displays information about the current document. It looks similar to the following picture.

Document statistics

The statistics for minimum, maximum, and mean number of words are determined by Monte Carlo simulation. Lots of document variations are spun and the statistics are calculated from those variations. This means the figures are not exact. They tend to underestimate the maximum words and overestimate the minimum words, so during spinning there is a small chance that spun variations could have more words than the maximum shown or less than the minimum shown.

The minimum and maximum number of characters shown are exact. The mean is calculated by simulation. Note that the lower right of the status bar at the bottom of the editor shows the maximum number of characters for the entire document and the paragraph the cursor is on, so the those can be seen without popping up the statistics dialog. This part of the status bar is portrayed below.

Maximum characters display

Copy Log

The "Copy Log" tool is used for sending bug reports to Creative Adept, the maker of whirlDOC. It copies a log file kept by the application to a file where it can be emailed to Creative Adept. The dialog is shown below. It has a section to select which log to copy and a control to set where to write the log file.

Copy log dialog

The normal log is kept for every run of whirlDOC. It consists of the application’s configuration, information about the system the application is run on, and an audit trail of editing operations that were performed. Every run of the program overwrites the prior log.

An error log is created when whirlDOC detects an internal error. It consists of the normal log for that run plus information about the error. This log can be used by Creative Adept for postmortem analysis of the error. When an error log is available, a small graphic "E" symbol will be displayed at the right end of the application’s status bar. This is illustrated below. The "Error Log" option is not enabled in the dialog if an error log is not available.

Error indicator on status bar

If you encounter an error or see the display of the "E" symbol at the bottom of the editor then Creative Adept would be grateful for notification of the error . Use the "Contact Us" page at Creative Adept’s website (creativeadept.com). An email address to send a log file will be provided.

To send an error report, select the log file type on the dialog. The error log, if it is available is the most useful. Select a location where the log will be copied. Use a location that can be easily found, like the desktop. Attach the log file to an email sent to the address provided by Creative Adept.