GoFiler Legato Script Reference
Legato v 1.5d Application v 5.25a
|
Table of Contents | < < Previous | Next >> |
Chapter Five — General Functions (continued)
Overview
The GetWordType function analyzes the content of a provided word and returns the type and attributes.
Syntax/Parameters
Syntax
dword = GetWordType ( string data );
Parameters
data
A string containing a word with no spaces.
Return Value
Returns a dword containing the attributes from the word scan.
Remarks
The GetWordType function scans the content of data and returns a composite bitwise value with the type and attributes of the word:
Definition | Bitwise | Description | ||||
Item Types | ||||||
WT_TYPE_ITEM_MASK | 0x000F0000 | Item Type Mask | ||||
WT_TYPE_UNKNOWN | 0x00000000 | Unknown Value | ||||
WT_TYPE_WORD | 0x00010000 | Word (dog, cat, monkey) | ||||
WT_TYPE_NUMBER | 0x00020000 | Number | ||||
WT_TYPE_NUMBER_SERIAL | 0x00030000 | Serial Number (12, 63) | ||||
WT_TYPE_LEADER | 0x00040000 | Leader Line | ||||
WT_TYPE_RULER | 0x00050000 | Ruler (possible or dash, nil) | ||||
WT_TYPE_CURRENCY_LEADER | 0x00060000 | Opening Currency “$ 1,121” | ||||
WT_TYPE_NIL | 0x00070000 | Nil or Compound Nil “--(a)” or “—” or “$-” | ||||
WT_TYPE_DATE | 0x00080000 | Date “12/12/12”, “12.12.12”, “23:22” or ISO | ||||
Word Variations | ||||||
WT_WORD_MASK | 0x00700000 | Word Type Mask | ||||
Types | ||||||
WT_WORD_UNKNOWN | 0x00000000 | Unknown or General Word Type | ||||
WT_WORD_LOWER | 0x00100000 | Lower Case Word | ||||
WT_WORD_UPPER | 0x00200000 | Upper Case Word | ||||
WT_WORD_INITIAL | 0x00300000 | Initial Capital | ||||
Word Flags | ||||||
WT_WORD_TRAIL_MASK | 0x000000FF | Punctuation (low in char) | ||||
WT_WORD_TRAIL_PUNCTUATION | 0x00800000 | Trails Punctuation (in low char) | ||||
WT_WORD_QUOTED | 0x01000000 | Word Quoted (can be partial) | ||||
WT_WORD_IN_HOLE | 0x02000000 | Word has Parenthesis or Brackets | ||||
WT_WORD_LEADER_TRAIL | 0x04000000 | Word has a Trailing Leader Line | ||||
Lexicon | ||||||
WT_WORD_LEXICON_MASK | 0x70000000 | Lexicon Mask | ||||
WT_WORD_DATE_MONTH | 0x10000000 | Word is in Month Lexicon | ||||
WT_WORD_DATE_DAY | 0x20000000 | Word is in Day Lexicon | ||||
WT_WORD_HONORIFIC | 0x30000000 | Word is in Honorific Lexicon | ||||
Number Variations | ||||||
WT_NUMBER_ALIGN_MASK | 0x000000FF | Alignment Position at Size | ||||
Types | ||||||
WT_NUMBER_MASK | 0x00700000 | Number Type Mask | ||||
WT_NUMBER_UNKNOWN | 0x00000000 | Unknown Type | ||||
WT_NUMBER_YEAR | 0x00100000 | Number is Year (1900-2099) | ||||
WT_NUMBER_DAY | 0x00200000 | Number is Day (1-31) | ||||
WT_NUMBER_FORMATTED | 0x00300000 | Number is Formatted | ||||
WT_NUMBER_LIST | 0x00400000 | Part of a List (1-99 with trail) | ||||
Number Flags | ||||||
WT_NUMBER_NEGATIVE | 0x01000000 | Negative Number (000) or -000 | ||||
WT_NUMBER_IN_HOLE | 0x02000000 | Negative Number (000) | ||||
WT_NUMBER_FOOTNOTE | 0x04000000 | Has Footnote | ||||
WT_NUMBER_CURRENCY | 0x08000000 | Has Currency | ||||
WT_NUMBER_PERCENT | 0x10000000 | Has Percent | ||||
WT_NUMBER_IN_HOLE_ERROR | 0x20000000 | Error in Parenthetical | ||||
WT_NUMBER_BAD_FORMAT | 0x40000000 | Bad Format (characters, not structure) | ||||
Leader Variation | ||||||
WT_LEADER_SIZE_MASK | 0x00000FFF | Word Type Mask (character in bottom) | ||||
Ruler Variations | ||||||
WT_RULER_MASK | 0x00700000 | Drawing Character in the Lower 8-bits | ||||
WT_RULER_CHARACTER | 0x000000FF | Mask for Ruler Character | ||||
Ruler Types | ||||||
WT_RULER_MIXED | 0x00000000 | Of Indeterminate Type | ||||
WT_RULER_SUBTOTAL | 0x00100000 | Subtotal Type | ||||
WT_RULER_TOTAL | 0x00200000 | Total Type | ||||
Ruler Flags | ||||||
WT_RULER_DASH | 0x01000000 | Possible Connecting Dash | ||||
Date Variations | ||||||
WT_DATE_MASK | 0x0F000000 | Date Code Mask | ||||
WT_DATE_AS_GENERAL | 0x00000000 | Date as Any Type (short mm/yy not supported) | ||||
WT_DATE_ISO_8601 | 0x01000000 | Date as ISO (in part, w w/o time) | ||||
WT_DATE_TIME_ONLY | 0x02000000 | A Time with Optional AM/PM | ||||
Unknown Word Data | ||||||
WT_UNKNOWN_ALPHA | 0x0000000F | Alpha Count | ||||
WT_UNKNOWN_NUMERIC | 0x000000F0 | Numeric Count | ||||
WT_UNKNOWN_CURRENCY | 0x00000300 | Currency Count (4) | ||||
WT_UNKNOWN_PUNCTUATION | 0x00000C00 | Sentence Punctuation Count (4) | ||||
WT_UNKNOWN_COMMA_PERIOD | 0x00003000 | Comma Period Count | ||||
WT_UNKNOWN_GROUP | 0x0000C000 | Parenthesis/Brace Group | ||||
WT_UNKNOWN_QUOTE | 0x00300000 | Quote Character Count | ||||
WT_UNKNOWN_FOOTNOTE | 0x00C00000 | Footnote Type Characters | ||||
WT_UNKNOWN_RULE | 0x03000000 | Rule Character Count | ||||
WT_UNKNOWN_ELLIPSE | 0x0C000000 | Ellipse Count | ||||
WT_UNKNOWN_OTHER | 0x30000000 | Other Count |
The item type can be filtered by ANDing the result with the WT_TYPE_ITEM_MASK value:
code = GetWordType(word); switch (code & WT_TYPE_ITEM_MASK) { case WT_TYPE_UNKNOWN: break; case WT_TYPE_WORD: break; case WT_TYPE_NUMBER: break; case WT_TYPE_NUMBER_SERIAL: break; case WT_TYPE_LEADER: break; case WT_TYPE_RULER: break; case WT_TYPE_CURRENCY_LEADER: break; case WT_TYPE_NIL: break; case WT_TYPE_DATE: break; }
Each case section can then count or act upon the details of the item.
The GetWordType function is useful for aggregating information from a text stream to perform high-level analysis. For example, a line of text can be parsed, information accumulated, and the first and last word data examined to determine the probability of the line being a heading, part of a paragraph, or perhaps a row of a table.
Analysis is performed on a gross level basis. That is, types of characters are counted and then run through logic to perform a basic analysis. For example, if one or two dashes are present without text, the content will be considered a “nil” value as would be seen in a table. On the other hand, three dashes would be considered as a possible rule or visual aid.
Other functions, such as the GetNumericType and the GetListItemType functions, can return more details regarding a number.
The word should be passed to the GetWordType function without spaces. If the Word Parse Object is employed with WP_GENERAL mode, the data returned is compatible with analysis. See the WordParseCreate function.
Related Functions
Platform Support
Go13, Go16, GoFiler Complete, GoFiler Corporate, GoFiler, GoFiler Lite, GoXBRL
Legato IDE, Legato Basic
Table of Contents | < < Previous | Next >> |
© 2012-2024 Novaworks, LLC. All rights reserved worldwide. Unauthorized use, duplication or transmission prohibited by law. Portions of the software are protected by US Patents 10,095,672, 10,706,221 and 11,210,456. GoFiler™ and Legato™ are trademarks of Novaworks, LLC. EDGAR® is a federally registered trademark of the U.S. Securities and Exchange Commission. Novaworks is not affiliated with or approved by the U.S. Securities and Exchange Commission. All other trademarks are property of their respective owners. Use of the features specified in this language are subject to terms, conditions and limitations of the Software License Agreement.