GoFiler Legato Script Reference
Legato v 1.5d Application v 5.25a
|
Table of Contents | < < Previous | Next >> |
Chapter Five — General Functions (continued)
The Word Parse Object supports a series of functions that are specifically tailored to “parse” or process textual data. The general-purpose parse runs in three modes: text, tags and program. The text mode provides word parsing for reading general text. The tags mode is tailored to work on XML, HTML or SGML tags and character entities and finally a program mode that is tailored for working with typical program or script text.
The text mode basically stops on word spaces, returns (line endings), and punctuation with the textual information. It is up to the script program to perform any additional analysis.
The tag mode is a basic SGML parser. It does not contain robust checking and error recovery, nor are there any provisions for DTD support. For a more expansive SGML parser, use the SGML Object which provides more complex and robust parsing. Tag mode is a simple parser and does not handle multiple lines unless a complete buffer with line endings is handed to the parse object. Other simple SGML functions to get components can be used to get the element and attributes.
Tag mode can be used with SGML and tag functions such as GetTagElement or GetTagAttributes.
Two programming modes are provided that stop on basic program syntax such as “==”, “+”, “=”, etc. In addition there is a program group mode that allows grouped items such as data in parenthesis, brackets and quotes to be grouped together.
Finally, an object notation parse mode allows for the parsing and separation of object names. This mode employs a much smaller set of stop limiters. Note that in order to support loose JSON object names, this mode allows for a number of programming delimiters to be inside of names. For example, “object.name” is broken into “object” “.” and “name” while “object.my-cat” is broken into “object” “.” and “my-cat”, not stopping on the ‘-’ character.
The Word Parsing function is meant to be lightweight and fast. It can be used to quick drive through large amounts of data.
The general steps are as follows:
– Create (get handle)
– Load/Set Data
– Iterate and Get Words/Item until End of Data (EOD).
New data can be repeatedly loaded to the same object to process multiple buffers or lines. After completion, the Word Parse Object handle should be closed.
As each item is parsed, the leading spaces and statistics are stored. For example, the caller can check to see if there are leading spaces and even return the raw space string.
The other routines pass additional information regarding word parse results such as the starting position of the item and the current parse position. The word/item buffer is limited to a maximum of 4,096 bytes.
Once the source data is set, the source variable can be changed or released. The Word Parse Object makes an internal copy of the data.
5.8.3 Setting Up a Parse Operation
The first action is to create a word parse object and retrieve a handle. That handle is then used in subsequent operations to move through the text and examine each parsed item.
For example:
handle hWP; string s1, s2; int rc, flags, spaces, count, pos; s1 = "My favorite pastime is waiting for my browser to load a page."; hWP = WordParseCreate(); if (hWP == NULL_HANDLE) { MessageBox('x', "Error on handle"); exit; } WordParseSetData(hWP, s1); s2 = WordParseGetWord(hWP); while (s2 != "") { count++; pos = WordParseGetPosition(hWP); flags = WordParseGetResult(hWP); spaces = WordParseGetSpaceSize(hWP); AddMessage(" %3d %3d %08X %3d :%s:", count, pos, flags, spaces, s2); s2 = WordParseGetWord(hWP); } CloseHandle(hWP);
In this case, the parse object is created with the default mode (text). A string is added to the parse object and then each successive word is retrieved along with certain attributes.
Functions are provided to retrieve and change the parsing position. In addition, a parse object can be used over and over again, provided the parse mode remains the same.
Object Control:
Item Parse:
Item Statics:
Related Functions:
Table of Contents | < < Previous | Next >> |
© 2012-2024 Novaworks, LLC. All rights reserved worldwide. Unauthorized use, duplication or transmission prohibited by law. Portions of the software are protected by US Patents 10,095,672, 10,706,221 and 11,210,456. GoFiler™ and Legato™ are trademarks of Novaworks, LLC. EDGAR® is a federally registered trademark of the U.S. Securities and Exchange Commission. Novaworks is not affiliated with or approved by the U.S. Securities and Exchange Commission. All other trademarks are property of their respective owners. Use of the features specified in this language are subject to terms, conditions and limitations of the Software License Agreement.