Legato
Legato

GoFiler Legato Script Reference

 

Legato v 1.5d

Application v 5.25a

  

 

Chapter ThreeData Types and Operators (continued)

3.6 Dimensional Data

3.6.1 What is Dimensional Data

Data can often be organized along multiple dimensions. This is a useful construct for some types of information in particular such as items that lend themselves to order like a cube. In Legato, an array of a particular data type can be declared as follows:

int data[10];

where the “[10]” notation signifies an array of ten integers. Specifying a size with an integer literal creates a fixed array. Declaring an array without indicating a size in one or more dimensions creates an auto-allocatable array.

A particular item in the array can be referenced by its zero-inclusive index or its key name:

data[1] = 5;

data["item1"] = 5;

A key name can be any string value that is used to reference a specific position in an array. The key is resolved by the script engine to an integer index. For example, suppose an integer array of ten items:

data[0] = 1;

data[1] = 2;

data["item3"] = 3;

data[9] = 10;

The script engine fills the next empty space in the array with the value 3 and sets the key item3 equivalent to that space’s numeric index. Once set, the key name cannot be altered without using special array management functions.

Arrays can also be multi-dimensional:

int data[10][5];

This defines an array that is 10 items by 5 items. A maximum of three dimensions (x, y, and z) are supported. Multi-dimensional arrays can be useful for organizing information in a format that is similar to a spreadsheet (row by column). A single-dimension array is considered a list, a two-dimension array a table, and a three-dimension array (3D matrix) a cube. For the most part, these conventions are not important except that certain functions are based on the names corresponding to the dimensionality. For example, to sort an array, the SortList function sorts a single-dimension variable while SortTable sorts a two dimension variable.

Strings are automatically arrays of characters. Consider the following syntax:

string s1;

char s2[];

are functionally very similar. See Section 3.5 Strings Versus Characters for more information on the differences between the string data type and character arrays. When using a multidimensional array of strings, the last or right-most position references a particular character within a string. This means the construct for a string:

a[x][y][z][i]

is referencing character i of the string at position x y z.

A number of terms are defined in the SDK for axis specification: AXIS_X (0) AXIS_Y (1) and AXIS_Z (2) for accessing data as in multiple-dimension format while AXIS_ROW (0) and AXIS_COL (1) for using table conventions. They essentially mean the same, it is just easier to think of a table in terms of rows and columns. AXIS_ALL (-1) is also defined to allow referencing of all available axes.

In some cases, y can be thought of an the up/down axis. For tables, rows are considered the x position. This is also the most efficient method of auto allocation. For example:

string list[][5];

list = CSVReadTable("my file.csv");

The CSVReadTable will read a CSV list of at least 5 columns and effectively unlimited depth. If the incoming data is wider than 5 columns, the table will be expanded.

3.6.2 Allocation

Fixed arrays are stored contiguously in memory and therefore cannot be resized once they are declared. Conversely,auto-allocatable arrays are not necessarily stored contiguously in memory but can grow dynamically. Although powerful and convenient, auto-allocatable arrays should be handled with care, particularly when passing information to external programs or files. Because the data contained in the array is not necessarily stored in adjacent memory, a binary “dump” of an auto-allocatable array may produce errors and undesirable behavior. The string data type as an array, by its nature, is not stored in a contiguous stream in memory. Rather, the array is made up of a series of string control elements with the actual string data being stored in a separate pool.

Fixed arrays are also stored in a manner to reduce size while auto-allocated arrays also start at 4kb and increment at minimum rate of 32kb. If repeatedly calling a routine that has one or more local arrays, it is preferred to use a fixed array. For example:

string aa[32];

aa = GetTagAttributes(tag);

is much faster than:

string aa[];

aa = GetTagAttributes(tag);

If the variable aa is local each time the subroutine or function is called, the variable will be instantiated. For the fixed array version of the variable, the overhead is considerably less since Legato will not be creating separate memory space and then discarding it at the exit of the routine. Of course, the GetTagAttributes function could return an unknown number of items. If this is the case, defining aa as a global variable will reduce overhead. Alternatively, make the definition a large number of elements, for example 200. For casual operations (e.g., less than a million iterations) this is not generally a concern, but it can start making a large difference in performance and the iteration count and number of auto-allocated array increases.

Note that including one dimension of auto-allocatable data or defining one dimension as auto-allocatable will force the entire array to be auto-allocatable.

The maximum size of any dimension is limited to 32-bits (0xFFFFFFFF or 4,294,967,295 elements). This well exceeds the maximum addressable memory for a 32-bit application.

As data is added to an array, the last position of each dimension is tracked. This is referred to as the array axis depth. This differs from the allocated array size. As data is added, the depth is adjusted. The ArrayGetAxisDepth function will return the number of elements. Referring to the above CSV table example:

elements = ArrayGetAxisDepth(list, AXIS_ROW);

will return the number of rows loaded. The number of elements allocated can be obtained by using the ArrayGetAxisSize function. When auto-allocated, the value will also receive additional padding such that the size will generally be the depth value plus some amount. Auto allocating always favors the x or row axis in terms of extra space.

A final note on string allocation. A standalone string will always be over-allocated by a limited amount. Depending on the number of times the string is expanded by the script, the over-allocation will be increased to improve program efficiency and to avoid thrashing memory. If the string is part of an array of strings, the initial string is not over-allocated since: (a) arrays tend to store and relay information such that strings do not generally expand once loaded; and, (b) over-allocating dimensional strings can significantly waste memory. However, once an array string entry is expanded, the same over-allocation rules will be applied.

3.6.4 Garbage Collection

Legato does not generally need to perform garbage collection. However, global variables and arrays do not return their memory to the system (application) until the script exits. For large global arrays and global strings, there can be a slow drain on system resources. Long running scripts (e.g., running for hours or days) should take this into account.

Strings reuse areas of the internal data pool and will give up areas during reallocation. If a string contracts, the space is not returned to the script engine or to the operating system, but rather, is reserved for future expansion of that specific variable. For locally declared variables, space always returned to the internal pool on function exit. For global strings, the space is only returned to the application or system on script termination.

3.6.3 Key Names

Key names are a convenient method of organizing data within a dimensional value. Many API function return complex data using arrays a key names.

The names themselves have some limitations:

–  Names can be up to 64 characters in length.

–  They cannot ave leading or trailing spaces.

–  They cannot have control characters.

Key names can also be used for CSV headers and other information. As such, programmers should exercise caution when interacting with external data.