One common request I get from clients is that they want to add dot leaders to some kinds of tables. Dot leaders are pretty common and also look great in documents but are also deceptively difficult to do in HTML. With regular typesetting software, you can fill white space with periods or other characters using a variety of methods that are relatively easy to employ. Typeset documents have set widths and fonts that don’t vary depending on external factors. HTML, on the other hand, is highly dependent on a user’s viewport settings.
Tuesday, June 16. 2020
LDC #170: Adding Dot Fill Leaders With Embedded Images
Each individual user can set different widths for his or her screen and window, which can be nearly any size. Further, fonts are dependent on the fonts that are available on each computer. And finally, there are quirks in the different browsers, including mobile browsing, that further complicate spacing and padding. As a result, adding a set number of period characters as a leader fill won’t create a leader. Adding fixed widths and specifying the padding can help but it is time-consuming and we would want to be certain that the leader would look the same across all viewing platforms, which is no simple task.
Instead, another solution for this is to create a background image that can span across the entire table cell that contains a dot being repeated across it. This will make it look like the table cell is full of dots, even though it’s empty and it’s just the background image of the cell. Our text will lay on top of it, so our text is going to need an opaque white background to prevent the dot background pattern behind it from being visible, but it should effectively make it look like we have a dot leader after the text in our cell. By repeating the image across the x-axis, we can create the illusion of a leader with this simple background image.
EDGAR, however, does have additional requirements before this type of solution will work for EDGAR filings. The EDGAR system will suspend a filing if it has unreferenced images. Including an image as a background is not enough. EDGAR requires you to actually use the image in an <IMG> tag to avoid an unreferenced image error. To avoid this problem and so we don’t have to include our leader as an actual image in the document, we can use base64 encoding. This will encode our image into a string of text, and we can then use that encoded string as our background.
Base64 encoding is a method of taking any type of file and converting it to a blob of textual data. It increases the file size of whatever we’re encoding, so we want to use it on a very small image, so our string isn’t too long. You can see in the example below that our image is a fairly small string of what looks like random characters. If our image was larger, then that string would be longer. It’s important to use as small an image as possible, since this string will have to be repeated on every row that is going to have this dot leader effect.
The script I wrote to demonstrate this solution requires a table cell or a table column to be selected. If not, it will display an error and exit. When run on a valid selection, it iterates from the bottom up to go through all table cells in the selection, wrapping the text inside the cell with the appropriate DIV and FONT tags to create our style we’re going for. This is a very simple implementation of this script, but will work well as a demo. Let’s start off by taking a look at our defined values.
#define INVALID_SELECT_MSG "This function is meant to be used on table cell or table column selections, make an appropriate selection and try again." #define DOT_IMG "R0lGODlhBgAGAMIFAEdHR2dnZ+zs7Pb29v///wAAAAAAAAAAACH5BAEKAAcALAAAAAAGAAYAAAMKSLq8E0CoF5tVCQA7" #define OPEN_DIV "<div style=\"margin-bottom: 1pt; background: url(data:image/png;base64,"+DOT_IMG+") repeat-x Bottom;\">" #define CLOSE_DIV "</div>" #define BG_COLOR "white" #define BG_COLOR_PH "$$$BGCOLOR$$$" #define OPEN_FONT "<font style=\"background:$$$BGCOLOR$$$\">" #define CLOSE_FONT "</font>"
The DOT_IMG define is probably the most different from other defined values we’ve used in the past. It appears to be a random string of characters, but it’s the base64 encoding of our image. We can use this inside our OPEN_DIV tag to create an HTML <DIV> tag with a background image, without referencing an image file outside of our HTML file. The BG_COLOR define is used to define a color, and is split from the other defines to make it easier to modify in the future. If at some point in the future the script is modified to change the color based on the background of the cell instead of just defaulting to white, we know exactly what to change. The BG_COLOR_PH tag is a placeholder: something we can find and replace with the color we actually want to use in our OPEN_FONT defined tag.
One word of caution here, we’re using defined tags since it’s easier to do, but it’s probably not the best way to approach this. Using the SGML writer in GoFiler would be a much better (but more complex) option, since it could read the DTD of the document we’re editing and make the tags match. Documents written by GoFiler for EDGAR are generally all upper-case tags, for example, while our defines are done with lower-case tags so they’re compliant with xHTML. This is a pretty minor difference, and generally won’t pose a problem, but it is definitely an area that could use improvement to make the script more universal.
Now that the defines are out of the way, lets take a look at the run function, which is where all the actual processing is done.
/****************************************/ int run(int f_id, string mode, handle edit_window) { /* Call from Hook Processor */ /****************************************/ ... variable declarations ommitted .. /* */ if (mode!="preprocess"){ /* if mode is not preprocess */ return ERROR_NONE; /* return no error */ } /* */ if (edit_window==NULL_HANDLE){ /* if not passed a handle, get one */ edit_window = GetActiveEditWindow(); /* get handle to edit window */ } /* */ if(IsError(edit_window)){ /* get active edit window */ MessageBox('x',"Cannot get edit window."); /* display error */ return ERROR_EXIT; /* return */ } /* */ type = GetEditWindowType(edit_window) & EDX_TYPE_ID_MASK; /* get the type of the window */ if (type!=EDX_TYPE_PSG_PAGE_VIEW && type!=EDX_TYPE_PSG_TEXT_VIEW){ /* and make sure type is HTML or Code */ MessageBox('x',"This is not an HTML edit window."); /* display error */ return ERROR_EXIT; /* return error */ } /* */ edit_object = GetEditObject(edit_window); /* create edit object */ /* */
The run function starts off pretty simply, checking to see if the mode is preprocess so it only executes once instead of in both post and pre process modes. Then it checks to see if it was passed an edit window handle, like from our main function (which is designed to run on an open edit window from the IDE, for debugging purposes), and if not, it grabs the current active window. If our edit window is an error, we need to return an error message and exit. If we have a valid return window, we need to get the type by using the GetEditWindowType function, and make sure it’s either HTML or Code View. If not, we can return an error and exit. Assuming we’re still running, we can then get the edit object with GetEditObject.
smode = GetSelectMode(edit_object); /* get selection mode */ if (smode == EDO_ARRAY_SELECT){ /* if we have an array selection */ selections = GetSelectCount(edit_object); /* get the number of selections */ } /* */ else{ /* if the user is not using array selct */ MessageBox('x',INVALID_SELECT_MSG); /* display error */ return ERROR_NONE; /* return with no error */ } /* */ for(ix=selections-1;ix>=0;ix--){ /* while element isn't empty */ sx = GetSelectStartXPosition(edit_object,ix); /* get selection position */ sy = GetSelectStartYPosition(edit_object,ix); /* get selection position */ ex = GetSelectEndXPosition(edit_object,ix); /* get selection position */ ey = GetSelectEndYPosition(edit_object,ix); /* get selection position */ element = ReadSegment(edit_object,sx,sy,ex,ey); /* get next element */ rc = GetLastError(); /* get last error */ if (IsError(rc)){ /* if it couldn't read the element */ MessageBox('x',"Could not read HTML element, aborting."); /* print error */ return ERROR_EXIT; /* return error */ } /* */ wx = 1; /* set position of current word parse */ words = WordsToArray(element,WP_SGML_TAG); /* get words in tag */ size = ArrayGetAxisDepth(words); /* get number of words */ content = words[0]; /* get the next word */ openfont = ReplaceInString(OPEN_FONT,BG_COLOR_PH, BG_COLOR); /* set up open font */ output = content + OPEN_DIV + openfont; /* set beginning of output */ while(wx<size){ /* while we have more content */ content = words[wx]; /* get the next word */ if(IsSGMLCharacterEntity(content)== false){ /* if not a character entity */ output = output + " "; /* add spacer to output */ } /* */ output = output + content; /* append content to output */ if(wx == size-2){ /* if we're right before the end TD */ output += CLOSE_FONT + CLOSE_DIV; /* close our font tag */ } /* */ wx++; /* increment parse counter */ } /* */ ArrayClear(words); /* clear words array */ WriteSegment(edit_object,output,sx,sy,ex,ey); /* write output to file */ } /* */ CloseHandle(edit_object); /* close edit object */ return ERROR_NONE; /* Exit Done */ } /* end setup */
Once we have the edit object, we can use the GetSelectMode function to get the selection mode. If it’s an array selection, it means that table cells are selected in an HTML edit window, and we can get the number of selections with the GetSelectCount function. If not, we can just return an error and exit. Once we have the number of selections, we can iterate backwards over the selections with a for loop, starting from the last and going to the first selection. We need to go backwards, because as we write out the new tags, the coordinates of our selections might change, and going backwards means that our next selection in the loop isn’t effected by this.
For each selection, we can get a pair of start and end coordinates, and then use the ReadSegment function to read in the text of the selection as a string. With the WordsToArray function, we can then split that string into an array of words we can iterate over. Using a while loop, we can iterate over all words in our array, and build our output string that we’re going to write back into the file in its place. The output string is created before entering the while loop, by first adding the first tag in our word array (which should be the open table cell) and then adding our DIV and FONT tags from our defined values. We then add content to it in our loop. Unless the content we’re adding to our output is a character entity like a non-breaking space or a quote, we want to add a space after each new word we add. Once our loop hits the word right before the last word (the last word is assumed to be a close table cell tag), we need to add our closing FONT and DIV tags from our defines, to wrap everything up. After our loop is finished, we can clear our array, write our output to the file with WriteSegment, and go back for the next selection.
This script does have limitations, which can all be improved, but doing so will expand its complexity. To summarize, these are the limitations to be aware of with the script so far:
1) It can only handle tables (and documents) with white backgrounds. Anything else will cause white boxes to appear around text. This could be resolved by adding more intelligence into the loop that builds the output, to check for the background color of the cell and to change the background attribute of the FONT tag.
2) The user cannot change the style of dot leader. This could be improved on by adding an option in the toolbar to set a new image and allowing the user to select an image file that would be base64 encoded and stored in an INI file for future use.
3) The SGML tags written out are currently stored as defines and not dynamically generated based on the DTD.
4) Any HTML errors in your table (such as a missing close cell tag) could cause the HTML written out to behave unpredictably; good error free HTML is expected.
5) An array selection is required, but some users might just want to run it on a normal linear selection of a table, or perhaps with no selection at all.
6) As the dots are a background, it’s possible for the dots to look like they’re starting on a “partial” dot, cut in half by the word box of the text in front of it. The only way to avoid this would be to use actual dot characters, but then that would be a fixed width number of dots and wouldn’t solve the problem of having a dynamic width table with dot leaders.
All of the above issues can be resolved, but would require increasing the complexity of the script to a significant degree. Generally, we’re trying to follow the KISS principle, and keep it simple... but there’s always room for improvement. Below is a copy of the full script in it’s entirety:
// // // GoFiler Legato Script - Add Dot Leaders // ------------------------------------------ // // Rev 06/09/2020 // // // (c) 2020 Novaworks, LLC -- All rights reserved. // // Adds dot fill leaders to an HTML file in a selected area of a table. // /********************************************************/ /* Global Items */ /* ------------ */ /********************************************************/ #define INVALID_SELECT_MSG "This function is meant to be used on table cell or table column selections, make an appropriate selection and try again." #define DOT_IMG "R0lGODlhBgAGAMIFAEdHR2dnZ+zs7Pb29v///wAAAAAAAAAAACH5BAEKAAcALAAAAAAGAAYAAAMKSLq8E0CoF5tVCQA7" #define OPEN_DIV "<div style=\"margin-bottom: 1pt; background: url(data:image/png;base64,"+DOT_IMG+") repeat-x Bottom;\">" #define CLOSE_DIV "</div>" #define BG_COLOR "white" #define BG_COLOR_PH "$$$BGCOLOR$$$" #define OPEN_FONT "<font style=\"background:$$$BGCOLOR$$$\">" #define CLOSE_FONT "</font>" int run (int f_id, string mode, handle edit_window); int setup (); /****************************************/ int setup() { /* Called from Application Startup */ /****************************************/ string fnScript; /* Us */ string item[10]; /* Menu Item */ int rc; /* Return Code */ /* */ /* ** Add Menu Item */ /* * Define Function */ item["Code"] = "EXTENSION_ADD_DOT_LEADERS"; /* Function Code */ item["MenuText"] = "&Add Dot Fill Leaders"; /* Menu Text */ item["Description"] = "<B>Add Dot Fill Leaders</B>"; /* Description (long) */ item["Description"]+= "\r\rAdds Dot Leaders To Table Cells."; /* * description */ /* * Check for Existing */ rc = MenuFindFunctionID(item["Code"]); /* Look for existing */ if (IsNotError(rc)) { /* Was already be added */ return ERROR_NONE; /* Exit */ } /* end error */ /* * Registration */ rc = MenuAddFunction(item); /* Add the item */ if (IsError(rc)) { /* Was already be added */ return ERROR_NONE; /* Exit */ } /* end error */ fnScript = GetScriptFilename(); /* Get the script filename */ MenuSetHook(item["Code"], fnScript, "run"); /* Set the Test Hook */ return ERROR_NONE; /* Return value (does not matter) */ } /* end setup */ /****************************************/ int main() { /* Initialize from Hook Processor */ /****************************************/ handle window; /* window handle */ string windows[][]; /* list of all windows */ int size; /* size of edit window list */ int ix; /* counter */ /* */ if (GetScriptParent()!="LegatoIDE"){ /* if not running in IDE */ return ERROR_NONE; /* return */ } /* */ setup(); /* Add to the menu */ windows = EnumerateEditWindows(); /* get all edit windows */ size = ArrayGetAxisDepth(windows); /* get size of windows */ for(ix=0;ix<size;ix++){ /* for each edit window */ if (GetExtension(windows[ix]["Filename"])==".htm"){ /* if it's an HTML file */ MessageBox("editing window: %s",windows[ix]["Filename"]); /* display message */ window = MakeHandle(windows[ix]["ClientHandle"]); /* get handle to html file */ run(0,"preprocess",window); /* run the function */ return ERROR_NONE; /* return */ } /* */ } /* */ return ERROR_NONE; /* return */ } /* end setup */ /****************************************/ int run(int f_id, string mode, handle edit_window) { /* Call from Hook Processor */ /****************************************/ dword type; /* type of window */ string words[]; /* words from wordparser */ string openfont; /* the open font tag */ string output; /* output tag to write */ string element; /* next SGML element */ int sx,sy,ex,ey; /* coordinates of block */ int rc; /* return code */ int ix,wx; /* counter */ int size; /* size of array */ int smode, selections; /* selection mode */ string content; /* content word */ handle edit_object; /* edit object */ /* */ if (mode!="preprocess"){ /* if mode is not preprocess */ return ERROR_NONE; /* return no error */ } /* */ if (edit_window==NULL_HANDLE){ /* if not passed a handle, get one */ edit_window = GetActiveEditWindow(); /* get handle to edit window */ } /* */ if(IsError(edit_window)){ /* get active edit window */ MessageBox('x',"Cannot get edit window."); /* display error */ return ERROR_EXIT; /* return */ } /* */ type = GetEditWindowType(edit_window) & EDX_TYPE_ID_MASK; /* get the type of the window */ if (type!=EDX_TYPE_PSG_PAGE_VIEW && type!=EDX_TYPE_PSG_TEXT_VIEW){ /* and make sure type is HTML or Code */ MessageBox('x',"This is not an HTML edit window."); /* display error */ return ERROR_EXIT; /* return error */ } /* */ edit_object = GetEditObject(edit_window); /* create edit object */ /* */ smode = GetSelectMode(edit_object); /* get selection mode */ if (smode == EDO_ARRAY_SELECT){ /* if we have an array selection */ selections = GetSelectCount(edit_object); /* get the number of selections */ } /* */ else{ /* if the user is not using array selct */ MessageBox('x',INVALID_SELECT_MSG); /* display error */ return ERROR_NONE; /* return with no error */ } /* */ for(ix=selections-1;ix>=0;ix--){ /* while element isn't empty */ sx = GetSelectStartXPosition(edit_object,ix); /* get selection position */ sy = GetSelectStartYPosition(edit_object,ix); /* get selection position */ ex = GetSelectEndXPosition(edit_object,ix); /* get selection position */ ey = GetSelectEndYPosition(edit_object,ix); /* get selection position */ element = ReadSegment(edit_object,sx,sy,ex,ey); /* get next element */ rc = GetLastError(); /* get last error */ if (IsError(rc)){ /* if it couldn't read the element */ MessageBox('x',"Could not read HTML element, aborting."); /* print error */ return ERROR_EXIT; /* return error */ } /* */ wx = 1; /* set position of current word parse */ words = WordsToArray(element,WP_SGML_TAG); /* get words in tag */ size = ArrayGetAxisDepth(words); /* get number of words */ content = words[0]; /* get the next word */ openfont = ReplaceInString(OPEN_FONT,BG_COLOR_PH, BG_COLOR); /* set up open font */ output = content + OPEN_DIV + openfont; /* set beginning of output */ while(wx<size){ /* while we have more content */ content = words[wx]; /* get the next word */ if(IsSGMLCharacterEntity(content)== false){ /* if not a character entity */ output = output + " "; /* add spacer to output */ } /* */ output = output + content; /* append content to output */ if(wx == size-2){ /* if we're right before the end TD */ output += CLOSE_FONT + CLOSE_DIV; /* close our font tag */ } /* */ wx++; /* increment parse counter */ } /* */ ArrayClear(words); /* clear words array */ WriteSegment(edit_object,output,sx,sy,ex,ey); /* write output to file */ } /* */ CloseHandle(edit_object); /* close edit object */ return ERROR_NONE; /* Exit Done */ } /* end setup */
Steven Horowitz has been working for Novaworks for over five years as a technical expert with a focus on EDGAR HTML and XBRL. Since the creation of the Legato language in 2015, Steven has been developing scripts to improve the GoFiler user experience. He is currently working toward a Bachelor of Sciences in Software Engineering at RIT and MCC. |
Additional Resources
Legato Script Developers LinkedIn Group
Primer: An Introduction to Legato