Relative "slowness" of XML Read

Bob Schor · July 6, 2012

I recently generated an array of data about a directory tree, and wanted to save it for later analysis. I decided to try XML because (a) it was "standard", ( had a LabVIEW (and JKI) implementation, and © potentially embedded the data description in the data, making for "portability".

My data was an array of clusters. Two elements of the cluster were "string-like", a Path and a string, while the rest were numerics.

I found that the time to write the data was linear with the size of the data (1000 records took 10 times longer to write than 100 records), but when the data were read (by NI's Read XML Data), the time went as the square of the size of the data (1000 records took 100 times longer to read than 100 records). I was already >5 minutes for 5000 records, and the "real project" I hoped to tackle had 100-fold more (incidently, the write time for 5000 records was < 3 seconds).

Dismayed by this discovery, I repeated the test with EasyXML, and got basically the same results -- writing time grows linearly with the size of the array being written, while reading grows as the square of the array size.

Why should this be? It (naively) seems to me that once you "know" that you are dealing with an array (even if it contains "variable-size" elements such as strings), you should be able to get near-linear results processing data to re-create the array. I haven't yet applied NI's "performance" metrics to my code to see "where the time goes". Another thing I'll try is to time how long it takes to produce the initial array of clusters (which I do from within a FOR loop, using the indexing output tunnel) -- if the problem involves having variable-length elements with the array (and, thus, potentially having to reshuffle and resize the array as elements are added, something that could potentially "square" the time), I should see this when creating the original array.

Be back in a few minutes with the answer to this point ...

Bob Schor

Bob Schor · July 6, 2012

What's that "smirk" doing in my post? Turns out I put "(", "b", ")" to start my second "list" element, and the Clever Forum Code made a Smiley out of it ... It didn't show in the Preview, either.

Bob Schor · July 6, 2012

OK, here's the answer. I'd previously done these calculations, but hadn't included the Creation Time.

I used as my "base dataset" my LabVIEW "programming" folder where most of my LabVIEW code resides. It has just under 5000 sub-folders, which will be traversed and indexed by my routine. I used three nested sub-sets of this folder -- one holding about half my code, a second (within this folder) holding my own "personal" projects, and a third holding just one of these projects.

As the attached data and analysis shows, Creation and XML Write times (here I was using NI's XML routines, but JKI's EasyXML gives similar numbers) had log-log slopes near 1, meaning they were near linear in speed (note that the Write time had a slope less than one, meaning it got faster as the data set got bigger, "economies of scale"?) However, the XML Read time had a slope near 2, meaning it grew as the square of the problem size.

Bob Schor

XML Read vs Write Speeds.pdf

Ashish · July 6, 2012

UPDATED: Added the attachment.

Hi,

Thanks for sharing your observations.

I tried to perform similar tests and the results appear to be linear. Attached is my test VI with the observations on the Front Panel.

Am I doing anything different than you?

Main Test.vi

Bob Schor · July 10, 2012

UPDATED: Added the attachment.

Hi,

Thanks for sharing your observations.

I tried to perform similar tests and the results appear to be linear. Attached is my test VI with the observations on the Front Panel.

Am I doing anything different than you?

Absolutely! You are doing things completely differently than I. I'm attaching my reworking of your code that definitively demonstrates the quadratic nature of the XML reading routine, contrasting it with the linear nature of the writing routine. The issue is the number of XML "records" in the file, not how many times you write a new file! Consider a text file -- if I write a text file of a million characters, I'd expect it to take 1000 times longer than to write a file of 1000 characters. Similarly, if I read a file of a million characters, I expect it to take 1000 times as long as a file of 1000 characters -- I expect the time to be linearly proportional to the size of the file.

In the case of writing XML records, this proportionality is followed. It took me 3.9 seconds to write an array of 1000 "records", and 39 seconds to do 10,000 (see attached). However, when I read these two files, the 1000-record file took 21 seconds, but the 10,000-record file took >2400 seconds, or more than 100 times longer for a ten-times-larger file.

I know what's wrong, and also know "how to fix it". I'd be happy to "tweak" the Easy Read XML File, but you'll need to "unlock" it for me (I have a purchased license for it).

Bob Schor

P.S. -- while running the attached Test routine, I noticed a strange behavior. As you'll see, I'm using the same file name for all the files. However, when I call Easy Write XML File and the file is already present, I get a LabVIEW error. Shouldn't there be an "open" option to allow you to open a file as read/write? Does the user need to precede the call to Easy Write XML File with a VI of her own to make sure that the VI does not already exist, and to delete it if it does? What is your idea about how to handle the various possibilities when getting ready to write an XML file?

Test EasyXML Write and Read.vi

Bob Schor · July 12, 2012

I just noticed something that makes a startling difference, and points out my less-than-complete understanding of XML and of EasyXML. If you look at the example code I sent you that "proves" my point, you'll notice it is completely "wrong" -- I have the "Read XML" function inside the For loop, so I'm opening the file over and over again, and extracting a single element, instead of the entire array. [incidently, if I remove the For loop, read once, and pass as the Type an Array of Clusters, the read takes less time than the write!].

What led me astray is an earlier use I made of EasyXML, where I was using it to write out Header information for experiment files. What I wanted in the Header file was a complete description of the experiment, including the "global" parameters (such as the subject's name, birthday, settings of the instrument, date and time, tester's name, etc.), trial-by-trial parameters (Trial number, specifics of the stimulus, subject's response), and a summary (total number of trials, ending time, etc.). I decided to use XML for this purpose, as it was "human-readable" (hence easy to check during development), portable, "standard", and (thanks to EasyXML) fairly simple to take into and out of LabVIEW, implementing cluster and arrays nicely.

Because the data were being generated throughout the experiment, I wanted to write the header "as I went". Not knowing enough about XML, the approach I adopted was to use your Flatten and Unflatten routines, and to write an ordinary text file. Although the resulting file was not, I now understand, a "legal" XML file (lacking the <?xml> tag, since I didn't "know any better", it was something that I could parse easily and "fool myself" that I had an XML file.

So when, in a completely different context, I decided to write out an array of clusters (which actually represented the walking of a directory tree) so I could analyze the data later, I thought of XML, and just decided to try NI's representation. I simply grabbed "Write XML to Data" and "Read XML from Data", not realizing the subtle difference between what I was doing now and what I did previously.

I now (better) realize that your code may be perfectly OK, and the problem is my lack of understanding about "How to" do XML.

So now I want to ask "How to do it". Here's what I want to do:

1) Generate a legitimate XML file, but one that does not need to be written "all at once" (which, it seems to me, is not a "requirement" for XML). I think this would require, in addition to "Easy Write XML File", an "Easy Open XML File" plus an "Easy Close XML File". Once the Open is accomplished, the user could simply use Easy Generate XML for each piece of data to be written, and simply write it as though it were a text file.

2) Read back and parse the resulting (legal) XML file. I'll note that it was easy to parse my "faux" XML file that I wrote using EasyXML, since I wrote out each element to a text file, so to parse it, I simply read in a line, which had the XML Start Tag, treated it as a variable name (which it was), found the End Tag, then knowing (from the Start Tag) the name (and therefore the type) of the variable, I could pass it to Easy Parse XML to do the work for me. For this to work for a "real" XML file, I would similarly need an "Easy Open XML File" (which, I suppose, could simply open the file as a text file for reading, and skip over the initial XML header tags until getting to the LabVIEW data section). It might possibly be useful to have an "Extract Element from XML File" that duplicates what I'm currently doing, that is, read a line, see if it has a Start Tag, if so read more lines until getting the End Tag, and return the resulting array to the user, with a signal (boolean, possibly) if the </LVData> end-of-XML tag is encountered.

I look forward to discussing this with you all at NI week.

Bob Schor

Sign In

Relative "slowness" of XML Read

Recommended Posts

Bob Schor

Link to comment

Share on other sites

Bob Schor

Link to comment

Share on other sites

Bob Schor

Link to comment

Share on other sites

Ashish

Link to comment

Share on other sites

Bob Schor

Link to comment

Share on other sites

Bob Schor

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information