First posted Fri Jan 29 04:51:04 PST 2010
Last updated Wed Feb 24 09:00:37 PST 12:00:37 ET 2010
Click to see the previous copy of this WEB page.
Go to the top of this document.
- Benchmark Test Hardware, Software, Parameter Settings and Execution
- Unless otherwise stated, all the benchmark test were run on the same HP xw8600 with 16GB of real memory, two hard disk drives and an INVIDA FX NVIDIA Quadro FX 3700 graphics card.
- All the software NX5032, NX6034 and xP were 64-bit.
- Settings for the graphics card were bias toward quality, wait for vertical sync was forced on, and the profile was NX.
- In both versions of NX, runs were made with SMP ON (also refered to as multi-threading turned on AND SMP_1) and SMP OFF (also refered as multi-threading turned off AND SMP_0.)
- Both the NX5 and NX6 macros were newly recorded and the mace part. part was created and saved on NX5 so that NX6 could just open it. Thus, except for the setup, the frames in each version contain functionally the same same UG NX operations. All UG Test Frame operations are bracketed by the macro TIMER START and TIMER STOP dirtectives: The test frame sub-operations
- rotate_in_x_and_y_360_1deg_aat_85The above operation is a "graphics card rotation" that rotates every on the screen 1 degree at a time for 360 degrees
By a very wide margin, the above operation accounts for the majority of the TIME, both CPU and Elapse, to process a test frame, in both NX5 and NX6.
One difference in making the NX6 macro recording is the Edit-Transform ... no longer exist in NX6 like it did in NX5. By using control-T to access transform functions, it was possible to find the same options for the transform in NX6 as exist in NX5.
- create_png_image_85The above operation, which is essentially a PNG image export of the display area that also captures the Microsoft Task manager where the VM size along with other runtime data is available. All these images from each of the 4 runs of NX are referenced in this document.`
- The Asembly Load Options and the Visulization Performance Preferences, VPP, were set the same for NX5 and NX6.
- Instructions for running the NX5 and NX6 moster macros are here.
- Summary of Results In the process of recasting the NX5 standard GM macros to NX6 and comparing to NX5 results, very significant performance improvements were found, not with the 5 standard benchmarks in particular, but with additional macros used to discover a knee in elapse time performance curves on NX5. These monster macros, MM, use what is called, in this document, a "test frame." A test frame is a sequence of UG operations that is run many times in the MM benchmarks. The key operations in in each of the test frame macros is the import of an additional part file and a transformation of all the part files that have been accumulating in the UG model, with the execution of each test frame. The model size grows until the elapse time performance for the processing` of each additional test frame increases to where the graph will start to approach an infinite elapse time. The knee of the curve is where the graph sharply changes direction and approaches vertical, infinity.
In comparing the results of MM benchmarks from NX5 to NX6, NX6
- Was, for the complete MM run, 83% faster than the NX5 run with SMP_ON
and 80% with SMP_OFF.See the grand totals for the NX5 and NX6 runs.
Scroll left to the orange column.
- Used 84% less virtual memory space than NX5 with SMP_ON
and 85% less virtual memory space than NX5 with SMP_OFFFor NX5 VM size at end SMP_ON click this sentence.
For NX6 VM size at end SMP_ON click this sentence.
In both cases, see the "ugraf.exe" line and the "VM size"
column in the Task manager image at the bottom.
- Exhibited a 2.15 CPU utilization with SMP_ON over the 1.2 CPU utilization with SMP_ON for NX5.
- Shows no sign that it is approaching a knee with the number of part copies imported so far
These improvements are showing up in long elapse times and large virtual memory sizes. For example, the percentage improvement of elapse time performance calculation, all in second, is
(NX5 elapse time - NX6 elapse time) / NX5 elapse time, with in this case SMP_ONwhich is specifically (425,692.58 - 72,503.25) / 425,692.58 = 353,259.33 / 425,692.58 = .829 rounded up to 83%
For the NX5 run, 425,692.58 seconds is 118.32 hours or 4.9 days. With NX6 the elapse time for the run went down to 20.19 hours or .84 of a day. That is a BIG improvement, but with either of these runs the elapse time is so large it is impractical for general use. No user today will put up with 20.19 hour runs unless perhaps it is some CAE application. These improvements need to be at a smaller scale to be really useful, and as of Wed Feb 24 16:06:38 ET 2010, no such improvements have been found.
The much smaller standard benchmark results are not showing a significant improvement for NX6 over NX5. In fact, in many cases NX5 is the faster of the two. This fact can be seen in the following listing by looking to the far right for all lines with a cell that has a black background and white letters. Click this line to see the detail listing by UG sub-operation.
Or Click this line to see the summery total rows for each of the small 15 standard runs. In these total lines/rows, the majority of cells in the Elapse sec. Fractional Difference column on the far right have a black background with white text signifying NX5 is faster than NX6.
As of Thu 02/04/2010, work is on going to determine more about these results such as: are the improvements are real and if they are real, what is the scope and are they foretelling of future improvements that will impact the average CAD user. In terms of scope, one question is it simply the type of part used in these benchmarks that is causing in the spectacular improvement in NX6 or is it an improvement applicable to any type of part. There are benchmarks with other parts in process that may produce results that shed some light on this question. Given the CPU utilization of 2.15, are there significant general multi-threading improvements in NX6?
- Monster Macro Benchmark The Monster Macro works like the new_Bennetton. Instead of importing copies of the Bennetton race car assembly, the Monster Macro (MM) imports multiple copies of a` mace part. One sub-operation of each MM test frame, is an N degree edit-transform-rotate-about-a-line of the entire collection of mace parts just before the next mace part is imported. The edit-transform-rotation changes the location of all the imported mace parts, and causes enough chaos in NX5 that a knee in the elapse time performance curve surfaces when the imported collection reaches, depending on the platform, about of 60 mace objects. The last operation of a MM test frame is to export a *.PNG image of the UG display area, with the Microsoft Task manager overlapped along the bottom showing key data about the ugraf.ex process, which is the main UG process.
This form of a benchmark has successfully exposed the knee in an elapse time performance curve from NX5 running on several different platforms, but in the comparisons of NX5 and NX6, NX6 has, so far, not produced an elapse time performance curve with a knee, and has shown some very striking performance improvements in both speed and memory utilization over NX5. It is this fact that this document with the NX5 and NX6 MM data shows and analyzes. See the graphs coming up next.
- NX5034 and NX6043 Elapse Time Performance Graphs Next are three PNG images of both the NX5 and NX6 elapse time performance curves from one MM benchmark run that goes up to 80 mace objects on screen and in the UG model space.
- Elapse time seconds for all 80 test frames
- Elapse time seconds for the left side of the curves up to 61 test frames
- Elapse time seconds for the left side of the curves up to 26 test frames
These graphs show that up to about test frame 15, the two benchmarks are running neck to neck, but by test frame 50 the performance difference is about 560 seconds in favor of NX6. After that the difference gets much worse very rapidly. The NX5 curve heads for performance oblivion while NX6 the curve is essentially linear showing no sign of a knee. This is a very significant improvement for NX6 over NX5 running this benchmark.
- NX5034 and NX6043 Virtual Memory Size Graphs This is a new presentation of data that has only been available in an image format The URL points to the image data for the fifty first test frame, which was highlighted in red on the image. The graphs show that running NX6 can result in a much more effective used of memory than NX5.
- Graph showing for NX5, red, and NX6, blue, the size of virtual memory at the end of each test frame. This graph clearly shows the dramatic difference between NX5 and NX6 when running this benchmark. Without more testing or input from UG, one can not say how broad the improvement is.
- Graph with degree 2 polynomials fit to the NX5 and NX6 VM usage curves, with the derivatives of the polynomias below the other curves. The graphs of the derivatives in this context are constrained to almost no visible difference.
- Graphs of the derivatives copied into their own context In this space, the steep climb of the NX5 derivative is clear.
All of the images produced at the end of each frame with the Microsoft Task Manager overlapping the bottom are all listed in frame number order at the next two URLs. They are also referenced in comparison list coming up after the next section.
- End of frame images for the NX5 SMP_0 run
- End of frame images for the NX5 SMP_1 run
In the NX5 run, the correct Task Manager starts with image 8. In 7, the "ugraf.exe" line is not visible, and proceeding image 7 there is no Task Manager there.
- End of frame images for the NX6 SMP_0 run
- End of frame images for the NX6 SMP_1 run
In this NX6 run "ugraf.exe" is not visible until image 3. It also ran up to 85 as apposed to stopping at 80. At 85 NX6 uses less VM space than NX5 at 80.
- A summary of summaries Using the image from the last test frame of both runs and the summary from each of the raw extract files, this large image shows how the memory utilization and the performance/speed is exhibited in the data. To read the text in the image you will need to enlarge it in your browser. The memory/VM size is in bytes and the speed is in elapse time seconds.
- List of the MM results and list of comparisons of the results The first set of lists are HTML lists of different levels of detail. All the previous graphs are based on this data.
- Grand totals compared
There are only two benchmarks, hence there is only one line of grand total comparisons.
- Test frame sub-operations compared
The NX6 data is subtracted from the NX5 data, thus any negative differences or fractions signify that NX5 is faster than NX6 for that sub-operation of that test frame. There exists sub totals for each test frame. This row with the sub totals has, in the two left columns, the string TEST-FRAME no. SUB-TOTAL that is followed by the URL string pic that points to the image of the UG display area for that version of UG, NX5034 and NX6043. In this example, the image is for FRAME 19 of the NX5034 run, where the Task Manager first appears overlapping the display image.
Look at the Task Manager for some of the higher numbered test frames, and note the difference in the VM size.
- Problem lines grouped into type of benchmark/sub-operation collections
See the explanation proceeding this table for detailed descriptions of what the contents of the columns is.
Once again, the superior performance of NX6 is evident.
The next two lists are the HTML version of the raw extract file for both runs.
- Raw extract for NX5034 MM runs wrapped in HTML
- Raw extract for the NX6043 MM runs wrapped in HTML
For those who have been looking at these benchmark results for last several years, this is the same raw format you are familiar with. This is not the streight *.txt version, but the *.txt version with a little HTML wrapped around it.
- Indications of possible problems with the NX6 runs This Update Failure Report is waiting at the end of the NX6 run. It points out 407 times that Thru face does not intersect path of the tool. However, the operations in each test frame have nothing obviously to do with a tool path. No similar report has ever come out of the NX5 run.
With any UG run the log file usually has some error messages that do not cause problems. The listing show files in a directory containing the benchmark data divided` into different groups and each log file is scanned for TIMER directives and error messages. It is easy to see that the NX6 run has produced far more errors than the NX5 run. Line numbers from the log file are shown on the left.
- An segrated index of NX5 benchmark data with error messages and TIMER directives
- An segrated index of NX6 benchmark data with error messages and TIMER directives
- Other types of benchmark runs results comparing NX5 and NX6
- Runs with very small part followed by runs with a larger part
In all these runs a part is loaded and then transformed by rotating around a line 110 degrees three times. The small part is just that small. In that case, NX5 was faster in the second of the three platforms. The large part is two copies of the block pan part, the third or last transform was faster with NX5 in this case. This image of Two copies of the block_pan part with the Task Manager after the 3 transforms were completed is not putting a lot of pressure on memory given the size of the VM space.
- Standard 5 benchmarks results compared on NX5 and NX6
These runs produced a compressed problem list with 20 lines of benchmark sub-operations types. The most serious are the rotate and translate transforms in the eleu and the section cut in the pow. Is section cutting in general slower with NX6? In the new_ben runs there were 83 graphics rotations one degree at a time where NX5 was faster. These cases are graphics sub-system problems and not transform performance problems.
- The next steps
- Run the NX6 version of the MM with SMP turned off and see how performance is affected. In the current runs, the utilization of 2.15 over the entire MM NX6 run, is better than we have ever seen with SMP on in any previous release. Has UG made significant progress implementing more threading?
- Process the data from the transform tests using 14 copies of the block_pan part.
- Check with UG to verify the reasonableness of these NX6 results.
- Finalize the standard benchmarks, with a written justification of each decision about a problem type.
- Make graphs of all the standard benchmark results.
- Establish a new NX6 standard 5 benchmark baseline.
- Type in the VM size data from the UG display images partially overlapped with the Task Manager and display it in graphs. This was accomplished in item 3.
- Instructions for running the NX5032 and NX6034 MMs
- Save the tar file NX5_NX6_monster_macro_to_drive_C.tar to the C drive on your workstation and unpack it, with WinZip for example.
- Start NX5034 or NX6043, and play back one of these macros ./NX5_looking_for_knee_5032.dir/monster01macro_NX5032_82_mace_20100125_1615.macro or ./monster01macro_NX6043_85_mace_20100125_1615.macro. If you want the Task Manager data, bring up the Microsoft Task Manager, select the columns, make sure the graf.exe is in view and near the top of the Task Manager process list and position the window so that it overlapps the bottom of the UG display area making sure the graf.exe line is in the display area. The NX5 version saves UG display area snapshot at the end of each frame in ./NX5_looking_for_knee_5032.dir/piics.dir and the NX6 version saves its snapshot data in ./NX6036_benchmark_monster_mac_images.dir. In both cases, the images will be similar to this image.
- At the end of the run save the UG log file that reflects what version of NX the log data is from. This file contains all the TIMER data.
- To see all the uploaded data necessary to run the monster macros, click this line.