tag:blogger.com,1999:blog-88928581468769204272024-03-13T21:16:26.439-07:00Distributed ReasonerJames Edmondsonhttp://www.blogger.com/profile/06768510571708099548noreply@blogger.comBlogger12125tag:blogger.com,1999:blog-8892858146876920427.post-30218108779255720522016-02-21T14:51:00.000-08:002016-02-22T11:19:22.346-08:00ACE recursive mutexes vs. STL recursive mutexes<div dir="ltr" style="text-align: left;" trbidi="on">
In the <a href="http://madara.sourceforge.net/" target="_blank">MADARA</a> engine, we must use recursive mutexes to protect the underlying dictionary-based shared knowledge. Since 2008 or so, we have been using <a href="http://www.dre.vanderbilt.edu/Doxygen/5.7.9/html/ace/a00533.html" target="_blank">ACE_Recursive_Thread_Mutex</a> to protect our critical sections, but I have been following the C++11 spec with especial interest. The goal of MADARA is portability and speed across platforms like Windows, Linux, ARM, Intel, Mac, Android, etc. and ACE was a natural choice because of its platform support but also because of its well-tested code base and community development. Over the past five years or so, the community that supports and uses ACE has dwindled, and there has been a push within the C++ community to use libraries like Boost and STL mutexes, which are basically Boost libraries that have been standardized.<br />
<br />
But for a middleware like MADARA that is especially concerned with performance on low-powered processors for robotics systems, it's not just about how excited the C++ community is about a particular library, it's also about speed and efficiency. So, to make our own decision on whether or not the C++11 spec was ready for primetime in portable middleware, I incorporated new features into the MADARA build process to allow for null mutexes (essentially no-ops that do not actually protect multi-threaded access), STL recursive mutexes, and our current usage of ACE recursive mutex in an extensible way. After seeing the results, I retrofitted <a href="https://sourceforge.net/p/madara/code/ci/master/tree/tests/test_reasoning_throughput.cpp" target="_blank">test_reasoning_throughput</a> (one of our standard tests for performance measurements on a target platform) to include breakdowns of C++ STL mutex and recursive mutex against the ACE implementations of ACE_Thread_Mutex and ACE_Recursive_Thread_Mutex.<br />
<br />
First, the results of the direct comparisons of ACE mutexes and STL mutexes for g++ and Visual Studio 2015.<br />
<br />
<b>Settings</b> <br />
CPU: Intel® Core™ i7-4810MQ CPU @ 2.80GHz × 4<br />
Linux: Ubuntu 14.04<br />
g++ -v:<a href="http://pastebin.com/DNABMerr" target="_blank">Version Info</a><br />
Windows: 7, Sp 1<br />
Visual Studio: 2015<br />
Results are reported in average nanoseconds per operation in 100k operations performed. <br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/vM2SXST.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/vM2SXST.png" height="230" width="320" /></a></div>
<br />
As you can see from the above direct comparison, the g++ STL C++11 mutexes are roughly the same performance as the ACE Recursive Mutexes. The Visual Studio 2015 performance is supposedly much better than the Visual Studio 2013 performance, but I could not get my installation of Visual Studio 2013 to handle the STL mutex library correctly at runtime (it compiled fine but just seemed to stall for no reason). For completeness, I've included a bunch of C++ operations with no mutex usage in the breakdown as well. This is information also printed in our test_reasoning_throughput test.<br />
<br />
Now, MADARA itself performs knowledge and reasoning operations for shared information in a distributed system. The test_reasoning_throughput tests many simple operations on the MADARA knowledge bases, using these recursive mutexes often in nested ways. It also enforces quality-of-service policies and various checks about knowledge consistency, time, and various other attributes. In short, it does useful things within the critical section.<br />
<br />
The following table uses the same hardware, operating systems, and compilers to check performance of basic operations in MADARA.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/BuBDXYK.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/BuBDXYK.png" height="290" width="400" /></a></div>
<br />
Most of our current generation of MADARA software uses KaRL containers (the last line in the above check). Another thing we optimize for are large Knowledge and Reasoning Language (KaRL) programs, which fall into the 2nd and 4th row of the above table. From these metrics, the ACE Recursive Thread Mutex is still the way for us to go. However, the performance of std::recursive_mutex in g++ is promising. Hopefully, the performance of the Visual Studio STL mutex library will catch up. After all, they have 30 years of open source code to look to for inspiration ideas... if they <a href="http://www.dre.vanderbilt.edu/Doxygen/5.7.9/html/ace/index.html" target="_blank">care to open a web browser</a>.</div>
James Edmondsonhttp://www.blogger.com/profile/06768510571708099548noreply@blogger.com0tag:blogger.com,1999:blog-8892858146876920427.post-53885801064812325452013-11-03T09:42:00.003-08:002013-11-03T09:43:58.466-08:00On MADARA Documentation<div dir="ltr" style="text-align: left;" trbidi="on">
For those who have followed my blog, you probably know that my dissertation research project MADARA (Multi-Agent Distributed Adaptive Resource Allocation) is a pet project that I still put a lot of my weekends and nights into. The feature set for MADARA has grown radically since I finished my dissertation and the KaRL engine has become much more convenient, faster, and feature-rich. The result so far is a middleware that provides timing, speed, and quality-of-service like a heavy weight middleware with the ease-of-use of a scripting language (at least I hope so).<br />
<br />
In truth, I feel like MADARA has been ready for prime time usage for several months now, but I'm a perfectionist, and I have been working diligently on the middleware layers to provide easier-to-use features, deeper functionality, and faster execution.<br />
<br />
But as ridiculous as it may sound, features are not the core of prime time readiness for a middleware. Documentation is the key to wider usage. I can't tell you how many middlewares I've downloaded without documentation, and it's extremely difficult to use any tool to its maximum effectiveness without good guidance. As much as I've tried to focus on feature building in the past several months, I've spent an equal amount of time on commenting, tutorials, tests, and the new Wiki pages. If MADARA ever does impress enough people to find itself into mainstream usage, I think the documentation will be the key.<br />
<br />
Just how much documentation have I done with the MADARA middleware? Let's start with the code documentation, which generates the doxygen documentation for the library:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://sites.google.com/site/distributedreasoner/MADARA_cloc_v1_1_13.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="175" src="http://sites.google.com/site/distributedreasoner/MADARA_cloc_v1_1_13.png" width="400" /></a></div>
<br />
MADARA contains over 66,000 lines of code right now (v1.1.13) and over 22,700 lines of commenting (for every 3 lines of code, there is one line of documentation). There are also over 18,000 blank lines to aid with code legibility, which I feel is just as important as documentation. To be clear, I try to document only what is unnecessary--nothing as inane as "int i // an integer". The majority of the commenting is done for function headers so doxygen and IDEs like Visual Studio can provide helpful tips on usage, such as precondition/postcondition information, parameter listing and definition, error information, and return value descriptions. And these lines are equally as important to the middleware as code lines. To me, MADARA sits at 107,000 lines of code with a healthy proportion of nearly 2/5 dedicated to documentation and readability.<br />
<br />
But code documentation is only 2/5 the battle when it comes to user presentation. Another important aspect is code examples, which I've worked diligently to provide in the tests and tutorials directories of the code base. Here are the cloc results from these two directories.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://sites.google.com/site/distributedreasoner/MADARA_cloc_test_tutorials_v1_1_13.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="262" src="http://sites.google.com/site/distributedreasoner/MADARA_cloc_test_tutorials_v1_1_13.png" width="400" /></a></div>
The tests directory, the first printed table, is meant to test every feature added to MADARA. It's also documented and uses descriptive variable names so people can use these tests as guides for usage. Consequently, the commenting and blank line usage is similar to the main repo but slightly less because there isn't much to doxygen comment. Consequently, the comment/readability ratio is 1 line of readability for roughly every 3 lines of code.<br />
<br />
The tutorials, however, are meant as guides for developers, and they are thoroughly documented to discuss the intent of features and proper usage. One user commented to me recently that they are likely reading technical papers because of how rich they are. This bears out in the cloc results. For every 1 line of code in the tutorials, there is at least a line of documentation or blank line for readability. In fact, there are more documentation/readability lines than code lines in the repository.<br />
<br />
As weird as it may sound, I also know that this readability and documentation of tests and tutorials may not constitute even 1/5 of the battle for prime time readiness of a middleware. After all, only someone who has downloaded the repository (i.e. only someone convinced enough of the power and features of MADARA) would be able to see the tests and tutorials directories. No, I feel the main focus on documentation has to go into external guides that help developers understand just what they're getting into with a new middleware or library. So, presentations and external guides featuring code examples and descriptions have received a large deal of focus as well--though not necessarily enough.<br />
<br />
The main point of entry for external guides of MADARA is now the <a href="https://code.google.com/p/madara/wiki/MadaraArchitecture">Wiki section of the MADARA project site</a>. The MADARA Wiki now defaults to a set of guides that discuss high level overviews of the MADARA architecture, interactions with the Knowledge Base, interactions with the Transport layer, and what the target audience for the middleware is. There are half a dozen images outlining interactions and overviews to aid developers in visualizing the system, Youtube code tutorials, and video of example usage in a swarm of commercial uavs--the latter two of which are available at the bottom of each Wiki page in the More Information section. These Wiki guides add an additional 1700 lines of effort to make MADARA more user-friendly and accessible.<br />
<br />
So, what do you think? When you get a chance, check out the <a href="https://code.google.com/p/madara/wiki/MadaraArchitecture">Wiki section of the MADARA project site</a> and maybe the code tutorials on the left hand pane. Feel free to tell me what you think. Documentation is an ongoing process, and I welcome the feedback!</div>
James Edmondsonhttp://www.blogger.com/profile/06768510571708099548noreply@blogger.com0tag:blogger.com,1999:blog-8892858146876920427.post-14605389391851080302012-02-10T15:13:00.000-08:002012-02-10T15:28:06.765-08:00What is MADARA KATS?MADARA is a focal point of my job talks, and KATS is one of the most exciting tools available in MADARA. In a nutshell, the KaRL Automated Testing Suite (KATS) is a portable deployment and testing system meant for automated sequencing or testing targeted at distributed, real-time and embedded systems.<br />
<br />
What separates MADARA KATS from the rest of the pack is that the entire system is done in a decentralized way. This may not immediately seem important or interesting, but this opens up fine-grained control, support for fault tolerance, and responsiveness that can't be found anywhere else, especially in centralized solutions--e.g., solutions that use a centralized controller.<br />
<br />
The features of KATS are itemized below:<br />
<ul><li>Fully decentralized system that targets large scale testing across multiple machines in a local area network</li>
<li>Portable to most operating systems (Windows, Linux, Apple, etc.)</li>
<li>Control over launched application<br />
<ol><li>Executable, command line, environment variables and many other application inputs</li>
<li>Kill time and signal (on Windows, only terminate is available)</li>
<li>Real-time class for elevating process priority</li>
</ol></li>
<li>Batch processing with parallel or sequential execution</li>
<li>XML configurable</li>
<li>Domain-specific modeling language available for modeling in GME</li>
<li>Ability to instrument Android smartphones via both Monkeyrunner (MADARA MAML library) and ADB (MADARA MAAL library)</li>
<li>8-phase process lifecycle (See figure below for visual)<br />
<ol><li>Barrier (optional)--require that a group of processes come to a barrier before application launch</li>
<li>Precondition (optional)--require that a condition is met before application launch (e.g., if another process succeeds or fails in one of its lifecycle phases)</li>
<li>Temporal delay (optional)--operating system portable sleep time</li>
<li>Post delay (optional)--set a global condition or perform logic that indicates you are past temporal delay phase</li>
<li>Application launch--launch an application</li>
<li>Post launch (optional)--set a global condition or perform logic that indicates your application has been launched.</li>
<li>Post condition (optional)--set a global condition or perform logic based on the return value/exit code of your application.</li>
<li>Exit</li>
</ol></li>
<li>Built-in network transports for RTI DDS and Open Splice DDS. Other transports can be added via expansions to 2 functions in the <a href="http://code.google.com/p/madara/source/browse/trunk/include/madara/transport/Transport.h">Transport.h</a> file</li>
<li>host agnostic--i.e., you can deploy whatever you want wherever you want due to the usage of anonymous publish/subscribe network transport layer.</li>
<li>fault tolerant--i.e., you can deploy multiple failover entities in case of faulty hardware or whatever might cause a process to fail. Additionally, you can create tests that detect and respond to faults/failures.</li>
<li>Nested tests and application launches</li>
<li>Microsecond precision between process lifecycle phases that are not dictated by blocking communication of a centralized controller.<br />
</ul><img src="http://sites.google.com/site/distributedreasoner/8StepLifeCycle.gif" alt="MADARA KATS Process Lifecycle" title="MADARA KATS Process Lifecycle"> Now, the microsecond precision is important for DRE systems, especially in reproducing race conditions. With KATS, you can get the results of a postcondition in a failed application launch at the relevant precondition for another application launch within fractions of a second. With this open-source, freely available framework, you can perform black-box sequencing at scale.<br />
<br />
Additionally, there are whitebox tools available to allow for distributed breakpoints within an application and powerful, thread-safe logging APIs in case those are needed. However, most people seem more interested in the blackbox testing tools.<br />
<br />
If you have questions about the MADARA KATS system, feel free to contact me at jedmondson (at) gmail.com.James Edmondsonhttp://www.blogger.com/profile/06768510571708099548noreply@blogger.com0tag:blogger.com,1999:blog-8892858146876920427.post-45710355927706369102012-01-17T23:52:00.000-08:002012-01-30T13:41:08.313-08:00Performance Increase in MADARA KaRLThe new built-in features in KaRL have resulted in very noticeable performance increases. Below are the changes in performance. These are reported timing metrics from the <a href="http://code.google.com/p/madara/source/browse/trunk/tests/test_reasoning_throughput.cpp">test_reasoning_throughput</a> test, available in the source code repo ran on a Intel Core Duo with 4 GB RAM, but only using ~330 KB of memory for the test.<br />
<br />
BEFORE indicates timing values before providing compiled expressions that circumvented the std::map lookups for KaRL logics and the built-in changes to the variable indexing which were incurring the same type of std::map overhead with each variable lookup. The AFTER indicates timing values after providing the built-in variable lookups with constant time was implemented. The AFTER WITH COMPILED indicates the timing for both changes. <br />
<br />
<b>Execution times</b><br />
<table><tr> <td valign="top" align="left"><b>BEFORE</b></td><td valign="top" align="left"><pre>for(1->10,000) ++var 997 ns
++var; x 10,000 502 ns
for(1->10,000) true => ++var 1000 ns
true => ++var; x 10,000 597 ns</pre></td> </tr>
<tr> <td valign="top" align="left"><b>AFTER</b></td><td valign="top" align="left"><pre>for(1->10,000) ++var 642 ns
++var; x 10,000 248 ns
for(1->10,000) true => ++var 637 ns
true => ++var; x 10,000 357 ns</pre></td> </tr>
<tr> <td valign="top" align="left"><b>AFTER WITH COMPILED</b></td><td valign="top" align="left"><pre>for(1->10,000) ++var 266 ns
++var; x 10,000 169 ns
for(1->10,000) true => ++var 269 ns
true => ++var; x 10,000 196 ns</pre></td> </tr>
</table><br />
<b>What does this mean to you as a developer?</b><br />
It means you can develop C++ applications that link to our library and evaluate knowledge operations at around 6 mhz before disseminating your knowledge updates across the network using DDS or whatever transport you want in microseconds. It means that knowledge and reasoning can be included in online, mission-critical real-time systems, and you no longer have to use reasoning engines that take milliseconds to evaluate rules, limiting you to hz and not khz or mhz, in our case.<br />
<br />
This engine was already increasing the state-of-the-art speeds for knowledge evaluation in real-time systems before these changes, but we also have plans for hopefully blowing this out of the water by using templates instead of virtual functions in our current expression tree formation. This will require either rolling over to the boost::spirit template meta programming lexical parser approach or rolling our own. I'll keep you posted. Right now, updates to CID and KATS for automated, adaptive deployments are taking priority.James Edmondsonhttp://www.blogger.com/profile/06768510571708099548noreply@blogger.com0tag:blogger.com,1999:blog-8892858146876920427.post-10027841121280748372012-01-10T20:54:00.000-08:002012-01-10T20:59:48.215-08:00New Features in MADARA KaRL<div dir="ltr" style="text-align: left;" trbidi="on">The MADARA Knowledge and Reasoning Language (KaRL) has undergone some major changes recently that should provide developers with a faster, more flexible reasoning engine. In this post, we’ll outline features like explicit compilation of KaRL logics and implicit compilation of variable references, and the timed wait operation. Along the way, we'll show how to use the atomic pre- and post- prints for evaluations or wait statements.<br />
<br />
Originally, the KaRL engine created an expression tree and then cached the expression tree in an STL string to expression tree map. This feature still exists, but we noticed that the string lookups were taking quite a bit of time. In the worst case, such string lookups can take O(m log n), where m is the length of the string and n is the number of compiled logics. This is quite a long time to grab a cached tree.<br />
<br />
The same search complexity was limiting the execution of our KaRL interpreter logic as well. With each variable lookup, we perform a lookup in an STL string to long long tree map. Depending on the length of the variable and the number of variables, this could again take a while.<br />
<br />
Not anymore.<br />
<br />
Developers may now compile KaRL logics directly with a call to the compile function, the result of which can be used to directly reference the expression tree. Additionally, underneath the hood, we have rewritten the variable node in the expression tree so that it directly manipulates the underlying Knowledge Record in the Thread Safe Context (and does so without entering or leaving the mutex). This resulted in increasing the speed of the engine by a factor of 3-4x, depending on how the logics were being processed. Keep in mind that this speed up factor was achieved on an already state-of-the-art reasoned that was capable of 2 million knowledge operations per second (~500 ns per operation).<br />
<br />
When using a C++ for loop to call the reasoning engine, these changes improved our performance from ~1us per operation to ~250ns. Larger logics, where internal optimizations are possible, have been improved from ~500ns per operation to ~190ns. This means that the KaRL engine can now processes knowledge operations at over 5mhz—5 million operations per second.<br />
<br />
The implicit compilation is included in all knowledge calls, but the explicit compilation can be done via the following:<br />
<hr /><code><br />
<span style="color: #38761d;">// Initiate knowledge base with no transport</span><br />
Madara::Knowledge_Engine::Knowledge_Base knowledge;<br />
<br />
<span style="color: #38761d;">// new classes for evaluation settings and compiled expressions</span><br />
Madara::Knowledge_Engine::Eval_Settings settings;<br />
Madara::Knowledge_Engine::Compiled_Expression compiled;<br />
<br style="color: #38761d;" /><span style="color: #38761d;"> // compile the expression and save it into compiled</span><br />
compiled = knowledge.compile ("invariant => (++.count ; someother.condition => status = 5)");<br />
<br />
<span style="color: #38761d;">// evaluate the expression with the default settings</span><br />
knowledge.evaluate (compiled, settings);<br />
</code><br />
<hr /><br />
You can see other examples of using these new features in the <a href="http://code.google.com/p/madara/source/browse/trunk/tests/test_reasoning_throughput.cpp">test for reasoning throughput</a>.<br />
<br />
We’ve also added the ability to do timed waits instead of indefinite blocking waits on knowledge expressions. This allows for a calling C++ program to wait for a specific time interval for the knowledge expression or KaRL logic to become non-zero, and if the time interval passes, returning control back to the caller. The underlying mechanisms are similar. The KaRL engine aggregates any changes to variables within the logic evaluation and sends updates to other interested network entities over the DDS transport.<br />
<br />
You can find examples of how to use this in the <a href="http://code.google.com/p/madara/source/browse/trunk/tests/test_timed_wait.cpp">timed wait tests</a>. I include an example below:<br />
<hr /><code><br />
<span style="color: #38761d;">// Initiate knowledge base with no transport</span><br />
Madara::Knowledge_Engine::Knowledge_Base knowledge;<br />
<br />
<span style="color: #38761d;">// new classes for wait settings and compiled expressions</span><br />
Madara::Knowledge_Engine::Compiled_Expression compiled;<br />
Madara::Knowledge_Engine::Wait_Settings wait_settings;<br />
<br />
<span style="color: #38761d;">// simple expression that will always evaluate to zero</span><br />
std::string logic = "++.count && 0";<br />
<br />
<span style="color: #38761d;">// set the wait settings to a polling frequency of once</span><br style="color: #38761d;" /><span style="color: #38761d;"> // a millisecond and a maximum wait time of 10 seconds</span><br />
wait_settings.poll_frequency = .001;<br />
wait_settings.max_wait_time = 10.0;<br />
<br />
<span style="color: #38761d;">// create atomic pre and post print statement</span><br />
wait_settings.pre_print_statement =<br />
"WAIT STARTED: Waiting for 10 seconds.\n";<br />
wait_settings.post_print_statement =<br />
"WAIT ENDED: Number of executed waits was {.count}.\n";<br />
<br />
<span style="color: #38761d;">// compile the simple zero logic</span><br />
compiled= knowledge.compile (logic);<br />
<br style="color: #38761d;" /><span style="color: #38761d;"> // wait on the expression with the timed wait semantics</span><br />
knowledge.wait (compiled, wait_settings);<br />
</code><br />
<hr />The implications of the time-based waiting mechanism are pretty big, and these changes will eventually make their way into the KATS framework to allow for even more flexibility with automated tests and deployments in the form of fail and success condition executions of deployment elements. Combined with the new redeployment framework changes, the MADARA suite of tools should help a lot of distributed, real-time and embedded developers better reach their project goals. If you have any questions or comments about the implementations of these features or how you can use them in your projects, please let me know. MADARA is completely open source under a BSD license.</div>James Edmondsonhttp://www.blogger.com/profile/06768510571708099548noreply@blogger.com0tag:blogger.com,1999:blog-8892858146876920427.post-28731276811219135542011-08-12T13:05:00.000-07:002011-08-12T13:11:29.224-07:00Android Performance Testing<b>1. Intro</b><br />
<br />
I'm building a distributed testing infrastructure on top of <a href="http://madara.googlecode.com/files/MADARA_KATS_Automated_Testing_With_Distributed_Reasoning.pdf">KATS</a> for a DARPA project, and we had a need for performance monitoring of CPU, memory, and process profiling information (including context switches) throughout a test run. MADARA already has a library called MAML which allows for quick Python script development to instrument a phone via the Android Monkeyrunner tool, but Monkeyrunner doesn't really provide for performance profiling. So, how do you quickly and easily retrieve a summary of CPU and memory utilization on your Android phone?<br />
<br />
<b>2. Solution</b><br />
<br />
The information is available through several utilities in the Android Debug Bridge including top and the varied information stored inside of the /proc directory. What I've done is make these more accessible through additions to the open-source maml.py library and the new maal.py (Madara Android ADB Library) which does not require Monkeyrunner at all.<br />
<br />
The MAAL provides much of the same functionality that MAML does, but is much slower with keyevents (I will fix this by reusing the same shell session, but it isn't a priority right now). MAAL and MAML also have a new library function called print_device_stats which allows for printing both a long form and a one line summary for CPU and memory usage.<br />
<br />
Three scripts have also been added to utilise these libraries and provide general-purpose testing information for Android smartphone programmers. For example, the following is a detailed view of the current memory and CPU usage on a Motorola Droid in our lab:<br />
<br />
<hr /><b>2.1. maal_monitor.py</b><br />
<br />
The command line arguments for maal_monitor.py are available by passing -h or --help to the script. The following script execution monitors performance for 1 iteration (-n 1) and prints the top 10 cpu-intensive processes running on the phone.<br />
<br />
<font size="-1"><br />
<pre>./maal_monitor.py -p 10 -n 1
Memory: 5892 kB free of 230852 kB
User 4%, System 6%, IOW 0%, IRQ 0%
User 15 + Nice 0 + Sys 21 + Idle 273 + IOW 0 + IRQ 0 + SIRQ 0 = 309
PID CPU% S #THR VSS RSS PCY UID Name
1021 4% S 57 215056K 56096K fg system system_server
30994 3% S 20 140436K 24940K bg app_24 edu.vu.isis.ammo.spotreport
23827 2% R 1 876K 392K fg shell top
177 0% S 1 0K 0K fg root omap2_mcspi
5 0% S 1 0K 0K fg root events/0
995 0% S 2 1272K 128K fg compass /system/bin/akmd2
1053 0% S 1 0K 0K fg root tiwlan_wq
160 0% S 1 0K 0K fg root cqueue
180 0% S 1 0K 0K fg root cpcap_irq/0
238 0% S 1 0K 0K fg root ksuspend_usbd
</pre></font><br />
<br />
The summarized view looks like this for maal_monitor.py:<br />
<br />
<font size="-1"><br />
<pre>./maal_monitor.py -p 10 -n 1 -s
Memory: 5952 kB free of 230852 kB. CPU: Total 6% (User: 2% Sys: 4%)
</pre></font><br />
<br />
Maal_monitor.py is great for taking periodic measurements, but this may be too coarse-grained and may be too inaccurate with regards to CPU utilisation for your testing needs (with maal_monitor.py, we're essentially polling every five seconds for current utilization, which is approximated). If you need hard numbers for cpu usage, context switches, number of processes launched, etc., I provide a separate set of scripts.<br />
<br />
<br />
<hr /><b>2.2. maal_proc_stats.py and maal_stats_cmp.py</b><br />
<br />
These scripts are generally used in the following way: 1) call maal_proc_stats.py with an outfile location to your storage drive, 2) run your test, 3) call maal_proc_stats.py with an outfile location that is different from #1, and 4) call maal_stats_cmp.py on the two files created in #1 and #3 to get the performance difference for your test.<br />
<br />
The output for maal_proc_stats.py will look like the following:<br />
<br />
<font size="-1"><br />
<pre>./maal_proc_stats.py -s
Clockticks: User 5451112. System 5357590. IO 566.
Processes: Num 56061. Switches 812051296.
</pre></font><br />
<br />
The first line shows the clock ticks spent in user processes, system processes and dispatching IO (it does not count idle ticks). The second line shows the number of processes that have been launched since phone boot, and the number of context switches since boot.<br />
<br />
Now, after creating two files with this script according to the process above noted in #1 and #3, you can process the performance information that changed during the test by running the following:<br />
<br />
<font size="-1"><br />
<pre>./maal_stats_cmp.py --infile1 first.stats --infile2 second.stats
Clockticks: User 629. System 393. IO 0.
Processes: Num 23. Switches 35566.
</pre></font><br />
<br />
Not only can you use maal_stats_cmp.py on the output from maal_proc_stats.py, but you can also process the difference between copies of the /proc/stats file (you just need to adb pull these files to your computer or something, if that's what you would like to do).<br />
<br />
With these numbers in place, you should be able to configure systems like BuildBot or your favorite scoreboarding system to display test results via threshold values based on known good test run resource usages. If changes have caused huge CPU or memory spikes, they should show up in one or both of these MAAL performance logging methodologies.<br />
<br />
<b>3. Downloads</b><br />
<a href="http://code.google.com/p/madara/source/browse/#svn%2Ftrunk%2Fmadara%2Fmaal"><br />
MAAL and associated scripts</a><br />
<a href="http://code.google.com/p/madara/source/browse/#svn%2Ftrunk%2Fmadara%2Fmaml">MAML and any open-source scripts</a><br />
<a href="http://madara.googlecode.com">MADARA KATS for synchronizing and coordinating testing processes</a><br />
James Edmondsonhttp://www.blogger.com/profile/06768510571708099548noreply@blogger.com0tag:blogger.com,1999:blog-8892858146876920427.post-35470037209813279692011-06-01T18:08:00.000-07:002011-06-01T22:13:09.680-07:00Android Monkeyrunner and the Google ADB: a lament<b>Intro</b><br />
<hr>So, for the past couple of months, I've been trying to get Android <a href="http://developer.android.com/guide/developing/tools/monkeyrunner_concepts.html">Monkeyrunner</a> to cooperate for distributed automated testing, but it has been an uphill battle... against an entrenched army of monkeys armed with bazookas. I wanted the Monkeyrunner library to work well, but I get the feeling that Monkeyrunner has not been tested or used much.<br />
<br />
<b>The honeymoon</b><br />
<hr>My experience with Monkeyrunner a month or two ago didn't start out all bad. The Monkeyrunner <a href="http://developer.android.com/guide/developing/tools/MonkeyDevice.html#press">press</a> function works much faster than doing "adb shell input keyevent" calls (likely due to a new shell being launched with every invocation and no option to chain together a long string of keyevents in the same session), and I got a glimpse of how easy smart phone automation could be without writing customized Java Unit tests or installing Robotium on the phone. I could just send KeyEvents to the phone, type a string, and even connect 2 to 8 phones to our servers and launch MonkeyRunner tests in parallel (more on problems with this later). With Monkeyrunner, I could instruct a non-technical person on how to write a test based simply on how they would use a directional pad and keyboard to navigate around the activity. <br />
<br />
<b>Aaaaaand we don't even cuddle anymore</b><br />
<hr>The first problem with MonkeyRunner for me came in the form of the <a href="http://developer.android.com/guide/developing/tools/MonkeyDevice.html#type">type</a> function being broken when the space key is used. This is not unique to Monkeyrunner. It appears that <i>adb shell input text</i> suffers from a similar problem. There may be several other KeyEvents (other than spaces) that fall into this particular hazard, but I was able to get around the issue for now by removing spaces from the text to be sent and inserting KEYCODE_SPACE where appropriate.<br />
<br />
There were a couple of other problems with MonkeyRunner that kept cropping up. First, there is very little support for debugging the state of the activity you are trying to instrument. <br />
<br />
You can't even get information on whether or not the activity has crashed without going back to adb and logcat. You can't form KeyEvent pairings that select an entire EditText without long clicks, but long clicks are hard to emulate when the EditText could be in a different location on the screen due to portrait or landscape modes, or even because the screen resolution is different between two phones. <br />
<br />
You can't press two buttons at once because the DOWN type in the press method is apparently mapped directly to <a href="http://developer.android.com/guide/developing/tools/MonkeyDevice.html#ACTION_DOWN_AND_UP">DOWN_AND_UP</a>. Basically, the shift is unpressed immediately after you get out of the press function, regardless of what you pass it. This caused some headaches when trying to select all text, but it was manageable. No automation killer problem found yet... until Tuesday...<br />
<br />
<b>Monkeyrunner is a racist... that's a software library that causes race conditions, right?</b><br />
<hr>On Tuesday came the worst problem, which drove me to try to rewrite the Monkeyrunner library without modifying the Android Debug Bridge. There is a race condition in the MonkeyRunner <a href="http://developer.android.com/guide/developing/tools/MonkeyRunner.html#waitForConnection">WaitForConnection</a> method that occurs when you try to wait for multiple phones at once (even from separate heavy weight processes). The only way to really witness this issue is when you have an automated system trying to launch activities on 2 to 8 phones at once (humans take milliseconds or seconds to launch each by hand, so the race condition is hard for a manual tester to catch). The WaitForConnection method will cause random behavior on one of the phones while opening the other one without a problem for a moment. Then the automation on all phones halts. The issue is very weird.<br />
<br />
We got around this for a short term fix by ensuring that we always waited 1 second after the previous phone launched before starting its automation (via the KATS process life cycle). While this works, it is not ideal. We wanted to launch 2-8 phones at once per server (as many USB connections as we can do right now) and see if there were any race conditions involving the phones connecting or disseminating to the server. With this race condition in Monkeyrunner and our subsequent fix of trying to sleep in between each phone launch, it's likely that the phones will have 1 second of difference between sending, which means we can't test everything that we what we want to test.<br />
<br />
What's most frustrating about this is that the problem is not on my end, and I can't seem to find any fix to this without modifying the Android code base.<br />
<br />
<b>Monkeyrunner withdrawal</b><br />
<hr>To try to address the issue, I rewrote my entire Python scripting library which wrapped Monkeyrunner to instead use nothing but ADB under the hood. It started out promising. First, the adb equivalent of <a href="http://developer.android.com/guide/developing/tools/MonkeyRunner.html#waitForConnection">WaitForConnection</a> was much, much faster (basically, I just used <i>adb get-state</i>. The WaitForConnection method must be establishing an actual session with the phone, and this is probably where the race condition is occurring (during the session creation, which is almost certainly not thread safe). So, far so good. Actually, the entire library was a breeze to write.<br />
<br />
Then I run it... and the <i>adb shell input keyevent</i> command inserts those 1 second delays in between every <a href="http://developer.android.com/reference/android/view/KeyEvent.html#KEYCODE_DPAD_LEFT">KEYCODE_DPAD_LEFT</a>, backspace, menu, etc. A 15 second MonkeyRunner test is extended to hundreds of seconds when using <i>adb shell input keyevent</i>. The culprit with the adb shell is probably that a separate shell session is started with each invocation - rather than queuing the events to the target phone and returning immediately. I can understand this not being the default behavior, but I can't really understand why an asynchronous or a queuing version isn't available.<br />
<br />
<b>A lamentable conclusion</b><br />
<hr>Being able to send KeyEvents to an Android phone is pretty awesome. I hope that the Google folks either fix the race conditions in MonkeyRunner or they fix the delays in adb shell so we can send KeyEvents at decent speeds. For the moment though, this is my Monkeyrunner sad face :(<br />
<br />
<b>Library files</b><br />
<hr>The wrappers around the Monkeyrunner and ADB interfaces are linked below. The library is called the MADARA Android Monkeyrunner Library (MAML).<br />
<br />
<a href="http://code.google.com/p/madara/source/browse/trunk/lib/maml_adb_only.py">MAML sans Monkeyrunner</a><br />
<a href="http://code.google.com/p/madara/source/browse/trunk/lib/maml.py">MAML original</a>James Edmondsonhttp://www.blogger.com/profile/06768510571708099548noreply@blogger.com4tag:blogger.com,1999:blog-8892858146876920427.post-29270716864980315682011-05-31T22:38:00.000-07:002011-05-31T22:53:02.567-07:00The KaRL Automated Testing SuiteSo, we've submitted our first paper highlighting the KaRL Automated Testing Suite (KATS) to GPCE 2011, and the features of the toolset have really blossomed in the past month. KATS is a suite of tools that automate distributed deployment and testing in a cross platform way. This means that you can use KATS on a hybrid test bed with Windows and POSIX machines, and each of the machines will work together to accomplish distributed, automated testing.<br />
<br />
The core of the KATS system is the <a href="http://code.google.com/p/madara/wiki/KnowledgeEngineMechanisms">KaRL reasoning engine</a>, which provides the testing suite with a distributed knowledge and reasoning engine based on the anonymous publish/subscribe paradigm. The infrastructure is consequently host-agnostic, resulting in the ability to move tests between hosts without much difficulty. Tests can be started via cron jobs, and they will barrier and synchronize if needed.<br />
<br />
One of the more interesting parts to the KATS system is the Generic Modeling Environment (GME) paradigm for visually modeling tests. You can read more about how to obtain and use KATS and its GME paradigm at the following links.<br />
<br />
Links:<br />
<ul><li><a href="http://code.google.com/p/madara/wiki/Installation">Installation Guide for Linux and Windows</a></li>
<li><a href="http://code.google.com/p/madara/wiki/KatsGmeTutorial">Tutorial for modeling KATS tests in the KATS GME paradigm</a></li>
<li><a href="http://code.google.com/p/madara/wiki/AdvancedKatsGme">Advanced KaRL Automated Testing Suite (KATS) features</a></li>
</ul><br />
We're currently using KATS to model and execute distributed tests for smart phones and C++ services connected to and running on various host platforms. You can find out more at the links above.James Edmondsonhttp://www.blogger.com/profile/06768510571708099548noreply@blogger.com0tag:blogger.com,1999:blog-8892858146876920427.post-46091688103278263862011-04-21T11:10:00.000-07:002011-04-21T11:11:07.107-07:00Android MonkeyrunnerI'm currently working on a project that requires automated testing of Android applications. Fortunately, Google has released a python API for manipulating Android devices, applications, intents, etc. called Monkeyrunner, but the help has been especially lacking. One reason for this is because the command that the Monkeyrunner project page says will generate the API documentation doesn't work because help.py does not appear to be provided in the Android SDK. This blog post will remedy that situation.<br />
<br />
According to the <a href="http://developer.android.com/guide/developing/tools/monkeyrunner_concepts.html">Google project site</a>, you should be able to run the following command to generate the API docs for Monkeyrunner:<br />
<br />
<code><br />
monkeyrunner <format> help.py <outfile><br />
</code><br />
<br />
Unfortunately, the help.py file does not appear to exist. So, I've created a help.py file that will give you all the capabilities of the old help.py, if it ever existed. Copy and paste the following into a new file called help.py on your computer (or download the file from <a href="http://sites.google.com/site/distributedreasoner/help.py">here</a>):<br />
<br />
<hr /><b>help.py file to create on your computer</b><br />
<code><br />
#!/usr/bin/env python<br />
<br />
# Imports the monkeyrunner modules used by this program<br />
from com.android.monkeyrunner import MonkeyRunner, MonkeyDevice<br />
from optparse import OptionParser<br />
<br />
def help_callback (option, opt, value, parser):<br />
parser.print_help ();<br />
<br />
parser = OptionParser (add_help_option=False)<br />
parser.add_option ("-o", "-f", "--outfile", dest="outfile", default="help.html",<br />
help="file to output monkeyrunner help to",<br />
metavar="OUTFILE")<br />
parser.add_option ("-t", "--type", dest="type", default="html",<br />
help="type of output to generate (html or text)",<br />
metavar="TYPE")<br />
parser.add_option ("-h", "--help", dest="help", default=None,<br />
action="callback", callback=help_callback,<br />
help="show usage information for this script")<br />
<br />
(options, args) = parser.parse_args ()<br />
<br />
if options.help is None:<br />
text = MonkeyRunner.help(options.type)<br />
f = open(options.outfile, 'w')<br />
f.write(text)<br />
f.close()<br />
<br />
print "\nMonkeyrunner help written to " + options.outfile + " (type:" \<br />
+ options.type + ")\n"<br />
<br />
</code><br />
<hr /><br />
The file comes with its own help and usage information, which you can access by providing a -h or --h option like so:<br />
<br />
<code><br />
monkeyrunner help.py -h<br />
</code><br />
<br />
By default, the help.py sets the output file to help.html and sets the type of output to html. Feel free to use this to generate the Monkeyrunner built-in help for reference on your system.James Edmondsonhttp://www.blogger.com/profile/06768510571708099548noreply@blogger.com0tag:blogger.com,1999:blog-8892858146876920427.post-67563722240079776682011-03-14T13:45:00.000-07:002011-03-14T13:53:31.186-07:00LaTeX in a Nutshell (#1)<div style="font-family: inherit;"><b>Introduction</b></div><div style="font-family: inherit;"><br />
</div><div style="font-family: inherit;">Writing research papers in industry tends to involve one of two text formats: Word and LaTeX. From my experience, Word is still dominating the creation of technical reports, papers, etc., but LaTeX was essentially written for programmers and technical researchers who want an extensible, programmable paper format. This blog entry is intended to group together examples and descriptions of features. Hopefully, the blog series will be appealing to beginner to intermediate LaTeX users.</div><hr style="font-family: inherit;" /><div style="font-family: inherit;"><b>Starting Out</b></div><div style="font-family: inherit;"><br />
</div><div style="font-family: inherit;">Like C++ or Java, you are probably going to break your main project into pieces. If you are collaborating with other authors or researchers, this will be especially essential, as it will allow you to each work on and revise different sections of the paper at the same time with no conflicts! Importing other LaTeX files is easy. Let's start with a simple example of a paper with three sections: abstract, solution, and experiments. We use the <a href="http://www.acm.org/sigs/publications/proceedings-templates">ACM Conference Proceeding</a> document class to format the document for submission to an ACM conference. Be sure to download the .cls files into the same directory as your document!</div><hr /><div style="font-family: "Courier New",Courier,monospace;">\documentclass{acm_proc_article-sp} </div><div style="font-family: "Courier New",Courier,monospace;"><br />
</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;">% package includes. </div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;">% For now, including graphicx for images is enough</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;">\usepackage{graphicx}</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;"><br />
</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;">% begin document signals the beginning of rendering</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;">% anything before this point is just metadata or package includes</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;">\begin{document}</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;"><br />
</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;">% title and author information. Note how we specify</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;">% two different authors (James Edmondson and John Smith)</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;">\title{Research Paper} </div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;">\numberofauthors{2}</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;">\author{</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;"> \alignauthor James Edmondson\\</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;"> \affaddr{Vanderbilt University}\\</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;"> \email{james.r.edmondson@vanderbilt.edu}</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;"> \alignauthor John Smith\\</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;"> \affaddr{Vanderbilt University}\\</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;"> \email{john.q.smith@vanderbilt.edu}</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;">}</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;"><br />
</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;">% Render the author and title information first</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;">\maketitle</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;"><br />
</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;">% These are macros that we can use in tables to provide</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;">% extra space at the top (\T) when under a horizontal line</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;">% and at the bottom (\B) when on top of a horizontal line</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;">\newcommand\T{\rule{0pt}{2.6ex}}</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;">\newcommand\B{\rule[-1.2ex]{0pt}{0pt}}</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;"> </div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;">% include the three sections</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;">\input{abstract}</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;">\input{solution}</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;">\input{experiments}</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;"><br />
</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;">% include the bibliography</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;">\bibliographystyle{abbrv}</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;">\bibliography{master}</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;">\end{document}</div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;"><hr></div><div style="font-family: inherit; margin: 0px; text-indent: 0px;">This main file, which is frequently called the name of the targeted conference might be called technicalreport.tex. The three included files would need to be called abstract.tex, solution.tex, and experiments.tex. At the end of the file, we include a bibliography called master.bib.</div><div style="font-family: inherit; margin: 0px; text-indent: 0px;"><br />
</div><div style="font-family: inherit; margin: 0px; text-indent: 0px;">The three input files can just be a paragraph, if you like, but the master.bib file should have a format. The best part about using LaTeX is that most research portals like ACM, IEEE, and Citeuseek provide LaTeX bibliography files for you to directly copy into your master.bib file. For instance, here is a complete listing of papers by <a href="http://www.cs.utexas.edu/users/EWD/indexBibTeX.html">E.W. Dijkstra</a>. The best part about bibliographies (called Bibtex) in LaTeX is that when you change the documentclass and the bibliographystyle, the bibliography is automatically formatted to the specification of your target conference or journal. Anyone who has had to do this in Word knows how difficult this can be with most Word Processors.</div><div style="font-family: inherit; margin: 0px; text-indent: 0px;"><br />
</div><div style="font-family: inherit; margin: 0px; text-indent: 0px;">Another nice feature of Bibtex is that the bibliography will only print a bibliography for papers or journals that are actually cited in the paper. In this way, you can build a master bibliography file that has all of the papers you have ever read and use it between all of your papers without any problems.</div><div style="font-family: inherit; margin: 0px; text-indent: 0px;"><hr></div><div style="font-family: inherit; margin: 0px; text-indent: 0px;"><b>Notes</b></div><div style="font-family: inherit; margin: 0px; text-indent: 0px;"><br />
</div><div style="font-family: inherit; margin: 0px; text-indent: 0px;">Try to avoid using underscores (_) or percent (%) in your bibliography. If you do use these, make sure you escape the sequence with a backslash (\).</div><div style="font-family: inherit; margin: 0px; text-indent: 0px;"><hr></div><div style="font-family: inherit; margin: 0px; text-indent: 0px;"><b>Links for Further Reading</b></div><div style="font-family: inherit; margin: 0px; text-indent: 0px;"><br />
</div><div style="font-family: inherit; margin: 0px; text-indent: 0px;">Check out these links if you would want to find out some specifics that may not be covered in this blog series.</div><div style="font-family: inherit; margin: 0px; text-indent: 0px;"><br />
</div><div style="font-family: inherit; margin: 0px; text-indent: 0px;"><a href="http://heather.cs.ucdavis.edu/%7Ematloff/LaTeX/LookHereFirst.html">LaTeX Basics </a></div><div style="font-family: inherit; margin: 0px; text-indent: 0px;"><a href="http://www.personal.ceu.hu/tex/cookbook.html">LaTeX Cookbook</a></div><div style="font-family: "Courier New",Courier,monospace; margin: 0px; text-indent: 0px;"><span style="font-family: inherit;"><br />
</span></div>James Edmondsonhttp://www.blogger.com/profile/06768510571708099548noreply@blogger.com0tag:blogger.com,1999:blog-8892858146876920427.post-7128195808869960522011-03-06T15:18:00.000-08:002011-03-06T15:21:39.317-08:00Undocumented ACE_OS::sleep caveats<p>For those in need of sleep in microseconds, understand that Windows provides no such mechanism.</p><hr><p><b>Intro</b></p><p>Recently, I needed a methodology for setting hertz publication rate on a publisher that would work in both Linux and Windows. The publication rate should be able to go up to mhz at least, which requires a sleep mechanism capable of 1,000,000,00 ns / 1,000,000 == 1,000 ns of precision. Consequently, the sleep would be required to function on a microsecond level.</p><hr><p><b>Tools and methodologies</b></p><p>I decided to stick with the ACE library and specifically use the ACE_OS::sleep(const ACE_Time_Value &) call. On the surface, this should allow us to sleep for microseconds, and it does - with one small caveat: the operating system needs to have a sleep mechanism that is capable of actual us (microsecond) precision.<p><hr><p><b>Problems</b></p><p>In WIN32 mode, the ACE_OS:sleep call uses the ::Sleep method provided by the Windows operating system. Unfortunately, ::Sleep works on millisecond precision. This means that you either blast (e.g. no sleep statement at all), or you can specify a hertz rate of <= 1 khz (1ms of sleep).</p><br />
<hr><p><b>Solutions</b></p><p>One potential solution is bursting events and then sleeping for 1ms. The trick to this is to work out a bursting pattern that uses the sleep to sum all the microseconds that should have been done over that period. This isn't modeling exactly what you want, but the alternative is to simply only allow bursting or <= 1khz. In other words, there is no beautiful, portable solution to this that isn't going to cause stress on whatever you are trying to test (bursting is always a worst case for any software library).</p><br />
<hr><p><b>Downloads</b></p><p><a href="http://code.google.com/p/madara/source/browse/trunk/tests/test_dissemination.cpp">KaRL Dissemination Test</a> - Tuned to burst mode on Windows and simply sleep for microseconds on POSIX.</p>James Edmondsonhttp://www.blogger.com/profile/06768510571708099548noreply@blogger.com0tag:blogger.com,1999:blog-8892858146876920427.post-3705393101724825952011-03-05T21:12:00.000-08:002011-06-01T22:25:58.017-07:00For loops just aren't what they used to beSometimes, compilers are too damned good at optimization.<br />
<hr><p><b>Intro</b></p><p>My PhD dissertation currently centers around a knowledge and reasoning engine and middleware called <a href="http://madara.googlecode.com">KaRL</a>, part of my Madara toolsuite. In a recent paper, I wanted to do some performance testing of the KaRL distributed reasoner, and so I attacked the testing from three vectors: reasoning throughput (the number of rules per second the engine could perform without distributed knowledge), dissemination throughput (the number of rules per second sent over the wire in a LAN), and dissemination latency.</p><p>To make things more interesting, I decided to form a baseline for reasoning throughput. How about C++ optimal performance with a for loop and reinforcements (e.g. ++var). Oh, and it needs to be portable across Windows and Linux. Easy enough, right?</p><hr><p><b>Problems, Solutions, and More Problems</b></p><p>The first problem on the docket was one of timer precision. I decided to go with <a href="http://www.riverace.com/ACE/ace55/html/ace/classACE__High__Res__Timer.html">ACE_High_Res_Timer</a>, after some unsuccessful and highly error prone usage of the underlying gethrtime. After using the High_Res_Timer class, so it corrects for global scale factor issues between the return values of QueryPerformanceCounter(). So far, so good.</p><p>The results on my Linux and Windows machines were right in line with what I expected. Through function inlining, expression tree caching, and various other mechanisms, we are able to efficiently parse KaRL logics at greater than 1 Mhz. However, when I started comparing to my supposed baseline, I discovered that the ACE_High_Res_Timer was reporting that the optimized C++ for loop of ++var was performing at an amazing 60 Ghz to over 1 Thz... on a 2.5 Ghz processor.</p><p>What the heck was going on here?</p><p>It turns out that modern C++ compilers will completely optimize out for loops if they can. My specific issue, which remains unsolved in a portable manner, was in regards to a for loop with a simple accumulator (var) which is incremented a certain number of times. I had started a timer before the for loop and stopped it after the loop was over, but the assembly language generated from the C++ programs had 0 for loops in the function. In fact, they simply moved the final value that the loop would have had into the var. The timer was effectively reporting the time it took to query the system for the nanosecond precision timers, since the couple of assembly instructions included were not enough to amount to any nanoseconds at all.</p><hr><p><b>Remarks on Known Solutions</b></p><p>In Visual Studio, I was able to circumvent the issue in two ways: first, by using __asym { nop }, which effectively inserts a no-op (an exchange of eax with itself), and second, by using volatile, which means the compiler is not able to optimize at all and can't fully take advantage of registers.</p>In g++, unfortunately, I was only able to use volatile, which means that if I wanted to test the actual loop, I have to take away every other optimization that the compiler might be able to do.</p><p>Using volatile turns out to be the only portable thing I could think of. Internet searching seemed to confirm these suspicions. I would think there would be some way to specifically tell each compiler to simply not optimize out for loops in a particular function or file though.</p><hr><p><b>Downloads</b></p><p><a href="http://code.google.com/p/madara/source/browse/trunk/tests/test_reasoning_throughput.cpp">Solution</a>, which unfortunately can't get around L3 optimization in g++ and Release mode in Visual Studio.</p>James Edmondsonhttp://www.blogger.com/profile/06768510571708099548noreply@blogger.com0