Monday, March 14, 2011

LaTeX in a Nutshell (#1)

Introduction

Writing research papers in industry tends to involve one of two text formats: Word and LaTeX. From my experience, Word is still dominating the creation of technical reports, papers, etc., but LaTeX was essentially written for programmers and technical researchers who want an extensible, programmable paper format. This blog entry is intended to group together examples and descriptions of features. Hopefully, the blog series will be appealing to beginner to intermediate LaTeX users.

Starting Out

Like C++ or Java, you are probably going to break your main project into pieces. If you are collaborating with other authors or researchers, this will be especially essential, as it will allow you to each work on and revise different sections of the paper at the same time with no conflicts! Importing other LaTeX files is easy. Let's start with a simple example of a paper with three sections: abstract, solution, and experiments. We use the ACM Conference Proceeding document class to format the document for submission to an ACM conference. Be sure to download the .cls files into the same directory as your document!

\documentclass{acm_proc_article-sp} 

% package includes.
% For now, including graphicx for images is enough
\usepackage{graphicx}

% begin document signals the beginning of rendering
% anything before this point is just metadata or package includes
\begin{document}

% title and author information. Note how we specify
% two different authors (James Edmondson and John Smith)
\title{Research Paper}
\numberofauthors{2}
\author{
  \alignauthor James Edmondson\\
    \affaddr{Vanderbilt University}\\
    \email{james.r.edmondson@vanderbilt.edu}
  \alignauthor John Smith\\
    \affaddr{Vanderbilt University}\\
    \email{john.q.smith@vanderbilt.edu}
}

% Render the author and title information first
\maketitle

% These are macros that we can use in tables to provide
% extra space at the top (\T) when under a horizontal line
% and at the bottom (\B) when on top of a horizontal line
\newcommand\T{\rule{0pt}{2.6ex}}
\newcommand\B{\rule[-1.2ex]{0pt}{0pt}}
 
% include the three sections
\input{abstract}
\input{solution}
\input{experiments}

% include the bibliography
\bibliographystyle{abbrv}
\bibliography{master}
\end{document}

This main file, which is frequently called the name of the targeted conference might be called technicalreport.tex. The three included files would need to be called abstract.tex, solution.tex, and experiments.tex. At the end of the file, we include a bibliography called master.bib.

The three input files can just be a paragraph, if you like, but the master.bib file should have a format. The best part about using LaTeX is that most research portals like ACM, IEEE, and Citeuseek provide LaTeX bibliography files for you to directly copy into  your master.bib file. For instance, here is a complete listing of papers by E.W. Dijkstra. The best part about bibliographies (called Bibtex) in LaTeX is that when you change the documentclass and the bibliographystyle, the bibliography is automatically formatted to the specification of your target conference or journal. Anyone who has had to do this in Word knows how difficult this can be with most Word Processors.

Another nice feature of Bibtex is that the bibliography will only print a bibliography for papers or journals that are actually cited in the paper. In this way, you can build a master bibliography file that has all of the papers you have ever read and use it between all of your papers without any problems.

Notes

Try to avoid using underscores (_) or percent (%) in your bibliography. If you do use these, make sure you escape the sequence with a backslash (\).

Links for Further Reading

Check out these links if you would want to find out some specifics that may not be covered in this blog series.


Sunday, March 6, 2011

Undocumented ACE_OS::sleep caveats

For those in need of sleep in microseconds, understand that Windows provides no such mechanism.


Intro

Recently, I needed a methodology for setting hertz publication rate on a publisher that would work in both Linux and Windows. The publication rate should be able to go up to mhz at least, which requires a sleep mechanism capable of 1,000,000,00 ns / 1,000,000 == 1,000 ns of precision. Consequently, the sleep would be required to function on a microsecond level.


Tools and methodologies

I decided to stick with the ACE library and specifically use the ACE_OS::sleep(const ACE_Time_Value &) call. On the surface, this should allow us to sleep for microseconds, and it does - with one small caveat: the operating system needs to have a sleep mechanism that is capable of actual us (microsecond) precision.


Problems

In WIN32 mode, the ACE_OS:sleep call uses the ::Sleep method provided by the Windows operating system. Unfortunately, ::Sleep works on millisecond precision. This means that you either blast (e.g. no sleep statement at all), or you can specify a hertz rate of <= 1 khz (1ms of sleep).



Solutions

One potential solution is bursting events and then sleeping for 1ms. The trick to this is to work out a bursting pattern that uses the sleep to sum all the microseconds that should have been done over that period. This isn't modeling exactly what you want, but the alternative is to simply only allow bursting or <= 1khz. In other words, there is no beautiful, portable solution to this that isn't going to cause stress on whatever you are trying to test (bursting is always a worst case for any software library).



Downloads

KaRL Dissemination Test - Tuned to burst mode on Windows and simply sleep for microseconds on POSIX.

Saturday, March 5, 2011

For loops just aren't what they used to be

Sometimes, compilers are too damned good at optimization.

Intro

My PhD dissertation currently centers around a knowledge and reasoning engine and middleware called KaRL, part of my Madara toolsuite. In a recent paper, I wanted to do some performance testing of the KaRL distributed reasoner, and so I attacked the testing from three vectors: reasoning throughput (the number of rules per second the engine could perform without distributed knowledge), dissemination throughput (the number of rules per second sent over the wire in a LAN), and dissemination latency.

To make things more interesting, I decided to form a baseline for reasoning throughput. How about C++ optimal performance with a for loop and reinforcements (e.g. ++var). Oh, and it needs to be portable across Windows and Linux. Easy enough, right?


Problems, Solutions, and More Problems

The first problem on the docket was one of timer precision. I decided to go with ACE_High_Res_Timer, after some unsuccessful and highly error prone usage of the underlying gethrtime. After using the High_Res_Timer class, so it corrects for global scale factor issues between the return values of QueryPerformanceCounter(). So far, so good.

The results on my Linux and Windows machines were right in line with what I expected. Through function inlining, expression tree caching, and various other mechanisms, we are able to efficiently parse KaRL logics at greater than 1 Mhz. However, when I started comparing to my supposed baseline, I discovered that the ACE_High_Res_Timer was reporting that the optimized C++ for loop of ++var was performing at an amazing 60 Ghz to over 1 Thz... on a 2.5 Ghz processor.

What the heck was going on here?

It turns out that modern C++ compilers will completely optimize out for loops if they can. My specific issue, which remains unsolved in a portable manner, was in regards to a for loop with a simple accumulator (var) which is incremented a certain number of times. I had started a timer before the for loop and stopped it after the loop was over, but the assembly language generated from the C++ programs had 0 for loops in the function. In fact, they simply moved the final value that the loop would have had into the var. The timer was effectively reporting the time it took to query the system for the nanosecond precision timers, since the couple of assembly instructions included were not enough to amount to any nanoseconds at all.


Remarks on Known Solutions

In Visual Studio, I was able to circumvent the issue in two ways: first, by using __asym { nop }, which effectively inserts a no-op (an exchange of eax with itself), and second, by using volatile, which means the compiler is not able to optimize at all and can't fully take advantage of registers.

In g++, unfortunately, I was only able to use volatile, which means that if I wanted to test the actual loop, I have to take away every other optimization that the compiler might be able to do.

Using volatile turns out to be the only portable thing I could think of. Internet searching seemed to confirm these suspicions. I would think there would be some way to specifically tell each compiler to simply not optimize out for loops in a particular function or file though.


Downloads

Solution, which unfortunately can't get around L3 optimization in g++ and Release mode in Visual Studio.