Friday, September 21, 2012

Chi-squared distribution table with sigma values

The table below gives values for the χ²-distribution at key credible intervals (Bayesian confidence intervals) and sigma values (σ, standard deviations). I could only find tables and calculators with p-values online. Shown are the well-used 1σ, 2σ, and 3σ values along with typical confidence intervals such as 90%, 99% and 99.9%. The parameter "k" represents the number of degrees of freedom.

Sigma 1σ 1.28 1.64 1.96 2σ 2.58 3σ 3.29 4σ
CI % 68.3% 80% 90% 95% 95.45% 99% 99.73% 99.9% 99.99%
P-value 0.317 0.20 0.10 0.05 0.0455 0.01 0.0027 0.001 0.00006
chi2(k=1) 1.00 1.64 2.71 3.84 4.00 6.63 9.00 10.83 16.00
chi2(k=2) 2.30 3.22 4.61 5.99 6.18 9.21 11.83 13.82 19.33
chi2(k=3) 3.53 4.64 6.25 7.81 8.02 11.34 14.16 16.27 22.06
chi2(k=4) 4.72 5.99 7.78 9.49 9.72 13.28 16.25 18.47 24.50
chi2(k=5) 5.89 7.29 9.24 11.07 11.31 15.09 18.21 20.52 26.77
chi2(k=6) 7.04 8.56 10.64 12.59 12.85 16.81 20.06 22.46 28.91
chi2(k=7) 8.18 9.80 12.02 14.07 14.34 18.48 21.85 24.32 30.96
chi2(k=8) 9.30 11.03 13.36 15.51 15.79 20.09 23.57 26.12 32.93
chi2(k=9) 10.42 12.24 14.68 16.92 17.21 21.67 25.26 27.88 34.85
chi2(k=10) 11.54 13.44 15.99 18.31 18.61 23.21 26.90 29.59 36.72

The Python script used to generate this table is below. It uses the percentage point function (ppf) and cumulative distribution functions (cdf) from the scipy.stats.chi2 distribution in the SciPy stats library:

#!/usr/bin/python
import scipy.stats
import math

#stand deviations to calculate
sigma = [   1.0,
            math.sqrt(scipy.stats.chi2.ppf(0.8,1)),
            math.sqrt(scipy.stats.chi2.ppf(0.9,1)),
            math.sqrt(scipy.stats.chi2.ppf(0.95,1)),
            2.0,
            math.sqrt(scipy.stats.chi2.ppf(0.99,1)),
            3.0,
            math.sqrt(scipy.stats.chi2.ppf(0.999,1)),
            4.0   ]

#confidence intervals these sigmas represent:
conf_int = [ scipy.stats.chi2.cdf( s**2,1) for s in sigma ]

#degrees of freedom to calculate
dof = range(1,11)

print "sigma     \t" + "\t".join(["%1.2f"%(s) for s in sigma])
print "conf_int  \t" + "\t".join(["%1.2f%%"%(100*ci) for ci in conf_int])
print "p-value   \t" + "\t".join(["%1.5f"%(1-ci) for ci in conf_int])

for d in dof:
    chi_squared = [ scipy.stats.chi2.ppf( ci, d) for ci in conf_int ]
    print "chi2(k=%d)\t"%d + "\t".join(["%1.2f" % c for c in chi_squared])

Monday, March 5, 2012

Debugging memory explosions with GDB

When writing large/complex/multithreaded software its inevitable that bugs will happen. Valgrind is a great tool for finding memory leaks, however it's not very useful when a bug causes a malloc "explosion" and the OS responds by killing the process. It is especially difficult to find such a "malloc bomb" when it happens randomly after running the code for many hours (which would be days with valgrind's overheads).

Synopsis of a malloc() bomb:
  • run the compiled program inside a debugger: eg. `gdb yourprogramname`. 
  • after watching the program run normally for many hours the bug occurs and memory usage spikes: the buggy program tries to allocate more memory from the heap than is available. 
  • the OS starts paging memory to swap and the hard disk thrashes, the system starts to freeze and becomes unresponsive.
  • you cant get keyboard focus to the console to hit Ctrl+C, as a result you cant break/stop your code inside the debugger.
  • eventually the OS kills the program in a panic and after more disk thrashing keyboard control returns to the debugger saying "Abnormal program exit". 
  • the program is killed with no opportunity to produce a core dump or break into the debugger. 

Even if you were sitting at the gdb console waiting for this to occur, you likely wouldn't catch it before the system becomes unresponsive to the keyboard Ctrl+C. Unfortunately gdb and other debuggers don't allow breakpoints on total memory usage.

Solution: One solution that works very nicely on Ubuntu/Linux is a simple bash script that loops checking memory usage. First run your program at a console inside the debugger `gdb yourprogramname`, and then run/paste this script into another console:
#!/bin/bash
#set the process name here (make sure there is only one instance)
NAME=yourprogramname
#set the process memory threshold in limit in KiB
MEMLIMIT=2000000
PID=`pgrep $NAME`
while true
do
    MEM=`echo 0 $(cat /proc/$PID/smaps | grep Rss | awk '{print $2}' | sed 's#^#+#') | bc`
    echo "Memory usage: $MEM"
    if [ $MEM -gt $MEMLIMIT ]; then echo "***sending sigint***"; kill -INT $PID; fi
    sleep 1
done

Make sure you are running only one instance of your program and it is inside the debugger. The script first finds the PID of your running program and then loops forever checking how much memory it is using. If total memory usage passes a predefined threshold (2GiB default above) the script sends SIGINT to your process. This has the same effect as hitting Ctrl+C in the debugger, however the script is 99.9% more likely to send the SIGINT before the disk thrashing and system becomes unresponsive. The repeated SIGINTs are ignored.

Example: (returning to the computer after several hours of execution)
Memory usage: 1047948
Memory usage: 1049996
Memory usage: 1049996
Memory usage: 1049812
Memory usage: 1893104
Memory usage: 2692788
***sending sigint***
Memory usage: 2695596
***sending sigint***
Memory usage: 2695596
***sending sigint***
Memory usage: 2695596
***sending sigint***
Memory usage: 3116488
***sending sigint***

The program is now frozen with all 3.1GB of memory still allocated and the debugger waiting in the other console window. This particular OpenGL/Qt-based app had 4 worker threads and finding the cause of this particular bug took some time. I've highlighted the GDB commands that narrowed the problem down:

Normal program output...
...
Program received signal SIGINT, Interrupt.
[Switching to Thread 0x7ffff7f9a9e0 (LWP 8264)]
0x00007ffff3f11773 in __GI___poll (fds=, nfds=, timeout=) at ../sysdeps/unix/sysv/linux/poll.c:87
87 ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
in ../sysdeps/unix/sysv/linux/poll.c
(gdb) info threads
Id Target Id Frame
4 Thread 0x7fffdd557700 (LWP 8269) "mapbuilder" __int64_t_decode_array (_buf=0x2ca8e130, offset=469, maxlen=82, p=0x7fff4cdfbc38, elements=1) at /usr/local/include/lcm/lcm_coretypes.h:278
3 Thread 0x7fffddd58700 (LWP 8268) "mapbuilder" 0x00007ffff645aa1b in ?? () from /usr/lib/nvidia-current/libGL.so.1
2 Thread 0x7fffde559700 (LWP 8267) "mapbuilder" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
* 1 Thread 0x7ffff7f9a9e0 (LWP 8264) "mapbuilder" 0x00007ffff3f11773 in __GI___poll (fds=, nfds=, timeout=) at ../sysdeps/unix/sysv/linux/poll.c:87
(gdb) thread 4
[Switching to thread 4 (Thread 0x7fffdd557700 (LWP 8269))]
#0 __int64_t_decode_array (_buf=0x2ca8e130, offset=469, maxlen=82, p=0x7fff4cdfbc38, elements=1) at /usr/local/include/lcm/lcm_coretypes.h:278
278 if (maxlen < total_size)
(gdb) bt
#0 __int64_t_decode_array (_buf=0x2ca8e130, offset=469, maxlen=82, p=0x7fff4cdfbc38, elements=1) at /usr/local/include/lcm/lcm_coretypes.h:278
#1 0x0000000000444db6 in __double_decode_array (elements=1, p=0x7fff4cdfbc38, maxlen=82, offset=469, _buf=0x2ca8e130) at /usr/local/include/lcm/lcm_coretypes.h:345
#2 gps_t::_decodeNoHash (this=0x7fff4cdfbc08, buf=0x2ca8e130, offset=421, maxlen=130) at /home/rob/mapbuilder/include/gps_t.hpp:160
#3 0x0000000000442805 in _decodeNoHash (maxlen=543, offset=8, buf=0x2ca8e130, this=0x7fff4cdfbbb0) at /home/rob/mapbuilder/include/robot_map_data_t.hpp:116
#4 decode (maxlen=543, offset=0, buf=0x2ca8e130, this=0x7fff4cdfbbb0) at /home/rob/mapbuilder/include/robot_map_data_t.hpp:62
#5 LogPlayThread::run (this=0x97ed60) at /home/rob/mapbuilder/src/logthread.cpp:419
#6 0x00007ffff69f0d05 in ?? () from /usr/lib/x86_64-linux-gnu/libQtCore.so.4
#7 0x00007ffff2f2cefc in start_thread (arg=0x7fffdd557700) at pthread_create.c:304
#8 0x00007ffff3f1d89d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#9 0x0000000000000000 in ?? ()

I stepped the execution out of the LCM library and back to my code (#5 in the backtrace). By setting a breakpoint there I was able to find the bug after a few loops. Let me know if this helps you out, it certainly saved me hours staring at the console waiting!

Friday, November 25, 2011

Capturing USB2 video at 35MB/sec on Linux

Problem: many USB cameras require the full bandwidth of a USB2.0 port without any interruptions. Unfortunately the kernel power saving features in modern linux distributions periodically switch the processor into sleep states that disrupt USB data transfer and cause frames to be dropped. As an example our uEye 25fps 1.3Mpix machine-vision cameras will often drop 50% of frames, not ideal from an expensive camera. The problem is with the operating system and not the hardware. It has been documented on the uEye cameras by IDS. The same problem happens with the Kinect RGBD sensor, with 10-50% of frames frequently being lost.

We have observed this problem with Intel processors including Core 2, Core i5 and i7, and running on Ubuntu 9.10, 10.04, 10.10, 11.04. It would likely occur on any Linux distribution with kernel 2.6+.  The different sleep "C-states" on Intel's processors are discussed here. It might seem counter-intuitive, however a faster PC is likely to drop more video frames.

Verify: One way to verify if the C-states are causing the issue is to heavily load the CPU so it doesn't enter any low power modes. Start capturing video data and then run something CPU intensive, eg. `cat /dev/urandom | gzip > /dev/null` will load one or two CPU cores. If you have a 4 or 8 core processor you'll want to run this multiple times. If the video capture performance improves, this is your problem.

You can check what C-states your processor is entering using Intel's powertop command line tool (`sudo apt-get install powertop` and run `powertop`).

Some BIOSs allow you to set the maximum C-state, however the operating system (both Windows or Linux) appear to ignore this setting. For Ubuntu linux the intel_idle kernel module must be instructed to not to enter any sleep C-states below 0 (or 1). However, the module is loaded at boot and despite offering a /sys interface cannot be changed after loading:
echo 0 > /sys/module/intel_idle/parameters/max_cstate
bash: /sys/module/intel_idle/parameters/max_cstate: Permission denied


Solution: on Ubuntu you need to pass the max_cstate level as a parameter during boot. This is easily done by editing the Grub config file /etc/default/grub. Add or edit the GRUB_CMDLINE_LINUX line to say:

GRUB_CMDLINE_LINUX="processor.max_cstate=0 intel_idle.max_cstate=0"
or if you'd like to try and retain some power savings:

GRUB_CMDLINE_LINUX="processor.max_cstate=1 intel_idle.max_cstate=1"
Update your Grub menu with `sudo update-grub` and reboot.

After booting confirm the C-states have been disabled by running `powertop` again. It will either indicate the processor isn't entering the states greater than zero, or not show any C-state information. Unfortunately this will increase your PC's power usage, however that's not likely a problem if you're processing live video at 35MB/sec!

Friday, April 15, 2011

Large-scale Multi-robot Mapping in MAGIC 2010

5th IEEE Conference on Robotics, Automation and Mechatronics (RAM 2011)

Authors: Robert Reid, Thomas Bräunl

Abstract: We describe a large-scale decentralised multi-robot mapping system that outputs globally optimised metric maps in real-time. The mapping system was used by team WAMbot in the finals of the Multi-Autonomous Ground-robotics International Challenge (MAGIC 2010). Research contributions include a novel large-scale multi-robot graph-based non-linear map optimisation approach, a hybrid decentralised and distributed mapping system and novel graphics processing unit (GPU) based approaches for accelerating intensive map matching and fusion operations. Our mapping system scales linearly with map size and on commodity hardware can easily map a 500m×500m urban area. We demonstrate robust, highly efficient and accurate mapping results from two different fleets of mobile robots. Videos, maps and timing results from the MAGIC 2010 challenge are presented.


Index Terms: Large-scale multi-robot mapping, simultaneous localisation & mapping (SLAM), graph-based SLAM, distributed, decentralised, GPU map fusion, GPU map correlation.
 BibTeX

Videos: These videos present the output of the Mapbuilder system and play at 15 × real-time speed. They are best viewed full-screen and at 1920×1080 (click the full-screen icon in the lower right of the video and also select 1080p). The global occupancy maps are a top-down view showing obstacles, free-space and unknown areas in the environment. The submap spatial constraints are shown as blue lines. We note that tunable parameters in the mapping system were not changed when moving between environments.


MAGIC 2010 Challenge Phase 2: This video shows 7 UGVs exploring an agricultural show-ground, passing in and out of horse stables. Figure 5 in the conference paper shows the final map overlaid on aerial imagery. The 200m× 160m map is metrically accurate and correctly georeferenced by the intermittent and noisy GPS data. To quantify the RMS error in the final map we selected a set of evenly distributed features in the map and measured their linear error to the aerial image. The estimated mean linear error was 0.57m. The final map has 446 submaps, 419 ground-truth constraints and 1284 submap constraints. For the final map each execution cycle it took 21ms to incrementally optimise the entire pose-graph, and 87ms to build and output the global occupancy map. The initial submaps take some time to become connected due to occlusions and ambiguous geometry in the matching process, however after driving several meters each UGV correctly joins its current submap to the global graph. Slow drifts in submaps are observed as transient GPS noise is filtered into ground-truth data. If additional ground truth constraints had been supplied by the operator, spatial errors could have been reduced by an order of magnitude.


MAGIC 2010 Old Ram Shed Challenge: This video shows 5 UGVs exploring a 70m ×40m agricultural shed. Again the submaps take some time to become connected however once the UGVs begin moving the map is rapidly constrained and is metrically accurate. Note that barriers inside the shed were temporary and very few walls and angles were regular. No ground-truth was made available to evaluate the accuracy of this map. The final map has 260 submaps and 2170 submap constraints. For the complete map each execution cycle took 23ms to incrementally optimise the pose graph and 85ms to build and output the global occupancy map.