Concurrency defects in multi-core or multi-threaded applications are probably the source of more troublesome problems than other defects in recent history, Coverity’s Mark Donsky told me recently. These defects are tricky, because they are “virtually impossible to reduce reliably,” he said, and “can take months of painstaking effort to reproduce and fix using traditional testing methodologies.”
Donsky filled me in on some of the most common concurrency defects and spotlighted the three that are currently causing the most problems: race conditions, deadlocks and thread blocks.
- Race conditions describe what happens when multiple threads access shared data without appropriate locks. “When a race condition occurs, one thread may inadvertently overwrite data used by another thread,” he said. “This results in data loss and corruption.”
- A deadlock can occur when two or more two or more threads are each waiting for each other to release a resource. “Some of the most frustrating deadlocks involve three, four or even more threads in a circular dependency,” said Donsky. “As with race conditions, deadlocks are virtually impossible to reproduce and can delay product releases by months as engineering teams scramble to isolate the cause of a deadlock.”
- A thread block happens when a thread invokes a long-running operation while holding a lock that other threads are waiting for. Although the first thread will continue executing, all other threads are blocked until the long-running operation completes. Consequently, Donsky explained, “a thread block can bring many aspects of a multi-threaded application down to a grinding halt.”
Concurrency defects showed up very often in last year’s Coverity Open Source Report, an analysis of over 50 million lines of open source code, and continue to pose a big problem, said Donsky.
Other common, crash-causing defects include null pointer dereferencing, resource leaks, buffer overruns and unsafe use of returned null values, according to Donsky, director of project management for San Francisco-based Coverity, maker of software integrity products.
Coverity’s open source security Scan site has been catching some headline-making defects recently, inclulding the 0day Local Linux root exploit. “As part of the Scan program, we reported this issue back to key Linux developers so that they could respond to this vulnerability,” said Donsky.
Catching software defects before they go into production, said Donsky, is the best way for your software not to make the wrong kind of front-page news.