View Full Version : Computational science: ...Error

10-14-2010, 10:36 AM
Computational science: ...Error

…why scientific programming does not compute.

Zeeya Merali
Download a PDF of this story.

When hackers leaked thousands of e-mails from the Climatic Research Unit (CRU) at the University of East Anglia in Norwich, UK, last year, global-warming sceptics pored over the documents for signs that researchers had manipulated data. No such evidence emerged, but the e-mails did reveal another problem — one described by a CRU employee named "Harry", who often wrote of his wrestling matches with wonky computer software.

"Yup, my awful programming strikes again," Harry lamented in one of his notes, as he attempted to correct a code analysing weather-station data from Mexico.

Although Harry's frustrations did not ultimately compromise CRU's work, his difficulties will strike a chord with scientists in a wide range of disciplines who do a large amount of coding. Researchers are spending more and more time writing computer software to model biological structures, simulate the early evolution of the Universe and analyse past climate data, among other topics. But programming experts have little faith that most scientists are up to the task.

A quarter of a century ago, most of the computing work done by scientists was relatively straightforward. But as computers and programming tools have grown more complex, scientists have hit a "steep learning curve", says James Hack, director of the US National Center for Computational Sciences at Oak Ridge National Laboratory in Tennessee. "The level of effort and skills needed to keep up aren't in the wheelhouse of the average scientist."

As a general rule, researchers do not test or document their programs rigorously, and they rarely release their codes, making it almost impossible to reproduce and verify published results generated by scientific software, say computer scientists. At best, poorly written programs cause researchers such as Harry to waste valuable time and energy. But the coding problems can sometimes cause substantial harm, and have forced some scientists to retract papers.

As recognition of these issues has grown, software experts and scientists have started exploring ways to improve the codes used in science. Some efforts teach researchers important programming skills, whereas others encourage collaboration between scientists and software engineers, and teach researchers to be more open about their code.
A proper education

Greg Wilson, a computer scientist in Toronto, Canada, who heads Software Carpentry — an online course aimed at improving the computing skills of scientists — says that he woke up to the problem in the 1980s, when he was working at a physics supercomputing facility at the University of Edinburgh, UK. After a series of small mishaps, he realized that, without formal training in programming, it was easy for scientists trying to address some of the Universe's biggest questions to inadvertently introduce errors into their codes, potentially "doing more harm than good".

After decades griping about the poor coding skills of scientists he knew, Wilson decided to see how widespread the problem was. In 2008, he and his colleagues conducted an online survey of almost 2,000 researchers, from students to senior academics, who were working with computers in a range of sciences. What he found was worse than he had anticipated1 (see 'Scientists and their software'). "There are terrifying statistics showing that almost all of what scientists know about coding is self-taught," says Wilson. "They just don't know how bad they are."

As a result, codes may be riddled with tiny errors that do not cause the program to break down, but may drastically change the scientific results that it spits out. One such error tripped up a structural-biology group led by Geoffrey Chang of the Scripps Research Institute in La Jolla, California. In 2006, the team realized that a computer program supplied by another lab had flipped a minus sign, which in turn reversed two columns of input data, causing protein crystal structures that the group had derived to be inverted. Chang says that the other lab provided the code with the best intentions, and "you just trust the code to do the right job". His group was forced to retract five papers published in Science, the Journal of Molecular Biology and Proceedings of the National Academy of Sciences, and now triple checks everything, he says.

"How many fields have been held back, and how many people have had their careers disrupted, because of a buggy program?" asks Wilson.

Much, much more at the link.

Nature (http://www.nature.com/news/2010/101013/full/467775a.html)

10-14-2010, 12:00 PM
I didn't read the whole article, but the problems its outlining in the pasted bits is exactly why we're seeing cross disciplinary majors pop up, like bioinformatics, that train students heavily in both computer science and in a natural science. There are also computer science heavy earth science and climatology majors cropping up all over the place.

I came really close to going to grad school for bioinformatics, which is computer science as applied to molecular biology... and may still one day.