Learning Good Programming Practices for Scientific Code

Learning best programming practices is important and difficult. It is also easy to get lost in the process, without formal training in computer science.

Researchers spend an increasing amount of time writing, testing, debugging, installing, and maintaining software. This can be very time-consuming, to the extent that these software-related tasks can easily become a dominant factor in how long it takes to produce a new result.

Interestingly, there is a whole body of knowledge called software engineering, which, in a few words, deals with how to organize code and software projects in order to maximize the productivity of software developers.

Increasing our productivity in software development is quite desirable, but it also requires an initial time investment.

Publication-Driven-Development: Writing Horrible Code

When doing research, the typical programming workflow is to iteratively implement, design, and develop a sophisticated set of algorithms, which, most of the time, invariably evolves into a horrible pile of undocumented hacks.

After some iterations of that process, we get the results we need for a publication. In the end, the software does what it was meant to do (produce a publication) and nothing more.

This process comes at a practical cost: adding new features or sharing some code with others has become impossible. Therefore, when beginning a new project, we almost have to start from scratch. Each time.

At a certain point in time, when searching our old code for something we can rely on, and find nothing worth using, we wish we had written reusable code, instead of disposable code. That we had written reliable software for future use, instead of having created a big pile of rubbish that can only serve the purpose of producing that publication.

I know that from first-hand experience. As an applied mathematician, writing sophisticated algorithms has been my job for more than a decade, but I didn’t have any formal education in software engineering. Much of what I know today about software development stems from my own experiences and mistakes, as well as having read several great books.

A balance must be found, of course, between writing good code and writing code as fast as possible.

To be honest, in some cases writing disposable code as fast as possible, just to get a paper published, is a perfectly fine plan. However, there’s no reason not to follow at least some of the best practices, even in this case. This would be the approach of developing “barely sufficient” good practices.

However, I believe this is an important concept: knowledge can sometimes work like compound interest. In particular, making a small effort today can lead to a big gain tomorrow, and saving some time and cutting corners today will probably come back to bite us at a greater cost at a later point in time.

How Can a Researcher Learn Basic Software Engineering?

Ok, I believe we made the case for the desirability of learning good programming practices. But is it possible to learn software engineering as a researcher, or are we condemned to be bad programmers?

I think that software engineering is just like any new knowledge we want to incorporate. And as researchers, we have this ability to incorporate new knowledge, right? We can learn software engineering over time, from talking to colleagues, from reading books and papers, and most importantly by learning from our own experiences.

Yet, there is a major barrier to entry into software engineering as a non-CS researcher.

For a researcher, the newest tools and coolest CS trends come bundled with a few annoying characteristics:

Reading books about software engineering is, to some extent, trying to leverage on the experience and the insight of other developers. The problem is that they might have been working in a completely different environment.

Given this situation, learning best programming practices is hard. It takes years of practice, study, and learning from past errors.

Additionally, you just can’t learn best programming practices the same way you learn a cooking recipe. Blindly following a certain mantra “because it’s what the experts do” will result in writing even more horrible code than before the “knowledge” was acquired.

When trying to learn best programming practices, the know-why is just as important, or maybe even more important than the know-how, as good practices are not set in stone, but rather might depend on the situation.

Basic Guideline for Writing Scientific Code

What can we learn, then?

Ignoring all the client/specification references in the software engineering literature, I think the typical researcher should focus on the following basic points:

Unlearning bad programming practices

Certain bad programming practices might develop naturally as a consequence of using the following programming languages.

The Julia programming language

Julia, by solving the so-called two-language problem, also solves some of the most important issues mentioned above.

In Julia, you can try new ideas very quickly, and the code you write in Julia is not just a disposable script, as for with little extra effort it can become quite “the real thing”.

Also, a nice feature of Julia is that it encourages a programming style based on composing many of small functions, as opposed to writing highly abstract and massive code.

Conclusion

Learning good programming practices, without thorough training in computer science can be challenging, specially in the particular scenario faced by researchers having to constantly meet publication, or other deadlines at which the code must just run.

I will continue covering these topics in this website, so stay tuned!

References

Here are some good papers about this, by the way: