Don’t rewrite from scratch: aim for gradual progress
People faced with legacy Fortran code are often tempted to attempt a complete rewrite in their language of choice.
Whatever you do, don’t attempt to do that. It would be a serious mistake. Often the Fortran code contains years, if not decades, of specific domain knowledge, which would be nearly impossible to recreate in a short amount of time.
Fortran is not the only culprit here. Software engineers are periodically tempted to just rewrite everything in other fields as well. For a related discussion on how silly this idea generally is, a good read is the following blog post by Joel Spolsky: Things that you should never do.
Fortran code written decades ago is probably, for today standards, messy, and not up to date with modern software engineering best practices. The codebase is likely full of implicit domain knowledge. And precisely because of this reason, a careless developer could spend a lifetime attempting to modernize a scientific codebase, and add nothing but bugs. To make things worse, they wouldn’t even notice because they lack the domain expertise.
Another thing to keep in mind is that Fortran is designed for extreme backwards compatibility. A code that has worked fine for the last 30 years, will very likely continue to work for another 30 years.
When modifying Fortran code, one should be extra cautious of not adding new bugs. The goal should be all about gradually improving the code as you work on it rather than attempting to refactor for its own sake.
Unit testing is not going to be possible, system-level tests will be the only alternative. Careful characterization testing and inquiry about edge cases should be on the agenda, as well.
Make system-level tests, and refactor the I/O into an API
The first thing to do should be probably to attempt to call the functionality provided by the Fortran codebase, from another languages which offers additional functionality, or simply in one in which more present-day developers are well versed.
Typically, Fortran code written many decades ago by scientists uses text files for its I/O. This creates a challenge for wrapping Fortran code into Python (either when using f2py or a more direct approach based on ctypes, as discussed in my post about calling Fortran from Python), or C++, languages which are much more popular among developers.
So in order to create an API which is more easily accessible from the outside world, we either need to parse text files (which could result in a substantial performance penalty) or modify the Fortran code slightly to obtain a suitable API.
To do this, simply write some setters and getters routines in Fortran. In Fortran, add a new
API.f90 file that implements the
get_output methods, and make sure these are interoperable with C by using the
Your external call will look something like this:
set_input(a,b,c); fortran_main(); get_output(x,y,z);
This will work even in the presence of COMMON blocks, a technique for storing global variables which was popular in the 80’s.
Make sure to test your system-level API with the data you produced in the previous step, and include relevant edge cases, to be on the safe side.
Modularize the code and remove COMMON blocks
The second possible thing to attempt is to properly structure the code for today standards. Once the codebase is properly modularized, it will be easier to extend or modify.
However, a recurring characteristic of old Fortran code is the abundant use of global variables, sometimes in the form of COMMON blocks.
If there are COMMON blocks all across in the code, start by moving them into a single include file
state.f. Now all modules can include this file to get access to the global state.
Make sure that all variables in common blocks across different files have the same variable names and are in the same order, before removing any re-definition of common blocks by individual files, as this re-definition is affecting the behavior of the code.
After this stage is completed, proper modularization of the code should become easier. Now it’s a matter turning variables in COMMON blocks into module variables. Make module variables private when possible.
Remove global variables and specify argument’s intent
After modularization, you might want to remove all global variables, to enable proper testing and extensibility.
Start by wrapping the remainder global variables into a few custom types, and pass them around to any function that requires to access them. Useful names for these data types could be something like
ProblemData, and so on. Instantiate a
param and a
data variable and pass them to every function that needs it.
intent(out) to each argument of a subroutine you intend to test and refactor. Declare all local variables, and add the
implicit none statement characteristic of modern Fortran.
Once the code is structured this way, it can be more easily tested. Interfacing with new sections of the code that seek to implement new features will be also easier, as now this subroutine is more transparent to the caller. Of course, this step is not required for every single module, but only for the modules that need to be actively developed and/or extended.
If desired, the arguments can be made interoperable with C via
bind(C), and grant access to the subroutine by external programs, where the required extension or new features could be implemented. If moving from Fortran to another programming language is what is desired, this could enable to port the
main function to that other language, and keep the Fortran modules as a library which implements the core functionality.
Gradually add more tests while improving code quality.
Now that you have well-structured code in some of your modules, the goal of this next step is to improve the code readability, so more programmers can understand it, modify it, and work with it.
Importantly, to be able to do this we need to be able to test individual routines. Fortunately, as each subroutine’s argument has a clear intent and there are no subroutines with side-effects (affecting any global state), we can now define tests with more confidence.
Again, the tests probably have to be produced by characterization: call the routine with a few different sets of inputs, and store the outputs. Make sure not to break any test after modernizing your routine.
Now it’s finally the time to remove the
arithmetic if, the
goto statements, and so on. You can follow the Modernizing Old Fortran Guide to know about some modifications you can make.
Once you understand what a routine is doing, you might even want to port it to a different language like C or Python, and keep the Fortran codebase as small and focused on the number-crunching as possible.
Working effectively with old Fortran codebases can be challenging. A deep knowledge of several programming languages (including Fortran), and the ability to interact with the domain experts is required.
The process of giving new life to old code can be extremely rewarding, though, as usually those old Fortran codebases are still in use because they contain substantial scientific knowledge.