Refactoring real, large, and messy codebases - Digital Defiant Studios Seattle Web development & Design by Chris Tabor

How to refactor real software codebases

Typically, refactoring examples and tips and tricks online focus on very small, manageable codebases or modules. I have yet to see any article that really brings a genuine real life “worst-case” scenario to the table, and illustrates how to increase the quality and predictability of said code. My goal here is not to prescribe a very strict, concrete set of ideals, but rather to offer some examples and “rules-of-thumb” that I’ve found handy in settings that are anything but ideal.

Getting your mind right

The first thing that has to happen is a shift in mindset (assuming your mind is not already set this way!) Naturally, we approach a given problem with our experience in how that problem should be solved. In refactoring a project, this can cloud our judgment and give us a false sense of competence as we jump into it. In software, this can make things worse, quickly. And the more complicated the codebase, the more likely failures and regressions will occur.

Starting small

The biggest, biggest rule-of-thumb I have found refuge in is the notion of “small changes”. There is no limit to how small they can be, but I tend to think of it as inversely proportional to the scale and severity of the problem: the more complicated, poorly architect-ed, hard to reason about the codebase is, the smaller your changes need to be.

The reasoning is that small changes are very incremental, and these can be used to create a stronger codebase, which can then be improved upon in broader strokes; it has to be incremental, otherwise you risk regression or premature optimization, or worse, total incompleteness.

So, starting small for example, might consist of:

making variable names more obvious
removing duplicate variables
isolating code into functions (basically wrap what exists into a function, and grab the data that way.)
adding comments!
adding docstrings
adding documentation
caching data when possible

These don’t alter the overall structure, but allow for consistent improvement. Another nice thing here is that you don’t spend a lot of bullshit time on the process of “what, where, and how” – you’re not classifying the refactoring, you’re just doing it.

A stepping stone to broader strokes

When you’ve completed some of the above examples, you start to feel the codebase as being a bit nicer, and in some cases, there might even be tangible improvements, say, performance or productivity. At this point, you can start in on the broader, more architectural issues, but still keep in mind you aren’t re-architecting the entire application or codebase – just breaking things apart a little bit more.

Some more examples here:

reusing functions
introducing namespaces (either global, or more likely, sub-namespaces)
pulling data/configuration out of implementation code, and storing it in the right files (to introduce separation of concerns).
either pulling methods out of classes, or adding them in – depending on the system. Often times, methods are too generic, and should be pulled off of object classes, in favor of purpose driven modules (think G.R.A.S.P.). But other times, they are thrown about disparately, and could benefit from the conceptual “tidying” that object oriented programming provides. This aspect, by its nature, has to be approached on a case-by-case basis. Often times OOP is abused and irrelevant, but other times it can greatly improve the system structure and adaptability.

Getting to a point of testability

There’s no hard science as to when you should introduce test cases and more formal verification to your codebase, but if you haven’t already got it, I would wait until you knock out the basics described in the first section. The reason being, you’ll often find yourself duplicating efforts when developing test cases and formal verification, only to quickly change it, once your refactor provides a clearer look into the structure.

Once you’re here though, the least you can do is stub out your test cases. This means coming up with test matrices for each function, or class and accompanying methods, and the various permutations of arguments that each might have, and this falls squarely in unit-testing world. I would certainly hesitate to provide integration tests at this phase, because you’re probably going to break them all once your refactor is complete.

Beginning to structure the big picture

Now you can rethink the system in the most broad (or nearly so) terms. From moving code into components and system modules, to defining clearer interfaces between classes, to investing in microservice architectures, you can start to break the code into something that might resemble a bonafide diagram. This is when you want to develop your broader testing and verification analysis, because you’re finally at a point where it makes sense to test things as-is.

Unit tests should be actively defined, if not already, and integration testing can begin as well.

I’ve found this kind of approach immensely helpful in real world cases, and hopefully, you can too!