Skip to content

Help! I'm working in a large code base!

Published: at 09:36 AM

“Working in large code bases always suck” is a common sentiment. I have been thinking about this sentence for a few days. I don’t really think that it has to suck, but it often does. In this post I’m going to explore why large code bases might suck and how to alleviate some of the pain you might experience.

Should you just run?

Why not just run away screaming from a large code base? We all love a green field project, with new technology and a blank canvas. In a large code base you might not be able to create your own architecture, but you can learn a lot about architecture. Learning what works in the current architecture, and what is currently biting you in the foot. Are there some abstractions or naming everyone misunderstands? You can learn a lot if you can figure out why. It is also worth remembering large code bases are large for a reason. Often very good reasons. The organization has invested in the code base because it has a lot of users and/or it is business critical.

Who has learned most about software development? A developer who spent two years getting their hands dirty in a large legacy code base, or one who has spent two years going from green field to green field project. Working in a large code base can teach you so much about software development.

Photo by <a href="https://unsplash.com/@geraninmo?utm_content=creditCopyText&#x26;utm_medium=referral&#x26;utm_source=unsplash">Geranimo</a> on <a href="https://unsplash.com/photos/birds-eye-view-of-river-between-trees-hl7ILaqrOqI?utm_content=creditCopyText&#x26;utm_medium=referral&#x26;utm_source=unsplash">Unsplash</a>

Getting to know your large code base

So you are just starting work in a team that maintains a large code base. What do you do your first few weeks to get your bearings? Maybe there is some documentation, it is probably wildly out of date and not very useful for a beginner anyway. So here are are some tips on where to get started.

1. Outside view

Get to know the system from a users perspective. What domain problems are solved by the code you’re going to work on. Start with the broad strokes. Getting familiar with the problem domain is an important step to be able to work on the code base effectively and efficiently.

2. Technology and conventions

What languages and frameworks are used in the code base? What are the important conventions in those? As an example I mostly work with dotnet and some might say that if you’ve seen one dotnet codebase you have seen them all. Not entirely true but getting up to speed on how dotnet projects are structured is going to be helpful as you dive into a large dotnet code base. The same is the case for many other stacks such as Java Spring Boot or Ruby on Rails.

3. Inputs and outputs

Reading the entire code base is probably not feasible. Instead I would recommend focusing on the edges of the application, aka the inputs and outputs. What I mean by that is to start looking at what the application outputs. The most important being of course the database. What are the important tables/collections, what data is stored there? Next get an overview of what important integrations the application has with external services. What critical dependencies are there on other systems? Are these systems maintained by a team in your organization or by someone completely external?

4. Implement something

If you are working in a large code base, there is probably a corresponding large backlog associated with it. Now is the time to dive in. Ask the veterans on your team if there is a task you can take on. It doesn’t really have to be super important, or even part of the current iteration or priority, just something to get your feet wet, and your first code merged. It is important to get rid of the anxiety associated with submitting your first pull request over with as quickly and pain free as possible.

5. Mental model

Now you’ve spent some time with the code base. A good exercise to formalize and cement any knowledge is to explain it to others and/or write about it. Try using a simple diagramming program like excalidraw to map out your current understanding of the system. Think of this as an exercise to improve your own and your teams understanding of the system. Not a documentation job, you can throw the diagrams away afterwards if you’d like, and I probably recommend doing that as those types of diagrams are not particularly useful.

Pains in a large code base and how to deal with them

So you are working in a large code base now. Knowing large code bases, you are going to meet quite a few pain points. In this section I will try to list a few pain points you can expect, and some tips on how to deal with them. My focus here has been on what you as an individual contributor can focus on. There are no easy fixes or silver bullets, but by putting in the effort over time you can make working with your large code base more pleasant.

Features take a long time to complete

There are probably a lot of layers to your large code base. And because of this you need to do quite a bit of work to complete a feature. You need to change the database schema, add an API call and implement the front end, maybe even more like an email notification or some other external integration. My number one tip here is to submit merge requests for each individual part of the feature, where possible. Use features toggles to make it more feasible. In most cases it’s not a problem to extend the database schema with the columns you need for your feature in its own pull request. This makes it easier to review your code and you get better feedback earlier.

Hard to reproduce bugs

Sometimes a user reports a bug and you can’t reproduce it locally. My first tip is to call the user, preferably with video, and have them walk you through how they experienced it. Take notes, screenshots or record the call so you don’t have to go back to them and ask them to do it again. Bugs are often hard to reproduce when application state has to be a particular way for the bug to happen. Get familiar with how to debug your tech stack. VSCode, Neovim and JetBrains IDEs have very good debuggers for most tech stacks now (node, dotnet, rust, go, java, etc). Learn how to use this debugger. Use conditional breakpoints, you can edit the value of variables and even time travel debug (running a code again without relaunching the program). The debugger lets you easily and quickly experiment with what state makes the bug happen. Ideally then you can write an appropriate test, that fails in the same way your user experiences. If it is easy to add or change your test suite, reproducing and fixing bugs is going to be a lot more pleasant and less time consuming. Being able to reproduce a bug, write a test, and then fix the implementation until the test passes feels very good.

Are you able to reproduce a bug, but can’t figure out what causes it? Learn how to use git bisect. Bisect lets you binary search through the git history, you feed it a known good commit id, and a known not good commit id. Then git bisect checks out each commit and you need to tell git if the current commit is good or not. Having an automatic test can be a time saver.

A side note about bugs, try to think of a workaround for your user. Is there a way they can continue using your product?

There are three different ways of doing things

Often you start a code base with a clear architecture, and you are happy with it for a while. Then as time goes on you want to do some thing in a different way. So you add that way of doing it, thinking that over time we can migrate everything over to doing it that way, but you don’t do that because we live in the real world. So now you have two ways of doing that thing. Over time this happens to other things. This becomes very painful to deal with over time, especially when you have to onboard new team members. You have to explain all those migrations you are supposedly in the middle of. Here I would firstly recommend exercising some restraint. I have been that guy who enters a code base thinking “ah time to fix every little architectural niggle here”. Realistically fresh people with this mind set are probably going to make things worse, not better. If you really want to fix a code base you need to deeply understand the trade-offs. A new way of doing something doesn’t just have to be better than the old way, it has to be so much better that its worth maintain both implementations for a long while.

Sticking to the known way of doing something until you are ready to move on is more often than not the right choice in my experience. Learn how to work in the architecture you have, instead of trying to change it and in the process making a mess. In dotnet and Java there are some ArchUnit tests that lets write automated tests that test that the rules of the architecture are respected. For example that the web-project does not reference the infrastructure layer. Consider using these. If nothing else they can be a good way of documenting how your architecture should be.

I always feel lost in the code

There is a lot of code in a large code base, obviously. It can be hard to find your way around in it, and to keep everything you need to know in your head at any given moment. I would recommend having a good system for taking notes, and keeping an engineering log or journal. I have a notebook where I write a log of what I have been doing each work day, and I use obsidian for for technical notes about the code base. This helps me clear my brain of information I don’t need to keep track off, making room for the information I need to accomplish the task I am working on.

Learn how to work with text files in the terminal. Get good at using tools like grep, fzf, cat, sed and maybe some awk and jq. Then you will be able to find and replace things in any code base and language. You don’t have to be a wizard, but know the basics and what each tool can do, then its easy to find out how to achieve what you need.

We are X versions behind in the framework we use

When working with a framework it is very important to look at how the framework authors version the framework. Do they use semantic versioning? When do they upgrade the major versions, and how often? Do they have a Long Term Support version that gets security patches longer? Find out the upgrade cadence your codebase needs to be on. Maybe you will be fine sticking to the LTS releases, and minor patches for security fixes. Upgrading can be a pretty tedious process. Write some scripts to automate upgrading the project. Using tools like bash, sed and jq to automate updating version references across your code base.

Conclusion

Thanks for reading! I hope something you read here can help you in your journey to becoming comfortable in large code bases.