How to Measure Code Quality, According to VP of Engineering Sin-Mei Tsai
Code quality measurement is an important aspect of engineering. In fact, when we interview engineering managers for Shippo, one of our standard questions is “How do you think about code quality?” or “What does code quality mean to you?” Recently, our VP of Engineering Sin-mei Tsai was pleasantly surprised to be asked this question by a candidate.
Here is Sin-mei’s response for how we measure code quality at Shippo.
How Do You Define and Measure Code Quality?
At Shippo, we think of code quality from three different points of view. One of them is the orthodox definition of code quality, but we have expanded on code quality with a couple of points of views of our own.
Is the Code Functional?
The first and foremost category of quality for engineers is the mainstream definition of quality: “Did the code work as designed?” Most quality assurance practices address functional quality by identifying defects in code.
Defects are cheapest to address when found early. For that reason, unit testing is very important. At Shippo, we focus our unit testing on functional testing. Instead of testing whether the code behaves as coded, we focus on whether the code behaves as the user would expect. We try to use mock objects only when we must, such as in stubbing out an external dependency.
In our legacy codebase in Python, an interpreted language, the engineers rightly wanted to focus on code coverage. For an interpreted language, code coverage can help ensure that each line of code has been exercised and does not contain syntax errors. With a compiled language like Go, which is the language we are using in our new platform, we have the advantage that syntax errors are caught by the compiler. This allows us to focus on functional testing, which is a hybrid of unit testing in that the test focuses on a functional unit, a function, and black box testing where we exercise the function as the caller without mocking any internals. To help us perform functional testing without mock objects, we maintain a seed database that reflects the database schema and contains the data definitions that we would expect in production. This seed database provides the initial data set for any local and integration testing environments.
Besides unit testing, engineers also perform manual developer end-to-end testing. This end-to-end testing is performed to ensure that when we integrate different pieces of code, the code behaves as expected. It is in manual integration testing that we test some tricky combinations. Sometimes we find that the new code does not play well with old state, or with old code, which is a transient problem that happens during deployment.
To make doubly-sure that code behaves as expected, we also employ independent testing during the quality assurance phase. Even though developers are required to thoroughly test their code and regression test any impacted functionality, developers tend to be myopic in their testing. Developers will test from their own point of view, from having written the code. An independent tester will test cases that are unexpected. One such example that we ran into in the past was a new function to add addresses to an address book. An independent tester thought of the case to remove the last remaining address, whereas the developer assumed that it couldn’t happen because the requirements stated that it shouldn’t happen. At Shippo, the engineers take on the role of testers and test the code developed by their fellow engineers. So long as the engineer didn’t contribute to the code being tested, they are eligible to be the independent tester.
For API changes, we perform parity testing where we record API requests and responses from production, and then play back the recorded API requests against the new code and compare the responses from the system under test against actual responses recorded from production. Because the API is stateful (a POST or PUT changes the state), we have developed heuristics for performing the comparisons in a parity test. This is especially important because we support many versions of the API, including some with unexpected side effects or object representations, such as in the case of null vs. empty object.
What is the Technical Design Quality?
At Shippo, we also consider tech debt in our definition of how we measure code quality. Architectural and software design integrity is very important to us. The question that we ask is: “In adding or changing a piece of code, did we create any intentional tech debt?”
Not all code needs to be scalable. Sometimes, we intend to create a rapid prototype to gauge user interest in a new feature. Once the prototype has served its purpose, it can be thrown away. Code needs to be written considering its longevity in mind. Code that is intended to last longer than a prototype should be written in such a way that future developers can easily change or extend it. Therefore, readability and maintainability are important.
When the code we write is meant to last more than the next year or two, we rely on technical designs to ensure the code ages well. Technical design is one of the most rewarding parts of code development. Nothing feels as satisfying as an elegant design. While elegance may be subjective, a formal peer design review process can help steer a technical design in the right direction.
In a design review, we evaluate the proposed solution against the problem that it is meant to solve. In reviewing a design, we start with the end first. What will it be like to maintain the solution? If the existing infrastructure can already address the new problem, we want to understand whether the existing infrastructure or patterns should be reused. Creating unnecessary new infrastructure or patterns will create additional maintenance overhead for the team and reduce our ability to create value. If a new solution provides enough additional benefit over the old solution, then we might evaluate moving the old code to the new infrastructure.
In a peer design review, the designer shares with the team the benefits and drawbacks of the proposed solution, as well as any other solutions that were evaluated. Typically, the solution will be presented in graphical form, such as a system architecture with flow diagrams. And the reviewers will provide input with regards to scalability, fault tolerance, and maintainability.
Components that need to handle large volume need to be designed for scale. Post development, we make use of performance (stress) testing to ensure that the new component can withstand multiples of the expected volume. The engineer is responsible for understanding the load, for example, the mixture of API calls that the new system will receive once in production. The performance test entails comparing the performance characteristics of the old system and the new system. Sometimes, we will evaluate two designs using performance testing to verify that one design scales better than the other.
Does it Provide Value to our Customers?
Above all, the purpose of product development is to create value for customers. Therefore one of the most important ways to evaluate code is whether it provides value to customers. The way we ensure that code will provide value for customers is by relying on customer-centric practices in creating our product development plans. Before even a line of code is written, our product team conducts customer research to get and vet product ideas.
The engineer’s job is first to understand the customer problem that we are trying to solve, and then to construct the solution for the problem. When building solutions, we are most interested in knowing whether the solution works for the customer. Whenever possible, we favor shipping an early iteration of the solution to get customer feedback.
Once the first iteration is in production, we observe customers using our product and collect customer feedback to gauge whether the solution is useful or desirable, whether it is easy to discover and use, and whether the user wants more of it. If the actual usage of the functionality meets the expected usage and engagement, then we know the code has met the most important quality metric of usefulness.
Our automated emails are a good example of a useful feature—we are constantly releasing variations and improvements to these emails.
As we confirm whether the solution works, we have more confidence in shipping follow-on iterations of the solution. For us, shipping follow-on iterations is a sign of a successful feature.
Conversely, sometimes a new solution can create unexpected outcomes. New functionality can get in the way of something users need to do. Even worse, sometimes new functionality can introduce unintended side effects and detract value or erode trust. We consider these to be cases of bad code quality.
To limit negative effects that arise from releasing new functionality, we roll out code in small increments. For critical functionality, we may start with 5 percent of the user population and slowly step up to 10 percent. We rely on instrumentation to provide us visibility as to how new functionality works for users, and as soon as we detect issues, we either fix the issue quickly or dial back the percentage to zero if a fix cannot be released quickly.
Does it Put People First?
At Shippo, we are known to be people-first. Our priorities are no different when it comes to code quality. In considering value to customers and technical design quality, we are keeping the customer and the future developer in mind when it comes to how we measure code quality.
In a recent conversation with our VP of Product, Yotam Troim, we talked about conducting post mortems on projects that run later than expected. It occurred to me that late code is a form of bad code. Perhaps this can be a topic for future conversation on how we measure code quality.
This article was first published in the Version One blog. You can read the original piece here.