Benchmarks - Code Review Size

Benchmarks - Code Review Size

See how your team compares to similar companies

We are excited to announce a new feature we’ve been rolling out to select PullRequest customers: Benchmarks. We have been developing and using these key metrics, based on anonymized datasets from thousands of development teams internally for several months in order to aid our network of expert engineers in providing PullRequest customers with feedback of maximum utility and value. We’re excited to begin offering this information to our customers so teams can have visibility into how they stack up to companies of similar size and composition, identify issues, and set high-level goals.


Teams using PullRequest are able to track their scores over time and where they stand compared to similar companies. Benchmark reporting also includes 30-day comparisons making it easy to track code review process improvements over time.

In our previous article, we discussed the first Benchmark in detail, Issue Catch Rate. In this article, we detail the importance of another key metric: Code Review Size.

Code Review Size

One of the key Benchmark metrics that PullRequest helps you track for code review is the size of your code reviews. The size of a code review is determined by the number of lines of change within a pull or merge request. Many studies, including a highly cited Rigby/Bird paper by Peter C. Rigby and Christian Bird, have indicated that a common behavior of teams with effective code review is keeping change sizes small. The size of code reviews is an important metric to track because it strongly and consistently correlates with two major issues: not catching issues caused by the proposed changes and slow response times from reviewers.

The Issue Catch Rate (which we introduced in the previous announcement) tends to be inversely correlated with Code Review Size. This happens because the complexity of the changes being proposed often increases the “cognitive load” required to effectively review the proposed changes.

Cognitive load

In the context of code review, “cognitive load” is a term that is used to describe how complicated it is for a reviewer to understand and take into account all of the code that they are reading. This can be easily visualized in a simplified case, such as one line of code being added to an existing project. If there is only one line of change, then it is fairly easy as an experienced reviewer to ensure that the one line is implemented correctly without issues. In addition, reviewers are able to carefully assess any surrounding code that might be affected by the change from that one line. If the change affects the way that a method or function behaves that is exposed outside of the scope of the file being reviewed, then the reviewer may also have to dig into all usages of this function to make sure that they are not impacted by the change as well.

This describes the cognitive load impact for just one or two lines of change.

As is seen, this can result in a nearly exponential increase in places to check for issues from the modified code if its impact reaches beyond the local scope of the modified lines of code themselves. With an increase of hundreds or thousands of other lines being changed within a single code review, then the complexity will continue to increase.

Once the complexity is beyond a threshold that a reviewer can reasonably assess, there are only two options for completing the review. One option is that the reviewer can begin taking multiple passes over the code, just looking for individual issues related to changes from a few lines at a time. This results in the code needing to be read many times with different goals in mind each time, which can significantly slow down the review process.

The other option, which happens more frequently, is that the reviewer begins to take shortcuts and make assumptions in order to limit the problem space to a smaller scope. For example, they may not check every external usage of a function modified in the code. Or they may not check interactions between every combination of functions being modified. These shortcuts and assumptions will often end up resulting in bugs or other issues being introduced into the code.

Including tests in reviews

Of course, not all lines of change in a code review are equal in terms of impacting cognitive load. Some don’t increase it at all, and some actually help reduce cognitive load.

One of the main types of code where this is consistently the case is with unit tests. For this reason, PullRequest excludes lines of code added for tests from our tracked Code Review Size metric. Understandably, we do this so that the metric doesn’t get polluted by behavior that provides value to the development team and project, encouraging teams and code authors to write good sets of tests to validate their implementations.

The more isolated the lines of code are from other systems, the less they will affect those systems. In particular, when adding unit tests or other tests alongside implementation code, tests are usually not running in production. They’re disconnected from the rest of the system. This alone will greatly reduce the complexity of reviewing the test lines of code as they don’t have the same potential for introducing an issue.

In addition to this, adding tests alongside implementation can help reviewers make safe assumptions about the impact of changes. A good set of tests, assuming the tests are passing, will help to ensure that certain code paths are functioning properly in all of the specified cases. Part of a reviewer’s job is to think of edge cases and ways to “break” the implementation being introduced by the code author. Including unit tests is usually a good sign that the author has already thought about all edge cases and written tests to exercise and prove that they work as expected.

Improving your score

Since an increase in cognitive load-bearing lines of code has a negative impact on the overall development lifecycle, focus on reducing the number of lines changed per pull request. This should help with two aspects of improving your code review: code reviews will be able to be performed faster and with higher attention to detail. Improving the Code Review Size score will also likely improve the Issue Catch Rate and Code Review Lifecycle Duration scores.

Pull requests should contain discrete, focused changes. Aim to create pull requests that center on one thing at a time. It may seem counterintuitive in that it may take the author slightly longer to create more than one pull request, however, the decreased time during code review and more thorough review will save your team time and code quality in the long run.

Patterns that we’ve seen at PullRequest that should generally be avoided:

  • Creating pull requests that contain all of the required changes for a relatively large or complex feature.
  • Pull requests that are created based on work completed in a duration of time. For example, opening scheduled pull requests every Friday containing changes that were made throughout the week.
  • Similarly, creating single, large pull requests containing all of the changes involved in a development sprint.

Creating appropriately sized pull requests is often an art that needs to be perfected over time. We will have to save more detailed suggestions for splitting up changes into multiple pull requests for future articles as there are many ways to do this. However, making sure that you are tracking the metric and focused on improving it will help to ensure your developers are making good decisions.

See your team’s score in minutes

Connect your team’s GitHub, Bitbucket, GitLab or Azure DevOps repositories to PullRequest to see how you compare to other similar companies. We’ll notify you when the your scores are ready.

Click here to get set up on PullRequest 📊

Access to PullRequest’s Code Review Benchmarks is free along with other great tools like PullRequest’s Repository Insights Dashboard.

Learn more about Benchmarks

Learn about the other code review metrics PullRequest tracks, our methods for deriving and comparing data, and how to improve your score in our Benchmarks Documentation.

Have questions?

We’d be happy to answer them, and we’d love to know what you think. Email

Or, schedule a 15 minute meeting with a member of our team.

About PullRequest

HackerOne PullRequest is a platform for code review, built for teams of all sizes. We have a network of expert engineers enhanced by AI, to help you ship secure code, faster.

Learn more about PullRequest

Tyler Mann headshot
by Tyler Mann

November 17, 2020