Photo by Helloquence on Unsplash
This is the third in a series of guest posts by Andrew Tate, a writer turned software developer at Animalz. He has been using using PullRequest code review to help improve their first product, Revive, and learn more about development.
- 3 Lessons Learned from My First Week of Code Review
- Learning Security One Code Review at a Time
- Solving Problems Collaboratively Through Code Review
My first code reviews all followed a similar pattern. I’d create the PR, receive the review, and then think, “Great, that’s really good advice. Can’t wait to figure out how to do it.”
There are 3 reasons for this mindset, but also 3 problems with this mindset.
For one, ‘figuring stuff out’ appeals to the problem-solver in every developer. We want to figure out the answer to our own problem. We also know that the answer is probably out there on Stack Overflow or in docs somewhere. But as we’ve seen, solving problems using SO without a good understanding of what you are doing only introduces a new set of problems.
Secondly, you don’t want to look like an idiot. Saying, “I don’t know how to do that” is difficult. But as I’ve gone through the review process with the reviewers at PullRequest, I’ve found two things:
- They are prepared for and mostly expect junior-level questions/issues. They’ve seen my
first_time = 'yes'
and didn’t even flinch. - They want to help. They have the skillset and experience to not just give you the answer, but give you the intuition for the answer so you can solve the problem yourself next time. If you don’t ask, you don’t get access to this knowledge.
A third reason for not following up with reviewers for clarity or guidance is that being new to this, I’m still unsure of the etiquette of the system. Am I hassling the reviewers if I ask them more questions? Of course not, they are professional reviewers getting paid to do this. But it’s understandable to be hesitant when you don’t know the conventions of the interaction.
Receiving a full project audit allowed me an opportunity to acknowledging the problems with this ‘on-my-own’ approach. Here is my experience with asking for help that you can learn from so that, with the help of experienced developers, you can build better and learn quicker like I have.
Working together to find the best way to store a client secret
This data is sensitive – consider moving this file out of the code repository and scrubbing it from your git history (see for example https://help.github.com/en/github/authenticating-to-github/removing-sensitive-data-from-a-repository). Ideally this file would be installed as part of your deployment process instead of being bundled with the code itself.
–Code reviewer Dani I.
If you are being told to scrub something entirely from your git history, you’ve probably f**ked up.
In this case, I had. A JSON client secret was sitting in my repo. When I originally committed this, I did wonder whether it might be insecure but reasoned that “Well, the repo is private, so it is only me and others on the team who can access it.” I was going for the tried-and-tested option of security through obscurity. Of course, I didn’t think about the dozen apps that have access to my GitHub, or the external reviewers I was explicitly asking to comb through it.
To fix, I needed to find out more. I had two options:
- The Google/Stack Overflow option I had followed before. I had done this before the initial commit. I found nothing helpful. A quick skim this time showed nothing had changed.
- Ask.
This isn’t about passing responsibility; it’s about using the support you have at your disposal. This is exactly the type of issue in which having an experienced developer review your code is invaluable. I had made the initial error because I was on my own; I wasn’t about to pass up the opportunity to learn and do it right this time.
So I asked and got a quick reply.
I spent a bit of time poking at this particular API. It seems like the API does require the credentials to be in a file, so you can’t just stick them in an env variable (which normally would be a great alternative). How exactly you would deploy this credentials file alongside your production code depends on deployment process.
You have some other secret keys that are already in env variables – do you know how these env variables are being set? For example, if you deploy via Docker containers, the env variables might be set by a Dockerfile that sets up your server. This Dockerfile could also pull in this credential file and stick it in a known location that your code can read from.
–Code reviewer Dani I.
I’m not using Docker, just a vanilla Heroku deployment. I did some research and found a possible solution from a Stack Overflow answer. Having this ongoing conversation allowed me to run this option past the reviewer. But in the meantime, the reviewer had gone further in their research:
Ah, nice link, yes I think some of the approaches here could be adapted to your situation. However, looking at the API again, it does look like there is a way to create Flows that doesn’t involve reading a file, by using
Flow.from_client_config
(see docs).
So here you would just paste the entire contents of yourclient_id.json
file into an env variable in Heroku. Then in your code, you can load the contents of the env variable (note that you will have to deserialize it from JSON) and pass the result tofrom_client_config
. For example:OAUTH_CLIENT_SECRETS = json.loads(os.environ('OAUTH_CLIENT_SECRETS'))
flow = google_auth_oauthlib.flow.Flow.from_client_config(OAUTH_CLIENT_SECRETS, scopes=luscaSettings.SCOPES)
–Code reviewer Dani I.
Between the two of us, we had come up with two possible alternatives to deal with this issue. Reviewers are problem-solvers as well—they want to help you get to the core issue and a resolution that leads to better code.
I also leave with more confidence in its eventual resolution because it has been checked by a senior dev. It isn’t just me finding an answer on Stack Overflow. It is two developers working together to find the best answer to the problem at hand.
Correcting Google on why pickles are bad
Pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue.
One of pickle’s caveats is that if an attacker has control over the string you’re unpickling, they can cause you to run arbitrary code. In this case, it appears you are unpickling a string from your own database – it’s worth considering that this is a way for an attacker who has compromised your database to upgrade to running arbitrary code in this environment.
JSON is a great alternative.
–Code reviewer Dani I.
Whereas with the client secret, I knew it was an issue; I was completely unaware of the perils of pickling, which is odd given this sitting at the top of the Python 3 docs for pickle: [Image: Screen Shot 2020-02-27 at 3.31.52 PM.png]This comes back to my security naivety. I would have seen that when looking up pickle, but I would have read it as “OK, ‘only unpickle data you trust.’ Well, I trust my own data, so there’s no problem here.” I hadn’t thought of how I’d be open to this kind of attack should my database become compromised.
As you can see, here’s another great example of where individual-led learning can lead you awry. The code for the client-secret problem above and the code for pickling data came directly from Google’s own docs for this API. You might expect Stack Overflow answers to be off but not the documentation from Google for their own API.
I wanted a little more insight into why I should be using JSON and in what situation pickle
was applicable.
The Python docs actually have a comparison between pickle and JSON. The key to pickle is that it can be used to serialize arbitrary Python objects, which can include code. JSON is a data-only serialization format. This makes deserializing JSON safe, because you’re just loading data, while unpickling an object can involve running the code inside it. You should be able to just drop in
json.dumps/load
here.
As a general rule I would suggest avoiding pickle entirely. I don’t think it matters in this case, but JSON’s major drawback is that because it is a human readable format, it can be pretty verbose, so large objects can take up lots of memory. When or if that is a problem, I would suggest either compressing the result, or looking into binary data formats like Protocol Buffers.
–Code reviewer Dani I.
Now it’s clear that I have to replace credential = pickle.dumps(credentials)
with credential = json.dumps(credentials)
, but I also have an idea of where to go next—protocol buffers.
In a larger engineering org, these might be the insights you gain from coding next to someone. But when you are a team of one, this is the type of knowledge missed. It is not the type of knowledge that is going to change your life. But it is the kind of tidbit that you can store in your toolbox and use when you need it.
Every conversation is an opportunity to learn more. In this case, the discussion with the reviewer allowed me to correct misinformation from elsewhere. It allowed me to increase my general knowledge. And it did so through an intuitive explanation that makes it easier to reason about and understand the work I am doing.
Deciding my philosophy on exceptions
First batch of comments. One thing I recommend thinking about a bit is your philosophy about exceptions – when do you want to ignore exceptions? When (and how) do you want to surface them to the user? There are a few comments in this first batch around that topic.
–Code reviewer Dani I.
Every article in this series so far has included some feedback on (my lack of) exceptions. In previous examples, I had taken the advice tactically, fixing the specific issue at hand. Because this was an audit of an entire codebase, the reviewers were able to not just comment on individual issues but also offer meta-advice about how I was approaching code.
I hadn’t really thought about my “philosophy about exceptions.” I knew I was wrong, but not really how I wanted to be right.
Yep, I am definitely having trouble with exceptions. Initially I was using them in the obviously wrong way to allow a code block to fail silently (i.e. I know this part of the code might fail, it normally doesn’t, but if it does ¯_(ツ)_/¯ ).
Recently, I’ve tried to get better at this (see the 2nd code block in https://www.pullrequest.com/blog/learning-security-code-review), but I am still struggling to get to grips with what exactly I should be excepting each time—what my ‘philosophy’ should be? (Which I guess is effectively Nietzschean at the moment ;) )
–Developer Andrew T.
Should I look out for each exception that happens and explicitly try and handle each of those? Should I initially try and be as extensive as possible (e.g., look to handle each error in the hierarchy you shared differently)? If I do raise an exception, is ‘pass’ really a legitimate way to handle it, or should I always allow the code to fail at this point? I just really didn’t have a framework for thinking about this problem.
Dani helped me out.
The general philosophy I would recommend for exception handling is to think of it not as a way to handle unexpected surprises, but instead as a way to handle specific failure scenarios that you are aware of. To take an example from another comment on this review:
new_traffic.append(int(part.strip('[],')))
Lots of exceptions could be raised while processing this line. If the preceding code is wrong,new_traffic
might not actually be a list, and this code could raise aSyntaxError
exception. Or maybe you pressed Ctrl-C on the server at the exact instant that it was running this line of code, raising aKeyboardInterrupt
exception. These are surprises, so it’s reasonable to ignore these exceptions and let some higher-level code handle them (possibly by terminating the execution of the server, or in this case maybe Django’s built-in view exception handler will just fail this specific request).
However, you know thatint()
will raise aValueError
when the input is invalid, so you need some logic to deal with that.
`try: new_traffic.append(int(part.strip('[],'))) except ValueError:
…` \
Now, should you pass, or do something else? That depends on what behavior you want when the input is invalid. If the field is optional, you could just ignore it and pass. If it is required, you could serve an error page that says the input is invalid. It’s completely situational.
A lot of the time, when you are struggling with how to respond to a specific exception, that’s a signal that you’re trying to handle it in the wrong place – in which case you may not want to catch the exception at all, instead relying on some higher layer try/except block to catch the error. As for which exceptions to catch, usually the documentation for a function will say what exceptions it can throw and why.
–Code reviewer Dani I.
Now, I have a starting point for thinking about my philosophy/framework for exceptions, a specific example of how this relates to a problem in my code, and a meta-point about thinking beyond just the exception and instead about higher-level thinking in the code.
Conclusion
Without following up with reviewers and getting involved in discussions, I would have still gotten a lot out of this audit. But with the audit, I am also able to better understand the problems my code has and have ideas and frameworks for thinking about its improvement in the future.
A meta-comment about my code was the lack of testing.
…it looks straightforward to extend when the time is right. A key step in this direction will be building out an automated test suite – currently, the code appears to be untested, which will make it more and more difficult to build on top of as time goes on (and particularly as the size of your engineering team grows).
–Code reviewer Chris E.
I got back to Chris about this. I don’t know anything about testing, so I wanted to ask for advice. I had asked others about it in the past but always in the abstract. These people had seen my code and could give me an idea of where to start for my specific problems, and that is what Chris did, pushing me in the right direction for my python/heroku/django stack.
Code reviews aren’t only about the specific comment. They are about learning and knowledge transfer. To get that properly requires discussion. With external code reviewers, that discussion doesn’t necessarily come as easy as with a code reviewer you share lunch with. But the outcome is better. They are willing to help and want to problem-solve, to help you make your code better, and to help you learn.