How to do a Java source code review

So, a while ago I developed the methodology on how to carry out a Java secure source code review at my previous employment. Now, I’ve been asked to do a few reviews at my current employment, and I got chuffed at myself for not being able to remember a lot of the stuff that I figured out at my old place, so I’ve decided to put some of them down right here. Now naturally I’m not allowed to just publish the same things that I’ve done for either of the jobs, they’re considered trade secrets, so these are more of general guidelines I use whenever I do source code reviews.

Scoping

The first thing I wanna do is to figure out how long the assignment will take. This greatly depends on how fast you are at reading code, but in my experience I can do about 1000 lines of code pr. hour. So here’s what I do (on linux):
First, CD into the directory that holds all the code.
Then run find $(pwd) -iname *.java > files.txt
This will find all the files that you will need to review.
Next run while read line ; do wc -l $line ; done < files.txt > linecounts.txt
This will run through all the files and count the number of lines in them.
To get rid of all the excess output, run cat linecounts.txt | cut -d” ” -f1 > numbers.txt
Now finally run this to get the total number of lines of code: sum=0 ; while read num ; do sum=$(($sum + $num)) ; done < numbers.txt ; echo $sum

So this could all be stuck together like this:
find $(pwd) -iname *.java > files.txt && while read line ; do wc -l $line ; done < files.txt > counts.txt && rm files.txt && cat counts.txt | cut -d” ” -f1 > numbers.txt && rm counts.txt && sum=0 ; while read num ; do sum=$(($sum + $num)) ; done < numbers.txt ; echo $sum && rm numbers.txt

This will give you the number of lines of code, divide it by 1000 and you have how many hours it would take me to review it.

Ressources

Aside from the source code (obviously), there are a number of things I usually like to ask the client for, such as:

  • A copy of any frameworks that have been used
  • Design Class Diagrams of the application
  • System Sequence Diagrams
  • Any other documentation

In the perfect world, the client would simply run the VMware Converter and hand you a full working copy of their development environment, but the world isn’t perfect.

Questions

As part of the review, I usually ask myself (and the client) questions like:

  • How critical is the application?
  • What’s it for?
  • What matters most, speed, availability, etc?
  • What sorta data will the application be processing?
  • Is logging being used? How and where?
  • How are errors handled?
  • Who will be using it? Is it for public usage, internal, external (consultant-style)?
  • What’s the architecture like? (P2P, MVC, client/server etc)
  • Where’s inputvalidation handled? And what about user authentication? (get method names so you won’t have to waste time looking for them)
  • How are sessions handled?
  • How’s data stored? Cleartext?
  • How’s data transfered? Cleartext?
  • What development methodology was used? (UP, Scrum, XP etc)

Reviewing

Now for the actual review. If possible, import the source into your favorite IDE (I’m a NetBeans man) and start looking through it. If you can’t get it into your IDE, I usually just read through it with any color-coded editor, like Notepad++

Look through the packages and classes, and see if you can get an overview of what kind of architecture is being used. Is this a good choice? Why?/Not?

Run it through a refactoring tool, like RefactorIT or whatever floats your boat. This will pretty much just check it for best practices, but sometimes you might get lucky.

Look to see if public/private methods and variables are being used properly (scope-wise) and beware of static methods. Also, a lot of interesting things can usually be seen in debug-messages that have been left in the code.

And now we read!
Look through all classes, line by line. Look for loops and see if they (logically) run and exit as they should (off by one?), are they using collections (for each) or regular for/while loops. Is that the best choice or would a different loop make it faster?
Look for methods that could use up a lot of memory, run the program and see if you can make it run out of memory.
Look for input validation. Do they blacklist or whitelist? Is that a good choice? (Blacklisting is never! the right choice)
Look for hardcoded passwords, addresses and config-locations. Is this a good idea/necessary?
Look for methods that execute OS-specific commands (ping, shutdown, nslookup etc)

If you’re doing this as part your own security assessment, then you probably have some tools as well, (I know we do) and now would be a good time to run them towards the code, as what’s been mentioned above is just a quick overall process that I go through.

Reccurance

If you do reviews as part of process, you may get to review the same code multiple times. In that case, use these commands to pinpoint the alterations since your last review:
diff <folder1> <folder2> -E -b -B -r -q This will list all the files that differ
diff <folder1> <folder2> -E -b -B -r This will show the actual differences
You might wanna play around with some of diff’s switches to get a display that makes it easier to compare these things.