Julia finds all OWASP Benchmark vulnerabilities

The Open Web Application Security Project (OWASP) is an independent organization focused on improving the security of software. They provide a Benchmark test suite designed to measure the quality of code analyzers thus making it possibile to compare the tools to each other.

The chart above presents the overall results for the tools tested against version 1.2 of the Benchmark. The score for each tool is the overall true positive rate (TPR) across all the test categories, minus the overall false positive rate (FPR).


Tool  Benchmark Version        TPR*         FPR*        Score*  
FBwFindSecBugs v1.4.0 1.2 47.64% 35.99% 11.65%
FBwFindSecBugs v1.4.3 1.2 77.60% 45.21% 32.39%
FBwFindSecBugs v1.4.4 1.2 78.77% 44.64% 34.13%
FBwFindSecBugs v1.4.5 1.2 95.20% 57.74% 37.46%
FBwFindSecBugs v1.4.6 1.2 96.84% 57.74% 39.10%
FindBugs v3.0.1 1.2 5.12% 5.19% -0.07%
OWASP ZAP vD-2015-08-24     1.2 18.03% 0.04% 17.99%
OWASP ZAP vD-2016-09-05 1.2 19.95% 0.12% 19.84%
PMD v5.2.3 1.2 0.00% 0.00% 0.00%
SAST-01 1.1 28.96% 12.22% 16.74%
SAST-02 1.1 56.13% 25.53% 30.60%
SAST-03 1.1 46.33% 21.44% 24.89%
SAST-04 1.1 61.45% 28.81% 32.64%
SAST-05 1.1 47.74% 29.03% 18.71%
SAST-06 1.1 85.02% 52.09% 32.93%
SonarQube Java Plugin v3.14 1.2 50.36% 17.02% 33.34%
Julia v2.3.2 1.2 100% 10.21% 89.79%

Please refer to each tool's scorecard on the OWASP website for the data used to calculate these values.


True Positive Rate (TPR) = TP/(TP+FN) The rate at which the tool correctly reports real vulnerabilities.
False Positive Rate (FPR) = FP/(FP+TN) The rate at which the tool incorrectly reports fake vulnerabilities as real.
Score = TPR - FPR: Normalized distance from the random guess line. Overall efficiency of the tool.
The ideal tool would be in the top left hand corner (finds everything, and doesn't issue any false alarms)
The diagonal line indicates a situation where a random guess would be as effective as using a tool


Julia found 100% of the vulnerabilities in all 11 categories included in the test suite. For some categories a small number of false alarms were issued, an inevitable consequence of a sound tool.
In the table below you can find the detailed analysis results for all the categories. 


Category CWE # TP FN TN FP Total TPR FPR Score
Command Injection 78 126      0        105      20       251     100.00%       16.00%          84.00%        
Cross-Site Scripting 79 246 0 190 19 455 100.00% 9.09% 90.91%
Insecure Cookie 614 36 0 31 0 67 100.00% 0.00% 100.00%
LDAP Injection 90 27 0 28 4 59 100.00% 12.50% 87.50%
Path Traversal 22 133 0 113 22 268 100.00% 16.30% 83.70%
SQL Injection 89 272 0 196 36 504 100.00% 15.52% 84.48%
Trust Boundary Violation 501 83 0 31 12 126 100.00% 27.91% 72.09%
Weak Encryption Algorithm        327 130 0 116 0 246 100.00% 0.00% 100.00%
Weak Hash Algorithm 328 129 0 107 0 236 100.00% 0.00% 100.00%
Weak Random Number 330 218 0 275 0 493 100.00% 0.00% 100.00%
XPath Injection 643 15 0 17 3 35 100.00% 15.00% 85.00%


For more information on the Benchmark project and a list of tested tools see the OWASP Benchmark page .

Please note: This page is a preview of the Julia Benchmark results, OWASP is currently in the process of updating the comparative results chart on their official website.

The test was executed with the version of Julia available in September 2016. If you wish to repeat the Bechmark to verify the results, you can do so using our online analysis service or by contacting us. For details, see instructions below.

How is it possible to obtain such accuracy?

Julia is a next generation static analyzer, with a radically different approach compared to the traditional techniques used by most competitors.
Result of more than ten years of academic research, the technology relies on a mathematical technique called Abstract Interpretation, which permits the precision and completeness of the results.
Unlike competing technologies based on syntactic pattern-matching techniques that can be easily bypassed, Julia interprets the code semantically and identifies the totality of errors in the categories covered by the analysis.
You can find further detail on the technology and the underlying scientific principles in the section “Scientific”.



If you wish to regenerate our Benchmark results, please proceed as follows.
We assume that you have an Eclipse project for the Owasp Benchmark 1.2 , that compiles with no errors, and that you have run mvn eclipse:eclipse inside that project (you need maven 3 for that). Then follow these steps:

  1. Register to our online service of analysis at https://portal.juliasoft.com/
  2. Install our Eclipse plugin and configure it with your credential information (instructions and credentials available at login time to our service and under your user profile). Set the maximum number of warnings shown to 5000
  3. Make sure you have enough credit to run the analysis (150,000 credits); otherwise please contact us
  4. Select the Owasp benchmark project in Eclipse and analyze it with our Eclipse plugin: in the first screen, flag both options "Only main" and "Include .properties files". The latter is needed since your benchmark assumes that the analyzer has access to property files, which is normally turned off for privacy
  5. In the next screen of our Eclipse plugin, select only the Basic checkers Cryptography, Cookie and Random and the Advanced checker Injection
  6. Click Finish and wait until the analysis terminates. This should take around 17 minutes, unless there are other analyses in the queue. You can check the progress of the analysis in the console view of Eclipse
  7. Once the analysis has terminated, you can see the warnings in the Eclipse view of Julia and you can export the XML file of the warnings with the icon of that view that looks like a downwards arrow