By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.

Why SCA Tools Can't Agree if Something is a CVE

One scanner says this is a CVE, and the other says it's not. Which is right?

Open Report

View Report

Written by

Henrik Plate

Published on

October 20, 2023

Topics

Security

SCA

One scanner says this is a CVE, and the other says it's not. Which is right?

Open Report

View Report

When working on solutions supposed to detect known vulnerabilities in upstream OSS dependencies, there’s one question coming up all the time: Why does tool X show a vulnerability, why does tool Y not, and who’s right?

And the pulse gets racing every time … because if we’re behind X and Y is correct, we’d be reporting a false-positive, which results in wasted efforts for our customers. Even worse, if our solution is Y and X is right, we’d have a false-negative that could bring our customers into trouble. They would keep on living with a vulnerable and potentially exploitable dependency in their applications, which puts themselves and their end-users at risk.

With this blog post, I try to provide some general explanations on why different tools can come to different conclusions - backed by real-world examples.

Tl;dr

A fundamental problem of most ecosystems is that it is difficult, sometimes impossible, to link a binary artifact to the source code it was produced from - in an automated, reliable and verifiable way. The (ugly) work-around that the security and software engineering communities came up with consists of enumerating the names and versions of all the artifacts that we think contain the vulnerable code. This approach is more coarse-granular, since it looks at entire projects or artifacts, instead of single source code files or functions. However, as shown by this blog post, coarse-granular software naming schemes struggle to capture all the myriad of ways in which code gets modified, copied, forked and distributed in the different ecosystems, which leads to false-positives (FP = wrongly thinking an artifact contains vulnerable code) and false-negatives (FN = failing to identify an artifact that contains it).

A bit of history and background

NVD is by far the most well-known vulnerability database. It covers commercial and open source software of various kinds, from operating systems and full-blown ERP applications to browsers and tiny open source libraries. Vulnerable components are identified through Common Platform Enumeration (CPE) identifiers, which comprise the vendor name, the product name, version information and a couple of other fields.

CPE identifiers, however, do not match to the component names used in development projects. More specifically, they do not correspond to the names used by developers in dependency declarations, which identify artifacts hosted on Maven Central, PyPI or npm.

The infamous Log4Shell vulnerability, for example, is mentioned to affect the CPE cpe:2.3:a:apache:log4j (cf. CVE-2021-44228). However, the Maven artifact identifier used by developers when declaring a dependency is org.apache.logging.log4j:log4j-core (in the Maven format groupId:artifactId). Understanding whether one corresponds to the other is tedious and error-prone, and can lead to both false-positives (if a CPE and its CVE is wrongly mapped to a dependency) and false-negatives (if an existent correspondence cannot be established).

Another problem of NVD is the fact that vulnerabilities are reported on a very coarse granularity, e.g. entire open source projects, which does not take into account that some projects distribute different artifacts that can be consumed independently (btw - the same can happen with SBOMs, depending on the names provided by SBOM generators). CVE-2022-26336, for example, concerns the CPE cpe:2.3:a:apache:poi, which is an Apache project allowing to read/write different office formats in Java. This vulnerability, however, affects only one of the many artifacts distributed by the project, which leads to false-positive findings of NVD-based tooling if a developer uses a non-affected POI artifact.

Those NVD naming problems have been known for quite some time, and the Open Source Vulnerability (OSV) database addresses them - at least for open source components used by application developers. It aggregates multiple data sources like GitHub Security Advisories (GHSA) and its naming scheme is aligned with package identifiers used by ecosystems like PyPI and Maven. The GHSA entry for CVE-2022-26336, for example, only reports the Maven identifier org.apache.poi:poi-scratchpad as affected.

Problem solved?

OSV is a great step forward by avoiding the error-prone mappings between CPEs and ecosystem-specific identifiers. Its security advisories use the same language as the developers depending on open source components.

So can we consider the problem solved, or does OSV-based tooling also come with false-positives and false-negatives? You have already guessed the answer to this rhetorical question, and the main reason is that code is shared, copied and distributed in so many different ways that it is very hard to enumerate all affected artifact names (no matter whether you use CPEs or not). In other words: OSV solves the mapping problem, but it does not solve the problem of identifying all artifacts that contain vulnerable code.

Finding all of them, at least as many as possible, requires first to know the vulnerable piece(s) of code, say function foo(), and second to excavate their history. Have they been developed in the context of the project, forked from another upstream project (that may also be vulnerable), and are there any downstream forks (again, potentially vulnerable)? And how is the code of all those projects distributed - do consumers need to build from sources or can they download prebuilt artifacts from registries like PyPI or Maven Central? And if the project produces multiple artifacts, which ones do contain the vulnerable function and which do not? To make matters worse, the techniques and tooling required to answer these questions often depend on the specific ecosystem in question - e.g. whether we talk about compiled, interpreted or hybrid languages, the package managers used etc.

Looking at this wall of issues, you already understand that the identification of all the affected artifacts is a huge manual and error-prone effort, which is why the majority of tool providers in the SCA and supply chain security space maintain proprietary vulnerability databases - even though the affected projects are open source.

In the following, we will explain seven problem areas that can lead to false-positives and false-negatives. Those categories are exemplified using vulnerabilities for three different Java projects: The well-known OSS projects Spring Framework and Apache Tomcat as well as a smaller project called EBICS Java Client.

Mapping across naming schemes (e.g. from CPEs to package names)
Project forks
Different distribution channels (that alter package names)
Projects that release multiple artifacts (with code intersections)
Rebundling/repackaging of project code in other project artifacts
Changing package names
Unmaintained versions

Yet other problems result from the way dependencies are established, which does not always happen through dependency declarations in manifest files. The Python phenomenon of phantom dependencies falls into this category (covered already in a previous blog post).

Problem areas

Forks + Distribution channels

The component affected by CVE-2022-1279 is called EBICS Java Client, and supports file exchanges between banks and their customers via the EBICS protocol. The project was originally developed with SVN and got published on SourceForge in 2012, but copyright notices suggest that the development began as early as in the 1990s.

The sources are still on SourceForge but also in GitHub repository uwemaurer/ebics-java-client, with the complete SVN commit history preserved. First Git commits appeared in 2014, and overlap with on-going development on SourceForge. At some point, the GitHub repo got renamed to ebics-java/ebics-java-client. The latest commits on this GitHub repo date back to April 2022 and relate to the vulnerability fix implemented in release 1.2, which is the only release tag ever created on GitHub.

But how do users consume this project? Not through Maven Central, because the author of the GitHub repo never published any Java archive himself. Instead, it is possible to consume them through JitPack, a service provider building Java artifacts from GitHub sources, and which is free for open source projects.

Interestingly though, the Maven coordinates used by JitPack do not correspond to the ones in the project’s POM file (org.kopi:ebics). Instead, developers wanting to declare a dependency on the EBICS Java Client hosted on JitPack need to use a coordinate composed of the GitHub user and repository name, in this case com.github.ebics-java:ebics-java-client.

Adding to those complications, the GitHub project also got forked sometime prior to the vulnerability disclosure. The artifacts of this project were published on Maven Central (with coordinates io.github.element36-io:ebics-cli and versions 1.1 to 1.5). The last commit took place in 2021, i.e. the Java classes affected by the vulnerability (KeyUtil and Utils) do not contain the fix implemented in the upstream repository, hence, all its releases are affected by the vulnerability.

The NVD highlights the CPE cpe:2.3:a:ebics_java_project:ebics_java as vulnerable, i.e. neither CPE vendor (ebics_java_project) nor product (ebics_java) correspond to Maven groupId and artifactId that any user would actually use in a dependency declaration. And OSV only mentions the GitHub repository github.com/ebics-java/ebics-java-client as affected, but no Maven coordinates. This means that solutions solely relying on NVD or OSV for mapping vulnerabilities to dependency declarations can hardly succeed, hence, suffer from false-negatives.

Instead, when it comes to enumeration affected Maven coordinates, the list of affected coordinates should include at least

org.kopi:ebics, to support the detection for people who built the EBICS Java Client themselves from the GitHub sources (and abstained from changing it any further),

com.github.ebics-java:ebics-java-client, to support people consuming artifacts from JitPack, and
io.github.element36-io:ebics-cli for those who consume artifacts of the fork from Central.

To summarize, this example illustrates how difficult it is to track the flow of vulnerable code in different forks and across different distribution channels, in this case GitHub, JitPack and Maven Central.

Multiple Artifacts + Rebundling

CVE-2018-1270 describes a possible remote code execution vulnerability in the Spring component “spring-messaging”. The Spring maintainers fixed that vulnerability through commit e0de91, which changed several methods of class DefaultSubscriptionRegistry and its nested class DefaultSubscriptionRegistry$SimpMessageHeaderPropertyAccessor.

Knowing the affected code makes it possible - with certain limitations - to search for artifacts that contain that class. Maven Central offers a corresponding search option for Java artifacts published in this registry, which shows that it is only contained in 1 of the 58 artifacts produced by the Spring Framework: The search for group “org.springframework” and class “DefaultSubscriptionRegistry” shows 174 different versions of the artifact org.springframework:spring-messaging.

OSV, however, denotes the artifact org.springframework:spring-core as affected. Since it does not contain the respective class, this leads to both false-positives (when developers depend on spring-core) and to false-negatives (for developers depending on spring-messaging).

Beyond illustrating the importance of picking the correct project artifact(s), the vulnerability also permits illustrating the problem of rebundling, which is when compiled Java classes from one project get included in the artifact of another project. When searching for the vulnerable class only, we find that the class is also contained in the artifact org.apache.servicemix.bundles:org.apache.servicemix.bundles.spring-messaging, which comes from a completely different OSS project.

As shown in this study, compiled Java classes are often rebundled and repackaged (which is another technique to include another project’s code and which involves changing the class name). Another problem arises when rebundled code is included in nested archives, e.g. JAR files in other JAR files, which complicates the search.

To conclude, this example demonstrates how laborious it is to track the distribution of vulnerable classes in binary artifacts - within and across different projects, even if they are all published on just one registry.

Renamed + Unmaintained Projects

Along the same lines, CVE-2023-41080 affects Apache Tomcat with CPE cpe:2.3:a:apache:tomcat. The particularity of this project is that Tomcat exists for many years already, that its Maven coordinates changed multiple times and that the same classes are contained in multiple artifacts, e.g. the stand-alone version of the Tomcat Web server and its embedded version.

In this particular case, the vulnerable class org.apache.catalina.authenticator.FormAuthenticator is contained in the following project artifacts, which exist for the major Tomcat releases 7 to 11: org.apache.tomcat:tomcat, org.apache.tomcat:tomcat-catalina and org.apache.tomcat.embed:tomcat-embed-core.

Previous Tomcat releases, however, were distributed with different Maven coordinates, namely tomcat:catalina (for major versions up to 5, with the latest release from 2007) and org.apache.tomcat:catalina (for major version 6, with the latest release from 2017), both of which also contain the affected class. For users of those old versions, which are not explicitly marked as vulnerable neither by NVD nor in the Apache mailing list, the big question is whether they are safe…

Good question, too bad that it is not consistently checked whether old, unmaintained releases are subject to vulnerabilities that have been reported for recent releases.

In many cases, vulnerabilities are simply said to affect all previous releases, because checking when it has been introduced can also cause significant effort (which is prohibitively expensive for public vulnerability databases considering the number of vulnerabilities and software versions). As the maintainers of Apache write themselves: “Please note that Tomcat 8.0.x has reached end of life and is no longer supported. Vulnerabilities reported after June 2018 were not checked against the 8.0.x branch and will not be fixed.”

In the case of CVE-2023-41080, however, the CVE and OSV advisories do not mention any of the previous versions. Still, we suspect that the previous versions are also affected, because the vulnerable class and method does not only exist in previous versions, but also contains the same method body. The four fix commits for all major release branches 8 - 11, e.g. 4998ad for 8.5.x, added the very same lines to method FormAuthenticator.savedRequestURL in order to prevent the open redirect vulnerability:

while (sb.length() > 1 && sb.charAt(1) == '/') {

sb.deleteCharAt(0);

}

And since the source code of this method in Tomcat 5.5.23 is identical to the ones in the 8.5.x branch, we strongly suspect that those previous versions are also affected, i.e. the Maven coordinates tomcat:catalina and org.apache.tomcat:catalina should also be covered by the advisory.

In other words, for this example, all 5 coordinates should be listed to avoid any false-negatives. OSV, however, only lists org.apache.tomcat.embed:tomcat-embed-core and org.apache.tomcat:tomcat as affected.

Side note: Developers on such old releases are often told that they should have migrated a long time ago, and that they have many more problems anyhow than just this vulnerability. All this may be true, however, does not support them very much in regards to assessing a critical vulnerability in a timely manner.

Got it, it’s Complicated, What Now?

For the time being, the quality of vulnerability databases is a significant asset and differentiator for SCA and supply chain tools. And this will probably remain as-is for the upcoming years, just because those ecosystems are well-established and new technologies on the horizon will only be adopted gradually.

As such, it is important for users of such tools to perform a thorough evaluation of their database and capabilities prior to licensing any of them. Ideally, this is done on benchmark applications that resemble the organizations’ applications as much as possible, e.g. in regards to frameworks and package managers used. To put it differently, it does not make sense to benchmark solutions with an application developed in Scala and the Gradle package manager, if you predominantly use plain Java and Maven.

It is also advisable to start with smaller applications such that all the individual findings can be manually reviewed in regards to whether they are false-negatives or false-positives of the respective tools.

Outlook

In the long run, however, I hope that it will be possible to reliably establish the link between source code and binary artifacts. Build attestations such as from SLSA aim at closing this gap and start being adopted by some ecosystems, e.g. npm. And projects supporting reproducible builds let users verify whether a given artifact has indeed been built from a given commit, because every build should result in bit-identical copies (provided they are done on the same platform and using the same tools). And early research ideas such as this paper investigate if it is possible to embed information about used source code in the binary itself (rather than delivering this as separate metadata) such that it can be queried.

Another puzzle piece to the solution of this problem is that it is very hard for project externals to identify the exact location of the vulnerable code, i.e. the source code file and function. Fix commits that solve a given vulnerability reveal this information, but they are not systematically linked in advisories. If that is not the case, identifying them can be very laborious and error-prone. One relatively easy solution to this problem can be provided by SCM platforms like GitLab or GitHub, e.g. by allowing maintainers to flag one or more commits as “fix commits” for a given vulnerability.

But until those puzzle pieces are mature and broadly deployed to all the different ecosystems, we will need to stick to the ugly work-around. To put it in the (slightly modified) words of Phil Karlton: “There are only two hard things in Computer Science: cache invalidation and naming [vulnerable] things.”

The Challenge

The Solution

The Impact

Get new posts in your inbox.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Get new posts in your inbox.

Welcome to the resistance

Oops! Something went wrong while submitting the form.

Get new posts in your inbox.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Improve Kubernetes Security with Signed Artifacts and Admission Controllers

AppSec Goes to Devnexus: Lessons from a Thriving, Modern Java Community

XZ Backdoor: How to Prepare for the Next One

XZ is A Wake Up Call For Software Security: Here's Why

SSDF Compliance and Attestation

You Have a Shadow Pipeline Problem

Artifact Signing 101 - On-Demand Webinar

Prioritizing SCA Findings with Reachability Analysis - On-Demand Webinar

Signing Your Artifacts For Security, Quality, and Compliance

Remediating Vulnerabilities vs. Maintaining Current Dependencies

Detect Malicious Packages Among Your Open Source Dependencies

How to Ingest and Manage SBOMs - Tutorial

How to Improve SCA in GitHub Advanced Security - Tutorial

How to Generate SBOM and VEX - Tutorial

How to Use AI for Open Source Selection - Tutorial

How to Scan and Prioritize Valid Secrets - Tutorial

Tom Gleason Joins Endor Labs as VP of Customer Solutions

Introducing CI/CD Security with Endor Labs

Highlights from State of Dependency Management 2022 - Webinar

Reachability Analysis for Python, Go, C# - Webinar

How Security and Engineering Can Scale Open Source Security - Webinar

Introduction to Open Source Security - Webinar

Comparing SBOMs Generated at Different Lifecycle Stages - Webinar

Why We Need Static Analysis When Prioritizing Vulnerabilities - Webinar

State of Dependency Management 2022

OWASP Top 10 Risks for Open Source

How to Prioritize Reachable Open Source Software (OSS) Vulnerabilities - Tutorial

What You Need to Know About Apache Struts and CVE-2023-50164

You Found Vulnerabilities in Your Dependencies, Now What?

Why SCA Tools Can't Agree if Something is a CVE

Chris Hughes Joins Endor Labs as Chief Security Advisor

What’s in a Name? A Look at the Software Identification Ecosystem

Why Different SCA Tools Produce Different Results

Why Your SCA is Always Wrong

Whatfuscator, Malicious Open Source Packages, and Other Beasts

What Security Teams Need to Know about Software Development

What Breaking Changes Teach Us about Security

What is VEX and Why Should I Care?

What are Maven Dependency Scopes and Their Related Security Risks?

What is Reachability-Based Dependency Analysis?

VMware Achieves SBOM Compliance for Over 100 Services with Endor Labs

Understanding Python Manifest Files

CSRB Log4j Report - The Response is as Dangerous as the Vulnerability

Strengthening Security in .NET Development with packages.lock.json

Endor Labs Raises $70M in Series A Funding to Reform Application Security

The Government's Role in Maintaining Open Source Security

Static SCA vs. Dynamic SCA: Which is Better (and Why it’s Neither)

From Cloud Security to Code Security: Why We've Raised $25M to Take on OSS Dependency Sprawl

Visualizing the Impact of Call Graphs on Open Source Security

SBOM vs. SBOM: Comparing SBOMs from Different Tools and Lifecycle Stages

Endor Labs Launches with $25M Seed Financing to Tackle Massive Sprawl of Open Source Software (OSS)

Key Questions for Your SBOM Program

SBOMs are Just a Means to an End

Reviewing Malware with LLMs: OpenAI vs. Vertex AI

SBOM Requirements for Medical Devices

Polyrepo vs. Monorepo - How Does it Impact Dependency Management?

Open Source Security 101: How to Evaluate Your Open Source Security Posture

Announcing the Endor Labs Hyperdrive Program for Resellers and Solution Providers

The Open Source Security Index Top 5

MileIQ Securely Reimagines a Decade Old Product with Endor Labs

LLM-assisted Malware Review: AI and Humans Join Forces to Combat Malware

Open Source Licensing Simplified: A Comparative Overview of Popular Licenses

Make Developers' Lives Easier with Endor Labs & GitHub Advanced Security

More Than 30 Industry-Leading CISOs Personally Invest in Endor Labs

Introducing JavaScript Reachability and Phantom Dependency Detection

Introduction to Program Analysis

Introducing the OpenSSF Scorecard API

Introducing Reachability-Based SCA for Python, Go, and C#

How Zero Trust Principles Can Accelerate Enterprise Adoption of OSS

Introducing a Better Way to SCA for Monorepos and Bazel

How to Quickly Measure SBOM Accuracy for Maven Projects (for Free)

Why I Joined Endor Labs to Build our India Team

How To Evaluate Secret Detection Tools

How to Get the Most out of GitHub API Rate Limits

How CycloneDX VEX Makes Your SBOM Useful

Exploring Risk: Understanding Software Supply Chain Attacks

Faster SCA with Endor Labs and npm Workspaces

Combining EPSS and Reachability Analysis to Optimize Vulnerability Management

Endor Labs’ ‘State of Dependency Management 2023’ Report Offers Insight on Explosive Popularity of AI and LLMs—and How They Impact Application Security

Endor Labs Wins Intellyx Digital Innovation Award