By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.

Detect Malicious Packages Among Your Open Source Dependencies

Open Report

View Report

Written by

Henrik Plate

Published on

February 28, 2024

Topics

Open Source

SCA

Security

Open Report

View Report

Open source software (OSS) has become a prevalent part of many applications, and while many OSS components are battle-tested, high-quality components, bad actors continue to use OSS as an attack vector. Known vulnerabilities in OSS packages are well-understood and addressed by tools such as SCA, but the use of malicious packages is an emerging and often overlooked threat that’s been growing since 2018-2019.

What are Malicious Packages?

Malicious software, or malware, refers to any software causing harm to computer systems, networks, and data. The common belief may still be that malware primarily spreads through email attachments (evoking all the annoyances of anti-virus software), but malicious actors are very creative to find and exploit new distribution channels for their sinister software. One such distribution channel are OSS projects, which produce reusable software components for application developers and software users in general. The publication of malicious OSS packages on registries like PyPI and npm, for example, keeps on increasing over the last couple of years.

There are numerous tactics and techniques at the disposal of attackers to distribute malicious code to consumers of OSS packages — from typosquatting and dependency confusion attacks to the creation of malicious merge requests for legitimate Git repositories. We’ll focus on two categories which present the greatest risk:

Name and dependency confusion attacks
Attacks on legitimate packages

Name and Dependency Confusion Attacks

The most prominent techniques used by attackers today are dependency confusion and name confusion:

Dependency confusion attacks: Exploits issues in package managers’ dependency resolution process, which lead to preferring packages from public registries over ones from private registries.
Name confusion attacks: Deploying packages that have a similar name than legitimate packages (e.g. “mumpy” instead of the famous machine learning package “numpy”). Includes typosquatting, brand-jacking, or combo-squatting.

Corresponding malware campaigns comprise dozens, sometimes hundreds of malicious packages, and this problem won’t disappear any time soon. Thanks to a high degree of automation, the marginal costs of creating and deploying a malicious package are very small. Even if only a few people fall victim to such attacks, this may already pay the attacker’s little bill for creating and deploying the malicious package, be it through harvesting secrets or running a cryptominer on the compromised system. Just like how we got used to spam and spam filters, developers should get used to these malware attacks.

The good news is that the open source community made huge progress in detecting such kinds of malicious packages. The OpenSSF project malware-analysis goes to great lengths to install newly published packages in a sandbox environment. It triggers as much code as possible, in the hope to also execute the malicious code and observe suspicious behaviors like network calls or file system access. This dynamic approach works well in cases where the malicious payload is triggered upon installation, which is the dominant technique used by dependency and name confusion attacks. All the packages detected are reported to the Open Source Vulnerability Database (OSV), and can be queried using their Web interface and API.

Attacks on Legitimate Packages

Another class of malware attacks are those that target the resources of an existing OSS project, such as its source code repository and build system, or the user accounts of its project maintainers. Those attacks are less frequent, also thanks to the increasing (sometimes mandatory) use of two factor authentication for user accounts of package registries and source code repositories. However, these attacks also cause greater impacts because a successful infection with malicious code impacts a larger user base compared to confusion attacks, especially in cases of unpinned dependency declarations. In comparison, dependency and name confusion packages have typically low download numbers.

Since name and dependency confusion attacks are taken down swiftly, we expect an increase in attacks on legitimate packages. One such example is a July 2023 attack where malicious merge requests impersonated GitHub Dependabot in order to suggest that the merge requests only update the version numbers of project dependencies (while in fact they leaked secrets and altered project resources).

Two Case Studies of Malicious Python Code

Malware developers can be very clever, and without a sophisticated malware detection tool, they can be challenging to detect. Here are two examples we detected using Endor Labs Open Source.

ttlo and gisi in PyPl — In July 2023, we detected malicious code (ttlo and gisi) that had been divided into smaller chunks which were distributed across several files and even different packages. This tactic was very likely used with the intention to hinder malware detection and is accountable for higher dwell times and download numbers compared to packages of other malware campaigns. According to PyPI Stats, the packages gisi and ttlo were downloaded 1291 times and 667 times respectively during that time frame. For more details, read Divide and Hide: How Malicious Code Lived on PyPI for 3 months.

‍Whatfuscator — In January 2023, we discovered a less sophisticated example: Python package Whatfuscator 70, which uses a variation of the pattern used in thousands of previous attacks. Whatfuscator/__init__.py downloads a Windows executable from a hard-coded URL, saves it to a local file and starts it right after. Version 69 was published just a few minutes before and apparently, between version 69 and 70, the obfuscated Python code contained in __init__.py was moved to the Windows executable (a file created with PyInstaller). This code in version 69 uses obfuscation and compression to hide a low-level Python code object. For more details, read Whatfuscator, Malicious Open Source Packages, and Other Beasts.

Using Endor Labs to Detect Malware

Leveraging recommendations from the OWASP Open Source Top 10, Endor Labs Open Source helps you significantly reduce the chance of malicious packages going unnoticed by enabling you to monitor OSS dependencies for:

Known malicious packages
Suspicious code and behaviors

This short video demonstrates how to use Endor Labs to detect malicious code in your open source dependencies. Read on to learn more about the details.

Monitor for Known Malicious Packages

Endor Labs compares your dependencies against confirmed malicious packages as reported in the OSV. The presence of any such package is brought to a developer's attention through a critical finding and alert. You can quickly catch malware by implementing this policy with pre-commit checks to prevent malware from being pushed to production or scan throughout the SDLC to detect malware that’s already infiltrated your applications.

This screenshot shows a finding created for a project that depends on the confirmed malicious package “bobikssf”, which was detected in August 2023 by the OpenSSF package analysis project and reported to OSV.

Detect Suspicious Code Snippets and Behaviors

Endor Labs developed a malware scanner that uses a comprehensive set of rules that reflect patterns and behaviors used in previous supply chain attacks. The rules look for suspicious code snippets and behaviors which we know have been used by attackers but have yet to be confirmed as malware. This capability is available for JavaScript (npm) and Python (PyPl).

However, some of the behaviors and techniques used by malicious actors are also used for benign purposes, which makes automatic classification very challenging. Because this type of finding is not guaranteed malware (but rather signals that malware may be present), the corresponding rule labels the respective package with a low severity “warning”, which does not break any build according to default policies. This provides security-sensitive organizations with full visibility without slowing down your developers. As with all Endor Labs findings, we provide a description of what triggered the policy and the Endor Score security evaluation. Our security researchers investigate the suspicious code and depending on the result of the review, the warning will either be dismissed, or (in case of confirmed malicious behavior) the respective package registry gets notified of the malicious package and a critical finding will be created in your project.

This screenshot shows a finding for potential malware. It’s low severity because it is under review by Endor Labs.

In this screenshot, the Endor Score informs you that the code was flagged for suspicious behavior because it reads environment variables and includes them in a HTTP request, which is both done for legitimate purposes and by malicious packages.

‍

Sample Suspicious Code Detection: versioneer@0.29

A great example to illustrate flagging suspicious code is a scan result brought up for the PyPI package versioneer@0.29, which supports “managing a recorded version number in setuptools-based python projects”.

The behavior that triggered a scan rule is only contained in the Python wheel (but not in the tarball), decodes a Base64-encoded string (containing Python code), and passes it on to Python’s exec statement. This obfuscation technique is frequently used by malicious actors to conceal malicious Python code. When the security research team reviewed this finding, we determined that the Base64-code is not malicious, but rather it’s created automatically during the packaging of the wheel.

Sample Suspicious Code Detection: azure-functions@1.18.0

Another interesting example is the PyPI package azure-functions@1.18.0, which provides Python support for Azure Functions. In this case, Base64-encoded information is decompressed with Python’s zlib package, which is yet another common obfuscation technique. The result is included in a dynamically created HTML page. Again, manual review showed that the code is not malicious but that azure-functions rebundles code from the well-known werkzeug project. In former versions, the Python file from werkzeug contained an easteregg, which included Base-64 encoded ASCII art in an Html page.

Start Detecting Malware with Endor Labs

The detection of malicious packages is another important step to securing your development projects and effectively preventing supply chain attacks. To get started with Endor Labs, start a free 30-day trial or contact us to discuss your use cases.

The Challenge

The Solution

The Impact

Get a demo of Endor Labs

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Get a demo of Endor Labs

Welcome to the resistance

Oops! Something went wrong while submitting the form.

Get a demo of Endor Labs

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Improve Kubernetes Security with Signed Artifacts and Admission Controllers

AppSec Goes to Devnexus: Lessons from a Thriving, Modern Java Community

XZ Backdoor: How to Prepare for the Next One

XZ is A Wake Up Call For Software Security: Here's Why

SSDF Compliance and Attestation

You Have a Shadow Pipeline Problem

Artifact Signing 101 - On-Demand Webinar

Prioritizing SCA Findings with Reachability Analysis - On-Demand Webinar

Signing Your Artifacts For Security, Quality, and Compliance

Remediating Vulnerabilities vs. Maintaining Current Dependencies

Detect Malicious Packages Among Your Open Source Dependencies

How to Ingest and Manage SBOMs - Tutorial

How to Improve SCA in GitHub Advanced Security - Tutorial

How to Generate SBOM and VEX - Tutorial

How to Use AI for Open Source Selection - Tutorial

How to Scan and Prioritize Valid Secrets - Tutorial

Tom Gleason Joins Endor Labs as VP of Customer Solutions

Introducing CI/CD Security with Endor Labs

Highlights from State of Dependency Management 2022 - Webinar

Reachability Analysis for Python, Go, C# - Webinar

How Security and Engineering Can Scale Open Source Security - Webinar

Introduction to Open Source Security - Webinar

Comparing SBOMs Generated at Different Lifecycle Stages - Webinar

Why We Need Static Analysis When Prioritizing Vulnerabilities - Webinar

State of Dependency Management 2022

OWASP Top 10 Risks for Open Source

How to Prioritize Reachable Open Source Software (OSS) Vulnerabilities - Tutorial

What You Need to Know About Apache Struts and CVE-2023-50164

You Found Vulnerabilities in Your Dependencies, Now What?

Why SCA Tools Can't Agree if Something is a CVE

Chris Hughes Joins Endor Labs as Chief Security Advisor

What’s in a Name? A Look at the Software Identification Ecosystem

Why Different SCA Tools Produce Different Results

Why Your SCA is Always Wrong

Whatfuscator, Malicious Open Source Packages, and Other Beasts

What Security Teams Need to Know about Software Development

What Breaking Changes Teach Us about Security

What is VEX and Why Should I Care?

What are Maven Dependency Scopes and Their Related Security Risks?

What is Reachability-Based Dependency Analysis?

VMware Achieves SBOM Compliance for Over 100 Services with Endor Labs

Understanding Python Manifest Files

CSRB Log4j Report - The Response is as Dangerous as the Vulnerability

Strengthening Security in .NET Development with packages.lock.json

Endor Labs Raises $70M in Series A Funding to Reform Application Security

The Government's Role in Maintaining Open Source Security

Static SCA vs. Dynamic SCA: Which is Better (and Why it’s Neither)

From Cloud Security to Code Security: Why We've Raised $25M to Take on OSS Dependency Sprawl

Visualizing the Impact of Call Graphs on Open Source Security

SBOM vs. SBOM: Comparing SBOMs from Different Tools and Lifecycle Stages

Endor Labs Launches with $25M Seed Financing to Tackle Massive Sprawl of Open Source Software (OSS)

Key Questions for Your SBOM Program

SBOMs are Just a Means to an End

Reviewing Malware with LLMs: OpenAI vs. Vertex AI

SBOM Requirements for Medical Devices

Polyrepo vs. Monorepo - How Does it Impact Dependency Management?

Open Source Security 101: How to Evaluate Your Open Source Security Posture

Announcing the Endor Labs Hyperdrive Program for Resellers and Solution Providers

The Open Source Security Index Top 5

MileIQ Securely Reimagines a Decade Old Product with Endor Labs

LLM-assisted Malware Review: AI and Humans Join Forces to Combat Malware

Open Source Licensing Simplified: A Comparative Overview of Popular Licenses

Make Developers' Lives Easier with Endor Labs & GitHub Advanced Security

More Than 30 Industry-Leading CISOs Personally Invest in Endor Labs

Introducing JavaScript Reachability and Phantom Dependency Detection

Introduction to Program Analysis

Introducing the OpenSSF Scorecard API

Introducing Reachability-Based SCA for Python, Go, and C#

How Zero Trust Principles Can Accelerate Enterprise Adoption of OSS

Introducing a Better Way to SCA for Monorepos and Bazel

How to Quickly Measure SBOM Accuracy for Maven Projects (for Free)

Why I Joined Endor Labs to Build our India Team

How To Evaluate Secret Detection Tools

How to Get the Most out of GitHub API Rate Limits

How CycloneDX VEX Makes Your SBOM Useful

Exploring Risk: Understanding Software Supply Chain Attacks

Faster SCA with Endor Labs and npm Workspaces

Combining EPSS and Reachability Analysis to Optimize Vulnerability Management

Endor Labs’ ‘State of Dependency Management 2023’ Report Offers Insight on Explosive Popularity of AI and LLMs—and How They Impact Application Security

Endor Labs Wins Intellyx Digital Innovation Award