By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.
18px_cookie
e-remove

Start Clean With AI: Select Safer LLM Models with Endor Labs

You can now use Endor Labs to evaluate AI models on HuggingFace for security, popularity, quality, and activity.

You can now use Endor Labs to evaluate AI models on HuggingFace for security, popularity, quality, and activity.

You can now use Endor Labs to evaluate AI models on HuggingFace for security, popularity, quality, and activity.

Written by
A photo of George Apostolopoulos — Engineer at Endor Labs.
George Apostolopoulos
A photo of Ron Harnik — VP Marketing at Endor Labs.
Ron Harnik
Published on
October 16, 2024

You can now use Endor Labs to evaluate AI models on HuggingFace for security, popularity, quality, and activity.

You can now use Endor Labs to evaluate AI models on HuggingFace for security, popularity, quality, and activity.

TL;DR - You can now use Endor Labs to evaluate open source LLM models on HuggingFace for security, popularity, quality, and activity. Let your developers innovate with AI while making sure you’re starting clean with safe, reliable models.

Everybody is experimenting with large language models (LLMs) right now (If you're new to the concept, check out our blog on the basics of LLMs to get up to speed). Some teams are building brand new AI-based businesses while others are looking for ways to slap a “powered by AI” sticker on their product. One thing is for sure, your developers are playing with LLM models.  While training your own model is resource-intensive, platforms like HuggingFace offer a vast repository of ready-made models. Just like how you can grab open source software (OSS) from GitHub, you can grab a model from HuggingFace. Unsurprisingly, just like OSS, convenience doesn’t come without risk. The good news is that you can mitigate that risk by evaluating and scoring the LLMs you use. 

The adoption of LLMs is following a similar trend to the early days of readily-available OSS. It’s the wild west. People are grabbing models that fit their needs, and those models could have exploitable vectors for attacks. Also similar to OSS, some models can rely on other models you are not aware of, which can potentially expose you to greater risks. 

LLMs are Dependencies

Our mission at Endor Labs is to “secure everything your code depends on”. This way of thinking emerged out of the understanding that dependencies are not just the software packages your 1st party code depends on, but also your container images, GitHub actions, even the repositories themselves. These are all parts of the software supply chain that need to be safe and reliable in order to ship a secure application. Open source LLMs are no different. They are another type of dependency. So while LLMs work differently than, for example, open source packages, you can follow a similar methodology of evaluating them and measuring risk as you allow developers to bring them into your codebase. 

Above we mentioned that just like OSS, open source LLMs can rely on other models. It just works a little differently. Unlike OSS packages, which often directly depend on each other by importing functionalities, LLMs are typically derived from other models. Many developers start with an existing model and fine-tune it for specific use cases or train it further to enhance its capabilities to their specific needs.

For example, models available on HuggingFace, such as those based on the open source LLaMA models from Meta, serve as foundational models. Developers can then create new models by refining these base models to suit their specific needs, creating a model lineage. This process means that while there is a concept of dependency, it is more about building upon a pre-existing model rather than importing components from multiple models. Yet, if the original model has a risk, models that are derived from it can inherit that risk.

Understanding LLM Risks

LLM files primarily consist of binary files that contain the model's weights. These weights are numerical values resulting from the training process, essentially forming the core of the model. When you use an LLM, you load these weights into an environment to run the model. These binary files can be quite large, often amounting to gigabytes of data.

In addition to the weights, some LLM files may include source code, typically as examples of how to use the model or other secondary functions. This source code, while minimal, can also pose security risks.

  1. Security Vulnerabilities: Pre-trained models from HuggingFace can harbor security risks. Generally, these risks fall into two categories: instances of malicious code in files shipped with the model, and vulnerabilities hidden within model weights. When such models are integrated into an organization’s environment, they can lead to security breaches.
  2. Legal and Licensing Issues: Using models without adhering to the proper licensing terms can result in legal troubles. Organizations must ensure that they comply with the licensing terms of the models they use and they should be aware that certain licenses have significant intellectual property and copyright implications. This can be more complex with open source LLM models than typical OSS code because of licensing requirements across the model’s lineage and the training sets used to train the model. 
  3. Operational Risks: The dependency on pre-trained models introduces operational risks. These models can be modified or fine-tuned based on other existing models, creating a complex dependency graph that can be challenging to manage and secure.

But more specifically, how can risks make their way into the model we’re downloading?

  • Vulnerabilities in weight encoding. The pickle format which used to be very common for storing model weights has been long known to be vulnerable and allow arbitrary code execution. As a result, models created with some versions of PyTorch/Tensorflow and Keras are vulnerable. New weight formats like safetensors do not have these types of vulnerabilities but still lots of models use the older, vulnerable formats. 
  • Deploying a model may require downloading other code that can be malicious or have vulnerabilities. Typically models describe snippets of code needed to run the model in their readme. This code could be malicious or vulnerable. This code may also try to import malicious or vulnerable dependencies. In some cases model repositories contain various scripts that help with installation, and these scripts could be malicious, or bring suspect dependencies.
  • Models contain links to repositories. These repositories could be malicious or vulnerable. The user may not download and use the repositories, but if they do, they may get infected. 

In an example that is detailed here, there is a method where the model is embedded inside a binary. Users download the binary and when they run it, it creates a server that can be used to interact with the model. While this is done for speed and convenience, this is problematic in terms of security since now attackers can directly infect users through malicious binaries. 

Use Endor Labs to Evaluate Open Source LLMs

As we discussed in Evaluating and Scoring OSS Packages, the Endor Score gives users all the data they need to decide whether an OSS library or package is safe to use. Does this project have multiple maintainers? Is it corporately sponsored? How many releases did it have in the last 60 days? Does it use CI? Does it have known vulnerabilities? The answers to these questions (along with 150 other checks) contribute to category scores for security, activity, popularity, and quality.

Endor Scores for open source LLMs work the same way! We evaluate Open Source LLM models on HuggingFace based on 50 out-of-the-box checks and identify both the good and bad qualities. With a quick search on the platform, you can see LLM scoring and highlights from our analysis.

Evaluate AI models based on security, activity, popularity, and quality

Factors that positively impact the scores are marked with a +, and factors that negatively impact it are marked with a -, for the full list, check out the docs, but here are some highlights:

  • - Model weights are private 
  • + Has License information, license is open source friendly
  • - Has incomplete model scorecard/Readme.md 
  • - Lack of performance data 
  • - Lack of dataset information 
  • - Incomplete information about the model’s training steps
  • - Incomplete information about the model’s provenance and lineage 
  • - Lack of prompt format information
  • + Uses another safe weight format 
  • - Uses unsafe weight format 
  • - Has example code in its Readme 
  • - Model has linked repos
    • May be vulnerable/malicious
  • - Model files contain binaries
    • Could be hiding malware 
    • HuggingFace has flagged it as problematic
  • + Number of downloads (more is better) 
  • + Number of likes (more is better)
  • + Age (older is better)
  • - Requires authentication to access the data 
  • + Has linked paper 
  • + Has linked GitHub 
  • + Used in multiple spaces 
  • + Engagement (discussions or Prs) 

Your teams are being asked about AI every single day, and they’ll look for the models they can use to accelerate innovation. Evaluating Open Source LLM models with Endor Labs helps you make sure the models you’re using do what you expect them to do, and are safe to use. In a way, security teams are entering uncharted territory with LLMs. Starting clean is the best way to mitigate issues that could come up further down the line.

What’s Next?

LLM scores are available now in the Endor Labs free trial - which lets you take the entire platform for a spin for 30 days! In our next release, we’ll introduce LLM selection policies, and the ability to discover, scan, and secure the models that are already in your system. AI model discovery is available now under DroidGPT in the free trial, sign up now and give it a spin!

The Challenge

The Solution

The Impact

Book a Demo

Book a Demo

Book a Demo

Welcome to the resistance
Oops! Something went wrong while submitting the form.

Book a Demo

Book a Demo

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Book a Demo