Content Mismanaged: Leaking API Keys on GitHub

The software has increasingly become an important part of every organization’s business model. “There’s an app” for just about everything out there, and many organizations develop software-based solutions in-house as well. As solutions become more common and more complex, the need for developer teams (and larger ones) continues to grow.

One of the main challenges when working with a team of developers (or even one working on many different products) is ensuring compatibility of each individual part of the product. Since many developers are working in parallel, there is the possibility that a component developed based upon the state of the software at the last release may be incompatible with the current version. 

Content Mismanaged: Leaking API Keys on GitHub

Continuous integration and deployment tools and version control services like GitHub are designed to help with this problem. However, the ability to perform bulk uploads to an untrusted server (like those provided by GitHub) can create significant security issues. One of the most common types is API security weaknesses caused by mismanagement of API keys included in GitHub repositories.

GitHub: Making Content Management Easier

GitHub is a version control system designed to enable large groups of developers to collaborate and work in parallel in an efficient manner. GitHub works using a client-server design, where the main server holds the official version of the code repository (or “repo”), and each individual developer can create a local copy of the repo and work based off of that copy. After they’ve made their desired modifications to the codebase, they can either push the modified code to the server or submit a pull request for someone to add it to the main repo (used if the developer does not have sufficient privileges or if the team has a code review policy).

GitHub has many useful features for development teams including the ability to see the complete history of the project (including tagging major releases), compare any two versions of the code (and see who made the edits), and create branches to allow code to follow different development paths (useful for creating free/paid versions or multi-platform projects with some shared code). As a result, it is much easier for large development teams to work in parallel, and GitHub can even have other products integrated with it to continuously test the correctness, compatibility, etc. of new code before accepting it into the main repo.

The Downside of Convenience

The issue with GitHub (and similar change management systems) is that it makes it too easy to upload files to a (potentially) insecure repository. If any organization is operating its own internal repository, this may not be too much of a problem. However, some organizations and individuals make use of

GitHub’s public server.

This can be an asset in many cases. Many open-source software projects are hosted on GitHub since it allows people to easily review the code and suggest and develop modifications or additional features. Organizations are increasingly asking for a developer’s GitHub account information as part of the interview process since the ability to see code developed by the individual often gives a better perspective on their abilities than a traditional interview process.

However, this convenience can also be a liability, especially in terms of security. One example is code that includes intellectual property or trade secrets. If an organization’s “secret sauce” is included in code uploaded to a public repository, this can have a significant impact on their competitive advantage.

Even worse, GitHub users have been found to be leaking sensitive API and cryptographic keys within their uploaded projects. North Carolina State University performed a six-month study in which they scanned over 2.3 billion publicly accessible files in about 4 million repos (about 13% of all repos). They were looking for data formatted like API keys from 11 different companies and 4 different types of cryptographic keys.

And they found them. 201,642 unique API and cryptographic keys were found spread over more than 100,000 GitHub projects. Anyone of these keys could give an attacker access to the owner’s account on a major service or, in the case of cryptographic keys, on a computer. The potential impact is huge.

Worse, the problem seems to be a continuing one. The researchers tracked the discovered keys over the six-month study and found that over 80% of them were not removed from the repo at that time. This likely indicates that the owners were unaware of the leakage and the potential compromise of their accounts.

Protecting Your Services

GitHub is an extremely convenient and valuable service for developers, but it’s important to use it in the right way. Scanners exist for testing a potential upload to a repo for potentially sensitive information (similar to the tools used in this study), and GitHub can be configured to ignore certain, sensitive files when performing uploads. All developers need to be aware of these features and use them properly in order to protect their sensitive data while taking advantage of this service.

At a larger scope, organizations should take action to help protect API security. Solutions exist for helping to identify anomalous behavior that could indicate compromised credentials. Other technology like data masking can also be deployed to ensure that any credentials that may be accidentally included in a bulk data upload are properly obfuscated before being placed in an untrusted environment.

Leave a Comment