How to Scan GitHub Repositories for Secrets & Credentials with Open Source

Some GitHub repositories don’t just contain code – they contain passwords, API tokens, and credentials.

The worst part? These are accessible in the public domain! Anyone who knows their way around GitHub can query with its search tool to pull up thousands of public repositories with key secret information.

These repositories are not just owned by beginners or solo developers but sometimes by even large ‘cybersecurity’ companies. In mid-2019, Comodo a cybersecurity company specializing in building antivirus software and the likes for both customers and enterprises – made a small but significant error.

A hacker got access to a set of credentials (email and password) on a GitHub public repository owned by a Comodo employee. With it, the hacker was able to login to Microsoft’s cloud suite and gained access to the Comodo’s internal sales documents, spreadsheets, customer documents, team contact information (phone numbers and addresses), calendars and org charts! The account conveniently was not protected by two-factor authentication.

Researchers at North Carolina State University found that over 100,000 of public repositories contained credentials, growing day on day. Surprisingly, this was after scanning only 13% of GitHub’s total public repositories.

The credentials found were categorized into:

SSH keys: These allow privileged access to servers
API keys: These allow access to third-party services (like Twitter, Facebook, etc.) and their API endpoints
User Credentials: An email/password pair like in the case of Comodo

How do Hackers Get Access?

GitHub has a very powerful search feature – accessible even via APIs.

Let’s use the search to look for files called “env”. These are environment files which are created by developers that hold credential information that their applications may use.

Well, that was easy. Let’s click on a random file.

I hid a lot of the information, but there you go! An exposed API key. Surprisingly this was the first search result I clicked on. I was expecting nothing to turn up – but you see how easy it is to accidentally push credentials online!

Another search for, “filename:.env MAIL_HOST=smtp.gmail.com” yields a list of email addresses and their passwords! Maybe this is what the Comodo hacker did!

Hackers search for multiple search terms like these. They even collate them and publish them online – on GitHub! These collections of ingenious search terms are known as “dorks”.

Here’s an example dork: https://github.com/misterch0c/GithubLeakAlert/blob/master/github-dorks.txt

There was also a service called, “gitleaks” – which is, unfortunately, down at the time of writing this article. Gitleaks used to publish any publicly available credentials from GitHub – and even indexed it!

At this point you must be wondering, why doesn’t GitHub just kill its search feature? Well, GitHub’s search feature is useful for a lot of people who are trying to learn how to code, or for looking up resources. At the end of the day, it is the developers’ responsibility to not put up credentials online – not GitHub’s – to safeguard code.

How Can I Protect Myself?

Easy – don’t hard code credentials in code!

Step one in this approach is to move credentials away from your codebase. You could start by storing them in environment variables on your servers.

However, when a developer needs to update a credential or add a new one, it will probably lead to a lot of hassle, and eventually send the credential through some internet service.

CyberArk Conjur

Conjur manages the secrets required by applications to gain access to critical infrastructure, data, and other resources. You can also set up role-based access so that certain developers or teams access only what they need to.

As a developer, you will need to setup Conjur on your development machine. You can then log in and add credentials.

docker-compose exec client conjur variable values add BotApp/secretVar ${secretVal}

As an application, you can pull credentials from Conjur via many sources. You can make an API call or cURL command, or use client libraries (available for Ruby, Python, Java, and Node.js)!

Let’s look at how to retrieve a secret from Conjur.

You can make a simple cURL command to retrieve “prod db password”:

curl -H "$(conjur authn authenticate -H)" \
https://eval.conjur.org/secrets/myorg/variable/prod/db/password

Or use a Ruby client,

puts "Logging in as #{host.id}"
host_api = Conjur::API.new_from_key "host/host-01", host.api_key

puts "Fetching db-password as #{host.id}"
value = host_api.resource("cucumber:variable:db-password").value

puts value

Conjur also comes with a GitHub action to push credentials from dev to prod environments.

Scanning Your Repositories

It’s also a good idea to detect whenever credentials are pushed to repositories. Doing a quick search on GitHub Marketplace shows a list of bots that can be added to your personal or organization account.

A great open-source tool is Truffle Hog.

It monitors repository activity for any hardcoded credentials and warns you about it. You can set up further actions to notify you of open issues for every occurrence.

Conclusion

Don’t store credentials where your code lives! Simple as that.

Unfortunately, humans aren’t that simple. Make your developers’ life easier by providing them the tools needed to prevent this from ever happening. Conjur makes it easy to store, update and retrieve credentials in a secure manner.

Join the Conversation on the CyberArk Commons

If you’re interested in this and other open-source content, join the conversation on the CyberArk Commons Community. Secretless Broker, Conjur and other open-source projects are a part of the CyberArk Commons Community, an open community dedicated to developers, engineers, cybersecurity researchers, and other technically-minded people. To discuss Kubernetes, Secretless Broker, Conjur, CyberArk Threat Research, join me on the CyberArk Commons discussion forum.”

Swaathi Kakarla

Swaathi Kakarla is the co-founder and CTO at Skcript. She enjoys talking and writing about code efficiency, performance, and startups. In her free time, she finds solace in yoga, bicycling and contributing to open source.