Skip to content

Commit 3d3c0fc

Browse files
authored
Merge pull request #744 from secureCodeBox/feature/add-scanner-semgrep
Add Semgrep scanner
2 parents 7740c2e + fd16358 commit 3d3c0fc

32 files changed

Lines changed: 2520 additions & 1 deletion

.github/workflows/ci.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -290,6 +290,7 @@ jobs:
290290
- nmap
291291
- nuclei
292292
- screenshooter
293+
- semgrep
293294
- ssh-scan
294295
- sslyze
295296
- trivy

.github/workflows/scb-bot.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ jobs:
2323
# - wpscan
2424
# - zap
2525
# - zap-advanced
26+
# - semgrep
2627
# These are commented out for the moment to avoid accidental multiple erroneous PRs
2728
# missing scanners are : nmap, nikto, typo3scan
2829
steps:

hooks/persistence-defectdojo/.helm-docs.gotmpl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ These are:
3535
- SSLyze
3636
- Trivy
3737
- Gitleaks
38+
- Semgrep
3839

3940
After uploading the results to DefectDojo, it will use the findings parsed by DefectDojo to overwrite the
4041
original secureCodeBox findings identified by the parser. This lets you access the finding metadata like the false

hooks/persistence-defectdojo/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ These are:
4646
- SSLyze
4747
- Trivy
4848
- Gitleaks
49+
- Semgrep
4950

5051
After uploading the results to DefectDojo, it will use the findings parsed by DefectDojo to overwrite the
5152
original secureCodeBox findings identified by the parser. This lets you access the finding metadata like the false

hooks/persistence-defectdojo/hook/src/main/java/io/securecodebox/persistence/util/ScanNameMapping.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ public enum ScanNameMapping {
1818
NIKTO("nikto", ScanType.NIKTO_SCAN),
1919
NUCLEI("nuclei", ScanType.NUCLEI_SCAN),
2020
WPSCAN("wpscan", ScanType.WPSCAN),
21+
SEMGREP("semgrep", ScanType.SEMGREP_JSON_REPORT),
2122
GENERIC(null, ScanType.GENERIC_FINDINGS_IMPORT)
2223
;
2324

scanners/semgrep/.helm-docs.gotmpl

Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
{{- /*
2+
SPDX-FileCopyrightText: 2021 iteratec GmbH
3+
4+
SPDX-License-Identifier: Apache-2.0
5+
*/ -}}
6+
7+
{{- define "extra.docsSection" -}}
8+
---
9+
title: "Semgrep"
10+
category: "scanner"
11+
type: "Repository"
12+
state: "released"
13+
appVersion: "{{ template "chart.appVersion" . }}"
14+
usecase: "Static Code Analysis"
15+
---
16+
17+
![Semgrep logo](https://raw.githubusercontent.com/returntocorp/semgrep-docs/main/static/img/semgrep-icon-text-horizontal.svg)
18+
19+
{{- end }}
20+
21+
{{- define "extra.dockerDeploymentSection" -}}
22+
## Supported Tags
23+
- `latest` (represents the latest stable release build)
24+
- tagged releases, e.g. `{{ template "chart.appVersion" . }}`
25+
{{- end }}
26+
27+
{{- define "extra.chartAboutSection" -}}
28+
## What is Semgrep?
29+
Semgrep ("semantic grep") is a static source code analyzer that can be used to search for specific patterns in code.
30+
It allows you to either [write your own rules](https://semgrep.dev/learn), or use one of the [many pre-defined rulesets](https://semgrep.dev/r) curated by the semgrep team.
31+
32+
To learn more about semgrep, visit [semgrep.dev](https://semgrep.dev).
33+
34+
{{- end }}
35+
36+
{{- define "extra.scannerConfigurationSection" -}}
37+
## Scanner Configuration
38+
39+
Semgrep requires one or more ruleset(s) to run its scans.
40+
Refer to the [semgrep rule database](https://semgrep.dev/r) for more details.
41+
A good starting point would be [p/ci](https://semgrep.dev/p/ci) (for security checks with a low false-positive rate) or [p/security-audit](https://semgrep.dev/p/security-audit) (for a more comprehensive security audit, which may include more false-positive results).
42+
43+
44+
Semgrep needs access to the source code to run its analysis.
45+
To use it with secureCodeBox, you thus need a way to provision the data into the scan container.
46+
The recommended method is to use `initContainers` to clone a VCS repository.
47+
The simplest example, using a public Git repository from GitHub, looks like this:
48+
49+
```yaml
50+
apiVersion: "execution.securecodebox.io/v1"
51+
kind: Scan
52+
metadata:
53+
name: "semgrep-vulnerable-flask-app"
54+
spec:
55+
# Specify a Kubernetes volume that will be shared between the scanner and the initContainer
56+
volumes:
57+
- name: repository
58+
emptyDir: {}
59+
# Mount the volume in the scan container
60+
volumeMounts:
61+
- mountPath: "/repo/"
62+
name: repository
63+
# Specify an init container to clone the repository
64+
initContainers:
65+
- name: "provision-git"
66+
# Use an image that includes git
67+
image: bitnami/git
68+
# Mount the same volume we also use in the main container
69+
volumeMounts:
70+
- mountPath: "/repo/"
71+
name: repository
72+
# Specify the clone command and clone into the volume, mounted at /repo/
73+
command:
74+
- git
75+
- clone
76+
- "https://github.com/we45/Vulnerable-Flask-App"
77+
- /repo/flask-app
78+
# Parameterize the semgrep scan itself
79+
scanType: "semgrep"
80+
parameters:
81+
- "-c"
82+
- "p/ci"
83+
- "/repo/flask-app"
84+
```
85+
86+
If your repository requires authentication to clone, you will have to give the initContainer access to some method of authentication.
87+
This could be a personal access token ([GitHub](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token), [GitLab](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html)), project access token ([GitLab](https://docs.gitlab.com/ee/user/project/settings/project_access_tokens.html)), deploy key ([GitHub](https://docs.github.com/en/developers/overview/managing-deploy-keys#deploy-keys) / [GitLab](https://docs.gitlab.com/ee/user/project/deploy_keys/)), deploy token ([GitLab](https://docs.gitlab.com/ee/user/project/deploy_tokens/)), or a server-to-server token ([GitHub](https://docs.github.com/en/developers/overview/managing-deploy-keys#server-to-server-tokens)).
88+
Due to the large variety of options, we do not provide documentation for all of them here.
89+
Refer to the linked documentation for details on the different methods, and remember to use [Kubernetes secrets](https://kubernetes.io/docs/concepts/configuration/secret/) to manage keys and tokens.
90+
91+
## Cascading Rules
92+
By default, the semgrep scanner does not install any [cascading rules](docs/hooks/cascading-scans), as some aspects of the semgrep scan (like the used ruleset) should be customized.
93+
However, you can easily create your own cascading rule, for example to run semgrep on the output of [git-repo-scanner](docs/scanners/git-repo-scanner).
94+
As a starting point, consider the following cascading rule to scan all public GitHub repositories found by git-repo-scanner using the p/ci ruleset of semgrep:
95+
96+
```yaml
97+
apiVersion: "cascading.securecodebox.io/v1"
98+
kind: CascadingRule
99+
metadata:
100+
name: "semgrep-public-github-repos"
101+
labels:
102+
securecodebox.io/invasive: non-invasive
103+
securecodebox.io/intensive: medium
104+
spec:
105+
matches:
106+
anyOf:
107+
# We want to scan public GitHub repositories. Change "public" to "private" to scan private repos instead
108+
- name: "GitHub Repo"
109+
attributes:
110+
visibility: public
111+
scanSpec:
112+
# Configure the scanSpec for semgrep
113+
scanType: "semgrep"
114+
parameters:
115+
- "-c"
116+
- "p/ci" # Change this to use a different rule set
117+
- "/repo/"
118+
volumes:
119+
- name: repo
120+
emptyDir: {}
121+
volumeMounts:
122+
- name: repo
123+
mountPath: "/repo/"
124+
initContainers:
125+
- name: "git-clone"
126+
image: bitnami/git
127+
# The command assumes that GITHUB_TOKEN contains a GitHub access token with access to the repository.
128+
# GITHUB_TOKEN is set below in the "env" section.
129+
# If you do not wan to use an access token, remove it from the URL below.
130+
command:
131+
- git
132+
- clone
133+
- "https://$(GITHUB_TOKEN)@github.com/{{ "{{{" }}attributes.full_name{{ "}}}" }}"
134+
- /repo/
135+
volumeMounts:
136+
- mountPath: "/repo/"
137+
name: repo
138+
# Load the GITHUB_TOKEN from the kubernetes secret with the name "github-access-token"
139+
# Create this secret using, for example:
140+
# echo -n 'YOUR TOKEN GOES HERE' > github-token.txt && kubectl create secret generic github-access-token --from-file=token=github-token.txt
141+
# IMPORTANT: Ensure that github-token.txt does not have a new line at the end of the file. This is automatically done by using "echo -n" to create it.
142+
# However, if you create it with an editor, some editors (most notably, vim) will create hidden newlines at the end of files, which will cause issues.
143+
env:
144+
- name: GITHUB_TOKEN
145+
valueFrom:
146+
secretKeyRef:
147+
name: github-access-token
148+
key: token
149+
```
150+
151+
Use this configuration as a baseline for your own rules.
152+
{{- end }}
153+
154+
{{- define "extra.chartConfigurationSection" -}}
155+
{{- end }}
156+
157+
{{- define "extra.scannerLinksSection" -}}
158+
{{- end }}

scanners/semgrep/.helmignore

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# SPDX-FileCopyrightText: 2021 iteratec GmbH
2+
#
3+
# SPDX-License-Identifier: Apache-2.0
4+
# Patterns to ignore when building packages.
5+
# This supports shell glob matching, relative path matching, and
6+
# negation (prefixed with !). Only one pattern per line.
7+
.DS_Store
8+
# Common VCS dirs
9+
.git/
10+
.gitignore
11+
.bzr/
12+
.bzrignore
13+
.hg/
14+
.hgignore
15+
.svn/
16+
# Common backup files
17+
*.swp
18+
*.bak
19+
*.tmp
20+
*~
21+
# Various IDEs
22+
.project
23+
.idea/
24+
*.tmproj
25+
.vscode/
26+
# Node.js files
27+
node_modules/*
28+
package.json
29+
package-lock.json
30+
src/*
31+
config/*
32+
Dockerfile
33+
.dockerignore
34+
*.tar
35+
parser/*
36+
scanner/*
37+
integration-tests/*
38+
examples/*
39+
docs/*
40+
Makefile

scanners/semgrep/Chart.yaml

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
apiVersion: v2
2+
name: semgrep
3+
description: A Helm chart for the semgrep semantic code analyzer that integrates with the secureCodeBox
4+
5+
# A chart can be either an 'application' or a 'library' chart.
6+
#
7+
# Application charts are a collection of templates that can be packaged into versioned archives
8+
# to be deployed.
9+
#
10+
# Library charts provide useful utilities or functions for the chart developer. They're included as
11+
# a dependency of application charts to inject those utilities and functions into the rendering
12+
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
13+
type: application
14+
15+
# This is the chart version. This version number should be incremented each time you make changes
16+
# to the chart and its templates, including the app version.
17+
# Versions are expected to follow Semantic Versioning (https://semver.org/)
18+
version: "v3.1.0-alpha1"
19+
20+
# This is the version number of the application being deployed. This version number should be
21+
# incremented each time you make changes to the application. Versions are not expected to
22+
# follow Semantic Versioning. They should reflect the version the application is using.
23+
# It is recommended to use it with quotes.
24+
appVersion: "0.70.0"
25+
26+
versionApi: https://api.github.com/repos/returntocorp/semgrep/releases/latest
27+
28+
kubeVersion: ">=v1.11.0-0"
29+
30+
home: https://docs.securecodebox.io/docs/scanners/semgrep
31+
icon: https://docs.securecodebox.io/img/integrationIcons/semgrep.svg # TODO: Add this
32+
33+
sources:
34+
- https://github.com/secureCodeBox/secureCodeBox
35+
36+
maintainers:
37+
- name: iteratec GmbH
38+
- email: secureCodeBox@iteratec.com
39+
40+
keywords:
41+
- security
42+
- semgrep
43+
- SAST
44+
- staticanalysis
45+
- secureCodeBox

scanners/semgrep/Makefile

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
#!/usr/bin/make -f
2+
3+
include_guard = set # Always include this line (checked in the makefile framework)
4+
scanner = semgrep
5+
6+
include ../../scanners.mk # Ensures that all the default makefile targets are included
7+
8+
integration-tests:
9+
@echo ".: 🩺 Starting integration test in kind namespace 'integration-tests'."
10+
kubectl -n integration-tests delete scans --all
11+
cd ../../tests/integration/ && npm ci
12+
cd ../../scanners/${scanner}
13+
kubectl -n integration-tests create configmap semgrep-test-file --from-file=integration-tests/testfile.py
14+
npx --yes --package jest@$(JEST_VERSION) jest --verbose --ci --colors --coverage --passWithNoTests ${scanner}/integration-tests
15+
kubectl -n integration-tests delete configmap semgrep-test-file

0 commit comments

Comments
 (0)