add push secrets detector #34226

TheFox0x7 · 2025-04-16T17:09:11Z

adds a step to pre-commit hook which scans commit diffs for secrets

Very WIP/PoC, more looking for a feedback on the approach.
TODO:

Settings flag as the feature has to be opt in until deemed stable enough
Per repo config flag enable the feature (again opt-in by default even if the instance enables it with later switch to opt out)
Bypass flag for repo admins (possibly some other role too? How to handle it in UI?)
Repo config taken into account. Probably taken from .gitleaks.toml at repo root but I think it might be interesting to look at the way gerrits repo config is handled as well. Though this is a topic for separate PR and proposal.
UI adjustments as rejection message looks bad currently
Logs on web push repeat the rejection message few times and with how large those might end up here it's an issue
Adjust/Disable gitleaks logging to not emit it's own things
Binary size? it's +1.7MB with some maybe reusable dependencies. Maybe it can be trimmed down (possibly in a follow up)
Upstream changes in fork? (conflicts with trimming it down)

use git diff based one

fix linting

lunny · 2025-04-16T22:39:23Z

I think this is the right place to detect. But the detection may take much time. It should have a queue to handle them and not return the detection result immediately.

TheFox0x7 · 2025-04-17T07:11:24Z

The entire point of the feature is push rejection if the change contains secrets and if that gets pushed to queue to do later it nullfies the point of it being a pre-recieve hook.
Unless I'm missing something and you have an idea on how to queue it at the start and wait for completion before ACK/NACking push?

TheFox0x7 · 2025-04-17T07:51:59Z

@wxiaoguang Mind also taking a look when you have a moment? You usually have good ideas and insights about things that would be better to do before a change/feature and a more native way to do it.

btw. I'm aware of the fault in the diffing when refs are added removed - the end behavior should be:

Scan diff (or individual commits - tbd) if old and new Ids aren't zero
Scan change if old ID is zero - it's still an added change that needs to be processed
Skip scanning if new ID is zero.

I'm in progress of figuring out how to do 2, without scanning the entire history if not required.

wxiaoguang · 2025-04-17T07:57:45Z

What if I'd like to store secret-like data in git repo? For example: I have a repo that contains encrypted passwords (of course, the master key is managed separately, not in the repo).

TheFox0x7 · 2025-04-17T08:04:37Z

That's why it would be off by default which is in TODOs. If you'd like to keep it on and store secrets, you'd have to add them to allowlist in config. I don't think gitleaks will classify encrypted passwords as keys but that's a good case to test for.

wxiaoguang · 2025-04-17T10:04:06Z

Oh I see the "repo config" TODO now. Maybe 2 approaches:

Use database repo config: then the git hook command need to request the repo config by internal API
Use git repo config (something like .gitleaks.toml)

I didn't know how other forges do so I can't tell which one is better .....

btw. I'm aware of the fault in the diffing when refs are added removed - the end behavior should be:

TBH I am not quite familiar with the "git hook" related code, can't really tell the differences between these cases (how the result would be affected) .... or maybe make it configurable to let repo admin decide? Or the same question as above: how do other forges do?

TheFox0x7 · 2025-04-17T22:42:58Z

One example of similar hook I found is this: https://github.com/github/platform-samples/blob/master/pre-receive-hooks/block_confidentials.sh

As to how others do it:
Gitlab approach is to have a pre-recieve hook which runs on diffs and unless skip option is set, blocks the push
https://docs.gitlab.com/user/application_security/secret_detection/secret_push_protection/#secret-push-protection-workflow
It also logs it in audit (which would be TODO once there's an audit system)

Github doesn't specify much but notes that it logs bypasses in security tab https://docs.github.com/en/code-security/secret-scanning/introduction/about-push-protection
Also scans more than commit pushes (issues, bodies, etc) but that's out of scope for this PR.

As for more details on how they allow configs:
https://docs.gitlab.com/user/application_security/secret_detection/exclusions/
https://docs.github.com/en/code-security/secret-scanning/using-advanced-secret-scanning-and-push-protection-features/custom-patterns/defining-custom-patterns-for-secret-scanning#about-custom-patterns-for-secret-scanning

Mostly via web UI updates. I'd argue that making the ruleset part of the repository is better as it has benefits of history tracking, messages why was the rule added and usual git benefits.

wxiaoguang · 2025-04-18T00:36:21Z

Mostly via web UI updates. I'd argue that making the ruleset part of the repository is better as it has benefits of history tracking, messages why was the rule added and usual git benefits.

I am neutral about these, while I think I could understand why "mostly via web UI updates": it is more friendly to end users. Not everyone is git expert or could commit the ruleset file into a git repo easily and debug it.

(Hmm, TBH, I have a little more preference for "web UI update")

TheFox0x7 · 2025-04-19T11:14:43Z

Honestly one doesn't exclude the other. This is a git server after all so we can have the config displayed in UI but stored in repo (gerrit like).

More on the current state - I'll probably end up needing to write a custom parser for the unless someone has a better idea?
I don't think gitdiff allows me to parse format-patch output which to my knowledge is the simplest way to get a commit by commit diff needed to scan each commit individually - that's needed so things like: commit 1 adds a secret, commit 2 removes it, will still fail the push.
I was looking at gitaly api for clues but so far I don't think they have something like this either.

Never mind - I was printing newCommitId instead of the commit from finding and assumed it was an issue. It's not.

add finding base for unknown starting point add bypass for pushing

TheFox0x7 · 2025-04-20T11:36:33Z

Okay I think the PR by now it's out of it's PoC phase and into "good enough for experimental releases" so if someone wants to try and mess with the feature please do.

I think the approach of blocking until the scan completes is correct. I've pushed linux 0af2f6be1b42..6fea5fabd332 827 commits, 860 files changed, 10736 insertions(+), 5738 deletions(-) to my local test repo and it took 11 seconds in total to process, which I think is acceptable for push with scan.

Apart of UI issues, error handling in the function and repo level feature flag, is there anything I'm missing which should be added/addressed in v1 of this feature?

There are also tests to make but currently I don't have a clue how to approach that.

I was thinking about redoing hooks more in gitaly direction (grpc streaming stdout,stderr) but that's an idea for later.

lunny · 2025-04-20T19:02:28Z

Okay I think the PR by now it's out of it's PoC phase and into "good enough for experimental releases" so if someone wants to try and mess with the feature please do.

I think the approach of blocking until the scan completes is correct. I've pushed linux 0af2f6be1b42..6fea5fabd332 827 commits, 860 files changed, 10736 insertions(+), 5738 deletions(-) to my local test repo and it took 11 seconds in total to process, which I think is acceptable for push with scan.

Apart of UI issues, error handling in the function and repo level feature flag, is there anything I'm missing which should be added/addressed in v1 of this feature?

There are also tests to make but currently I don't have a clue how to approach that.

I was thinking about redoing hooks more in gitaly direction (grpc streaming stdout,stderr) but that's an idea for later.

How much overhead does token scanning add during a git push?

base the attempt on a change between default branch and new reference

TheFox0x7 · 2025-04-21T22:07:27Z

I've tested pushing linux master to new repo (so my worst case scenario - a lot of commits which are not in default branch) - found two bugs, one with gitea UI other in my implementation.
In the end - pushing reached 1.1GB of memory use (+ 1.1GB in subprocess but I'm not sure how it's counted). It also timed out with the feature on so that's not ideal.

One thing that could be optimized is to avoid storing the output and parsing it on the fly though I had no luck getting that working so far. Or storing it in a temp file and letting git write to it while parser reads it in the background (how well that would work I have no idea).

The diff parser runs in a go routine and writes to a channel and gitleaks consumes that channel so ideally it would be about even but the issue is git show just takes time and that's not something I can mitigate in any way unless making a bypass on large amounts. Gitlab does have a 1MB limit, but it's on file level (or file diff if that's enabled), not a commit level.
Alternatively we could have a limit of scannable revs and require user to submit it batches but that's less user friendly.

Linux repo also might not be the best benchmark for this due to the size, but I think it's worth trying in the unlikely event this will be somehow used during migrations.

Ideas welcome because I'm currently out - on io.Pipe and nio.Pipe it hangs.

add push secrets detector

22739f1

use git diff based one

GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Apr 16, 2025

github-actions bot added modifies/go Pull requests that update Go code modifies/dependencies labels Apr 16, 2025

use tag version

bd6ae40

fix linting

TheFox0x7 added 6 commits April 20, 2025 00:38

rework commit change list

e38b675

add finding base for unknown starting point add bypass for pushing

add global toggle

b01faea

update licenses

313c139

disable gitleaks logger

fc9213f

add sourcing config from default branch

9f8d729

add feature flag to app.ini example

5a568ff

github-actions bot added the docs-update-needed The document needs to be updated synchronously label Apr 20, 2025

TheFox0x7 added 2 commits April 21, 2025 19:20

fix new reference scanning

39adb06

base the attempt on a change between default branch and new reference

fix scanning first commit in default branch

97e74df

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add push secrets detector #34226

add push secrets detector #34226

TheFox0x7 commented Apr 16, 2025 •

edited

Loading

lunny commented Apr 16, 2025

TheFox0x7 commented Apr 17, 2025

TheFox0x7 commented Apr 17, 2025

wxiaoguang commented Apr 17, 2025 •

edited

Loading

TheFox0x7 commented Apr 17, 2025

wxiaoguang commented Apr 17, 2025

TheFox0x7 commented Apr 17, 2025

wxiaoguang commented Apr 18, 2025 •

edited

Loading

TheFox0x7 commented Apr 19, 2025 •

edited

Loading

TheFox0x7 commented Apr 20, 2025

lunny commented Apr 20, 2025

TheFox0x7 commented Apr 21, 2025

add push secrets detector #34226

Are you sure you want to change the base?

add push secrets detector #34226

Conversation

TheFox0x7 commented Apr 16, 2025 • edited Loading

lunny commented Apr 16, 2025

TheFox0x7 commented Apr 17, 2025

TheFox0x7 commented Apr 17, 2025

wxiaoguang commented Apr 17, 2025 • edited Loading

TheFox0x7 commented Apr 17, 2025

wxiaoguang commented Apr 17, 2025

TheFox0x7 commented Apr 17, 2025

wxiaoguang commented Apr 18, 2025 • edited Loading

TheFox0x7 commented Apr 19, 2025 • edited Loading

TheFox0x7 commented Apr 20, 2025

lunny commented Apr 20, 2025

TheFox0x7 commented Apr 21, 2025

TheFox0x7 commented Apr 16, 2025 •

edited

Loading

wxiaoguang commented Apr 17, 2025 •

edited

Loading

wxiaoguang commented Apr 18, 2025 •

edited

Loading

TheFox0x7 commented Apr 19, 2025 •

edited

Loading