Skip to content

add push secrets detector #34226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

TheFox0x7
Copy link
Contributor

@TheFox0x7 TheFox0x7 commented Apr 16, 2025

adds a step to pre-commit hook which scans commit diffs for secrets


Very WIP/PoC, more looking for a feedback on the approach.
TODO:

  • Settings flag as the feature has to be opt in until deemed stable enough
  • Per repo config flag enable the feature (again opt-in by default even if the instance enables it with later switch to opt out)
  • Bypass flag for repo admins (possibly some other role too? How to handle it in UI?)
  • Repo config taken into account. Probably taken from .gitleaks.toml at repo root but I think it might be interesting to look at the way gerrits repo config is handled as well. Though this is a topic for separate PR and proposal.
  • UI adjustments as rejection message looks bad currently
  • Logs on web push repeat the rejection message few times and with how large those might end up here it's an issue
  • Adjust/Disable gitleaks logging to not emit it's own things
  • Binary size? it's +1.7MB with some maybe reusable dependencies. Maybe it can be trimmed down (possibly in a follow up)
  • Upstream changes in fork? (conflicts with trimming it down)

use git diff based one
@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Apr 16, 2025
@github-actions github-actions bot added modifies/go Pull requests that update Go code modifies/dependencies labels Apr 16, 2025
fix linting
@lunny
Copy link
Member

lunny commented Apr 16, 2025

I think this is the right place to detect. But the detection may take much time. It should have a queue to handle them and not return the detection result immediately.

@TheFox0x7
Copy link
Contributor Author

The entire point of the feature is push rejection if the change contains secrets and if that gets pushed to queue to do later it nullfies the point of it being a pre-recieve hook.
Unless I'm missing something and you have an idea on how to queue it at the start and wait for completion before ACK/NACking push?

@TheFox0x7
Copy link
Contributor Author

@wxiaoguang Mind also taking a look when you have a moment? You usually have good ideas and insights about things that would be better to do before a change/feature and a more native way to do it.

btw. I'm aware of the fault in the diffing when refs are added removed - the end behavior should be:

  1. Scan diff (or individual commits - tbd) if old and new Ids aren't zero
  2. Scan change if old ID is zero - it's still an added change that needs to be processed
  3. Skip scanning if new ID is zero.

I'm in progress of figuring out how to do 2, without scanning the entire history if not required.

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Apr 17, 2025

What if I'd like to store secret-like data in git repo? For example: I have a repo that contains encrypted passwords (of course, the master key is managed separately, not in the repo).

@TheFox0x7
Copy link
Contributor Author

That's why it would be off by default which is in TODOs. If you'd like to keep it on and store secrets, you'd have to add them to allowlist in config. I don't think gitleaks will classify encrypted passwords as keys but that's a good case to test for.

@wxiaoguang
Copy link
Contributor

Oh I see the "repo config" TODO now. Maybe 2 approaches:

  1. Use database repo config: then the git hook command need to request the repo config by internal API
  2. Use git repo config (something like .gitleaks.toml)

I didn't know how other forges do so I can't tell which one is better .....


btw. I'm aware of the fault in the diffing when refs are added removed - the end behavior should be:

TBH I am not quite familiar with the "git hook" related code, can't really tell the differences between these cases (how the result would be affected) .... or maybe make it configurable to let repo admin decide? Or the same question as above: how do other forges do?

@TheFox0x7
Copy link
Contributor Author

One example of similar hook I found is this: https://github.com/github/platform-samples/blob/master/pre-receive-hooks/block_confidentials.sh

As to how others do it:
Gitlab approach is to have a pre-recieve hook which runs on diffs and unless skip option is set, blocks the push
https://docs.gitlab.com/user/application_security/secret_detection/secret_push_protection/#secret-push-protection-workflow
It also logs it in audit (which would be TODO once there's an audit system)

Github doesn't specify much but notes that it logs bypasses in security tab https://docs.github.com/en/code-security/secret-scanning/introduction/about-push-protection
Also scans more than commit pushes (issues, bodies, etc) but that's out of scope for this PR.

As for more details on how they allow configs:
https://docs.gitlab.com/user/application_security/secret_detection/exclusions/
https://docs.github.com/en/code-security/secret-scanning/using-advanced-secret-scanning-and-push-protection-features/custom-patterns/defining-custom-patterns-for-secret-scanning#about-custom-patterns-for-secret-scanning

Mostly via web UI updates. I'd argue that making the ruleset part of the repository is better as it has benefits of history tracking, messages why was the rule added and usual git benefits.

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Apr 18, 2025

Mostly via web UI updates. I'd argue that making the ruleset part of the repository is better as it has benefits of history tracking, messages why was the rule added and usual git benefits.

I am neutral about these, while I think I could understand why "mostly via web UI updates": it is more friendly to end users. Not everyone is git expert or could commit the ruleset file into a git repo easily and debug it.

(Hmm, TBH, I have a little more preference for "web UI update")

@TheFox0x7
Copy link
Contributor Author

TheFox0x7 commented Apr 19, 2025

Honestly one doesn't exclude the other. This is a git server after all so we can have the config displayed in UI but stored in repo (gerrit like).

More on the current state - I'll probably end up needing to write a custom parser for the unless someone has a better idea?
I don't think gitdiff allows me to parse format-patch output which to my knowledge is the simplest way to get a commit by commit diff needed to scan each commit individually - that's needed so things like: commit 1 adds a secret, commit 2 removes it, will still fail the push.
I was looking at gitaly api for clues but so far I don't think they have something like this either.

Never mind - I was printing newCommitId instead of the commit from finding and assumed it was an issue. It's not.

@github-actions github-actions bot added the docs-update-needed The document needs to be updated synchronously label Apr 20, 2025
@TheFox0x7
Copy link
Contributor Author

Okay I think the PR by now it's out of it's PoC phase and into "good enough for experimental releases" so if someone wants to try and mess with the feature please do.

I think the approach of blocking until the scan completes is correct. I've pushed linux 0af2f6be1b42..6fea5fabd332 827 commits, 860 files changed, 10736 insertions(+), 5738 deletions(-) to my local test repo and it took 11 seconds in total to process, which I think is acceptable for push with scan.

Apart of UI issues, error handling in the function and repo level feature flag, is there anything I'm missing which should be added/addressed in v1 of this feature?

There are also tests to make but currently I don't have a clue how to approach that.


I was thinking about redoing hooks more in gitaly direction (grpc streaming stdout,stderr) but that's an idea for later.

@lunny
Copy link
Member

lunny commented Apr 20, 2025

Okay I think the PR by now it's out of it's PoC phase and into "good enough for experimental releases" so if someone wants to try and mess with the feature please do.

I think the approach of blocking until the scan completes is correct. I've pushed linux 0af2f6be1b42..6fea5fabd332 827 commits, 860 files changed, 10736 insertions(+), 5738 deletions(-) to my local test repo and it took 11 seconds in total to process, which I think is acceptable for push with scan.

Apart of UI issues, error handling in the function and repo level feature flag, is there anything I'm missing which should be added/addressed in v1 of this feature?

There are also tests to make but currently I don't have a clue how to approach that.

I was thinking about redoing hooks more in gitaly direction (grpc streaming stdout,stderr) but that's an idea for later.

How much overhead does token scanning add during a git push?

base the attempt on a change between default branch and new reference
@TheFox0x7
Copy link
Contributor Author

I've tested pushing linux master to new repo (so my worst case scenario - a lot of commits which are not in default branch) - found two bugs, one with gitea UI other in my implementation.
In the end - pushing reached 1.1GB of memory use (+ 1.1GB in subprocess but I'm not sure how it's counted). It also timed out with the feature on so that's not ideal.

One thing that could be optimized is to avoid storing the output and parsing it on the fly though I had no luck getting that working so far. Or storing it in a temp file and letting git write to it while parser reads it in the background (how well that would work I have no idea).

The diff parser runs in a go routine and writes to a channel and gitleaks consumes that channel so ideally it would be about even but the issue is git show just takes time and that's not something I can mitigate in any way unless making a bypass on large amounts. Gitlab does have a 1MB limit, but it's on file level (or file diff if that's enabled), not a commit level.
Alternatively we could have a limit of scannable revs and require user to submit it batches but that's less user friendly.

Linux repo also might not be the best benchmark for this due to the size, but I think it's worth trying in the unlikely event this will be somehow used during migrations.

Ideas welcome because I'm currently out - on io.Pipe and nio.Pipe it hangs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-update-needed The document needs to be updated synchronously lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. modifies/dependencies modifies/go Pull requests that update Go code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants