-
-
Notifications
You must be signed in to change notification settings - Fork 17
Explore AST based diff tools, diff minimisation, etc #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I would recommend difftastic for this! Actually rspack has already used it for checking diff between its output with that of webpack. |
@HerringtonDarkholme Interesting.. do you know if they did so while suppressing the 'noise' of changed variables? Or was it more just generally to ensure they were doing compatible things. I had a quick google, but didn't seem to turn up anything specific beyond the repo/etc:
Edit: Opened the following issue on Edit 2: And this one on I was also just re-reading through the
|
Though playing with
eg. On a very minimal example, ⇒ git difftool --tool diffsitter HEAD~1 HEAD -- unpacked/_next/static/\[buildHash\]/_buildManifest.js
/var/folders/j4/kxtq1cjs1l98xfqncjbsbx1c0000gn/T//git-blob-AOnHKy/_buildManifest.js
/var/folders/j4/kxtq1cjs1l98xfqncjbsbx1c0000gn/T//git-blob-ahyGIo/_buildManifest.js
===================================================================================
80:
---
- "/search": ["static/chunks/pages/search-8da35bbb0f092dc3.js"],
80:
---
+ "/search": ["static/chunks/pages/search-d835393483b5432a.js"],
138:
----
+ "static/chunks/5054-e2060ddbea2abdb7.js"
138:
----
- "static/chunks/5054-8ad3d13d663a6185.js" Vs ⇒ git diff HEAD~1 HEAD -- unpacked/_next/static/\[buildHash\]/_buildManifest.js
diff --git a/unpacked/_next/static/[buildHash]/_buildManifest.js b/unpacked/_next/static/[buildHash]/_buildManifest.js
index 851a8f0..5004cc7 100644
--- a/unpacked/_next/static/[buildHash]/_buildManifest.js
+++ b/unpacked/_next/static/[buildHash]/_buildManifest.js
@@ -78,7 +78,7 @@
"/payments/success-trial": [
"static/chunks/pages/payments/success-trial-84597e34390c1506.js",
],
- "/search": ["static/chunks/pages/search-8da35bbb0f092dc3.js"],
+ "/search": ["static/chunks/pages/search-d835393483b5432a.js"],
"/share/e/[[...shareParams]]": [
"static/chunks/pages/share/e/[[...shareParams]]-899e50f90dac9ff5.js",
],
@@ -136,6 +136,6 @@
"static/chunks/5017-f7c5e142fc7f0516.js",
"static/chunks/3975-37a9301353b29c5d.js",
"static/chunks/3754-ae5dc2fb759ecfc1.js",
- "static/chunks/5054-8ad3d13d663a6185.js"
+ "static/chunks/5054-e2060ddbea2abdb7.js"
)),
self.__BUILD_MANIFEST_CB && self.__BUILD_MANIFEST_CB(); This is the
// Default: `diffsitter dump-default-config`
// See also: https://github.com/afnanenayet/diffsitter/blob/v0.8.1/assets/sample_config.json5
// Colours: `color256`, `black`, `red`, `green`, `yellow`, `blue`, `magenta`, `cyan`, `white`
{
"formatting": {
"default": "unified",
"unified": {
"addition": {
"highlight": "green",
"regular-foreground": "green",
"emphasized-foreground": "white",
"bold": true,
"underline": false,
"prefix": "+ "
},
"deletion": {
"highlight": "red",
"regular-foreground": "red",
"emphasized-foreground": "white",
"bold": true,
"underline": false,
"prefix": "- "
}
},
"json": {
"pretty_print": false
},
"custom": {}
},
"grammar": {
"dylib-overrides": null,
"file-associations": {
"js": "typescript",
"jsx": "tsx"
},
},
"input-processing": {
"split-graphemes": true,
// You can exclude different tree sitter node types - this rule takes precedence over `include_kinds`.
"exclude-kinds": null,
// "exclude-kinds": ["string"],
// You can specifically allow only certain tree sitter node types
"include-kinds": null
// "include-kinds": ["method_definition"],
},
// Specify a fallback command if diffsitter can't parse the given input
// files. This is invoked by diffsitter as:
//
// ```sh
// ${fallback_cmd} ${old} ${new}
// ```
"fallback-cmd": null,
// "fallback-cmd": "diff",
} And this is the # https://github.com/afnanenayet/diffsitter
[difftool "diffsitter"]
cmd = diffsitter "$LOCAL" "$REMOTE"
# https://github.com/afnanenayet/diffsitter
[difftool "diffsitter-debug"]
cmd = diffsitter --debug "$LOCAL" "$REMOTE" Running it with
|
After the above explorations, I ended up taking a different tact and started exploring the 'post processing Initial PoC tests seemed to show some merit, so I hacked them together into a script that can filter an existing
Edit: Newer updates to this Twitter thread continue in this comment below: #3 (comment) |
With the current PoC implementations in the diff minimiser, we're grouping by the diff chunks, and then by added/removed lines within those chunks. Sometimes a section of code will churn and move a large amount of lines from one chunk, to another chunk, which won't get noticed by the minimiser currently. It would be good if we could process these in a similar way to Edit: See also: --
--
|
|
|
From some Twitter chat with @michaelskyba, talking about
Edit (2025-04-10): Decided to see how Claude 3.7 does with looking at
Some more Twitter chat with @michaelskyba, talking about diff minimisation:
|
Not sure if I realised this earlier and forgot, but seems semanticdiff has an online version that allows us to see GitHub commits/PRs/etc (and from a quick peek in DevTools/ Here's an example of the last build I committed to
It seemed to work ok on some small files (that were already not too bad with GitHub), but when I tried to get it to parse some of the bigger chunk files, it generally just gave an error:
{
"type": "error",
"error": {
"type": "timeout_error",
"message": "The provided timeout was reached"
}
} |
From Twitter / X:
|
RE: My 'post processing
Since it's been a while since I looked at this, and the state of LLMs being able to handle large/complex code has come a long way since the start of 2024 (When OpenAI's GPT-4 was... not that great for it), I figured I'd ask a few different LLMs to evaluate the diff minimiser code with this prompt, and attaching the following two files:
Here are the results for that single starter prompt, with no deeper follow up questions:
|
Some more updates, from a private chat the other day:
And then some further updates from today:
|
Adding on to the old (Feb 4, 2024) Twitter thread summarised above in #3 (comment)
Also this standalone RT just in case the old thread wasn't super visible:
As well as a crosspost to BlueSky: |
See the following blog posts for a deeper dive into the Myers and Patience diff algorithms:
|
Uh oh!
There was an error while loading. Please reload this page.
There can be a lot of 'noise' when diffing minimised bundled code, as the bundler will often change the minified variable names it uses at times between builds (even if the rest of the code hasn't changed)
We can attempt to reduce this by using non-default git diff modes such as
patience
/histogram
/minimal
:Musings
See Also
Additional links to review
The text was updated successfully, but these errors were encountered: