Description
Is your feature request related to a problem? Please describe.
Currently, when diffing minimized bundled JavaScript code, there's a significant amount of 'noise' due to the bundler often changing the minified variable names between builds. This can obscure the real changes and make the diff output less useful for understanding code changes.
Describe the solution you'd like
I propose adding a feature to diffsitter
that ignores changes in variable/function names within minified JavaScript code. This improvement would drastically reduce the noise in diffs of minimized source builds, allowing for a clearer focus on the actual code changes rather than the fluctuation of variable names.
Describe alternatives you've considered
As workarounds, I've experimented with various git diff modes like patience
, histogram
, and minimal
to somewhat reduce the diff size. For instance, changing the diff algorithm can alter the number of lines in the diff output significantly:
⇒ git diff --diff-algorithm=default -- unpacked/_next/static/chunks/pages/_app.js | wc -l
116000
⇒ git diff --diff-algorithm=patience -- unpacked/_next/static/chunks/pages/_app.js | wc -l
35826
Nonetheless, these approaches still capture variable name changes, which can introduce a substantial amount of 'noise', especially in larger files.
Other potential solutions include pre-processing the files to normalize variable/function names or post-processing the diff output to filter out sections where the only changes involve variable/function names.
Additional context
The ideal solution would provide diff output in text format, but the actual diffing would occur at the AST level, ignoring variable/function name changes.
I suspect this might be possible already (at least to some degree) with the following; though I haven't found any good examples/docs to help explain how to use it better yet:
- https://github.com/afnanenayet/diffsitter
-
A tree-sitter based AST difftool to get meaningful semantic diffs
-
You can also filter which tree sitter nodes are considered in the diff through the config file.
- https://github.com/afnanenayet/diffsitter#node-filtering
-
You can filter the nodes that are considered in the diff by setting
include_nodes
orexclude_nodes
in the config file.exclude_nodes
always takes precedence overinclude_nodes
, and the type of a node is the kind of a tree-sitter node.This feature currently only applies to leaf nodes, but we could exclude nodes recursively if there's demand for it.
-
-
I'm going to hopefully play around with it a bit more now, but wanted to capture this while it was fresh in my mind.
See Also
- Explore AST based diff tools, diff minimisation, etc 0xdevalias/chatgpt-source-watch#3
- [feature] AST diff, with option to ignore variable/function name changes ast-grep/ast-grep#901
- Support ignoring differences that only consist of variable/function name changes (eg. within minified JavaScript) Wilfred/difftastic#631