Improve speed of `cargo dev fmt` #14862

Jarcho · 2025-05-21T14:23:41Z

This stops using cargo fmt and instead calls rustfmt directly with the list of all files.

All cargo fmt does is find the crate roots and passes the edition from cargo.toml. Since the edition is set in rustfmt.toml for the test files and we're already iterating through all the files this is not needed.

--skip-children is used since we already pass all the files, so the automatic detection isn't buying us anything other than running slower.

~~Second commit~~ (part of the first commit now) is a change to only use the ignore option in rustfmt.toml rather than having a way in cargo dev fmt to ignore files.

r? @samueltardieu

changelog: none

samueltardieu

I like the speed improvement.

I am a bit concerned that some of the Clippy maintainers or some of the users might use stray Rust files while working on Clippy, or test directories under the Clippy root. I wonder if it we should be more conservative and start from known roots.

samueltardieu · 2025-05-21T16:07:48Z

clippy_dev/src/fmt.rs

+                .as_os_str()
+                .as_encoded_bytes()
+                .get(2..)
+                .is_none_or(|x| x != "target".as_bytes() && x != ".git".as_bytes())


At least one Clippy maintainer uses .jj which will have the same issue as .git:

Suggested change

.is_none_or(|x| x != "target".as_bytes() && x != ".git".as_bytes())

.is_none_or(|x| !matches!(x, b"target" | b".git" | b".jj"))

Not really a problem, but I agree it should be skipped as well.

If this is not a colocated git repo, all the git files will be under .jj/repo/store/git, which would not be filtered away, hence the importance of including it.

Ah. That is indeed a problem.

samueltardieu · 2025-05-21T16:08:08Z

clippy_dev/src/fmt.rs

+    )
+    .map(|mut cmd| match cmd.spawn() {
+        Ok(x) => x,
+        Err(ref e) => panic_action(&e, ErrAction::Run, "rustfmt".as_ref()),


It would be more ergonomic to have panic_action() take a impl AsRef<Path> (and call .as_ref() in it) and to use &str directly:

Suggested change

Err(ref e) => panic_action(&e, ErrAction::Run, "rustfmt".as_ref()),

Err(e) => panic_action(&e, ErrAction::Run, "rustfmt"),

samueltardieu · 2025-05-21T16:09:43Z

clippy_dev/src/fmt.rs

@@ -353,25 +330,17 @@ fn run_rustfmt(clippy: &ClippyInfo, update_mode: UpdateMode) {
                    {
                        eprintln!("{s}");
                    }
-                    panic_action(&e, ErrAction::Run, name.as_ref());
+                    panic_action(&e, ErrAction::Run, "rustfmt".as_ref());


Suggested change

panic_action(&e, ErrAction::Run, "rustfmt".as_ref());

panic_action(&e, ErrAction::Run, "rustfmt");

samueltardieu · 2025-05-21T16:09:59Z

clippy_dev/src/fmt.rs

                },
            },
-            Err(ref e) => panic_action(e, ErrAction::Run, name.as_ref()),
+            Err(ref e) => panic_action(e, ErrAction::Run, "rustfmt".as_ref()),


Suggested change

Err(ref e) => panic_action(e, ErrAction::Run, "rustfmt".as_ref()),

Err(ref e) => panic_action(e, ErrAction::Run, "rustfmt"),

samueltardieu · 2025-05-21T16:11:01Z

clippy_dev/src/utils.rs

+            for arg in self.args.by_ref().take(self.batch_size) {
+                cmd.arg(arg.as_ref());
+                cmd_len += arg.as_ref().len();
+                cmd_len += 8;


Why 8? Maybe add a small comment ("arbitrary safety gap"?)

Eight is for unix based things since the pointers in argv are part of the limit IIRC. Doesn't really matter since Windows has a way lower limit.

Windows is unfortunately not consistent here since it's one space plus whatever is needed to escape spaces within arguments. I can add a comment about all this mess.

samueltardieu · 2025-05-21T16:11:26Z

clippy_dev/src/utils.rs

+                cmd_len += arg.as_ref().len();
+                cmd_len += 8;
+
+                // Windows has a command length limit of 32767; stop before we hit that.


Suggested change

// Windows has a command length limit of 32767; stop before we hit that.

// Windows has a command length limit of 32767; stop before we hit that.

// Unix-like systems typically have at least 256k bytes (`getconf ARG_MAX`)

Unix is actually more complicated than that, but it will almost always be larger than Windows (except cygwin because Windows).

samueltardieu · 2025-05-21T16:12:55Z

clippy_dev/src/utils.rs

+
+                // Windows has a command length limit of 32767; stop before we hit that.
+                if cmd_len > 30000 {
+                    self.batch_size = (self.args.len() / self.thread_count).max(32);


You should probably use .div_ceil() here (see below), and import min_batch_size into the struct instead of using the hardcoded 32.

div_ceil is correct.

samueltardieu · 2025-05-21T16:14:10Z

clippy_dev/src/utils.rs

-    if len != 0 {
-        run_cmd(&mut cmd);
+    let thread_count = thread::available_parallelism().map_or(1, NonZero::get);
+    let batch_size = (args.len() / thread_count).max(min_batch_size);


I think you want to use div_ceil() for this kind of operations, you want you have 2 threads and 65 files for a min batch size of 32, you would want batches of 33+32, not 32+32+1.

Suggested change

let batch_size = (args.len() / thread_count).max(min_batch_size);

let batch_size = args.len().div_ceil(thread_count).max(min_batch_size);

Jarcho · 2025-05-21T16:42:37Z

I am a bit concerned that some of the Clippy maintainers or some of the users might use stray Rust files while working on Clippy, or test directories under the Clippy root. I wonder if it we should be more conservative and start from known roots.

This is only a problem if people keep extra .rs files within the clippy directory not via symlinks, run dev fmt, and formatting them is a problem. I don't think that's actually a thing to worry about.

Jarcho · 2025-05-21T19:11:16Z

All dot files and target directories are skipped now. Should be good enough to avoid any issues.

samueltardieu · 2025-05-21T20:24:17Z

cargo dev fmt is part of cargo test. If people routinely run cargo test -F internal before submitting a PR they will get failures if they keep things in their git repository.

I seem to remember having seen recently in a Zulip thread people saying that they have some t.rs or similar in their working directory when they work on Clippy.

I wonder if matching the beginning of the walkdir path against a list of expected prefixes would have any real impact performance (using, e.g., a radix-tree, or even an anchored Aho-Corasick but this might be overkill).

samueltardieu · 2025-05-21T20:30:40Z

After thinking more about this, the worst that can happen is that people will get their temporary test code reformatted. That may be for the best.

rustbot assigned samueltardieu May 21, 2025

rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label May 21, 2025

Jarcho force-pushed the fmt_speed branch from 1353517 to 984af5b Compare May 21, 2025 14:44

samueltardieu reviewed May 21, 2025

View reviewed changes

Jarcho force-pushed the fmt_speed branch from 984af5b to 6622d0f Compare May 21, 2025 18:58

Jarcho added 2 commits May 21, 2025 15:09

Improve speed of cargo dev fmt.

544c300

Add expect_action helper to clippy_dev

106ac79

Jarcho force-pushed the fmt_speed branch from 6622d0f to 106ac79 Compare May 21, 2025 19:09

samueltardieu added this pull request to the merge queue May 21, 2025

Merged via the queue into rust-lang:master with commit 3da4c10 May 21, 2025
13 checks passed

	.is_none_or(\|x\| x != "target".as_bytes() && x != ".git".as_bytes())
	.is_none_or(\|x\| !matches!(x, b"target" \| b".git" \| b".jj"))

	Err(ref e) => panic_action(&e, ErrAction::Run, "rustfmt".as_ref()),
	Err(e) => panic_action(&e, ErrAction::Run, "rustfmt"),

	panic_action(&e, ErrAction::Run, "rustfmt".as_ref());
	panic_action(&e, ErrAction::Run, "rustfmt");

	Err(ref e) => panic_action(e, ErrAction::Run, "rustfmt".as_ref()),
	Err(ref e) => panic_action(e, ErrAction::Run, "rustfmt"),

	// Windows has a command length limit of 32767; stop before we hit that.
	// Windows has a command length limit of 32767; stop before we hit that.
	// Unix-like systems typically have at least 256k bytes (`getconf ARG_MAX`)

	let batch_size = (args.len() / thread_count).max(min_batch_size);
	let batch_size = args.len().div_ceil(thread_count).max(min_batch_size);

Improve speed of cargo dev fmt #14862

Improve speed of cargo dev fmt #14862

Uh oh!

Conversation

Jarcho commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

samueltardieu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jarcho May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jarcho commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jarcho commented May 21, 2025

Uh oh!

samueltardieu commented May 21, 2025

Uh oh!

samueltardieu commented May 21, 2025

Uh oh!

Uh oh!

Uh oh!

Improve speed of `cargo dev fmt` #14862

Improve speed of `cargo dev fmt` #14862

Jarcho commented May 21, 2025 •

edited

Loading

Jarcho May 21, 2025 •

edited

Loading

Jarcho commented May 21, 2025 •

edited

Loading