Skip to content

Improve speed of cargo dev fmt #14862

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 21, 2025
Merged

Conversation

Jarcho
Copy link
Contributor

@Jarcho Jarcho commented May 21, 2025

This stops using cargo fmt and instead calls rustfmt directly with the list of all files.

All cargo fmt does is find the crate roots and passes the edition from cargo.toml. Since the edition is set in rustfmt.toml for the test files and we're already iterating through all the files this is not needed.

--skip-children is used since we already pass all the files, so the automatic detection isn't buying us anything other than running slower.

Second commit (part of the first commit now) is a change to only use the ignore option in rustfmt.toml rather than having a way in cargo dev fmt to ignore files.

r? @samueltardieu

changelog: none

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label May 21, 2025
Copy link
Contributor

@samueltardieu samueltardieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the speed improvement.

I am a bit concerned that some of the Clippy maintainers or some of the users might use stray Rust files while working on Clippy, or test directories under the Clippy root. I wonder if it we should be more conservative and start from known roots.

.as_os_str()
.as_encoded_bytes()
.get(2..)
.is_none_or(|x| x != "target".as_bytes() && x != ".git".as_bytes())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least one Clippy maintainer uses .jj which will have the same issue as .git:

Suggested change
.is_none_or(|x| x != "target".as_bytes() && x != ".git".as_bytes())
.is_none_or(|x| !matches!(x, b"target" | b".git" | b".jj"))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really a problem, but I agree it should be skipped as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is not a colocated git repo, all the git files will be under .jj/repo/store/git, which would not be filtered away, hence the importance of including it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. That is indeed a problem.

)
.map(|mut cmd| match cmd.spawn() {
Ok(x) => x,
Err(ref e) => panic_action(&e, ErrAction::Run, "rustfmt".as_ref()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more ergonomic to have panic_action() take a impl AsRef<Path> (and call .as_ref() in it) and to use &str directly:

Suggested change
Err(ref e) => panic_action(&e, ErrAction::Run, "rustfmt".as_ref()),
Err(e) => panic_action(&e, ErrAction::Run, "rustfmt"),

@@ -353,25 +330,17 @@ fn run_rustfmt(clippy: &ClippyInfo, update_mode: UpdateMode) {
{
eprintln!("{s}");
}
panic_action(&e, ErrAction::Run, name.as_ref());
panic_action(&e, ErrAction::Run, "rustfmt".as_ref());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
panic_action(&e, ErrAction::Run, "rustfmt".as_ref());
panic_action(&e, ErrAction::Run, "rustfmt");

},
},
Err(ref e) => panic_action(e, ErrAction::Run, name.as_ref()),
Err(ref e) => panic_action(e, ErrAction::Run, "rustfmt".as_ref()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Err(ref e) => panic_action(e, ErrAction::Run, "rustfmt".as_ref()),
Err(ref e) => panic_action(e, ErrAction::Run, "rustfmt"),

for arg in self.args.by_ref().take(self.batch_size) {
cmd.arg(arg.as_ref());
cmd_len += arg.as_ref().len();
cmd_len += 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 8? Maybe add a small comment ("arbitrary safety gap"?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eight is for unix based things since the pointers in argv are part of the limit IIRC. Doesn't really matter since Windows has a way lower limit.

Windows is unfortunately not consistent here since it's one space plus whatever is needed to escape spaces within arguments. I can add a comment about all this mess.

cmd_len += arg.as_ref().len();
cmd_len += 8;

// Windows has a command length limit of 32767; stop before we hit that.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Windows has a command length limit of 32767; stop before we hit that.
// Windows has a command length limit of 32767; stop before we hit that.
// Unix-like systems typically have at least 256k bytes (`getconf ARG_MAX`)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unix is actually more complicated than that, but it will almost always be larger than Windows (except cygwin because Windows).


// Windows has a command length limit of 32767; stop before we hit that.
if cmd_len > 30000 {
self.batch_size = (self.args.len() / self.thread_count).max(32);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably use .div_ceil() here (see below), and import min_batch_size into the struct instead of using the hardcoded 32.

Copy link
Contributor Author

@Jarcho Jarcho May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

div_ceil is correct.

if len != 0 {
run_cmd(&mut cmd);
let thread_count = thread::available_parallelism().map_or(1, NonZero::get);
let batch_size = (args.len() / thread_count).max(min_batch_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want to use div_ceil() for this kind of operations, you want you have 2 threads and 65 files for a min batch size of 32, you would want batches of 33+32, not 32+32+1.

Suggested change
let batch_size = (args.len() / thread_count).max(min_batch_size);
let batch_size = args.len().div_ceil(thread_count).max(min_batch_size);

@Jarcho
Copy link
Contributor Author

Jarcho commented May 21, 2025

I am a bit concerned that some of the Clippy maintainers or some of the users might use stray Rust files while working on Clippy, or test directories under the Clippy root. I wonder if it we should be more conservative and start from known roots.

This is only a problem if people keep extra .rs files within the clippy directory not via symlinks, run dev fmt, and formatting them is a problem. I don't think that's actually a thing to worry about.

@Jarcho
Copy link
Contributor Author

Jarcho commented May 21, 2025

All dot files and target directories are skipped now. Should be good enough to avoid any issues.

@samueltardieu
Copy link
Contributor

cargo dev fmt is part of cargo test. If people routinely run cargo test -F internal before submitting a PR they will get failures if they keep things in their git repository.

I seem to remember having seen recently in a Zulip thread people saying that they have some t.rs or similar in their working directory when they work on Clippy.

I wonder if matching the beginning of the walkdir path against a list of expected prefixes would have any real impact performance (using, e.g., a radix-tree, or even an anchored Aho-Corasick but this might be overkill).

@samueltardieu
Copy link
Contributor

After thinking more about this, the worst that can happen is that people will get their temporary test code reformatted. That may be for the best.

@samueltardieu samueltardieu added this pull request to the merge queue May 21, 2025
Merged via the queue into rust-lang:master with commit 3da4c10 May 21, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants