Skip to content

fix(fr32): correctly calculate the todo size under low core counts #12884

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: fix/unpadreader-panics
Choose a base branch
from

Conversation

tediou5
Copy link
Contributor

@tediou5 tediou5 commented Feb 10, 2025

Related Issues

Fixes #9324
Based on #12491.

Proposed Changes

Currently, the unpadReader's work size is calculated using MTTresh * mtChunkCount(sz). When the core count is low, this may cause the todo length to exceed len(work).

In the current implementation, the length of out is actually expanded automatically by Go's underlying mechanisms, making adjustments quite troublesome.

// io.go
func ReadAll(r Reader) ([]byte, error) {
  b := make([]byte, 0, 512)
  for {
    n, err := r.Read(b[len(b):cap(b)])
    b = b[:len(b)+n]
    if err != nil { /* ... */ }

    if len(b) == cap(b) {
      // Add more capacity (let append pick how much).
      b = append(b, 0)[:len(b)] // <-- here
    }
  }
}

There are two feasible solutions:

  1. Increase the minimum mtChunkCount to 16.
  2. Ensure that todo does not exceed the size of work.

I'm not entirely sure if other conditions could also lead to similar issues, so for now, I've chosen the second approach. I added a check when calculating todo to prevent overflow:

todo := min(abi.PaddedPieceSize(outTwoPow), abi.PaddedPieceSize(len(r.work)))

Additionally I’ve also fixed the CI.

Additional Info

Checklist

Before you mark the PR ready for review, please make sure that:

@rjan90 rjan90 added the skip/changelog This change does not require CHANGELOG.md update label Feb 10, 2025
@rjan90 rjan90 requested a review from magik6k February 10, 2025 07:46
@BigLep BigLep requested a review from Copilot May 27, 2025 06:25
@BigLep
Copy link
Member

BigLep commented May 27, 2025

I'm not up on the specifics of this PR, but I know it's been open for a long time. What is the owrst case that could happen if this code is "wrong"? I'm trying to gage if we'd be better off to just merge than keep it open.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR ensures the unpadReader never schedules more work than its buffer can hold under low core counts by clamping the todo size, and it bolsters the test suite with reliability checks and a new edge-case test.

  • Clamp the todo chunk size in readInner to the work buffer length
  • Add TestUnpadReaderBufWithSmallWorkBuf to verify behavior with very small buffers
  • Strengthen existing tests by validating the number of bytes read from rand.Read

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
storage/sealer/fr32/readers.go Clamp todo to len(r.work) in readInner to prevent overflow
storage/sealer/fr32/readers_test.go Add new buffer-size edge-case test and tighten randomness reads in tests
Comments suppressed due to low confidence (1)

storage/sealer/fr32/readers_test.go:40

  • [nitpick] The test name is a bit verbose; renaming it to TestUnpadReaderWithSmallWorkBuf would align better with existing test naming patterns and improve readability.
func TestUnpadReaderBufWithSmallWorkBuf(t *testing.T) {

Co-authored-by: Copilot <[email protected]>
@tediou5
Copy link
Contributor Author

tediou5 commented May 28, 2025

I'm not up on the specifics of this PR, but I know it's been open for a long time. What is the owrst case that could happen if this code is "wrong"? I'm trying to gage if we'd be better off to just merge than keep it open.

There's less data to read each time, which affects performance, I guess.

@rvagg
Copy link
Member

rvagg commented May 28, 2025

I've found it hard to context switch deep enough to grok either this or the previous attempt to edit this code so it's hard for me to weigh in.

There's less data to read each time, which affects performance, I guess.

This seems right but I don't have time to validate it. I think if we just go ahead and roll this out, we'd find out about problems with it if we started hearing about CommP mismatches. We might just have to 🤞 and go ahead unless we have someone else who has the brainspace to drop in here and grok what this code is doing and give a proper 👍 . Maybe Kuba could do that when he's back from his break?

@BigLep BigLep requested a review from Kubuxu May 28, 2025 04:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip/changelog This change does not require CHANGELOG.md update
Projects
Status: 🔎 Awaiting Review
Development

Successfully merging this pull request may close these issues.

4 participants