Skip to content

[wip] adding a part for pandas.df to hash_value #575

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

djarecka
Copy link
Collaborator

@djarecka djarecka commented Sep 2, 2022

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

Summary

adding a section to hash_value that is for pandas.DF: changing df to dict before calculating the hash value.

@yibeichan - could you please see if this small change fixes some of your issues? it did fix the issue that I had with running my workflow within the jupyter notebook...

Checklist

  • I have added tests to cover my changes (if necessary)
  • I have updated documentation (if necessary)

@codecov
Copy link

codecov bot commented Sep 2, 2022

Codecov Report

Attention: Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.

Project coverage is 77.04%. Comparing base (219c721) to head (875fd17).
Report is 1306 commits behind head on main.

Files with missing lines Patch % Lines
pydra/engine/helpers.py 0.00% 1 Missing and 1 partial ⚠️

❌ Your patch status has failed because the patch coverage (0.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #575      +/-   ##
==========================================
- Coverage   77.06%   77.04%   -0.02%     
==========================================
  Files          20       20              
  Lines        4316     4322       +6     
  Branches     1213     1217       +4     
==========================================
+ Hits         3326     3330       +4     
- Misses        802      803       +1     
- Partials      188      189       +1     
Flag Coverage Δ
unittests 76.95% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@yibeichan
Copy link
Collaborator

@djarecka thank you! I'll try it today and let you now

@yibeichan
Copy link
Collaborator

@djarecka this change is specific for a list of pandas.Dataframe as inputs, correct?
my notebook works fine with or without this change because I'm using python 3.7? If this can solve your problem, then it probably can solve potential problems in higher python as well.
However, I exported my notebook as .py and got errors (not related to your change here)

File "/Users/yibeichen/miniconda3/envs/pydra/lib/python3.7/asyncio/futures.py", line 181, in result
raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

this error happens after secondlevel_estimation when starting statistical testing.
Only .py script has this error, notebook works fine.

@tclose
Copy link
Contributor

tclose commented Apr 2, 2025

I think this should be covered by the sequence hashing functionality shouldn't it @effigies ?

@effigies
Copy link
Contributor

effigies commented Apr 2, 2025

Seems like it's worth just testing.

@tclose
Copy link
Contributor

tclose commented Apr 3, 2025

Doesn't work, different data frames evaluate to the same hash... Looks like we need to do some more work on the backstop object hash.

At the moment we are a bit stuck no matter which way we err, if the same value produces a different hash you run the risk of workflows getting stuck halfway through if the downstream node can identify the upstream node (bad). However, if different values map onto the same hash you run the risk of producing the wrong results (worse). If #784 is implemented we could avoid the workflows getting stuck and then we could just default to cloudpickle to guarantee that different values map onto different hashes at least, and just throw a warning that the cache is likely to be missed in subsequent runs

@github-project-automation github-project-automation bot moved this to In progress in Pydra Roadmap Apr 29, 2025
@tclose tclose moved this from In progress to Triage in Pydra Roadmap Apr 29, 2025
@tclose tclose moved this from Triage to Incomplete PRs in Pydra Roadmap Apr 29, 2025
@tclose tclose moved this from Incomplete PRs to v1.0 in Pydra Roadmap Apr 29, 2025
@tclose tclose moved this from v1.0 to Incomplete PRs in Pydra Roadmap Apr 29, 2025
@tclose
Copy link
Contributor

tclose commented May 8, 2025

closing in favour of #762

@tclose tclose closed this May 8, 2025
@github-project-automation github-project-automation bot moved this from Bugs & Incomplete PRs to Done in Pydra Roadmap May 8, 2025
@tclose tclose removed this from Pydra Roadmap Jun 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants