[wip] adding a part for pandas.df to hash_value #575

djarecka · 2022-09-02T16:20:37Z

Types of changes

Bug fix (non-breaking change which fixes an issue)

Summary

adding a section to hash_value that is for pandas.DF: changing df to dict before calculating the hash value.

@yibeichan - could you please see if this small change fixes some of your issues? it did fix the issue that I had with running my workflow within the jupyter notebook...

Checklist

I have added tests to cover my changes (if necessary)
I have updated documentation (if necessary)

codecov · 2022-09-02T16:27:06Z

Codecov Report

Attention: Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.

Project coverage is 77.04%. Comparing base (219c721) to head (875fd17).
Report is 1306 commits behind head on main.

Files with missing lines	Patch %	Lines
pydra/engine/helpers.py	0.00%	1 Missing and 1 partial ⚠️

❌ Your patch status has failed because the patch coverage (0.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #575      +/-   ##
==========================================
- Coverage   77.06%   77.04%   -0.02%     
==========================================
  Files          20       20              
  Lines        4316     4322       +6     
  Branches     1213     1217       +4     
==========================================
+ Hits         3326     3330       +4     
- Misses        802      803       +1     
- Partials      188      189       +1

Flag	Coverage Δ
unittests	`76.95% <0.00%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

yibeichan · 2022-09-02T16:36:18Z

@djarecka thank you! I'll try it today and let you now

yibeichan · 2022-09-03T03:17:03Z

@djarecka this change is specific for a list of pandas.Dataframe as inputs, correct?
my notebook works fine with or without this change because I'm using python 3.7? If this can solve your problem, then it probably can solve potential problems in higher python as well.
However, I exported my notebook as .py and got errors (not related to your change here)

File "/Users/yibeichen/miniconda3/envs/pydra/lib/python3.7/asyncio/futures.py", line 181, in result
raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

this error happens after secondlevel_estimation when starting statistical testing.
Only .py script has this error, notebook works fine.

tclose · 2025-04-02T01:08:34Z

I think this should be covered by the sequence hashing functionality shouldn't it @effigies ?

effigies · 2025-04-02T15:25:07Z

Seems like it's worth just testing.

tclose · 2025-04-03T08:07:15Z

Doesn't work, different data frames evaluate to the same hash... Looks like we need to do some more work on the backstop object hash.

At the moment we are a bit stuck no matter which way we err, if the same value produces a different hash you run the risk of workflows getting stuck halfway through if the downstream node can identify the upstream node (bad). However, if different values map onto the same hash you run the risk of producing the wrong results (worse). If #784 is implemented we could avoid the workflows getting stuck and then we could just default to cloudpickle to guarantee that different values map onto different hashes at least, and just throw a warning that the cache is likely to be missed in subsequent runs

tclose · 2025-05-08T20:21:44Z

closing in favour of #762

adding a part for pandas.df to hash_value

875fd17

tclose added this to Pydra Roadmap Apr 29, 2025

github-project-automation bot moved this to In progress in Pydra Roadmap Apr 29, 2025

tclose moved this from In progress to Triage in Pydra Roadmap Apr 29, 2025

tclose moved this from Triage to Incomplete PRs in Pydra Roadmap Apr 29, 2025

tclose moved this from Incomplete PRs to v1.0 in Pydra Roadmap Apr 29, 2025

tclose moved this from v1.0 to Incomplete PRs in Pydra Roadmap Apr 29, 2025

tclose closed this May 8, 2025

github-project-automation bot moved this from Bugs & Incomplete PRs to Done in Pydra Roadmap May 8, 2025

tclose removed this from Pydra Roadmap Jun 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[wip] adding a part for pandas.df to hash_value #575

[wip] adding a part for pandas.df to hash_value #575

Uh oh!

djarecka commented Sep 2, 2022 •

edited

Loading

Uh oh!

codecov bot commented Sep 2, 2022 •

edited

Loading

Uh oh!

yibeichan commented Sep 2, 2022

Uh oh!

yibeichan commented Sep 3, 2022

Uh oh!

tclose commented Apr 2, 2025

Uh oh!

effigies commented Apr 2, 2025

Uh oh!

tclose commented Apr 3, 2025

Uh oh!

tclose commented May 8, 2025

Uh oh!

Uh oh!

[wip] adding a part for pandas.df to hash_value #575

[wip] adding a part for pandas.df to hash_value #575

Uh oh!

Conversation

djarecka commented Sep 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Types of changes

Summary

Checklist

Uh oh!

codecov bot commented Sep 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

yibeichan commented Sep 2, 2022

Uh oh!

yibeichan commented Sep 3, 2022

Uh oh!

tclose commented Apr 2, 2025

Uh oh!

effigies commented Apr 2, 2025

Uh oh!

tclose commented Apr 3, 2025

Uh oh!

tclose commented May 8, 2025

Uh oh!

Uh oh!

djarecka commented Sep 2, 2022 •

edited

Loading

codecov bot commented Sep 2, 2022 •

edited

Loading