You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
Previously, we only caught specific CUDA OOM errors, but benchmarks can fail in
other ways too. Let's make it more robust by catching and logging all
exceptions.
While there is already code to log exception messages, it often leads to
malformed CSVs since there was no quoting going on. We should use Python's csv
module to avoid this issue.
Additionally, the previous logic would record the error message in each metric column
of the failed benchmark. This was redundant, so I've changed it to emit the message
only once.
Finally, since Python's csv writer writes directly to a file, instead of
creating a string first, the previous csv file naming convention using the hash
of its contents no longer applies. Instead I've used NamedTemporaryFile to get
a unique file name.
Reviewed By: chenyang78
Differential Revision: D57785120
fbshipit-source-id: 73c76bba7661b60a7357aaba3d5b9659b533479e
0 commit comments