Skip to content

RHOAIENG-16568: [Bug] Unable to download notebook as a PDF from JupyterLab Workbenches #1030

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jun 4, 2025

Conversation

jesuino
Copy link
Contributor

@jesuino jesuino commented May 2, 2025

This is a fix for issue #210

Description

It install TexLive following instructions from TexLive Quick Installation guide. It uses the minimal scheme and other important dependencies are added.

The full installation increased the image size by a few GB. Additionally, the full installation adds more vulnerabilities to the published image. So starting from a minimal installation is a must. To compare images with different installation methods, check the ones I have published in my quay.io, the oldest one is the full TexLive installation

Notice also that the packages required for adding the required dependencies are not available in UBI9 base image, hence why we had to manually install this.

Finally we also disable other option that does not work.

How Has This Been Tested?

Manually tested by creating and importing some notebooks and then exporting it to PDF.

To test:

  • Build locally or pull my image with this PR changes:
podman pull quay.io/wsiqueir/workbench-images:jupyter-minimal-ubi9-python-3.11-pdf_and_disabled_formats_20250502
  • Run the image:
podman run -it -p 8888:8888  quay.io/wsiqueir/workbench-images:jupyter-minimal-ubi9-python-3.11-pdf_and_disabled_formats_20250502
  • Go to the JupyterLab URL, create a notebook and try to export it using the option Save and Export Notebook As.
  • You should be able to export the notebook as PDF with no issue.

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

@openshift-ci openshift-ci bot added the size/s label May 2, 2025
@openshift-ci openshift-ci bot requested review from caponetto and dibryant May 2, 2025 17:59
@atheo89
Copy link
Member

atheo89 commented May 5, 2025

Thanks, William, for the fix! The implementation looks good to me. I ran the minimal image generated from this PR locally, and I was able to successfully generate the PDF 🥳.Aside from Jiri’s comments, I noticed a few extra blank white lines that I’d like to remove. Also, it seems the TexLive library increases the size of the minimal by approximately 700 MB.

/lgtm

@@ -8,6 +8,9 @@ if [ -f "${SCRIPT_DIR}/utils/setup-elyra.sh" ]; then
source ${SCRIPT_DIR}/utils/setup-elyra.sh
fi

export PATH=/usr/local/texlive/2025/bin/x86_64-linux:$PATH
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestion on where to set the path? I am currently doing it in start-notebook.sh

Could be set in Dockerfile itself, but it appears we don't do it anywhere else this way.

Alternatively, leave this and let the test set its PATH anew, like in

f"bash -c 'RSTUDIO_PANDOC=/usr/lib/rstudio-server/bin/quarto/bin/tools/x86_64 Rscript {self.APP_ROOT_HOME}/script.R'",

@openshift-ci openshift-ci bot added size/m and removed size/m labels May 6, 2025
@jesuino
Copy link
Contributor Author

jesuino commented May 6, 2025

Hello @jiridanek I added a test and changed the path configuration to the Dockerfile - please let me know if I am missing something. Thanks!

@openshift-ci openshift-ci bot added size/m and removed size/m labels May 6, 2025
@openshift-ci openshift-ci bot added size/l and removed size/m labels May 6, 2025
@jiridanek jiridanek requested review from jiridanek and harshad16 May 30, 2025 15:51
Copy link
Member

@harshad16 harshad16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Excellent work
tested with following:

separate question:
@jesuino , can you share for knowledge share, if version of pandoc 3.6.4 was consider for some special reason instead of the 3.7 ? so that , these reason can be consider for next upgrades.

@openshift-ci openshift-ci bot added the lgtm label Jun 2, 2025
@jesuino
Copy link
Contributor Author

jesuino commented Jun 2, 2025

@harshad16 thanks for testing it!

can you share for knowledge share, if version of pandoc 3.6.4 was consider for some special reason instead of the 3.7 ? so that , these reason can be consider for next upgrades.

No special reason. I tested with latest pandoc and it works, so I updated the version

@jesuino
Copy link
Contributor Author

jesuino commented Jun 2, 2025

@jiridanek thanks for letting me know about this config. I did add an entry for pandoc, but I don't know if I had to add some specific name, so I used rpm.pandoc.ignore

@openshift-ci openshift-ci bot added size/l and removed size/l labels Jun 2, 2025
@jiridanek
Copy link
Member

jiridanek commented Jun 2, 2025

I don't know if I had to add some specific name, so I used rpm.pandoc.ignore

🤞🏽

I never know this either. I normally just try this again and again until I figure out something that works. If you look what the gha runs, it should not be hard to reproduce on your own laptop, to iterate faster. (We do have open ticket to document, #1086)

@jiridanek jiridanek requested a review from harshad16 June 2, 2025 19:23
@jiridanek
Copy link
Member

jiridanek commented Jun 2, 2025

Did not work. I can check tomorrow morning and try figuring something out, so that you don't waste time on this. Imo what you'd have to do add it to the plain [[ignore]] block, because the reported error does not mention any rpm or nothing

[[ignore]]
error = "ErrNotDynLinked"
files = [
    # executable is not dynamically linked
    "/opt/app-root/bin/py-spy",
    "/usr/local/pandoc/bin/pandoc",
]

edit: that ^^^ worked for me in https://github.com/jiridanek/notebooks/actions/runs/15404297288, trying to push the commit here

@openshift-ci openshift-ci bot added size/l and removed size/l labels Jun 3, 2025
@jesuino
Copy link
Contributor Author

jesuino commented Jun 3, 2025

Hello @jiridanek thanks again for ignoring that warning. I see that only image cuda-jupyter-tensorflow-ubi9-python-3.11 is failing the test. I am checking what could be wrong and I will update this PR soon.

@jesuino
Copy link
Contributor Author

jesuino commented Jun 3, 2025

I could not reproduce this failing test locally. Could this be a flaky test?

To run locally first I build the image:

make cuda-jupyter-tensorflow-ubi9-python-3.11 -e  IMAGE_REGISTRY=quay.io/wsiqueir/workbench-images  -e  RELEASE=test_installer_path_with_pandoc

Then I run the test:

uv run pytest tests/containers/workbenches/jupyterlab/jupyterlab_test.py -k test_pdf_export -m 'not openshift' --image quay.io/wsiqueir/workbench-images:jupyter-datascience-ubi9-python-3.11-test_installer_path_with_pandoc_20250528

@jiridanek
Copy link
Member

jiridanek commented Jun 3, 2025

@jesuino no, it's a flaky image build! If you check build logs, you see

[4/6] STEP 8/12: RUN ./utils/install_pdf_deps.sh
Installing TexLive to allow PDf export from Notebooks
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   284  100   284    0     0    393      0 --:--:-- --:--:-- --:--:--   393

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0

[ ... snip ... ]

  0     0    0     0    0     0      0      0 --:--:--  0:02:16 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:02:16 --:--:--     0
curl: (28) Failed to connect to ctan.math.illinois.edu port [443](https://github.com/opendatahub-io/notebooks/actions/runs/15411712114/job/43365000822?pr=1030#step:24:444): Connection timed out
./utils/install_pdf_deps.sh: line 8: install-tl-unx.tar.gz: No such file or directory
tar: This does not look like a tar archive
tar: Exiting with failure status due to previous errors
./utils/install_pdf_deps.sh: line 9: cd: install-tl-2*: No such file or directory
Can't open perl script "./install-tl": No such file or directory
./utils/install_pdf_deps.sh: line 11: cd: /usr/local/texlive/bin/x86_64-linux: No such file or directory
./utils/install_pdf_deps.sh: line 12: ./tlmgr: No such file or directory
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0

100 32.1M  100 32.1M    0     0   130M      0 --:--:-- --:--:-- --:--:--  130M
pandoc-3.7.0.2/share/
pandoc-3.7.0.2/share/man/
pandoc-3.7.0.2/share/man/man1/
pandoc-3.7.0.2/share/man/man1/pandoc.1.gz
pandoc-3.7.0.2/share/man/man1/pandoc-server.1.gz
pandoc-3.7.0.2/share/man/man1/pandoc-lua.1.gz
pandoc-3.7.0.2/bin/
pandoc-3.7.0.2/bin/pandoc
pandoc-3.7.0.2/bin/pandoc-server
pandoc-3.7.0.2/bin/pandoc-lua
--> dadc6631aca6

And this RUN ./utils/install_pdf_deps.sh step exited with ecode 0 without actually installing xelatex.

@jesuino
Copy link
Contributor Author

jesuino commented Jun 3, 2025

thank you @jiridanek I expected the build to fail in such cases (fail fast), hence why I didn't check the logs, saw the failed test and thought it was the culprit!

Copy link
Member

@harshad16 harshad16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the upgrade to latest version
work as expected

Tested on:
podman run -p 8888:8888 quay.io/opendatahub/workbench-images:jupyter-datascience-ubi9-python-3.11-pr-1030

The export of PDF works as expected.

(app-root) pandoc --version
pandoc 3.7.0.2
Features: +server +lua
Scripting engine: Lua 5.4
User data directory: /opt/app-root/src/.local/share/pandoc
Copyright (C) 2006-2024 John MacFarlane. Web: https://pandoc.org
This is free software; see the source for copying conditions. There is no

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm label Jun 4, 2025
Copy link
Contributor

openshift-ci bot commented Jun 4, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: harshad16

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Jun 4, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit 858b1b8 into opendatahub-io:main Jun 4, 2025
52 of 53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm size/l tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants