-
Notifications
You must be signed in to change notification settings - Fork 5
Add File System Support via pyodide.FS #37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…lzer4/add_fyle_system
Fullzer4/add fyle system
fix: conventions
fix: _build_command place convention
This comment was marked as outdated.
This comment was marked as outdated.
Update: Feature Complete & Ready for ReviewQuick Summary
StructuredTool MigrationThe switch to StructuredTool enables dynamic descriptions that automatically list available files, improving developer experience by providing clear visibility into attached resources. Technical DetailsThis implementation focuses only on MEMFS support as documented in the Pyodide File System documentation. Ready for review! 🚀 Open to feedback and suggestions for improvements. @eyurtsev @vbarda Could you please review when you have a chance? Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fullzer4 this PR is really amazing. Thank you for putting all of this together! I left a few questions about the extra entrypoints in the code and would like to figure out how to avoid them if possible.
Thanks also for fixing the various issues in the existing code! You caught more than one bug!
I think we have to pass the file bytes via stdin rather than as arg as most OSs impose a max limit on the size of the arg value that can be passed |
Fix/conventions and patterns
@eyurtsev I've completed the filesystem implementation with stdin. It's working well and solves the CLI argument size limitations we discussed. I chose a binary protocol over JSON stdin for file transfer, implemented with ReadableStream for efficient binary data handling, because it preserves binary data integrity and avoids "Maximum call stack size exceeded" errors that occur when sending large files as JSON through stdin. Would appreciate your thoughts on this approach. I'm already working on the output functionality with a get_file method to extract files from the sandbox after execution. Will submit this in a separate PR once it's ready and tested. Regarding the permissions flags, I had to make changes to fix critical errors like: Error: Requires write access to "/home/fullzer4/path/node_modules/.deno/[email protected]/node_modules/pyodide/micropip-0.8.0-py3-none-any.whl" and Code NotCapable: Requires read access to "/path/node_modules/.deno/[email protected]/node_modules/pyodide/pyodide.asm.wasm" The approach follows the principle of least privilege while ensuring everything works properly. For the description part, I implemented a template-based system with _description_template and _build_description() that dynamically shows available files to the LLM: A secure Python code sandbox with filesystem support... ATTACHED FILES AVAILABLE: • data.csv • config.json These files are already loaded and ready to use... The implementation now guarantees that when users provide custom descriptions, they are respected exactly as specified, while still supporting dynamic file information through an optional placeholder system. I kept the filesystem methods (attach_file, create_directory, etc.) as they provide a clean, intuitive API. These methods make working with sandbox files much simpler than reimplementing these operations in user code. I've also implemented the constructor-based file attachment you suggested, so users can now provide files directly when creating the sandbox tool: sandbox_tool = PyodideSandboxTool(
allow_net=True,
files={
"sales.csv": sales_data
}
) I decided to maintain both approaches - constructor-based initialization and method-based file operations - because they serve different use cases. The constructor approach is perfect for setting up files at creation time in a clean, declarative style, while the method-based approach allows for more dynamic file manipulation during the tool's lifecycle. This flexibility gives users the best of both worlds without forcing them into a single pattern. Let me know if you'd like me to adjust anything before merging. |
Hi @eyurtsev, I hope you're doing well! I just wanted to gently check in on the filesystem support PR when you have a moment. I'm eager to hear your thoughts. Thank you so much! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few questions and I'd like to remove the mutability and and various help methods on the Python side.
I can take care of the changes on the python side if you'd like me to!
If you want to get the changes in more quickly, we can also break this PR into smaller ones (e.g., just tackling the typescript side first.)
): | ||
self.description = self._build_description() | ||
|
||
def attach_file( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we remove all the extra helper methods to avoid mutability and multiple ways of achieving the same result? Passing files
through the init is enough for both the tool and the underlying Sandbox.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I included the helper methods (attach_file, create_directory, etc.) primarily for dynamic use cases where files need to be added after tool initialization - particularly useful in LangGraph nodes where the tool might need to attach files based on previous execution results.
Without these methods, users would need to recreate the entire tool instance or implement custom file management logic. The methods provide a clean API for: "tool running → needs new file → attach → continue" etc...
However, I understand the preference for immutability. If you'd like to remove them for simplicity, I can refactor to constructor-only approach.
_sync_sandbox: SyncPyodideSandbox | None = PrivateAttr(default=None) | ||
_custom_description: bool = PrivateAttr(default=False) | ||
|
||
def model_post_init(self, /, __context) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to use pydantic's model_post_init instead of placing the code after the init (super().init()) call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used model_post_init
specifically to handle Pydantic's field initialization lifecycle. When trying to update the description
field after super().__init__()
, Pydantic had already frozen the field state, causing silent failures when attach_file()
tried to update the description dynamically.
model_post_init
ensures the description is properly built after all Pydantic initialization is complete, allowing proper field mutation for the dynamic file information feature.
Thanks for offering to help! I'm happy to handle the remaining changes on the Python side myself - there's no rush on my end to get these changes in immediately. I've already addressed some of your review points with recent commits (specifically the unnecessary filesystem operations and permission model reversion). If you prefer breaking this into smaller PRs as you suggested, I'm open to that approach too, but I don't want you to worry about making the changes yourself unless you specifically want to. I'm comfortable implementing the remaining feedback items based on your guidance. |
Add File System Support via pyodide.FS
This PR implements comprehensive file system operations in the PyodideSandbox, addressing issue #34 (Allow attaching files) and issue #26 (Preserve newlines when printing).
Key Features
File System Operations
attach_file()
andattach_files()
open()
,Path()
, and standard file operationsBug Fixes
print(5); print(5)
now correctly outputs5\n5
instead of55
Implementation Details
FileSystemOperation
interface for defining file operationsmain.ts
using pyodide.FSUsage Example
References