Skip to content

[content-publishing] Deal with brittle IPFS service #834

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
JoeCap08055 opened this issue May 20, 2025 · 0 comments
Open

[content-publishing] Deal with brittle IPFS service #834

JoeCap08055 opened this issue May 20, 2025 · 0 comments

Comments

@JoeCap08055
Copy link
Contributor

Description

Real-world data from the "Freesky" project has shown us that, at least with a "locally"-deployed Kubo IPFS node, that the IPFS service can be slow & brittle. Performance and other issues with the IPFS node can cause problems with content uploading.

Specifically, the recent addition of the /v2/assets/upload endpoint that streams directly to IPFS does not isolate the client from problems with the IPFS service. This endpoint was developed to address 2 issues:

  1. In the /v1/assets/upload endpoint, files were not only initially buffered in application heap memory, but were also cached in Redis (putting extra load on Redis & also subject to Redis' 512MB limit on the size of any individual value). Streaming directly eliminates both buffering and caching.
  2. In the /v1/assets/upload workflow, the client gets an immediate response without confirmation that the asset has achieved its final disposition in IPFS; thus leading to race conditions were the client immediately makes a request to announce previously-uploaded content, but the content cannot yet be verified on IPFS.

Unfortunately, it may be that the direct-stream approach is not robust enough. One advantage of the previous approach was that it utilized task queues with automatic retry to marshal IPFS uploads, and isolated the client from direct IPFS errors.

Here are some architectural points to consider in devising a more robust solution to this workflow issue:

1. Better Local Caching of Content

We could require content-publishing-service (both -api and -worker) to be provisioned with enough attached storage to temporarily cache uploaded content. In order to eliminate application heap buffering, we could use the Multer middleware's DiskStorage storage engine to stream uploaded files directly to disk. Then the worker task queue could handle upload out-of-band.

This approach would benefit from both:

2. Task Dependencies

When uploading assets, if we make the task-id of an uploaded asset be deterministically generated from either its DSNP hash or its CID, then we could:

  • Check before queuing a content announcement task; if any assets are not already present in IPFS, calculate their task IDs and devise a dependency mechanism to wait for successful task completion of those tasks
    • BullMQ has parent/child job dependency capabilities; might be able to devise an appropriate mechanism there

3. Batch-upload Specific Ideas

The original approach in content-publishing-service is a multi-step process, modeled after the way many social media apps' posting workflows seem to work, ie:

  1. Upload media assets
  2. Compose a post about them
  3. Submit

However, that workflow may not be best for some applications, which might be better-served by an all-in-one approach, ie:

  1. Compose a post containing media file references
  2. Submit in one call

For instance, the Freesky project, which creates its own content batch files, then immediately uploads & requests to be announced on-chain, might be better served by an endpoint that combines the file upload & announcement request in one payload. This would allow Gateway to construct a better task pipeline with dependencies right from the start.

To be clear, both models seem to have a place in the Gateway ecosystem, but we only support the first model right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant