You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Real-world data from the "Freesky" project has shown us that, at least with a "locally"-deployed Kubo IPFS node, that the IPFS service can be slow & brittle. Performance and other issues with the IPFS node can cause problems with content uploading.
Specifically, the recent addition of the /v2/assets/upload endpoint that streams directly to IPFS does not isolate the client from problems with the IPFS service. This endpoint was developed to address 2 issues:
In the /v1/assets/upload endpoint, files were not only initially buffered in application heap memory, but were also cached in Redis (putting extra load on Redis & also subject to Redis' 512MB limit on the size of any individual value). Streaming directly eliminates both buffering and caching.
In the /v1/assets/upload workflow, the client gets an immediate response without confirmation that the asset has achieved its final disposition in IPFS; thus leading to race conditions were the client immediately makes a request to announce previously-uploaded content, but the content cannot yet be verified on IPFS.
Unfortunately, it may be that the direct-stream approach is not robust enough. One advantage of the previous approach was that it utilized task queues with automatic retry to marshal IPFS uploads, and isolated the client from direct IPFS errors.
Here are some architectural points to consider in devising a more robust solution to this workflow issue:
1. Better Local Caching of Content
We could require content-publishing-service (both -api and -worker) to be provisioned with enough attached storage to temporarily cache uploaded content. In order to eliminate application heap buffering, we could use the Multer middleware's DiskStorage storage engine to stream uploaded files directly to disk. Then the worker task queue could handle upload out-of-band.
This approach would benefit from both:
A query endpoint to get the completion status of an asset upload, or its disposition in IPFS
When uploading assets, if we make the task-id of an uploaded asset be deterministically generated from either its DSNP hash or its CID, then we could:
Check before queuing a content announcement task; if any assets are not already present in IPFS, calculate their task IDs and devise a dependency mechanism to wait for successful task completion of those tasks
BullMQ has parent/child job dependency capabilities; might be able to devise an appropriate mechanism there
3. Batch-upload Specific Ideas
The original approach in content-publishing-service is a multi-step process, modeled after the way many social media apps' posting workflows seem to work, ie:
Upload media assets
Compose a post about them
Submit
However, that workflow may not be best for some applications, which might be better-served by an all-in-one approach, ie:
Compose a post containing media file references
Submit in one call
For instance, the Freesky project, which creates its own content batch files, then immediately uploads & requests to be announced on-chain, might be better served by an endpoint that combines the file upload & announcement request in one payload. This would allow Gateway to construct a better task pipeline with dependencies right from the start.
To be clear, both models seem to have a place in the Gateway ecosystem, but we only support the first model right now.
The text was updated successfully, but these errors were encountered:
Description
Real-world data from the "Freesky" project has shown us that, at least with a "locally"-deployed Kubo IPFS node, that the IPFS service can be slow & brittle. Performance and other issues with the IPFS node can cause problems with content uploading.
Specifically, the recent addition of the
/v2/assets/upload
endpoint that streams directly to IPFS does not isolate the client from problems with the IPFS service. This endpoint was developed to address 2 issues:/v1/assets/upload
endpoint, files were not only initially buffered in application heap memory, but were also cached in Redis (putting extra load on Redis & also subject to Redis' 512MB limit on the size of any individual value). Streaming directly eliminates both buffering and caching./v1/assets/upload
workflow, the client gets an immediate response without confirmation that the asset has achieved its final disposition in IPFS; thus leading to race conditions were the client immediately makes a request to announce previously-uploaded content, but the content cannot yet be verified on IPFS.Unfortunately, it may be that the direct-stream approach is not robust enough. One advantage of the previous approach was that it utilized task queues with automatic retry to marshal IPFS uploads, and isolated the client from direct IPFS errors.
Here are some architectural points to consider in devising a more robust solution to this workflow issue:
1. Better Local Caching of Content
We could require
content-publishing-service
(both -api and -worker) to be provisioned with enough attached storage to temporarily cache uploaded content. In order to eliminate application heap buffering, we could use the Multer middleware'sDiskStorage
storage engine to stream uploaded files directly to disk. Then the worker task queue could handle upload out-of-band.This approach would benefit from both:
2. Task Dependencies
When uploading assets, if we make the task-id of an uploaded asset be deterministically generated from either its DSNP hash or its CID, then we could:
3. Batch-upload Specific Ideas
The original approach in content-publishing-service is a multi-step process, modeled after the way many social media apps' posting workflows seem to work, ie:
However, that workflow may not be best for some applications, which might be better-served by an all-in-one approach, ie:
For instance, the Freesky project, which creates its own content batch files, then immediately uploads & requests to be announced on-chain, might be better served by an endpoint that combines the file upload & announcement request in one payload. This would allow Gateway to construct a better task pipeline with dependencies right from the start.
To be clear, both models seem to have a place in the Gateway ecosystem, but we only support the first model right now.
The text was updated successfully, but these errors were encountered: