Skip to content

Multimodality: Upload PDFs and Images #135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 35 commits into from
May 20, 2025
Merged

Conversation

starmorph
Copy link
Contributor

PR Summary: Multimodal File Uploads (Images & PDFs) Integration

Overview

This PR introduces robust support for multimodal file uploads—specifically images and PDFs—across the chat UI and supporting utility functions. The changes enable users to attach images and PDF files to their chat messages, with proper encoding, preview, and message formatting for downstream processing (e.g., OpenAI-compatible formats).


Key Changes

1. src/lib/multimodal-utils.ts

  • New Utility Functions:
    • fileToImageBlock(file: File): Converts an image file to a typed Base64ContentBlock for image uploads.
    • fileToPDFBlock(file: File): Converts a PDF file to a typed Base64ContentBlock for PDF uploads.
    • fileToBase64(file: File): Helper to encode any file as a base64 string (strips data URI prefix).
    • toOpenAIPDFBlock(block: Base64ContentBlock): Converts a PDF block to OpenAI-compatible file format.
    • toOpenAIImageBlock(block: Base64ContentBlock): Converts a base64 image block to OpenAI-compatible image format.
  • Type Safety: All functions use proper TypeScript types for strong type safety and clarity.

2. src/components/thread/index.tsx

  • Image & PDF Upload UI:
    • Added file input controls for both images and PDFs in the chat input area.
    • Drag-and-drop support for both file types, with error handling for unsupported formats.
    • Uploaded images are previewed as thumbnails; PDFs are listed with filename and removable.
  • Message Construction:
    • On message send, attached images and PDFs are converted to OpenAI-compatible blocks using the new utility functions.
    • Ensures all file uploads are properly encoded and included in the message payload.
  • UX Improvements:
    • Users can remove individual images or PDFs before sending.
    • Error toasts for invalid file types.
    • Clean, minimal UI integration following project conventions.

Implementation Notes

  • All file handling logic is modularized in lib/multimodal-utils.ts for reusability and testability.
  • TypeScript types are strictly enforced for all multimodal content blocks.
  • The UI leverages Shadcn, Lucide, and Tailwind for a consistent, accessible experience.
  • The code is DRY, minimal, and follows Next.js and project best practices.

Closes: #multimodal-upload, #image-upload, #pdf-upload

Copy link

vercel bot commented May 17, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langgraph-chat ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 20, 2025 5:20pm

Copy link
Member

@bracesproul bracesproul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few comments, lmk when resolved

}) => {
if (!blocks.length) return null;
return (
<div className={`flex flex-wrap gap-2 p-3.5 pb-0 ${className}`}>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to wrap with cn

Comment on lines 141 to 143
{contentString &&
contentString !== "Other" &&
contentString !== "Multimodal message" ? (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can drop in favor of just

Suggested change
{contentString &&
contentString !== "Other" &&
contentString !== "Multimodal message" ? (
{contentString ? (

now that getContentString won't return those strings

if (size === "lg") imgClass = "rounded-md object-cover h-24 w-24 text-xl";
return (
<div
className={`relative inline-block${className ? ` ${className}` : ""}`}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to wrap with cn

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure this is fixed everywhere

<div
className={`relative inline-block${className ? ` ${className}` : ""}`}
>
<img
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be using NextImage from next/image here instead?

Comment on lines 6 to 15
export const SUPPORTED_IMAGE_TYPES = [
"image/jpeg",
"image/png",
"image/gif",
"image/webp",
];
export const SUPPORTED_FILE_TYPES = [
...SUPPORTED_IMAGE_TYPES,
"application/pdf",
];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why two arrays and not one?

Comment on lines 134 to 140
const setThreadId = (id: string | null) => {
_setThreadId(id);

// close artifact and reset artifact context
closeArtifact();
setArtifactContext({});
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think this should be dropped

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its called again later - thats my fault for bad Diff going to do as suggested below

};

const toolMessages = ensureToolCallsHaveResponses(stream.messages);

const context =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think this should be dropped

};

// Restore handleRegenerate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vibe coding comments shouldnt be committed

closeArtifact();
setArtifactContext({});
};
const threadId = _threadId;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just set it to threadId originally where you define useQueryState instead of re-assigning here

Comment on lines 277 to 280
<motion.div
className={cn(
"grid w-full grid-cols-[1fr_0fr] transition-all duration-500",
artifactOpen && "grid-cols-[3fr_2fr]",
"relative flex min-w-0 flex-1 flex-col overflow-hidden",
!chatStarted && "grid-rows-[1fr]",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im still confused why there's so many JSX changes here? It might be easier to just copy/paste whatever we have in main for the JSX, then add your code for the files instead of trying to fix this merge conflict

@bracesproul bracesproul merged commit 06e0de6 into main May 20, 2025
6 checks passed
@bracesproul bracesproul deleted the upload-images-and-pdfs branch May 20, 2025 17:23
@bracesproul bracesproul linked an issue May 20, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature request]: Support file uploads
4 participants