Skip to content

Multimodality: Upload PDFs and Images #135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 35 commits into from
May 20, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
ee1c084
feat : Support file uploads #56
Neulhan Apr 28, 2025
aa59470
feat: support drag and drop file upload
wuhaolei455 Apr 29, 2025
635d635
Merge pull request #2 from wuhaolei455/main
Neulhan Apr 30, 2025
d8c8a85
init pdf images branch
starmorph May 15, 2025
5d86187
implementing pdf upload, text blob pdf-parse
starmorph May 15, 2025
aa32e58
converting image + file upload to Base64ContentBlock Mime_type standa…
starmorph May 15, 2025
9ac2228
lint and format
starmorph May 15, 2025
1cac35f
major refactor
starmorph May 17, 2025
fcaa963
format && remove graph
starmorph May 17, 2025
79a28d1
fix file upload schema to match python langchain multimodality docs
starmorph May 19, 2025
855d8c5
Anthropic pdf uploads working w metadata filename
starmorph May 19, 2025
5e7a8ee
handleFileUpload, images working
starmorph May 19, 2025
217fd43
support different image filetypes
starmorph May 19, 2025
f1b6aab
show image thumbnails and pdf filenames in chat, allow for fileupload…
starmorph May 19, 2025
198d13d
format
starmorph May 19, 2025
2197ce3
update pnpm lock
starmorph May 19, 2025
24c5e2d
bump langgraph-checkpoint and langgraph-sdk to same versions as main
starmorph May 19, 2025
f618f0a
Merge branch 'main' into upload-images-and-pdfs
starmorph May 19, 2025
28cee32
fix thread index
starmorph May 19, 2025
1002273
lint format
starmorph May 19, 2025
2df2067
remove un-needed util
starmorph May 19, 2025
1e51243
CR: drop un-needed pkgs
starmorph May 19, 2025
224d0ba
minor syntax hotfix
starmorph May 19, 2025
f3b6165
CR: multimodal preview component, drop pdf-parse dep
starmorph May 19, 2025
087587d
CR: check for supported image types, duplicate filenames
starmorph May 19, 2025
52379f5
CR fixes
starmorph May 19, 2025
2884683
CR: use-file-upload-hook
starmorph May 19, 2025
d358222
format
starmorph May 19, 2025
8534f4a
CR: ContentBlock abstraction
starmorph May 19, 2025
1fbef48
fix artifact code
starmorph May 19, 2025
25b2473
CR: cn utility, refactor accepted files, nextImage
starmorph May 20, 2025
f29b505
format
starmorph May 20, 2025
d3e5534
add file upload code from main thread/index
starmorph May 20, 2025
752fd11
format
starmorph May 20, 2025
cb2d216
fix pdf duplicate file upload handler
starmorph May 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion next.config.mjs
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
/** @type {import('next').NextConfig} */
const nextConfig = {};
const nextConfig = {
experimental: {
serverActions: {
bodySizeLimit: "10mb",
},
},
};

export default nextConfig;
6 changes: 3 additions & 3 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@
"tailwind-merge": "^3.0.2",
"tailwindcss-animate": "^1.0.7",
"use-stick-to-bottom": "^1.0.46",
"uuid": "^11.0.5",
"uuid": "^11.1.0",
"zod": "^3.24.2"
},
"devDependencies": {
Expand All @@ -64,6 +64,7 @@
"@types/react": "^19.0.8",
"@types/react-dom": "^19.0.3",
"@types/react-syntax-highlighter": "^15.5.13",
"@types/uuid": "^10.0.0",
"autoprefixer": "^10.4.20",
"dotenv": "^16.4.7",
"eslint": "^9.19.0",
Expand All @@ -81,8 +82,7 @@
"typescript-eslint": "^8.22.0"
},
"overrides": {
"react-is": "^19.0.0-rc-69d4b800-20241021",
"@langchain/langgraph-checkpoint": "^0.0.16"
"react-is": "^19.0.0-rc-69d4b800-20241021"
},
"packageManager": "[email protected]"
}
2,500 changes: 1,366 additions & 1,134 deletions pnpm-lock.yaml

Large diffs are not rendered by default.

37 changes: 37 additions & 0 deletions src/components/thread/ContentBlocksPreview.tsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
import React from "react";
import type { Base64ContentBlock } from "@langchain/core/messages";
import { MultimodalPreview } from "../ui/MultimodalPreview";
import { cn } from "@/lib/utils";

interface ContentBlocksPreviewProps {
blocks: Base64ContentBlock[];
onRemove: (idx: number) => void;
size?: "sm" | "md" | "lg";
className?: string;
}

/**
* Renders a preview of content blocks with optional remove functionality.
* Uses cn utility for robust class merging.
*/
export const ContentBlocksPreview: React.FC<ContentBlocksPreviewProps> = ({
blocks,
onRemove,
size = "md",
className,
}) => {
if (!blocks.length) return null;
return (
<div className={cn("flex flex-wrap gap-2 p-3.5 pb-0", className)}>
{blocks.map((block, idx) => (
<MultimodalPreview
key={idx}
block={block}
removable
onRemove={() => onRemove(idx)}
size={size}
/>
))}
</div>
);
};
57 changes: 51 additions & 6 deletions src/components/thread/index.tsx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like you deleted all of the stuff around artifacts. can you please redo that merge commit to properly merge this with what already exists? We shouldn't have to make david go do this merge, and def dont want to delete his artifacts stuff

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, the diff is way to large in the JSX portion. i assume this is because you didn't merge with main properly so we're overwriting a lot of David's stuff, but we should aim to have as little changes as possible

Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ import {
PanelRightClose,
SquarePen,
XIcon,
Plus,
CircleX,
} from "lucide-react";
import { useQueryState, parseAsBoolean } from "nuqs";
import { StickToBottom, useStickToBottomContext } from "use-stick-to-bottom";
Expand All @@ -36,6 +38,8 @@ import {
TooltipProvider,
TooltipTrigger,
} from "../ui/tooltip";
import { useFileUpload } from "@/hooks/use-file-upload";
import { ContentBlocksPreview } from "./ContentBlocksPreview";
import {
useArtifactOpen,
ArtifactContent,
Expand Down Expand Up @@ -122,6 +126,14 @@ export function Thread() {
parseAsBoolean.withDefault(false),
);
const [input, setInput] = useState("");
const {
contentBlocks,
setContentBlocks,
handleFileUpload,
dropRef,
removeBlock,
resetBlocks,
} = useFileUpload();
const [firstTokenReceived, setFirstTokenReceived] = useState(false);
const isLargeScreen = useMediaQuery("(min-width: 1024px)");

Expand Down Expand Up @@ -183,13 +195,17 @@ export function Thread() {

const handleSubmit = (e: FormEvent) => {
e.preventDefault();
if (!input.trim() || isLoading) return;
if ((input.trim().length === 0 && contentBlocks.length === 0) || isLoading)
return;
setFirstTokenReceived(false);

const newHumanMessage: Message = {
id: uuidv4(),
type: "human",
content: input,
content: [
...(input.trim().length > 0 ? [{ type: "text", text: input }] : []),
...contentBlocks,
] as Message["content"],
};

const toolMessages = ensureToolCallsHaveResponses(stream.messages);
Expand All @@ -214,6 +230,7 @@ export function Thread() {
);

setInput("");
setContentBlocks([]);
};

const handleRegenerate = (
Expand Down Expand Up @@ -423,11 +440,18 @@ export function Thread() {

<ScrollToBottom className="animate-in fade-in-0 zoom-in-95 absolute bottom-full left-1/2 mb-4 -translate-x-1/2" />

<div className="bg-muted relative z-10 mx-auto mb-8 w-full max-w-3xl rounded-2xl border shadow-xs">
<div
ref={dropRef}
className="bg-muted relative z-10 mx-auto mb-8 w-full max-w-3xl rounded-2xl border shadow-xs"
>
<form
onSubmit={handleSubmit}
className="mx-auto grid max-w-3xl grid-rows-[1fr_auto] gap-2"
>
<ContentBlocksPreview
blocks={contentBlocks}
onRemove={removeBlock}
/>
<textarea
value={input}
onChange={(e) => setInput(e.target.value)}
Expand All @@ -448,7 +472,7 @@ export function Thread() {
className="field-sizing-content resize-none border-none bg-transparent p-3.5 pb-0 shadow-none ring-0 outline-none focus:ring-0 focus:outline-none"
/>

<div className="flex items-center justify-between p-2 pt-4">
<div className="flex items-center gap-6 p-2 pt-4">
<div>
<div className="flex items-center space-x-2">
<Switch
Expand All @@ -464,19 +488,40 @@ export function Thread() {
</Label>
</div>
</div>
<Label
htmlFor="file-input"
className="flex cursor-pointer items-center gap-2"
>
<Plus className="size-5 text-gray-600" />
<span className="text-sm text-gray-600">
Upload PDF or Image
</span>
</Label>
<input
id="file-input"
type="file"
onChange={handleFileUpload}
multiple
accept="image/jpeg,image/png,image/gif,image/webp,application/pdf"
className="hidden"
/>
{stream.isLoading ? (
<Button
key="stop"
onClick={() => stream.stop()}
className="ml-auto"
>
<LoaderCircle className="h-4 w-4 animate-spin" />
Cancel
</Button>
) : (
<Button
type="submit"
className="shadow-md transition-all"
disabled={isLoading || !input.trim()}
className="ml-auto shadow-md transition-all"
disabled={
isLoading ||
(!input.trim() && contentBlocks.length === 0)
}
>
Send
</Button>
Expand Down
63 changes: 60 additions & 3 deletions src/components/thread/messages/human.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ import { getContentString } from "../utils";
import { cn } from "@/lib/utils";
import { Textarea } from "@/components/ui/textarea";
import { BranchSwitcher, CommandBar } from "./shared";
import { MultimodalPreview } from "@/components/ui/MultimodalPreview";
import type { Base64ContentBlock } from "@langchain/core/messages";

function EditableContent({
value,
Expand Down Expand Up @@ -32,6 +34,36 @@ function EditableContent({
);
}

// Type guard for Base64ContentBlock
function isBase64ContentBlock(block: unknown): block is Base64ContentBlock {
if (typeof block !== "object" || block === null || !("type" in block))
return false;
// file type (legacy)
if (
(block as { type: unknown }).type === "file" &&
"source_type" in block &&
(block as { source_type: unknown }).source_type === "base64" &&
"mime_type" in block &&
typeof (block as { mime_type?: unknown }).mime_type === "string" &&
((block as { mime_type: string }).mime_type.startsWith("image/") ||
(block as { mime_type: string }).mime_type === "application/pdf")
) {
return true;
}
// image type (new)
if (
(block as { type: unknown }).type === "image" &&
"source_type" in block &&
(block as { source_type: unknown }).source_type === "base64" &&
"mime_type" in block &&
typeof (block as { mime_type?: unknown }).mime_type === "string" &&
(block as { mime_type: string }).mime_type.startsWith("image/")
) {
return true;
}
return false;
}

export function HumanMessage({
message,
isLoading,
Expand Down Expand Up @@ -84,9 +116,34 @@ export function HumanMessage({
onSubmit={handleSubmitEdit}
/>
) : (
<p className="bg-muted ml-auto w-fit rounded-3xl px-4 py-2 whitespace-pre-wrap">
{contentString}
</p>
<div className="flex flex-col gap-2">
{/* Render images and files if no text */}
{Array.isArray(message.content) && message.content.length > 0 && (
<div className="flex flex-col items-end gap-2">
{message.content.reduce<React.ReactNode[]>(
(acc, block, idx) => {
if (isBase64ContentBlock(block)) {
acc.push(
<MultimodalPreview
key={idx}
block={block}
size="md"
/>,
);
}
return acc;
},
[],
)}
</div>
)}
{/* Render text if present, otherwise fallback to file/image name */}
{contentString ? (
<p className="bg-muted ml-auto w-fit rounded-3xl px-4 py-2 text-right whitespace-pre-wrap">
{contentString}
</p>
) : null}
</div>
)}

<div
Expand Down
6 changes: 6 additions & 0 deletions src/components/thread/utils.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
import type { Message } from "@langchain/langgraph-sdk";

/**
* Extracts a string summary from a message's content, supporting multimodal (text, image, file, etc.).
* - If text is present, returns the joined text.
* - If not, returns a label for the first non-text modality (e.g., 'Image', 'Other').
* - If unknown, returns 'Multimodal message'.
*/
export function getContentString(content: Message["content"]): string {
if (typeof content === "string") return content;
const texts = content
Expand Down
Loading