Skip to content

docs: multimodal #777

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
May 7, 2025
Merged

docs: multimodal #777

merged 16 commits into from
May 7, 2025

Conversation

madams0013
Copy link
Contributor

No description provided.

Copy link

vercel bot commented May 6, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langsmith-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 7, 2025 4:45pm

@tanushree-sharma
Copy link
Contributor

You'll need to add

---
hide_table_of_contents: true
---

to the "Run an evaluation with multimodal content" page because the tabs mess up the table of contents :(


![](./static/attachment_editing.gif)

## Define custom evaluators
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move this section up above the update section and follow the same section header as the SDK. So have Run evaluations as the header and then Create a multimodal prompt, define custom evaluators underneath

would also help to give an examples of what IS possible. Eg:

Even without multimodal support in your evaluators, you can still run text‐only evaluations. For example:

  • OCR → text correction: Use a vision model to extract text from a document, then evaluate the accuracy of the extracted output.
  • Speech‑to‑text → transcription quality: Use a voice model to transcribe audio to text, then evaluate the transcription against your reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, done

Some applications are based around multimodal content, like a chatbot that can answer questions about a PDF or image.
In these cases, you'll want to include multimodal content in your prompt and test the model's ability to answer questions about the content.

There are two ways to include multimodal content in a prompt:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any diff between the two that we should call out? how does a user know which approach to pick?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, added

@tanushree-sharma
Copy link
Contributor

This is a really cool capability!! I would also make a tutorial about this (docs version of catherine's video)

@madams0013 madams0013 merged commit 28062f4 into main May 7, 2025
6 checks passed
@madams0013 madams0013 deleted the maddy/multimodal branch May 7, 2025 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants