-
-
Notifications
You must be signed in to change notification settings - Fork 40
web-next: Add image attachment to note composer #287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 19 commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
20224a4
Add AI alt text generation for media (Phase 1–2)
dahlia 4f06eec
web-next: Add note image attachment UI (Phase 3–4)
dahlia dc42fb4
Fix issues found in branch code review
dahlia 0f7c00c
Add CORS support to /medium-uploads/* proxy endpoint
dahlia 027d884
Fix drag-and-drop in Firefox
dahlia 88a076c
Fix Firefox drag-and-drop with capture-phase listeners
dahlia 25a0f7c
Gate generatedAltText behind KV ownership
dahlia 5ae3054
Use per-account KV keys for medium ownership
dahlia 9f5704e
Add ownership tests for generatedAltText; commit schema
dahlia 3bf4d98
Cache alt text prompts after first read
dahlia 6645311
Fix dragover default; toast for skipped files; translations
dahlia 2d37f73
Make setMediumOwner fail closed
dahlia 9a183d0
Add CORS headers to 405/404 proxy responses; extend tests
dahlia 16b3ddf
Improve upload error messages; add model TODO
dahlia 2022a29
Treat kv.set false-return as failure in setMediumOwner
dahlia 8592060
Batch addFiles; track alt subscription; improve XHR errors
dahlia 6fa5a09
Test CORS headers on all error paths of the upload proxy
dahlia bddd57b
Guard against malformed locale tags in getAltTextPrompt
dahlia 5f18b5a
Surface error details in toasts; mark alt textarea required
dahlia 93f3d4a
Sync translator model in web/ai.ts to claude-sonnet-4-6
dahlia 07acd56
Remove XHR upload timeout
dahlia 18a3105
Use debounced dragleave instead of relatedTarget check
dahlia a445e11
Switch mediaItems from createSignal to createStore
dahlia 8a0c491
Guard against NaN in XHR upload progress
dahlia File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,238 @@ | ||
| import assert from "node:assert/strict"; | ||
| import test from "node:test"; | ||
| import { MockLanguageModelV3 } from "ai/test"; | ||
| import { generateAltText } from "./alttext.ts"; | ||
|
|
||
| // A 1×1 transparent GIF as a data URL — avoids network downloads in tests. | ||
| const DATA_URL = | ||
| "data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7"; | ||
|
|
||
| test("generateAltText() returns trimmed text from the model response", async () => { | ||
| const model = new MockLanguageModelV3({ | ||
| doGenerate: async () => ({ | ||
| content: [{ type: "text", text: " A cat sitting on a keyboard. \n" }], | ||
| finishReason: { unified: "stop", raw: undefined }, | ||
| usage: { | ||
| inputTokens: { | ||
| total: 10, | ||
| noCache: 10, | ||
| cacheRead: undefined, | ||
| cacheWrite: undefined, | ||
| }, | ||
| outputTokens: { total: 5, text: 5, reasoning: undefined }, | ||
| }, | ||
| warnings: [], | ||
| }), | ||
| }); | ||
|
|
||
| const result = await generateAltText({ | ||
| model, | ||
| imageUrl: DATA_URL, | ||
| language: "en", | ||
| }); | ||
|
|
||
| assert.equal(result, "A cat sitting on a keyboard."); | ||
| }); | ||
|
|
||
| test("generateAltText() sends an image file part to the model", async () => { | ||
| let hasImageFilePart = false; | ||
| const model = new MockLanguageModelV3({ | ||
| doGenerate: async (options) => { | ||
| for (const message of options.prompt) { | ||
| if (message.role !== "user") continue; | ||
| for (const part of message.content) { | ||
| if ( | ||
| part.type === "file" && | ||
| typeof part.mediaType === "string" && | ||
| part.mediaType.startsWith("image/") | ||
| ) { | ||
| hasImageFilePart = true; | ||
| } | ||
| } | ||
| } | ||
| return { | ||
| content: [{ type: "text", text: "A description." }], | ||
| finishReason: { unified: "stop", raw: undefined }, | ||
| usage: { | ||
| inputTokens: { | ||
| total: 10, | ||
| noCache: 10, | ||
| cacheRead: undefined, | ||
| cacheWrite: undefined, | ||
| }, | ||
| outputTokens: { total: 5, text: 5, reasoning: undefined }, | ||
| }, | ||
| warnings: [], | ||
| }; | ||
| }, | ||
| }); | ||
|
|
||
| await generateAltText({ model, imageUrl: DATA_URL, language: "en" }); | ||
|
|
||
| assert.ok(hasImageFilePart, "model should receive an image file part"); | ||
| }); | ||
|
|
||
| test("generateAltText() sends a system prompt to the model", async () => { | ||
| let capturedSystem: string | undefined; | ||
| const model = new MockLanguageModelV3({ | ||
| doGenerate: async (options) => { | ||
| const sysMsg = options.prompt.find((m) => m.role === "system"); | ||
| if (sysMsg?.role === "system") capturedSystem = sysMsg.content; | ||
| return { | ||
| content: [{ type: "text", text: "A description." }], | ||
| finishReason: { unified: "stop", raw: undefined }, | ||
| usage: { | ||
| inputTokens: { | ||
| total: 10, | ||
| noCache: 10, | ||
| cacheRead: undefined, | ||
| cacheWrite: undefined, | ||
| }, | ||
| outputTokens: { total: 5, text: 5, reasoning: undefined }, | ||
| }, | ||
| warnings: [], | ||
| }; | ||
| }, | ||
| }); | ||
|
|
||
| await generateAltText({ model, imageUrl: DATA_URL, language: "en" }); | ||
|
|
||
| assert.ok(capturedSystem != null, "model should receive a system prompt"); | ||
| assert.ok(capturedSystem.length > 0, "system prompt should not be empty"); | ||
| }); | ||
|
|
||
| test("generateAltText() uses a Korean system prompt for Korean language", async () => { | ||
| let capturedSystem: string | undefined; | ||
| const model = new MockLanguageModelV3({ | ||
| doGenerate: async (options) => { | ||
| const sysMsg = options.prompt.find((m) => m.role === "system"); | ||
| if (sysMsg?.role === "system") capturedSystem = sysMsg.content; | ||
| return { | ||
| content: [{ type: "text", text: "설명입니다." }], | ||
| finishReason: { unified: "stop", raw: undefined }, | ||
| usage: { | ||
| inputTokens: { | ||
| total: 10, | ||
| noCache: 10, | ||
| cacheRead: undefined, | ||
| cacheWrite: undefined, | ||
| }, | ||
| outputTokens: { total: 5, text: 5, reasoning: undefined }, | ||
| }, | ||
| warnings: [], | ||
| }; | ||
| }, | ||
| }); | ||
|
|
||
| await generateAltText({ model, imageUrl: DATA_URL, language: "ko" }); | ||
|
|
||
| assert.ok(capturedSystem != null, "system prompt should be set"); | ||
| assert.ok( | ||
| capturedSystem.includes("한국어") || capturedSystem.includes("접근성"), | ||
| "Korean prompt should contain Korean-specific text", | ||
| ); | ||
| }); | ||
|
|
||
| test("generateAltText() falls back to English prompt for unsupported locales", async () => { | ||
| let capturedSystem: string | undefined; | ||
| const model = new MockLanguageModelV3({ | ||
| doGenerate: async (options) => { | ||
| const sysMsg = options.prompt.find((m) => m.role === "system"); | ||
| if (sysMsg?.role === "system") capturedSystem = sysMsg.content; | ||
| return { | ||
| content: [{ type: "text", text: "A description." }], | ||
| finishReason: { unified: "stop", raw: undefined }, | ||
| usage: { | ||
| inputTokens: { | ||
| total: 10, | ||
| noCache: 10, | ||
| cacheRead: undefined, | ||
| cacheWrite: undefined, | ||
| }, | ||
| outputTokens: { total: 5, text: 5, reasoning: undefined }, | ||
| }, | ||
| warnings: [], | ||
| }; | ||
| }, | ||
| }); | ||
|
|
||
| await generateAltText({ model, imageUrl: DATA_URL, language: "ar" }); | ||
|
|
||
| assert.ok(capturedSystem != null, "system prompt should be set"); | ||
| assert.ok( | ||
| capturedSystem.includes("accessibility") || | ||
| capturedSystem.includes("English"), | ||
| "should fall back to English prompt for unsupported locales", | ||
| ); | ||
| }); | ||
|
|
||
| test("generateAltText() includes note context in the user text when provided", async () => { | ||
| let capturedTextPart: string | undefined; | ||
| const model = new MockLanguageModelV3({ | ||
| doGenerate: async (options) => { | ||
| const userMsg = options.prompt.find((m) => m.role === "user"); | ||
| if (userMsg?.role === "user") { | ||
| const textPart = userMsg.content.find((p) => p.type === "text"); | ||
| if (textPart?.type === "text") capturedTextPart = textPart.text; | ||
| } | ||
| return { | ||
| content: [{ type: "text", text: "A cat." }], | ||
| finishReason: { unified: "stop", raw: undefined }, | ||
| usage: { | ||
| inputTokens: { | ||
| total: 10, | ||
| noCache: 10, | ||
| cacheRead: undefined, | ||
| cacheWrite: undefined, | ||
| }, | ||
| outputTokens: { total: 5, text: 5, reasoning: undefined }, | ||
| }, | ||
| warnings: [], | ||
| }; | ||
| }, | ||
| }); | ||
|
|
||
| await generateAltText({ | ||
| model, | ||
| imageUrl: DATA_URL, | ||
| language: "en", | ||
| context: "My home office setup", | ||
| }); | ||
|
|
||
| assert.ok(capturedTextPart?.includes("My home office setup")); | ||
| }); | ||
|
|
||
| test("generateAltText() does not add context hint when context is absent", async () => { | ||
| let capturedTextPart: string | undefined; | ||
| const model = new MockLanguageModelV3({ | ||
| doGenerate: async (options) => { | ||
| const userMsg = options.prompt.find((m) => m.role === "user"); | ||
| if (userMsg?.role === "user") { | ||
| const textPart = userMsg.content.find((p) => p.type === "text"); | ||
| if (textPart?.type === "text") capturedTextPart = textPart.text; | ||
| } | ||
| return { | ||
| content: [{ type: "text", text: "A photo." }], | ||
| finishReason: { unified: "stop", raw: undefined }, | ||
| usage: { | ||
| inputTokens: { | ||
| total: 10, | ||
| noCache: 10, | ||
| cacheRead: undefined, | ||
| cacheWrite: undefined, | ||
| }, | ||
| outputTokens: { total: 5, text: 5, reasoning: undefined }, | ||
| }, | ||
| warnings: [], | ||
| }; | ||
| }, | ||
| }); | ||
|
|
||
| await generateAltText({ model, imageUrl: DATA_URL, language: "en" }); | ||
|
|
||
| assert.ok(capturedTextPart != null); | ||
| assert.ok( | ||
| !capturedTextPart.toLowerCase().includes("context:"), | ||
| "no context hint should appear when context is absent", | ||
| ); | ||
| }); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,77 @@ | ||
| import { readdir, readFile } from "node:fs/promises"; | ||
| import { join } from "node:path"; | ||
| import { | ||
| isLocale, | ||
| type Locale, | ||
| negotiateLocale, | ||
| } from "@hackerspub/models/i18n"; | ||
| import { generateText, type LanguageModel } from "ai"; | ||
|
|
||
| const MAX_CONTEXT_LENGTH = 1000; | ||
| const MAX_ALT_TEXT_TOKENS = 200; | ||
|
|
||
| const PROMPT_LANGUAGES: Locale[] = ( | ||
| await readdir( | ||
| join(import.meta.dirname!, "prompts", "alttext"), | ||
| { withFileTypes: true }, | ||
| ) | ||
| ).map((f) => f.name.replace(/\.md$/, "")).filter(isLocale); | ||
|
dahlia marked this conversation as resolved.
dahlia marked this conversation as resolved.
|
||
|
|
||
| const promptCache = new Map<string, string>(); | ||
|
|
||
| async function getAltTextPrompt(language: string): Promise<string> { | ||
| let locale: Intl.Locale; | ||
| try { | ||
| locale = new Intl.Locale(language); | ||
| } catch { | ||
| locale = new Intl.Locale("en"); | ||
| } | ||
| const promptLocale = negotiateLocale(locale, PROMPT_LANGUAGES) ?? | ||
| new Intl.Locale("en"); | ||
|
coderabbitai[bot] marked this conversation as resolved.
|
||
| const cacheKey = promptLocale.baseName; | ||
| const cached = promptCache.get(cacheKey); | ||
| if (cached != null) return cached; | ||
| const promptPath = join( | ||
| import.meta.dirname!, | ||
| "prompts", | ||
| "alttext", | ||
| `${cacheKey}.md`, | ||
| ); | ||
| const content = await readFile(promptPath, "utf8"); | ||
| promptCache.set(cacheKey, content); | ||
| return content; | ||
| } | ||
|
dahlia marked this conversation as resolved.
|
||
|
|
||
| export interface AltTextOptions { | ||
| model: LanguageModel; | ||
| imageUrl: string; | ||
| language: string; | ||
| context?: string; | ||
| } | ||
|
|
||
| export async function generateAltText( | ||
| options: AltTextOptions, | ||
| ): Promise<string> { | ||
| const { model, imageUrl, language } = options; | ||
| const context = options.context?.slice(0, MAX_CONTEXT_LENGTH); | ||
| const systemPrompt = await getAltTextPrompt(language); | ||
|
|
||
| const textContent = context | ||
| ? `Generate alt text for this image. Context from the accompanying note: ${context}` | ||
| : "Generate alt text for this image."; | ||
|
|
||
| const result = await generateText({ | ||
| model, | ||
| system: systemPrompt, | ||
| maxOutputTokens: MAX_ALT_TEXT_TOKENS, | ||
| messages: [{ | ||
| role: "user", | ||
| content: [ | ||
| { type: "image", image: new URL(imageUrl) }, | ||
| { type: "text", text: textContent }, | ||
| ], | ||
| }], | ||
| }); | ||
|
|
||
| return result.text.trim(); | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1,3 @@ | ||
| export { generateAltText } from "./alttext.ts"; | ||
| export { summarize } from "./summary.ts"; | ||
| export { translate } from "./translate.ts"; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| You are an accessibility assistant. Generate concise, descriptive alt text for the provided image so that visually impaired users understand what the image shows. | ||
|
|
||
| Rules: | ||
| - Write 1–3 short sentences describing the image objectively. | ||
| - Focus on the main subject, action, setting, and any relevant text visible in the image. | ||
| - Do not begin with "Image of", "Photo of", or similar redundant phrases. | ||
| - Do not include personal opinions or interpretations. | ||
| - Write in English. | ||
| - Keep it under 150 characters when possible. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| あなたはアクセシビリティアシスタントです。視覚障害のあるユーザーが画像の内容を理解できるよう、提供された画像に対して簡潔で説明的な代替テキストを生成してください。 | ||
|
|
||
| ルール: | ||
| - 画像を客観的に説明する1〜3つの短い文を書きます。 | ||
| - 主な被写体、動作、背景、画像に見えるテキストに焦点を当てます。 | ||
| - 「画像:」「写真:」などの不要な接頭辞で始めないでください。 | ||
| - 個人的な意見や解釈は含めないでください。 | ||
| - 日本語で書きます。 | ||
| - できるだけ150文字以内に収めます。 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| 당신은 접근성 보조 도구입니다. 시각 장애인 사용자가 이미지의 내용을 이해할 수 있도록 제공된 이미지에 대한 간결하고 서술적인 대체 텍스트를 생성하세요. | ||
|
|
||
| 규칙: | ||
| - 이미지를 객관적으로 설명하는 1–3개의 짧은 문장을 작성합니다. | ||
| - 주요 피사체, 행동, 배경, 이미지에 보이는 관련 텍스트에 초점을 맞춥니다. | ||
| - "이미지:", "사진:", "그림:" 등 불필요한 접두사로 시작하지 않습니다. | ||
| - 개인적인 의견이나 해석은 포함하지 않습니다. | ||
| - 한국어로 작성합니다. | ||
| - 가능한 한 150자 이내로 유지합니다. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| 您是一个无障碍辅助工具。请为提供的图像生成简洁、描述性的替代文本,以便视觉障碍用户了解图像内容。 | ||
|
|
||
| 规则: | ||
| - 用1–3个简短句子客观描述图像。 | ||
| - 重点关注图像中的主要主体、动作、背景及可见文本。 | ||
| - 不要以"图像:"、"照片:"等冗余短语开头。 | ||
| - 不包含个人意见或解读。 | ||
| - 用简体中文书写。 | ||
| - 尽量控制在150个字符以内。 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| 您是一個無障礙輔助工具。請為提供的圖像生成簡潔、描述性的替代文字,以便視覺障礙使用者了解圖像內容。 | ||
|
|
||
| 規則: | ||
| - 用1–3個簡短句子客觀描述圖像。 | ||
| - 重點關注圖像中的主要主體、動作、背景及可見文字。 | ||
| - 不要以「圖像:」、「照片:」等冗贅短語開頭。 | ||
| - 不包含個人意見或解讀。 | ||
| - 用繁體中文書寫。 | ||
| - 盡量控制在150個字元以內。 |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.