Breathing Life into Pixels: Building the Living Portfolio with Veo 3.1
How I engineered a hallucination-free cinemagraph pipeline using Gemini Vision, the Anchor Rule, and Cloudinary transformations.
Static photography captures a moment, but memory is fluid. My goal for the Living Portfolio was to bridge this gap, transforming my best shots into 8-second looping cinemagraphs that feel alive without losing their soul.
The challenge with Generative Video (GenAI) is control. Most "image-to-video" models treat the source image as a suggestion, often morphing the subject greatly, hallucinating new weird elements, or shifting the camera angle so drastically that the composition is ruined. I didn't want a movie based on my photo; I wanted my photo in motion.
To achieve this, I built a strict pipeline using Google Veo 3.1, Gemini Vision, and Cloudinary. This post breaks down the three architectural pillars that make it work: the Anchor Rule for prompt engineering, the Square Fix for aspect ratios, and Phantom Loading for a seamless user experience.
The Living Portfolio Pipeline
The first hurdle was preventing Veo from "dreaming" too much. Without careful constraints, the model might hallucinate entirely new unpredictable elements, like a static cityscape sprouting new buildings, or the camera angle shifting so drastically that the original composition is unrecognizable.
I solved this with what I call the Anchor Rule. Instead of prompting Veo directly, I use Gemini 3 Flash (a multimodal vision model) to analyze the image first. I feed Gemini a strict system prompt that forces it to act as a "conservative cinematographer." It must describe only what is visible and constrain motion to subtle, logical evolutions (breathing, wind, flowing water).
Here is the system prompt from that enforces these constraints:
export const VEO_PROMPT_SYSTEM = `You are an expert Cinematographer and Prompt Engineer for Google Veo 3.1. Your task is to analyze provided images and generate a specific, high-fidelity video generation prompt for each one. ## CONTEXT These prompts will be used in a Veo 3.1 "Image-to-Video" workflow where the provided image is strictly the **First Frame**. The video will start matching the image pixels exactly and evolve naturally for 8 seconds at 1080p resolution. ## CRITICAL CONSTRAINTS 1. **SILENT VISUALS (Priority):** The video output will be muted. **DO NOT** include audio cues (e.g., "sirens wailing"), dialogue quotes, or speaking actions. Visuals must be self-explanatory. 2. **THE ANCHOR RULE:** - **No Morphing:** Do not transform the subject into something else. The "First Frame" is the ground truth. - **Subject Consistency:** Do not hallucinate major subjects (crowds, cars) that are not visible in the image unless a specific camera movement would logically reveal them. - **Lighting Match:** If the image is dark, use "dim," "noir," or "shadowy." 3. **DURATION AWARENESS (8 Seconds):** Describe an action that takes 8 seconds to unfold. Avoid actions that are too brief without secondary motion. ## PROMPT FORMULA Construct the prompt using this specific order to match Veo's attention mechanism: **[Composition & Lens]** + **[Camera Movement]** + **[Subject Action/Evolution]** + **[Atmosphere & Lighting]** + **[Style Keywords]** ## ANALYSIS GUIDELINES - **Composition & Lens:** "Wide shot," "Close-up," "Macro lens," "Shallow depth of field," "Low angle," "Eye-level." - **Camera Movement:** "Slow dolly forward," "Tracking drone view," "Truck left," "Rack focus," "Slow pan," "Static shot with internal motion." - **Action (Silent):** - _Portraits:_ "Subtle breathing," "eyes shifting focus," "hair floating in wind," "gentle smile forming." - _Landscapes:_ "Clouds drifting," "water flowing," "fog rolling," "branches swaying," "sunlight dappling." - _Objects:_ "Reflections shimmering," "steam rising," "dust motes dancing." - **Atmosphere:** "Warm tones," "Cool blue tones," "Golden hour," "Cinematic lighting," "Volumetric fog." - **Style:** "Photorealistic," "Cinematic," "High fidelity," "Film grain."
Video models are trained on television standards (16:9) or social media verticals (9:16). Veo 3.1 only supports these two aspect ratios when using image-to-video generation. For photography with different aspect ratios, like 4:5 or the classic 1:1 square, this creates a challenge.
To solve this, I implemented the Square Fix. The strategy is straightforward: Always request a standard video format, then crop it later.
When providing an image as input to Veo, I force it to generate either 16:9 (for landscape or square photos with aspect ratio ā„ 1.0) or 9:16 (for portrait photos with aspect ratio < 1.0). Veo adds black bars (letterboxing or pillarboxing) around my original photo composition to fill the requested format. This ensures Veo generates high-fidelity pixels in the center where my photo lives, even though the edges contain waste space.
The logic lives in :
export function analyzeAspectRatio( width: number, height: number, ): AspectRatioInfo { const ratio = width / height; let orientation: AspectRatioInfo["orientation"]; if (ratio > 1.01) { orientation = "landscape"; } else if (ratio < 0.99) { orientation = "portrait"; } else { orientation = "square"; } // The Square Fix: AR >= 1 uses 16:9, AR < 1 uses 9:16 const veoFormat: VeoAspectRatio = ratio >= 1 ? "16:9" : "9:16"; return { width, height, ratio, orientation, veoFormat, }; }
Now I had a problem: I had raw 16:9 or 9:16 videos with ugly black bars, and I needed to restore them to the original aspect ratio of the photo. Running FFmpeg on a serverless function to crop video is slow, expensive, and memory-intensive.
The elegant solution was Cloudinary Incoming Transformations. Instead of processing the video myself, I configure the upload parameters to tell Cloudinary to crop the video as it arrives. I also strip the audio track (audio_codec: "none") because cinemagraphs are silent, saving roughly 30% on file size.
This happens in . Notice how the transformation array is applied during the upload, so I never even store the waste pixels:
export function getVideoUploadConfig(options: VideoUploadConfigOptions) { const { publicId, overwrite = true, preserveAudio = false, useOriginalAspectRatio = false, originalWidth, originalHeight, } = options; const transformation: Record<string, any> = { quality: "auto:best", // Only strip audio if preserveAudio is false (default behavior) ...(preserveAudio ? {} : { audio_codec: "none" }), }; // Add cropping/aspect ratio transformations if requested if (!useOriginalAspectRatio && originalWidth && originalHeight) { const aspectRatio = calculateAspectRatio(originalWidth, originalHeight); transformation.crop = "fill"; transformation.gravity = "center"; transformation.aspect_ratio = aspectRatio; } return { public_id: publicId, overwrite, resource_type: "video", transformation: [transformation], }; }
The entire process is asynchronous. When a photo is uploaded, an Inngest function triggers automatically. It handles the retry logic (Veo can be rate-limited), checks for existing prompts to save money, and manages the state transitions in the database.
The workflow in shows how the system chains these steps together, ensuring the database is only updated when the final, clean video is ready:
export const generateVideoFunction = inngest.createFunction( { id: "generate-video", name: "Generate Cinemagraph Video", retries: 5, concurrency: { limit: 2 }, }, { event: "photo.uploaded" }, async ({ event, step, logger }) => { // ... validation steps ... // Step 4: Generate new prompt with Gemini const veoPrompt = await step.run("get-veo-prompt", async () => { // Logic to call Gemini 3 Flash here... return prompt; }); // Step 5: Generate video with Veo const veoResult = await step.run("generate-veo-video", async () => { const aspectRatio = getVeoAspectRatio(width, height); return await generateVideo({ prompt: veoPrompt, imageUrl: uploadthingUrl, aspectRatio, durationSeconds: 8, }); }); // Step 6: Upload to Cloudinary with auto-crop const cloudinaryResult = await step.run("upload-to-cloudinary", async () => { return await uploadVideoWithTransformations( veoResult.videoUri, { originalWidth: width, originalHeight: height, photoId } ); }); // ... database updates ... } );
On the frontend, the goal was "zero layout shift." I didn't want a loading spinner to replace the beautiful high-res photo while the video buffered.
I created a Phantom Loading pattern in the VideoPlayer component. The high-res static image remains visible absolute-positioned over the video player. The video loads in the background (muted and playsInline for iOS compatibility). Only when the onCanPlay event fires does the video fade in, replacing the static image perfectly.
The full component implementation is in
The Living Portfolio turns the portfolio from a static archive into a dynamic experience. By chaining Gemini's understanding of the image, Veo's generation capabilities, and Cloudinary's on-the-fly processing, I built a system that respects the photographer's original composition while adding a new dimension of motion.
Check a photo for yourself: Whispers on the Water Whispers on the Water Animation Demo
The key takeaway is that Generative AI works best when constrained. By strictly defining the prompt with the Anchor Rule and mechanically enforcing composition with the Square Fix, we can tame the hallucination problem and ship reliable, production-grade AI features.