Video to Caption
Introduction
Installation
# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" gllm-multimodal gllm-inference
# you can use a Conda environment
$token = (gcloud auth print-access-token)
pip install --extra-index-url "https://oauth2accesstoken:$token@glsdk.gdplabs.id/gen-ai-internal/simple/" gllm-multimodal gllm-inference
# you can use a Conda environment
FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO pip install --extra-index-url "https://oauth2accesstoken:%T@glsdk.gdplabs.id/gen-ai-internal/simple/" gllm-multimodal gllm-inferenceQuickstart
import asyncio
from gllm_inference.schema import Attachment
from gllm_multimodal.modality_converter.video_to_text.video_to_caption import LMBasedVideoToCaption
video = Attachment.from_path("./sample_video.mp4")
converter = LMBasedVideoToCaption.from_preset("default")
# The converter expects raw bytes for the video input
result = asyncio.run(converter.convert(video.data))
# The result is a TextResult object
print(f"Video Summary: {result.result}")
# Access detailed segments from metadata
for segment in result.metadata["segments"]:
print(f"Segment ({segment['start_time']}s - {segment['end_time']}s):")
for caption in segment.get("segment_caption", []):
print(f" - {caption}")Expected output format
Contextual video captioning
Using image one liner and description
Adding domain knowledge and metadata
Using attachment context
Customize model and prompt
Was this helpful?