Segmentation and Semantic Classification with Pretrained Models

This project presents an intelligent animal detection and classification system that combines state-of-the-art computer vision models to identify and locate animals in images using natural language prompts. The system leverages Meta’s Segment Anything Model (SAM) for precise object segmentation and OpenAI’s CLIP for semantic understanding, creating a powerful tool for wildlife monitoring and research applications.

Video 1. Real-time animal detection and classification demonstration.

Processing Workflow

# Simplified workflow representation
Load input image → Image preprocessing
Generate object masks → SAM segmentation
Filter high-quality masks → Quality assessment
Extract object crops → Bounding box extraction
Classify each crop → CLIP classification
Match user prompt → Semantic matching
Generate output visualization → Result overlay