Project

Vaisight
Computer Vision Extraction

A small service pattern for turning image inputs into structured outputs you can use in real workflows.

Overview

Vaisight is a lightweight computer-vision project that demonstrates a simple but powerful pattern: accept an image input, run a vision model or API, extract a set of signals, then return structured output that other systems can use. Keith Azodeh uses projects like this to explore the boundary between raw media inputs and operationally useful data.

The problem

Many workflows contain information that is visually present but not digitally structured. Receipts, screenshots, labels, price tags, and documents often need to be converted into text or fields before they can be stored, searched, or acted on. The engineering challenge is less about "can we read text" and more about building a small, reliable service that turns vision outputs into clean data with sensible error handling.

Solution pattern

1) A minimal API surface

The service is designed around a single job: receive an image payload and respond with extracted information. Keeping the surface minimal makes it easier to secure, test, and evolve.

2) Model/API integration

The vision step can be powered by a managed API or a local model. The key design choice is to isolate the integration behind a small adapter so the rest of the system remains stable if the provider changes.

3) Post-processing into a schema

Raw OCR is rarely enough. The output needs to be normalized into a simple schema, for example: { text, candidates, confidence, extractedFields }. Post-processing might include regex parsing, cleanup, and basic validation before returning the result.

Why it matters

Vaisight is useful as a reference because it shows how to make an AI integration practical. The most common failure mode with AI integrations is building a demo that cannot be operated. A small extraction service is an example of an integration that can be made testable, measurable, and replaceable.

Related pages