Introduction

Last updated 1 month ago

Was this helpful?

Introduction

What is LLM Vision?

LLM Vision is a Home Assistant integration that leverages multimodal LLMs to analyze images, videos, and live camera feeds, answer questions about them, and update sensors based on extracted data. The integration can optionally store analyzed images along with their AI summaries as events. These events can be viewed on the dashboard using the LLM Vision Timeline Card.

The integration also has a built-in memory to store reference images along with a description to easily .

Architecture

The diagram below is a high-level overview of the different components.

LLM Vision consists of multiple key parts, which need to be installed separately:

Integration (LLM Vision)
Blueprint (event_summary)
Timeline Card

At its core, the integration handles processing camera streams and sending them to the AI provider. It consists of providers such as "Timeline" and "Memory", as well as AI providers such as OpenAI. Providers are configuration entries in that add functionality to the integration.

The blueprint is a ready-made automation that is easy to use but customizable. It is triggered by a camera or binary sensor state change. When triggered, stream_analyzer is called, which returns a summary of what is happening in the camera feed. This summary is then sent to your phone via Home Assistant app notification. Optionally, the analyzed "event", can also be saved in the timeline.

Timeline is a provider that adds the ability to store events so they can be viewed on the dashboard through the Timeline Card.

Continue, to learn how to install and set up these components.

NextInstallation

Last updated 1 month ago

Was this helpful?