10/31/2025
Product Development launching
ai-agents¶
https://github.com/topics/ai-agents https://github.com/ashishpatel26/500-AI-Agents-Projects
refer: https://www.youtube.com/watch?v=mNcXue7X8H0 https://github.com/coleam00/ottomator-agents/tree/main/python-local-ai-agent git clone https://github.com/coleam00/ottomator-agents.git cd ottomator-agents/python-local-ai-agent
https://www.youtube.com/watch?v=c5jHhMXmXyo https://www.youtube.com/watch?v=E4l91XKQSgw&t=3s Summary for the last video
This video provides a detailed, step-by-step tutorial for building a fully local AI agent using the Retrieval-Augmented Generation (RAG) pattern in a Python environment, specifically demonstrating how to ground a Large Language Model (LLM) in a private dataset.
The final system achieved in the tutorial can take a user's question, search a local database of information for context, and then generate a contextual, accurate answer using an LLM running on the developer's computer.
Detailed Summary of the Local AI Agent Tutorial¶
1. Core Technology Stack¶
The project relies on a completely free, local, and open-source stack to ensure no external API keys or cloud services are needed.
- Local LLM Runtime: Ollama is used to download and run the LLM models locally on the host machine.
- Orchestration Framework: LangChain is used to connect all the components: the data loader, the vector database, the embedding model, and the LLM.
- Vector Database: ChromaDB is used as the local, on-disk database to store the vectorized representations of the private data.
2. Local Model Setup and Installation¶
The setup takes place in a Python environment within a terminal (similar to a WSL2 setup).
| Component | Model/Tool Used | Purpose |
|---|---|---|
| Core LLM | llama3.2 (via ollama pull llama3.2) |
Used for the final question answering, text synthesis, and conversation. |
| Embedding Model | mxbai-embed-large (via ollama pull mxbai-embed-large) |
Used to convert the source text (and later the user's questions) into numerical vectors for searching the database. |
| Python Dependencies | langchain, langchain-ollama, langchain-chroma, pandas |
Libraries to manage the RAG flow, interact with Ollama, run the vector database, and read the CSV data. |
3. Data Preparation and Local RAG Pipeline¶
The tutorial uses a sample CSV file (realistic_restaurant_reviews.csv) containing reviews (Title, Date, Rating, Review) as the local knowledge source.
A. Data Loading and Document Creation (Vectorization)
- Load Data: The CSV file is loaded into a pandas DataFrame.
- Define Persistence: A folder (
./chrome_langchain_db) is defined to save the vector database persistently, avoiding the need to re-embed data on every run. - Document Structuring: The code iterates through the DataFrame, creating a LangChain
Documentobject for each review.- The crucial
page_content(the text used for vectorized search) is created by combining the Title and the full Review text. - Metadata (Rating, Date) is also attached but is not used for the vector search itself.
- The crucial
- Vector Store Creation: A Chroma vector store is created using the downloaded
OllamaEmbeddingsmodel and the defined persistence directory. - Data Embedding: The documents are added to the vector store (
vector_store.add_documents(...)), which automatically uses the local embedding model to convert the text into numerical vectors and stores them in ChromaDB.
B. Creating the Retriever
- The Chroma vector store is converted into a
retrieverobject. - The retriever is configured to look up the top 5 most relevant documents (
search_kwargs={"k": 5}) when queried.
4. Final Agent Orchestration¶
The logic connects the local data source (RAG) and the local LLM.
- LLM Initialization: An
OllamaLLMobject is instantiated using thellama3.2model. - Prompt Template: A prompt template is defined to instruct the LLM, with two dynamic placeholders:
{reviews}: For the context retrieved from the database.{question}: For the user's original question.
- Chain Creation: A single, simple chain is created:
chain = prompt | model. - The Conversational Loop (RAG in Action):
- The script enters a continuous loop asking the user for a question.
- When a question is entered:
- Retrieval:
reviews = retriever.invoke(question)sends the question to the vector database, which returns the top 5 most relevant documents (e.g., reviews about "vegan options"). - Generation:
result = chain.invoke({"reviews": reviews, "question": question})feeds the retrieved context and the question to the LLM. - Output: The LLM synthesizes a contextual answer based only on the retrieved reviews and prints the final result.
- Retrieval:
This pipeline successfully demonstrates a high-performance, local AI agent capable of grounded, contextual question-answering, forming the foundation for your museum app concept.