跳转至

10/31/2025

Product Development launching

ai-agents

https://github.com/topics/ai-agents https://github.com/ashishpatel26/500-AI-Agents-Projects

refer: https://www.youtube.com/watch?v=mNcXue7X8H0 https://github.com/coleam00/ottomator-agents/tree/main/python-local-ai-agent git clone https://github.com/coleam00/ottomator-agents.git cd ottomator-agents/python-local-ai-agent

https://www.youtube.com/watch?v=c5jHhMXmXyo https://www.youtube.com/watch?v=E4l91XKQSgw&t=3s Summary for the last video


This video provides a detailed, step-by-step tutorial for building a fully local AI agent using the Retrieval-Augmented Generation (RAG) pattern in a Python environment, specifically demonstrating how to ground a Large Language Model (LLM) in a private dataset.

The final system achieved in the tutorial can take a user's question, search a local database of information for context, and then generate a contextual, accurate answer using an LLM running on the developer's computer.


Detailed Summary of the Local AI Agent Tutorial

1. Core Technology Stack

The project relies on a completely free, local, and open-source stack to ensure no external API keys or cloud services are needed.

  • Local LLM Runtime: Ollama is used to download and run the LLM models locally on the host machine.
  • Orchestration Framework: LangChain is used to connect all the components: the data loader, the vector database, the embedding model, and the LLM.
  • Vector Database: ChromaDB is used as the local, on-disk database to store the vectorized representations of the private data.

2. Local Model Setup and Installation

The setup takes place in a Python environment within a terminal (similar to a WSL2 setup).

Component Model/Tool Used Purpose
Core LLM llama3.2 (via ollama pull llama3.2) Used for the final question answering, text synthesis, and conversation.
Embedding Model mxbai-embed-large (via ollama pull mxbai-embed-large) Used to convert the source text (and later the user's questions) into numerical vectors for searching the database.
Python Dependencies langchain, langchain-ollama, langchain-chroma, pandas Libraries to manage the RAG flow, interact with Ollama, run the vector database, and read the CSV data.

3. Data Preparation and Local RAG Pipeline

The tutorial uses a sample CSV file (realistic_restaurant_reviews.csv) containing reviews (Title, Date, Rating, Review) as the local knowledge source.

A. Data Loading and Document Creation (Vectorization)

  1. Load Data: The CSV file is loaded into a pandas DataFrame.
  2. Define Persistence: A folder (./chrome_langchain_db) is defined to save the vector database persistently, avoiding the need to re-embed data on every run.
  3. Document Structuring: The code iterates through the DataFrame, creating a LangChain Document object for each review.
    • The crucial page_content (the text used for vectorized search) is created by combining the Title and the full Review text.
    • Metadata (Rating, Date) is also attached but is not used for the vector search itself.
  4. Vector Store Creation: A Chroma vector store is created using the downloaded OllamaEmbeddings model and the defined persistence directory.
  5. Data Embedding: The documents are added to the vector store (vector_store.add_documents(...)), which automatically uses the local embedding model to convert the text into numerical vectors and stores them in ChromaDB.

B. Creating the Retriever

  • The Chroma vector store is converted into a retriever object.
  • The retriever is configured to look up the top 5 most relevant documents (search_kwargs={"k": 5}) when queried.

4. Final Agent Orchestration

The logic connects the local data source (RAG) and the local LLM.

  1. LLM Initialization: An OllamaLLM object is instantiated using the llama3.2 model.
  2. Prompt Template: A prompt template is defined to instruct the LLM, with two dynamic placeholders:
    • {reviews}: For the context retrieved from the database.
    • {question}: For the user's original question.
  3. Chain Creation: A single, simple chain is created: chain = prompt | model.
  4. The Conversational Loop (RAG in Action):
    • The script enters a continuous loop asking the user for a question.
    • When a question is entered:
      • Retrieval: reviews = retriever.invoke(question) sends the question to the vector database, which returns the top 5 most relevant documents (e.g., reviews about "vegan options").
      • Generation: result = chain.invoke({"reviews": reviews, "question": question}) feeds the retrieved context and the question to the LLM.
      • Output: The LLM synthesizes a contextual answer based only on the retrieved reviews and prints the final result.

This pipeline successfully demonstrates a high-performance, local AI agent capable of grounded, contextual question-answering, forming the foundation for your museum app concept.