Google Drive File Ingestion to Supabase Interactive and Knowledge Base Chat with RAG using AI
This n8n Automation automates the process of ingesting files from Google Drive into a Supabase database, preparing them for a knowledge base system. It supports text-based files (PDF, DOCX, TXT, etc.) and tabular data (XLSX, CSV, Google Sheets), extracting content, generating embeddings, and storing data in structured tables. This is a foundational workflow for building a company knowledge base that can be queried via a chat interface (e.g., using a RAG workflow).
Problem Solved
Manually managing a knowledge base with files from Google Drive is time-consuming and error-prone. This workflow solves that by:
- Automatically ingesting files from Google Drive as they are created or updated.
- Extracting content from various file types (text and tabular).
- Generating embeddings for text-based files to enable vector search.
- Storing data in Supabase for efficient retrieval.
- Handling duplicates and errors to ensure data consistency.
Target Audience:
- Knowledge Managers: Build a centralized knowledge base from company files.
- Data Teams: Automate the ingestion of spreadsheets and documents.
-
Developers: Integrate with other workflows (e.g., RAG for querying the knowledge base).
Workflow Description
- File Detection: Triggers when a file is created or updated in Google Drive.
- File Processing: Loops through each file, extracts metadata, and validates the file type.
- Duplicate Check: Ensures the file hasn’t been processed before.
-
Content Extraction:
- Text-based Files: Downloads the file, extracts text, splits it into chunks, generates embeddings, and stores the chunks in Supabase.
- Tabular Files: Extracts data from spreadsheets and stores it as rows in Supabase.
- Metadata Storage: Stores file metadata and basic info in Supabase tables.
-
Error Handling: Logs errors for unsupported formats or duplicates.
The Retrieval-Augmented Generation (RAG). It retrieves relevant information from text documents and tabular data stored in Supabase, then generates natural language responses using OpenAI’s GPT-4o-mini model or any model of your choice. Designed for teams managing internal knowledge, this workflow enables users to ask questions like “What’s the remote work policy?” or “Show me the latest budget data” and receive accurate, context-aware responses in a conversational format.
The second workflow consists of a chat interface powered by n8n’s Chat Trigger node, an AI Agent node for RAG, and several tools to retrieve data from Supabase. Here’s how it works step-by-step:- User Initiates a Chat: The user interacts with a chat interface, sending queries like “Summarize our remote work policy” or “Show budget data for Q1 2025.”
- Query Processing with RAG: The AI Agent processes the query using RAG, retrieving relevant data from Supabase tables and generating a response with OpenAI’s GPT-4o-mini model.
-
Data Retrieval and Response Generation: The workflow uses multiple tools to fetch data:
- Retrieves text chunks from the
documents
table using vector search. - Fetches tabular data from the
document_rows
table based on file IDs. - Extracts full document text or lists available files as needed.
- Generates a natural language response combining the retrieved data.
- Retrieves text chunks from the
- Conversation History Management: Stores the conversation history in Supabase to maintain context for follow-up questions.
-
Response Delivery: Formats and sends the response back to the chat interface for the user to view.
Testing Steps-
Upload Sample Files
- Upload sample documents (e.g., PDFs, DOCX) to your Supabase bucket to simulate real-world data.
-
Verify Processing and Storage
- Check if files are successfully processed and stored in the documents table with embeddings in the vector store.
-
Chatbot Interaction
- Ask simple conversational questions via the chatbot, e.g., "What does Chapter 1 say about the Roman Empire?"
-
Accuracy and Relevance Check
- Assess the accuracy and contextual relevance of the retrieved results against the uploaded documents.
Notes
- Ensure Supabase credentials and the match_documents function are correctly configured.
- Test with varied queries to validate robustness.
-
Upload Sample Files
A powerful utomated knowledge base solution that seamlessly ingests files from Google Drive into Supabase and provides an interactive chat interface powered by RAG (Retrieval-Augmented Generation) with GPT-4o. This workflow transforms your documents and spreadsheets into a searchable, conversational assistant, saving time and improving access to critical company information. 🚀