Building a RAG app with Node and React Native: Part 1

The code for this project can be found on GitHub

If you keep up with the world of AI-powered apps, then you've likely heard of RAG (Retrieval Augmented Generation). It's a way to generate answers with an LLM like GPT by providing context retrieved from some specific source, like an article or blog post. In this series of posts, I'll document my journey of building a RAG app using React Native, Langchain, and Chroma that lets users chat with Wikipedia articles.

Architectural Overview

Here's a visual overview of how the first iteration of this app is architected.

If you're new to RAG apps, the Langchain docs contain a great overview of the concepts and technologies involved, but I'll define some key terms here.

Langchain is a framework for building AI-powered applications. It provides APIs for Python and JavaScript. Our Node server uses the JS API.
Chroma is a vector database that we use to store the embeddings created from our Wikipedia articles
Embeddings are a way to represent text (i.e. words) in a way that lets you search against other embeddings for semantic similarity (i.e. similar meaning), as opposed to traditional text search, which can only search for characters in the text

In this first iteration, our React Native app is truly just a client for collecting user queries and displaying the LLM's answers. But thanks to projects like llama.rn, it's entirely feasible to do much if not all of the AI work on a user's device. I may explore that avenue in a future post.

The Server

I'll explain the server first, since that's where most of the interesting work is done in this app. Here are the main components:

An http server that exposes a single endpoint, POST /api that expects a request payload with this two properties:

article - the URL of the Wikipedia article you want to chat with
question - the question the user wants to ask with this request

A query service function takes the request data and that invokes Langchain to perform steps 2, 4, and 5 from the diagram above, namely:

Check whether the article exists in the vector database and if not, download it, send it to the LLM to create embeddings, and store it in the database
Perform a similarity search on the database with the user questions to find the relevant chunks to send to the LLM
Send to retrieved chunks and the users question to the LLM for completion

The Client

Our app client is currently a bare-bones Expo app that lets the user select from a hard-coded list of Wikipedia articles and submit a question about it. It calls our api endpoint with the data and presents the user with the result. Here's a short demo of the app in action.

Clearly this is only the most minimum example of a RAG app built with React Native. In future posts I'd like to show how we can add the following features.

Integration the Wikipedia API for searching for articles
Streaming text responses as they are retrieved from the server (as is common in most LLM chat apps)
Maintain a searchable chat history