Open Source·

Mocking OpenAI Streaming APIs (SSE) with Nock

I built an open source library to mock OpenAI streaming APIs (SSE) and i'd like to share how it works under the hood.

Building reliable, testable applications that leverage OpenAI APIs can be a challenge.

The streaming APIs, which use Server-Sent Events (SSE), are incredibly powerful but come with their own complexities when it comes to mocking and testing.

As someone who loves building tools and simplifying workflows, I decided to take this problem head-on and built an open-source library that can mock OpenAI responses, including streaming SSE responses and function calling.

Let me walk you through why and how this library works.

Why Mocking OpenAI APIs Is a Challenge?

The OpenAI APIs use an SSE protocol for their streaming responses.

This means instead of getting a single, neatly packaged JSON response, you’re receiving data in chunks over a persistent connection.

While this makes for smoother and faster UX in real-world apps, it’s a headache to replicate in tests.

My Open-Source Mocking Tool

I built a tool that simplifies mocking OpenAI’s APIs using nock , nock allows you to intercept HTTP requests to a specified endpoint.

It’s called openai-api-mock, and it’s available on npm.

How Does It Work?

it's a simple as calling this function:

const { mockOpenAIResponse } = require('openai-api-mock');

// Call the mockOpenAIResponse function once to set up the mock
mockOpenAIResponse() 
// Now, when you call the OpenAI API, it will return a mock response
const response = await openai.chat.completions.create({
                model: "gpt-3.5",
                stream : true,
                messages: [
                    { role: 'system', content: "You'r an expert chef" },
                    { role: 'user', content: "suggest a recipe for a chicken parmesan" }
                ]
});

// then read it 
for await (const part of response) {
    console.log(part.choices[0]?.delta?.content || '')
}

nock will intercept the api call to this endpoint:

https://api.openai.com/v1/chat/completions

and return a mock response that simulates the streaming SSE response that sends openai data in chunks.

Code Breakdown

Step 1: Creating a Readable Stream

A Readable stream is a way to send data over time.

It's different from sending a single block of data.

In this case, it’s used to simulate a chat system where messages are sent one at a time, rather than all at once.

The Readable stream allows us to push data (like chat messages) into it, which can then be sent to the client in chunks.

In the code, this is achieved by:

const stream = new Readable({
    read() { }
});

This creates a stream without an actual reading mechanism because data will be pushed manually into the stream.

Step 2: Simulating the Data Sending Process (sendData function)

The sendData function is where the actual data transmission happens.

It's designed to send chunks of data to the stream in intervals (with a 200ms delay).

Here’s how it works:

  • The function sends a chunk of data to the stream using stream.push(). The data being sent is generated by getSteamChatObject(), which could be a function returning a fake chat message.
  • It then waits for 200ms before sending the next chunk of data. This simulates a "real-time" chat where new messages are sent over time rather than instantly.
  • After sending a chunk, the function checks if more data needs to be sent. If the count is less than maxCount - 1, it sends another message and calls sendData() again recursively.

Step 3: Ending the Stream

Once the count reaches maxCount - 1 (meaning we've sent the last message):

  • The function sends a special message data: [DONE], indicating that the server is done sending data.
  • The stream is then ended by pushing null, which is a signal that no more data will follow.
else if (count === maxCount - 1) {
    stream.push(`data: [DONE]\n\n`);
    stream.push(null); // End the stream after sending the data
}

Step 4: Starting the Data Sending Process

Once the createChatStream function is called, it immediately starts the process of sending data by calling sendData():

sendData(); // Start sending data

This triggers the entire flow of sending messages one after another.

Step 5: Returning the Stream

Finally, the function returns the stream, so it can be used by other parts of the code (e.g., sent to a client via HTTP or used in further processing).

return stream;

Recap of the Flow:

  • A readable stream is created to simulate real-time data sending.
  • The stream will push fake chat messages at regular intervals (200ms apart).
  • The messages are sent until a set limit (5 messages).
  • Once the messages are sent, a [DONE] message signals the end of the stream. The stream is returned, allowing it to be consumed or sent to a client.
const { faker } = require("@faker-js/faker");
const { Readable } = require('stream');

function createChatStream() {
    const stream = new Readable({
        read() { }
    });

    let count = 0;
    const maxCount = 5;

    function sendData() {
        setTimeout(() => {
            if (count < maxCount - 1) {
                stream.push(data: ${getSteamChatObject()}\n\n);
                count++;
                sendData(); // Call the function recursively until the last iteration
            } else if (count === maxCount - 1) {
                stream.push(data: [DONE]\n\n);
                stream.push(null); // End the stream after sending the data
            }
        }, 200);
    }

    sendData(); // Start sending data

    return stream;
}

By using a readable stream, it sends small chunks of data at intervals, ending with a [DONE] signal.

This approach is ideal for testing or replicating server-sent events (SSE) behavior in real-world applications like chat systems or live data feeds


Copyright © 2024. All rights reserved.