Connecting your application

In this guide, you will learn how to connect your application to your AI Gateway. You will need to have an AI Gateway created to continue with this guide.

Once you have configured a Gateway in the AI Gateway dashboard, click on “API Endpoints” to find your AI Gateway endpoint. AI Gateway offers multiple endpoints for each Gateway you create - one endpoint per provider, and one Universal Endpoint.

Universal

https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY

AI Gateway offers multiple endpoints for each Gateway you create - one endpoint per provider, and one Universal Endpoint. The Universal Endpoint requires some adjusting to your schema, but supports additional features. Some of these features are, for example, retrying a request if it fails the first time, or configuring a fallback model/provider when a request fails.

You can use the Universal endpoint to contact every provider. The payload is expecting an array of message, and each message is an object with the following parameters:

provider : the name of the provider you would like to direct this message to. Can be openai, workers-ai, or any of our supported providers.
endpoint: the pathname of the provider API you’re trying to reach. For example, on OpenAI it can be chat/completions, and for Workers AI this might be @cf/meta/llama-2-7b-chat-int8. See more in the sections that are specific to each provider.
authorization: the content of the Authorization HTTP Header that should be used when contacting this provider. This usually starts with “Token” or “Bearer”.
query: the payload as the provider expects it in their official API.

Requestcurl https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY_NAME -X POST \  --header 'Content-Type: application/json' \  --data '[  {    "provider": "workers-ai",    "endpoint": "@cf/meta/llama-2-7b-chat-int8",    "headers": {      "Authorization": "Bearer XXXX",      "Content-Type": "application/json"    },    "query": {      "messages": [        {          "role": "system",          "content": "You are a friendly assistant"        },        {          "role": "user",          "content": "Why is pizza so good"        }      ]    }  },  {    "provider": "openai",    "endpoint": "chat/completions",    "headers": {      "Authorization": "Bearer XXXX",      "Content-Type": "application/json"    },    "query": {      "model": "gpt-3.5-turbo",      "stream": true,      "messages": [        {          "role": "user",          "content": "What is Cloudflare?"        }      ]    }  }]'

The above will send a request to Workers AI Inference API, if it fails it will proceed to OpenAI. You can add as many fallbacks as you need, just by adding another JSON in the array.

Workers AI

https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/workers-ai/

When making requests to Workers AI, replace https://api.cloudflare.com/client/v4/accounts/ACCOUNT_TAG/ai/run in the URL you’re currently using with https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/workers-ai.

Then add the model you want to run at the end of the URL. You can see the list of Workers AI models and pick the ID.

You’ll need to generate an API token with Workers AI read access and use it in your request.

Request to Workers AI llama modelcurl https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/workers-ai/@cf/meta/llama-2-7b-chat-int8  -X POST \  --header 'Authorization: Bearer $TOKEN' \  --header 'Content-Type: application/json' \  --data '{ "prompt": "Where did the phrase Hello World come from" }'

Request to Workers AI text classification modelcurl https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/workers-ai/@cf/huggingface/distilbert-sst-2-int8 -X POST \  --header 'Authorization: Bearer $TOKEN' \  --header 'Content-Type: application/json' \  --data '{ "text": "This pizza is amazing!" }'

Anthropic

https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/anthropic

Example fetch requestcurl -X POST https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/anthropic/v1/messages \    -H 'x-api-key: XXX' \    -H "anthropic-version: 2023-06-01" \    -H 'Content-Type: application/json' \    -d '{      "model": "claude-3-opus-20240229",      "max_tokens": 1024,      "messages": [          {"role": "user", "content": "Hello, world"}      ]    }'

If you are using the @anthropic-ai/sdk, you can set your endpoint like this:

index.jsimport Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({  apiKey: env.ANTHROPIC_API_KEY,   baseURL: "https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/anthropic",
});

const message = await anthropic.messages.create({  model: 'claude-3-opus-20240229',  messages: [{role: "user", content: "When is halloween?"}],  max_tokens: 1024,

});

Amazon Bedrock

https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/aws-bedrock

When making requests to Amazon Bedrock, replace https://bedrock-runtime.us-east-1.amazonaws.com/ in the URL you’re currently using with https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/aws-bedrock/bedrock-runtime/us-east-1/, then add the model you want to run at the end of the URL.

With Bedrock, you will need to sign the URL before you make requests to AI Gateway. You can try using the aws4fetch SDK. For example:

import { AwsClient } from 'aws4fetch'
inferface Env {  accessKey: string;  secretAccessKey: string;
}

export default {  async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
    // replace with your configuration    const cfAccountId = "ACCOUNT_ID";    const gatewayName = "GATEWAY_NAME";    const region = 'us-east-1';
    // added as secrets (https://developers.cloudflare.com/workers/configuration/secrets/)    const accessKey = env.accessKey;    const secretKey = env.secretAccessKey;
    const requestData = {      inputText: "What does ethereal mean?"    };
    const headers = {      'Content-Type': 'application/json'    };
    // sign the original request    const stockUrl = new URL("https://bedrock-runtime.us-east-1.amazonaws.com/model/amazon.titan-embed-text-v1/invoke")
    const awsClient = new AwsClient({      accessKeyId: accessKey,      secretAccessKey: secretKey,      region: region,      service: "bedrock"    });
    const presignedRequest = await awsClient.sign(stockUrl.toString(), {      method: "POST",      headers: headers    });
    // change the signed request's host to AI Gateway    const stockUrlSigned = new URL(presignedRequest.url);    stockUrlSigned.host = "gateway.ai.cloudflare.com"    stockUrlSigned.pathname = `/v1/${cfAccountId}/${gatewayName}/aws-bedrock/bedrock-runtime/${region}/model/amazon.titan-embed-text-v1/invoke`
    // make request    const response = await fetch(stockUrlSigned, {      method: 'POST',      headers: presignedRequest.headers,      body: JSON.stringify(requestData)    })
    if (response.ok && response.headers.get('content-type')?.includes('application/json')) {      const data = await response.json();      return new Response(JSON.stringify(response));    } else {      return new Response("Invalid response", { status: 500 });    }  },
};

Azure OpenAI

https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/azure-openai/RESOURCE_NAME/MODEL_NAME

When making requests to Azure OpenAI, you will need:

AI Gateway account tag
AI Gateway gateway name
Azure OpenAI API key
Azure OpenAI resource name
Azure OpenAI deployment name (aka model name)

Your new base URL will use the data above in this structure: https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/azure-openai/RESOURCE_NAME/DEPLOYMENT_NAME. Then, you can append your endpoint and api-version at the end of the base URL, like .../chat/completions?api-version=2023-05-15.

Example fetch requestcurl --request POST \  --url 'https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/azure-openai/RESOURCE_NAME/DEPLOYMENT_NAME/chat/completions?api-version=2023-05-15' \  --header 'Content-Type: application/json' \  --header 'api-key: KEY' \  --data '{  "messages": [    {      "role": "system",      "content": "You are a helpful assistant."    },    {      "role": "user",      "content": "What is Cloudflare?"    }  ]}'

If you are using the openai-node library, you can set your endpoint like this:

index.jsimport OpenAI from "openai";
  const resource = 'xxx'; //without the .openai.azure.com  const model = 'xxx';  const apiVersion = 'xxx';  const apiKey = env.AZURE_OPENAI_API_KEY;
  const azure_openai = new OpenAI({    apiKey,    baseURL: `https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/azure-openai/${resource}/${model}`,    defaultQuery: { 'api-version': apiVersion },    defaultHeaders: { 'api-key': apiKey },  });

Google Vertex AI

https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/google-vertex-ai

When making requests to Google Vertex, you will need:

AI Gateway account tag
AI Gateway gateway name
Google Vertex API key
Google Vertex Project Name
Google Vertex Region (e.g., us-east4)
Google Vertex model

Your new base URL will use the data above in this structure: https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/google-vertex-ai/v1/projects/PROJECT_NAME/locations/REGION.

Then you can append the endpoint you want to hit, for example: /publishers/google/models/gemini-1.0-pro-001:streamGenerateContent

So your final URL will come together as: https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/google-vertex-ai/v1/projects/PROJECT_NAME/locations/REGION/publishers/google/models/gemini-1.0-pro-001:streamGenerateContent

Example fetch requestcurl -X POST "https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/google-vertex-ai/v1/projects/PROJECT_NAME/locations/REGION/publishers/google/models/gemini-1.0-pro-001:streamGenerateContent" \    -H "Authorization: Bearer XXX" \    -H 'Content-Type: application/json' \    -d '{      "contents": [          {            "role": "user",              "parts": [                  {"text": "Tell me a joke"}              ]          }      ]    }'

HuggingFace

https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/huggingface

When making requests to HuggingFace Inference API, replace https://api-inference.huggingface.co/models/ in the URL you’re currently using with https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/huggingface. Note that the model you’re trying to access should come right after, for example https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/huggingface/bigcode/starcoder.

Requestcurl https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/huggingface/bigcode/starcoder -X POST \  --header 'Authorization: Bearer $TOKEN' \  --header 'Content-Type: application/json' \  --data '{    "inputs": "console.log"}'

If you are using the HuggingFace.js library, you can set your inference endpoint like this:

index.jsimport { HfInferenceEndpoint } from '@huggingface/inference'

const hf = new HfInferenceEndpoint(	"https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway}/huggingface/gpt2",	env.HF_API_TOKEN
);

OpenAI

https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/openai

When making requests to OpenAI, replace https://api.openai.com/v1 in the URL you’re currently using with https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/openai.

Requestcurl https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/openai/chat/completions -X POST \  --header 'Authorization: Bearer $TOKEN' \  --header 'Content-Type: application/json' \  --data ' {   		 "model": "gpt-3.5-turbo",   		 "messages": [   			 {   				 "role": "user",   				 "content": "how to build a wooden spoon in 3 short steps? give as short as answer as possible"   			 }   		 ]   	 }'

If you’re using a library like openai-node, set the baseURL to your OpenAI endpoint like this:

index.jsimport OpenAI from 'openai';

const openai = new OpenAI({	apiKey: 'my api key', // defaults to process.env["OPENAI_API_KEY"]	baseURL: "https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/openai"
});

Perplexity

https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/perplexity-ai

Example fetch requestcurl --request POST \     --url https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/perplexity-ai/chat/completions \     --header 'accept: application/json' \     --header 'content-type: application/json' \     --header 'Authorization: Bearer pplx-XXXXXXXXXXXXXXXXX' \     --data '{      "model": "mistral-7b-instruct",      "messages": [        {          "role": "system",          "content": "Be precise and concise."        },        {          "role": "user",          "content": "How many stars are there in our galaxy?"        }      ]    }'

Perplexity doesn’t have their own SDK, but they have compatability with the OpenAI SDK. You can use the OpenAI SDK to make a Perplexity call through AI Gateway as follows:

index.jsimport OpenAI from "openai";
  const perplexity = new OpenAI({    apiKey: env.PERPLEXITY_API_KEY,    baseURL: "https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/perplexity-ai"  });
   const chatCompletion = await perplexity.chat.completions.create({      model: "mistral-7b-instruct",      messages: [{role: "user", content: "What is petrichor?"}],      max_tokens: 20,    });

Replicate

https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/replicate

When making requests to Replicate, replace https://api.replicate.com/v1 in the URL you’re currently using with https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/replicate.

Requestcurl https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/replicate/predictions -X POST \  --header 'Authorization: Token $TOKEN' \  --header 'Content-Type: application/json' \  --data '{    "version": "2796ee9483c3fd7aa2e171d38f4ca12251a30609463dcfd4cd76703f22e96cdf",    "input": {   	 "prompt": "What is Cloudflare?"    }}'

Next Steps

Observe and control

Observe and control your AI applications

Connecting your application

​​ Universal

​​ Workers AI

​​ Anthropic

​​ Amazon Bedrock

​​ Azure OpenAI

​​ Google Vertex AI

​​ HuggingFace

​​ OpenAI

​​ Perplexity

​​ Replicate

​​ Next Steps