The fastest way to build AI applications that never go down.
Bifrost is a high-performance AI gateway that connects you to 8+ providers (OpenAI, Anthropic, Bedrock, and more) through a single API. Get automatic failover, load balancing, and zero-downtime deployments in under 30 seconds.
🚀 Just launched: Native MCP (Model Context Protocol) support for seamless tool integration
⚡ Performance: Adds only 11µs latency while handling 5,000+ RPS
🛡️ Reliability: 100% uptime with automatic provider failover
⚡ Quickstart (30 seconds)
Go from zero to production-ready AI gateway in under a minute. Here's how:
What You Need
- Any AI provider API key (OpenAI, Anthropic, Bedrock, etc.)
- Docker OR Go 1.23+ installed
- 30 seconds of your time ⏰
Using Bifrost HTTP Transport
📖 For detailed setup guides with multiple providers, advanced configuration, and language examples, see Quick Start Documentation
Step 1: Create your config (copy & paste this)
{
"providers": {
"openai": {
"keys": [
{
"value": "env.OPENAI_API_KEY",
"models": ["gpt-4o-mini"],
"weight": 1.0
}
]
}
}
}
Step 2: Add your API key
export OPENAI_API_KEY=your_openai_api_key
Step 3: Start Bifrost (choose one)
# 🐳 Docker
docker pull maximhq/bifrost
docker run -p 8080:8080 \
-v $(pwd)/config.json:/app/config/config.json \
-e OPENAI_API_KEY \
maximhq/bifrost
# 🔧 Or install Go binary (Make sure Go is in your PATH)
go install github.com/maximhq/bifrost/transports/bifrost-http@latest
bifrost-http -config config.json -port 8080
Step 4: Test it works
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"model": "gpt-4o-mini",
"messages": [
{"role": "user", "content": "Hello from Bifrost! 🌈"}
]
}'
🎉 Boom! You're done!
Your AI gateway is now running and ready for production. You can:
- Add more providers for automatic failover
- Scale to thousands of requests per second
- Drop this into existing OpenAI/Anthropic code with zero changes
Want more? See our Complete Setup Guide for multi-provider configuration, failover strategies, and production deployment.
- Multi-Provider Support: Integrate with OpenAI, Anthropic, Amazon Bedrock, Mistral, Ollama, and more through a single API
- Fallback Mechanisms: Automatically retry failed requests with alternative models or providers
- Dynamic Key Management: Rotate and manage API keys efficiently with weighted distribution
- Connection Pooling: Optimize network resources for better performance
- Concurrency Control: Manage rate limits and parallel requests effectively
- Flexible Transports: Multiple transports for easy integration into your infra
- Plugin First Architecture: No callback hell, simple addition/creation of custom plugins
- MCP Integration: Built-in Model Context Protocol (MCP) support for external tool integration and execution
- Custom Configuration: Offers granular control over pool sizes, network retry settings, fallback providers, and network proxy configurations
- Built-in Observability: Native Prometheus metrics out of the box, no wrappers, no sidecars, just drop it in and scrape
- SDK Support: Bifrost is available as a Go package, so you can use it directly in your own applications.
- Seamless Integration with Generative AI SDKs: Effortlessly transition to Bifrost by simply updating the
base_url in your existing SDKs, such as OpenAI, Anthropic, GenAI, and more. Just one line of code is all it takes to make the switch.
Bifrost is built with a modular architecture:
bifrost/
├── core/ # Core functionality and shared components
│ ├── providers/ # Provider-specific implementations
│ ├── schemas/ # Interfaces and structs used in bifrost
│ ├── bifrost.go # Main Bifrost implementation
│
├── docs/ # Documentations for Bifrost's configurations and contribution guides
│ └── ...
│
├── tests/ # All test setups related to /core and /transports
│ └── ...
│
├── transports/ # Interface layers (HTTP, gRPC, etc.)
│ ├── bifrost-http/ # HTTP transport implementation
│ └── ...
│
└── plugins/ # Plugin Implementations
├── maxim/
└── ...
The system uses a provider-agnostic approach with well-defined interfaces to easily extend to new AI providers. All interfaces are defined in core/schemas/ and can be used as a reference for contributions.
There are three ways to use Bifrost - choose the one that fits your needs:
1. As a Go Package (Core Integration)
For direct integration into your Go applications. Provides maximum performance and control.
📖 2-Minute Go Package Setup
Quick example:
go get github.com/maximhq/bifrost/core
2. As an HTTP API (Transport Layer)
For language-agnostic integration and microservices architecture.
📖 30-Second HTTP Transport Setup
Quick example:
docker pull maximhq/bifrost
docker run -p 8080:8080 \
-v $(pwd)/config.json:/app/config/config.json \
-e OPENAI_API_KEY \
maximhq/bifrost
3. As a Drop-in Replacement (Zero Code Changes)
Replace existing OpenAI/Anthropic APIs without changing your application code.
📖 1-Minute Drop-in Integration
Quick example:
- base_url = "https://api.openai.com"
+ base_url = "http://localhost:8080/openai"
Bifrost adds virtually zero overhead to your AI requests. In our sustained 5,000 RPS benchmark (see full methodology in docs/benchmarks.md), the gateway added only 11 µs of overhead per request – that's less than 0.001% of a typical GPT-4o response time.
Translation: Your users won't notice Bifrost is there, but you'll sleep better knowing your AI never goes down.
| Metric |
t3.medium |
t3.xlarge |
Δ |
| Added latency (Bifrost overhead) |
59 µs |
11 µs |
-81 % |
| Success rate @ 5 k RPS |
100 % |
100 % |
No failed requests |
| Avg. queue wait time |
47 µs |
1.67 µs |
-96 % |
| Avg. request latency (incl. provider) |
2.12 s |
1.61 s |
-24 % |
🔑 Key Performance Highlights
- Perfect Success Rate – 100 % request success rate on both instance types even at 5 k RPS.
- Tiny Total Overhead – < 15 µs additional latency per request on average.
- Efficient Queue Management – just 1.67 µs average wait time on the t3.xlarge test.
- Fast Key Selection – ~10 ns to pick the right weighted API key.
Bifrost is deliberately configurable so you can dial the speed ↔ memory trade-off:
| Config Knob |
Effect |
initial_pool_size |
How many objects are pre-allocated. Higher = faster, more memory |
buffer_size & concurrency |
Queue depth and max parallel workers (can be set per provider) |
| Retry / Timeout |
Tune aggressiveness for each provider to meet your SLOs |
Choose higher settings (like the t3.xlarge profile above) for raw speed, or lower ones (t3.medium) for reduced memory footprint – or find the sweet spot for your workload.
Need more numbers? Dive into the full benchmark report for breakdowns of every internal stage (JSON marshalling, HTTP call, parsing, etc.), hardware sizing guides and tuning tips.
Everything you need to master Bifrost, from 30-second setup to production-scale deployments.
🚀 I want to get started (2 minutes)
🎯 I want to understand what Bifrost can do
⚙️ I want to deploy this to production
📱 I'm migrating from another tool
🔗 Join our Discord for:
- ❓ Quick setup assistance and troubleshooting
- 💡 Best practices and configuration tips
- 🤝 Community discussions and support
- 🚀 Real-time help with integrations
See our Contributing Guide for detailed information on how to contribute to Bifrost. We welcome contributions of all kinds—whether it's bug fixes, features, documentation improvements, or new ideas. Feel free to open an issue, and once it's assigned, submit a Pull Request.
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
Built with ❤️ by Maxim
Комментарии
Комментариев пока нет. Будьте первым.