How it works Models Voice Developers Business Docs

How Hicap Works

Same models you already use, routed through reserved GPU capacity so you pay less. Here's how it works.

STEP 01

Swap Your Base URL

Hicap is a drop-in replacement for the OpenAI API. Point your existing SDK, CLI tool, or extension to our endpoint and you're done.

No new SDK — works with the OpenAI client you already use
Access OpenAI, Anthropic, and Google models through one endpoint
Integrate in under five minutes with minimal code changes

Code Example

import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: "https://api.hicap.ai/v1",
  defaultHeaders: {
    "api-key": process.env.HICAP_API_KEY
  }
});

const response = await openai.chat.completions.create({
  model: "gpt-5.4",
  messages: [{ role: "user", content: "Hello!" }]
});

STEP 02

We Route to Reserved GPUs

Your requests are intelligently routed to our reserved GPU capacity across multiple providers for optimal performance and cost.

Dedicated capacity means consistent, predictable performance
Automatic load balancing across providers
When reserved capacity overflows, requests fall back to on-demand

Request Flow

Your App

Hicap Gateway

Reserved

GPT-5.53,400 TPM

OVERFLOW112% utilized

Claude Opus 4.72,100 TPM

OVERFLOW108% utilized

Gemini 3.1 Pro5,000 TPM

OVERFLOW115% utilized

Dedicated to your app

On-Demand

GPT-5.5ACTIVE

Claude Opus 4.7ACTIVE

Gemini 3.1 ProACTIVE

GPT-5.4STANDBY

Claude Sonnet 4.6STANDBY

Overflow & additional models

STEP 03

Spot Market for Excess Capacity

When your reserved capacity sits idle, it enters the spot market. Other customers buy that excess at dynamic prices — you earn revenue, they save money, and the platform takes a small fee.

Sellers monetize unused reserved capacity instead of letting it sit idle
Buyers access capacity below on-demand rates with dynamic spot pricing
Platform earns a transparent fee on every spot trade
Three-way value creation: seller, buyer, and platform all benefit

Spot Market Flow

Seller

Reserved Capacity100k tokens

62% used internally38k excess → spot market

$0.62 / token

Buyer

Demand: 50k tokens38% savings

76% spot24% on-demand

Platform fee (5%)$1.18

STEP 04

See Where Every Dollar Goes

Get full visibility into token usage, costs, and model performance across dev tools and production apps — all in one dashboard.

Track usage by model, application, and team
Compare dev tooling vs production spend at a glance
Spot your most expensive models and optimize

Usage Insights

Coding Tools

Dev tooling with BYOK configuration.

$246.45

28.6M tokens

Codexcodex-mini-latestOn-Demand18.2M$118.30

Clineclaude-opus-4.5On-Demand10.4M$128.15

Your App

Production workload with reserved capacity.

$1,704.00

10,500 TPM + 66.4M

Plannergpt-5.4Reserved3,400 TPM$720.00

Reasoningclaude-opus-4.6Reserved2,100 TPM$480.00

Summarizergemini-3.0-flashReserved5,000 TPM$240.00

Tool Usegpt-4.1On-Demand52M$156.00

Classifierclaude-sonnet-4.5On-Demand14.4M$108.00

Last used Jun 19, 8:13 PM

Why Choose Hicap?

Production-ready AI infrastructure with enterprise features built in.

Enterprise Security

We never store your prompts or completions. Your data passes through our gateway and is never retained.

Multi-Provider Flexibility

Access OpenAI, Anthropic, Google, and more through a single API endpoint. Switch models without changing providers.

Spot Market Economics

Sell idle reserved capacity or buy excess from others at dynamic spot prices. Three-way value creation for sellers, buyers, and platform.

Real-Time Analytics

Monitor usage, costs, and performance metrics in real-time with detailed dashboards.

Flexible Pricing

Pay as you go with no commitments, or lock in reserved throughput for even deeper savings.

Ready to start saving?

Create an account, swap your base URL, and start paying less for the same models. Setup takes under five minutes.

Get Started Documentation