OpenAI
import OpenAI from 'openai'

const HICAP_API_KEY = process.env.HICAP_API_KEY

const client = new OpenAI({
  baseURL: 'https://api.hicap.ai/v2',
  apiKey: HICAP_API_KEY,
  defaultHeaders: { "api-key": HICAP_API_KEY },
})

const response = await client.chat.completions.create({
  model: 'gpt-5.2',
  messages: [
    { role: 'user', content: 'Hello, GPT!' }
  ],
  // Same API. 25% less cost.
})

Trusted by teams at

ClineGPTHiveflowMintachuPacer

Real-World Savings Examples

Cline Power User

$456/yr

Daily coding with AI assistance, refactoring, and debugging

Usage5M tokens/mo
ModelClaude 4 Sonnet
Direct$150/mo
HiCap$112/mo

Claude Code Developer

$1,344/yr

Full-time AI pair programming across multiple projects

Usage15M tokens/mo
ModelClaude 4 Sonnet
Direct$450/mo
HiCap$338/mo

Ad Agency Dev Team

$4,500/yr

Team of 5 developers using mixed models for client work

Usage50M tokens/mo
ModelMixed
Direct$1,500/mo
HiCap$1,125/mo
99.9% Uptime SLA
Guaranteed availability for production
OpenAI-Compatible API
Drop-in replacement, zero code changes
Getting started

Choose a model and run requests

Call the standard Chat Completions endpoint and set model to whatever you want to run.

curl https://api.hicap.ai/v2/openai/chat/completions \ -H "Content-Type: application/json" \ -H "api-key: $HICAP_API_KEY" \ -d '{"model":"gpt-5","messages":[{"role":"user","content":"Hello"}]}'
Features

Reserved capacity.
Pay-as-you-go pricing.

We buy reserved GPU capacity in bulk, then let you tap into it on-demand.
You get the speed of provisioned throughput at a fraction of the cost.

Up to 25% lower costs

Access the same models at a fraction of pay-as-you-go pricing through reserved GPU capacity.

60% faster inference

Provisioned throughput means your requests skip the queue. No cold starts, no throttling.

All major models

GPT-5.2, Claude 4 Sonnet, Gemini 3.0 Flash—switch between providers with one line of code.

Built-in usage analytics

Track token usage, costs, and latency across all models. See exactly where your AI budget goes.

Drop-in replacement

Works with existing OpenAI, Anthropic, and Google SDKs. Just change the base URL.

Enterprise-grade reliability

99.9% uptime SLA. Your requests are load-balanced across multiple providers for redundancy.