Cerebras

Fastest inference via wafer-scale chips

https://cerebras.ai📍 Sunnyvale, USApaidinferenceultra-fasthardware

API サービス (1)

Chat CompletionsRESTbearer

Ultra-fast inference — 2,100 tok/s on Llama 3.3

https://api.cerebras.ai/v1

ドキュメント →
chatultra-fast2100-tok-per-sec

AI モデル (1)

Llama 3.3 70B (on Cerebras)llama-3.3-70b
入力: $0.6/M出力: $0.6/MContext: 131K
ultra-fast2100-tok-per-sec