together.pricing
Inference pricing
Over 100 leading open-source Chat, Language, Image, Code, and Embedding models are available through the Together Inference API. For these models you pay just for what you use.
Serverless Endpoints
Prices are per 1 million tokens including input and output tokens for Chat, Language and Code models, only including input tokens for Embedding models, and based on image size and steps for Image models. Special promotional pricing for Llama-2 and CodeLlama models.
CHat, language, and code models
Model size
price 1M tokens
Up to 4B
price 1M tokens
$0.10
4.1B - 8B
price 1M tokens
$0.20
8.1B - 21B
price 1M tokens
$0.30
21.1B - 41B
price 1M tokens
$0.80
41B - 70B
price 1M tokens
$0.90
Mixture-of-experts
Model size
price 1M tokens
Up to 56B total parameters
price 1M tokens
$0.60
56.1B - 176B total parameters
price 1M tokens
$1.20
176.1B - 480B total parameters
price 1M tokens
$2.40
EMbeddings models
Model size
price 1M tokens
Up to 150M
price 1M tokens
$0.008
151M - 350M
price 1M tokens
$0.016
Image models
Image Size
25 steps
50 steps
75 steps
100 steps
512X512
25 steps
$0.001
50 steps
$0.002
75 steps
$0.0035
100 steps
$0.005
1024X1024
25 steps
$0.01
50 steps
$0.02
75 steps
$0.035
100 steps
$0.05
GENOMIC MODELS
Model size
price 1M tokens
4.1B - 8B
price 1M tokens
$2.00
Dedicated instances
When hosting your own model you pay hourly for the GPU instances, whether it is a model you fine-tuned using Together Fine-tuning or any other model you choose to host. You can start or stop your instance any time through the web-based Playground or using the start/stop instance APIs.
Your fine-tuned models
hardware type
price per hour hosting
model size
Fractional L40 48GB
price per hour hosting
$0.70
price per hour hosting
Up to 4B
1x L40 48GB
price per hour hosting
$1.40
price per hour hosting
4.1B - 21B
2x L40 48GB
price per hour hosting
$2.80
price per hour hosting
21.1B - 41B
2x A100 80GB
price per hour hosting
$6.17
price per hour hosting
41.1B - 70B
8x7B MoE
Interested in a dedicated instance for your own model?
Fine-tuning pricing
Pricing for fine-tuning is based on model size, dataset size, and the number of epochs.
Download checkpoints and final model weights.
View job status and logs through CLI or Playgrounds.
Deploy a model instantly once it’s fine-tuned.
Try the interactive calculator
Together GPU Clusters Pricing
Together Compute provides private, state of the art clusters with H100 and A100 GPUs, connected over fast 200 Gbps non-blocking Ethernet or up to 3.2 Tbps InfiniBand networks.
haRDWARE TYPES AVAILABLE
NETWORKING
A100 PCIe 80GB
price 1k tokens
200 Gbps non-blocking Ethernet
A100 SXM 80GB
price 1k tokens
200 Gbps non-blocking Ethernet or 1.6 Tbps Infiniband configs available
H100 80GB
price 1k tokens
3.2 Tbps Infiniband