together.solutions

Together AI is the best end-to-end platform for developing your AI applications – no matter your starting point. Let’s build together.

What we offer

Together AI offers cutting-edge products to power AI for your application.

‍

We have the fastest performance, effortless horizontal scalability, easy-to-use developer tools, and an expert team that’s excited to work closely with you.

‍

We’ll make quick work of solving problems together and deploy at the scale of your enterprise.

unmatched performance
We are obsessed with being faster and more efficient.
Our research and innovations bring next-level efficiencies in training and inference that can scale with your needs. Together Inference Engine is the fastest inference stack available.
Learn more
Built to scale with you
Together AI can go from prototype to production with you.
We’ve built a horizontally scalable platform that is optimized to deliver the highest performance while scaling to meet your traffic.
designed for rapid integration
It’s a zero-friction build experience for your developers.
And when you’re ready to bring your model into your apps, integration is snappy. With our easy-to-use API, your fine-tuned model can be seamlessly integrated into business processes in a matter of days.
Read our docs
world-class support
Put a world-class team of AI researchers to work for you.
We understand what it takes to train AI models to meet business goals. Our team can help you prepare your datasets, optimize them for accuracy, train your own private AI model, and deploy it in a scalable way – all to drive measurable results for your business.
Contact us
collaborate
Your team can work together to build, tune, and test AI models on Together AI.
Share fine-tuned models across your team, collaborate on testing, analyze usage from team members, and set up API keys for each phase of your application development.
Contact us

INFERENCE

3X FASTER¹

network
c0mmunication

117x LOWER²

relative to aws

4x LOWER cost³

Together Inference: Faster and lower cost than GPT 3.5 Turbo

Together Inference provides incredible speed — 1.3x faster.⁴
Together Inference is 6x lower cost.⁵

Cost and speed comparison with GPT 3.5 Turbo

Pricing

We have clusters available for you

Reserve your cluster now

Customer Stories

Together is the partner of choice for the worlds most innovative AI developers.

Pika Labs, a video generation company founded by two Stanford PhD students, built its text-to-video model on Together GPU Clusters. As they got traction, Pika built new iterations of the model from scratch with Together GPU Clusters, and they scaled their inference volume as they grew to millions of videos generated per month.

$1.1 million
Saved over 5 months
4 hours
Time to training start
392,300
Discord users

Problem
Needed efficient compute capacity that scaled from prototype to production. Having fast and efficient performance for training was a must. They needed to move quickly – they didn’t have time to worry about setting up their own training infrastructure and they needed a partner who could scale with their difficult-to-forecast traffic.
Solution
Pika used Together Inference API to rapidly prototype using the easy-to-use open-source model library. Once the team decided to build their own models from the ground up, they opted for the unparalleled compute power of Together GPU Clusters. And once they launched the product and saw user traction grow exponentially, Pika scaled inference seamlessly.
Results
Pika grew to millions of videos generated per month with the top users spending ~10 hours per day on the platform — all within 6 months of being founded.
‍

“Together GPU Clusters provided a combination of amazing training performance, expert support, and the ability to scale to meet our rapid growth to help us serve our growing community of AI creators.”

‍

—

Demi Guo

CEO, Pika Labs

Upstage is a leading LLM company specializing in customized, domain-specific models, and the builder of top-ranked models like Solar. With Together Inference, they were able to make their Solar model available to a wide audience including Together API customers, Poe.com users, and their own customers.

2.8 million
peak token volume per hour
45 tokens per second
Together AI TPS for SOLAR v0 (70B)

Problem‍
Upstage needed to host Solar, their most popular LLM, so that it could be used by the widest possible audience. When the model charted on the Hugging Face Open LLM Leaderboard, they also needed a place that could scale to handle high traffic while maintaining fast performance and cost efficiency.
‍
Solution
Upstage chose Together Inference serverless endpoints to host their model because of the user-friendly interface of the API, its competitive pricing, and Together AI’s expert support that made bringup super easy.
Result
The Solar model was deployed on Together Inference, and published on Poe.com. Together Inference easily scaled to serve over 2.8 million peak tokens per hour with exceptional performance — over 45 tokens per second. The Upstage team expanded their partnership and integrating Together AI into their own service.
‍
‍

"We chose Together AI for their competitive pricing, user-friendly interface, and quick service. Truly, it offers an exceptional service experience. I was particularly impressed when their CEO, Vipul, personally jumped in to help with technical questions."

‍

—

Sung Kim

CEO of Upstage AI

Wordware, founded by Cambridge University ML experts Robert Chandler and Filip Kozera, enables seamless collaboration between domain experts and engineers, emphasizing a 'prompt first' approach to building LLM applications. This unique method helps create diverse AI-powered experiences, ranging from simple workflows to intricate agents.

4 models
Integrated into Wordware's platform
16x
Cost reduction for AI-powered NPCs
3-4 Hours
Time to integrate multiple models

Wordware's mission is to enhance the machine learning workflow by removing the dependency on extensive 'ground truth' datasets. Their platform empowers domain experts to quickly refine prompts, improving collaboration and speeding up iterations. Wordware wanted to focus on building the best collaborative web-based IDE for language model programming with seamless model selection and not on the hassle of managing expensive infrastructure.
Wordware adopted Together's infrastructure for its versatility and user-friendly interface. The ability to rapidly prototype and scale using Together's Inference API and the powerful compute capabilities of the service was integral to their progress. The platform's low latency, minimal cold start times, and cost-effectiveness allowed Wordware to experiment with various models, enabling their customers to transition from GPT-4 to Mistral, leading to significant cost reductions, enhanced reliability and reduced latency.
Wordware's innovative approach has led to groundbreaking applications. One notable customer example is the development of AI-powered NPC interactions, in which the cost of operation was reduced by 16x after transitioning to Wordware. This efficiency is attributed to Wordware's token-based pricing and the ability to integrate multiple models seamlessly, like Mistral and OpenChat, offering a unique balance of speed, flexibility, and cost-effectiveness, which Wordware attributes to Together’s API.

"I love the flexibility Together AI provides, from serverless inference endpoints to easy fine-tuning and hosted deployments. We like working with a company who knows what they’re doing. With Together AI, downtime is low and throughput is amazing. That matters so much for us and our end-customers.”

‍

—

Robert Chandler

Co-Founder of Wordware

Nexusflow, a leader in generative AI solutions for cybersecurity, relies on Together GPU Clusters to build robust cybersecurity models as they democratize cyber intelligence with AI.

40%
Cost savings per month
<90 minutes
Onboarding time
Zero
Downtime

Problem
To enhance the capabilities of existing base models with public data, Nexusflow required a cost-effective, reliable, and scalable compute partner. Traditional cloud providers were not able to simultaneously offer the cost-efficiency and the level of guaranteed availability that Nexusflow needed to scale their specialized workloads.
Solution
The team at Nexusflow opted for Together GPU Clusters, seeing it as the perfect "trifecta" in terms of contract length, pricing, and compute availability. They utilized GPUs suitable for their specific workload requirements, and benefited from the unparalleled support that Together’s expert team offers.
Results
Nexusflow completed the onboarding process in <90 minutes and was able to run workloads. Initial hiccups were resolved by Together's support team, ensuring a smooth experience. Nexusflow managed to cut their R&D cloud compute costs by 40%, while experiencing faster response times and lower latency in technical support than other cloud providers.

“In an industry where time and specialized capabilities can mean the difference between vulnerability and security, Together GPU Clusters has helped us scale compute resources quickly in a cost-effective way. Their high-performance infra and top-notch support lets us focus on building state-of-the-art generative AI solutions for cybersecurity."

—

Jian Zhang

CTO of Nexusflow

Arcee is a growing start up in the LLM space building domain adaptive language models for organizations, and they are using Together Custom Models to fine-tune a model with a domain specific dataset.

40B tokens
Used to fine tune
7B
Parameters

Problem
Arcee was looking for more reliable, factual systems that are also cost effective to build domain adaptive language models for Arcee’s customers.
Solution
Arcee made a strategic decision to build a fine-tuned model with Together AI for several compelling reasons: the accessibility of Together API, the quality of the Together AI team, and their commitment to build a good model not just as a technical provider but as a collaborative partner.
Results
Arcee built their model using Together Custom Models including domain-specific data. To optimize the quality of the model, it was trained with a data mixture optimized using DoReMi, an algorithm for finding the optimal mixture of language datasets using Distributionally Robust Optimization.

"Our relationship with Together AI has yielded remarkable achievements, including state-of-the-art models. These models are specialized, grounded, and laser-focused on specific verticals and use cases. Working with Together AI helped us dramatically accelerate development."

—

Mark McQuade

CEO of Arcee

Why open-source

Open source models are best choice for your company. They are faster, more customizable, and more private.

PERFORMANCE
The open-source models featured on Together AI represent the world’s most cutting-edge models.
These models were developed by research communities at leading institutions across the globe including Google, Meta, Open AI, and Stanford. With these models, you’ll get high accuracy, fast performance, and the ability to fine-tune the model to your specific needs.
Explore 100+ models
PRIVACY
If your company values data privacy above all else, open-source is the best choice.
When you take a cutting-edge open-source model from Together AI and train it with your own private data, you’ll create a fine-tuned model that is completely yours – a private, proprietary tool that your company owns. Together AI enables you to do this in a fully private manner on Together Cloud, or in your existing Virtual Private Cloud. This means none of your private data is exposed to the world or used to improve someone else’s model.
transparency
Open-source models share how the model was built including training data, code, and comprehensive quality benchmarks.
This puts you and your developers in the driver’s seat. And it enables you to show your model review board, security team, and executives everything they need to green light deploying generative AI in your application. At Together AI, we care about transparency so that we can give you more control. Let us help you understand the powerful tools at your fingertips.
Contact us
control
With open-source, your developers will have the ability to do more.
You can fully fine-tune any open-source model. You can adjust every layer in the model. You don’t have to update the model on someone else’s schedule. You control what and when you deploy. Your developers will thank you.

Industries & use cases

Speed up your business processes, organize millions of documents, forecast demand for products, develop a conversational chatbot for your sales team — and so much more.

‍

Harness the power of AI applications that are customized to you.

Defect detection

Boost quality control in a production process -- automate visual inspection by identifying missing components using computer vision.

Text and data extraction

Extract and collate critical information from millions of documents at high speed.

Sentiment analsyis

Understand the sentiment of words, sentences, paragraphs, or documents. Tune to your subject matter and language style for a high degree of precision.

News analysis

Pull names, events, and more from news so you can drive insights and make decisions.

Machine condition detection

Assess the condition of your machines through sensor data.

Image and video analysis

Automate editing workflows, catalog your assets and extract meaning from your images and videos.

Forecast business metrics

Create prediction models to forecast your business needs using your data.

Interaction analytics

Remove friction and improve customer journeys with deeper understanding of interactions across channels.

Script-writing

Generate creative starting points for books, movies, or other media. Leverage an AI co-pilot to help with editing scripts and creating a consistent tone.

Text to speech

Generate high quality, natural speech from any text.

Document intelligence

Identify, extract and organize custom data from complex documents to reduce manual operations and improve workflows. Extract clauses, dates, parties, and other custom entities from documents with ease.

Text and speech translation

Automatically translate text or speech between over 100 languages.

Insights and analysis

Extract understanding and insights from unstructured text, and output in a structured form for use in a variety of formats (bullets, tables, sentences, or JSON).

Product summarization

Automate product titles and descriptions at scale, customizing to different regions or audiences to maximize engagement and SEO.

Image, video, and audio generation

Generate high quaility images, video, and audio from text prompts.

Personalization

Enhance the user experience by customizing content to each individual user.

Data Audit

Detect and identify root causes of unexpected changes in metrics such as revenue and retention.

Code generation and understanding

Understand code in dozens of languages, summarize check-ins, identify bugs or issues, and automate code review processes.

Named entity recognition

Identify and extract known entities from bodies of text efficiently and accurately.

Customized document classification

Improve document classification by using features unique to your data.

Chatbots & virtual agents

Communicate with your end users 24x7 with natural language. Add an intelligent conversational layer to any application-- customer support, sales, internal devops, legal assistant, coding co-pilot, social chat and more. Easily extend to email, chat, and voice applications.

Summarization

Efficiently summarize a few paragraphs -- or whole documents.

Automatic speech recognition

Add a voice interface to any product interface or feature to allow your users to interact with your application more efficiently or in new modalities, e.g. driving, hands free.