Coding Mixture of Agents

A custom-built Mixture-of-Agents (MoA) synthesis mix optimized for challenging coding tasks. This mix leverages multiple 'proposer' models, including Claude 3.5 Sonnet and GPT-4 Turbo, with an 'aggregation' model that synthesizes their outputs. In benchmarks, it demonstrated 28% better performance compared to Claude 3.5 Sonnet alone, particularly excelling at complex programming challenges.

Updated Aug 26$6.00/M input tokens$20.00/M output tokensGithubGitHub

API

Example

Models

This mix uses the models below:

Anthropic Icon

claude-3-5-sonnet-20241022

Provided by Anthropic

OpenAI Icon

gpt-4-turbo-2024-04-09

Provided by OpenAI

OpenAI Icon

gpt-4o-2024-08-06

Provided by OpenAI

Readme

Coding Mixture of Agents

This is a synthesis mix that uses a mixture-of-agents architecture to give you the highest quality answers. Your request is sent to two "proposer" models (Claude 3.5 Sonnet and GPT-4 Turbo). The responses from these models are passed to an "aggregation" model (GPT-4o) which synthesizes the answers, corrects issues, and returns code.

To learn more about mixture of agents, check out our Github repo here.

Categories

  • 👩🏽‍💻 Coding
  • 🦾 Mixture of Agents
  • 🧠 Reasoning

Quality

According to our evaluations, this mix performs 18% better than the current leader in Bigcodebench Instruct Hard, an evaluation aimed at measuring the performance of LLMs for difficult coding tasks.

We ran the evaluation on 148 problems in the Bigcodebench Instruct Hard dataset using the provided docker container. Our SynthCode Mix score 31.1% (Pass@1) on this dataset compared to the next best model, GPT-4o, which scored 26.4% (Pass@1).

Results

Performance

This model may take longer to produce a response due to the multiple sub-requests. We are working hard to reduce the latency while retaining the same high quality.

Composition

This mix produces responses from the following models:

Model NameType
Claude 3.5 SonnetProposer
GPT-4 TurboProposer
GPT-4oAggregator