AI Decisions You Can Actually Trust

Stop relying on single, hallucination-prone AI models. The Arguing Machines subjects every automated task to rigorous adversarial debate, delivering bulletproof, enterprise-grade decisions via a simple API.

Bring Your Own Key: OpenAI Anthropic Google Gemini OpenAI-compatible APIs

The Core Value

We provide the cognitive rigor; you provide the intelligence. Stop writing validation loops and get deterministic decisions from non-deterministic models.

Say Goodbye to Hallucinations

Single AI models are eager to please and prone to making things up. Our platform pits AI against AI, forcing ideas to be defended with evidence. Weak logic is destroyed before it ever reaches your application.

Radical Transparency

Never guess why an AI made a choice. Every decision comes with a complete, human-readable audit trail. Usage and cost are visible per run, per day, and per agent — self-serving dashboards and graphs so you always know what you're paying for.

Zero Vendor Lock-in (BYOK)

You shouldn't be tied to one AI provider. Plug in your own API keys for OpenAI, Anthropic, Google Gemini, or any OpenAI-compatible models. We orchestrate the debate; you own the intelligence.

Cognitive Rigor as a Service

Single AI models are eager to please and prone to making things up. Our platform pits AI against AI, forcing ideas to be defended with evidence before a decision is reached.

Phase 0
Decompose
Break complex, ambiguous tasks into actionable sub-decisions
Phase 1
Specialists
Domain experts analyse the problem from multiple angles in parallel
Phase 2
Challengers
Devil's advocates aggressively hunt for blind spots and logical flaws
Phase 3
Synthesizer
Resolves conflicts using only surviving, validated evidence
Phase 4
Consensus
Rules engine evaluates thresholds: proceed, loop to retry, or escalate
Phase 5
Delivery
You receive a structured, actionable JSON decision with a full audit trail

Stop writing validation loops. Get deterministic decisions from non-deterministic models. Every step of the deliberation is securely tracked and statefully managed, ensuring that complex debates never derail. Built-in institutional memory means the system learns from past outcomes, becoming smarter with every decision.

Built for High-Stakes Business Logic

Apply organizational friction to automated tasks. Configure the Boardrooms, consensus rules, and veto thresholds for your domain.

Investment & Risk

Automated Due Diligence

The system won't just summarize a pitch deck; it actively hunts for financial inconsistencies and raises vetoes before you deploy capital.

  • Quant + Macro + Fundamental analysts
  • Risk Officer veto at 70% confidence
  • Automatic loop if consensus is too low
Architecture Review

Violent Stress-Testing

Don't just auto-generate code. Have the system aggressively test it for security vulnerabilities and performance bottlenecks before it merges.

  • Security and privacy hold hard vetoes
  • Architects review all code proposals
  • Degraded agents trigger automatic retry
Medical & Legal Triage

Zero-Margin-of-Error

For environments where mistakes cost lives or lawsuits. The system requires hard consensus and mandatory evidence citations before outputting a recommendation.

  • Min 2 debate rounds enforced
  • Synthesizer must cite every participant
  • Escalates to a human on deadlock

Pay Per Decision

You bring the AI provider keys and pay them directly for inference. We charge only for the deliberation infrastructure. Start free, scale as you grow.

Free
$0 / month
  • 5 decisions / month
  • 22 default agents
  • 3 Boardroom templates
  • Any AI provider (Bring Your Own Key)
  • Dashboard + API
Team
$99 / month
  • 500 decisions / month
  • Everything in Builder
  • Overage at $0.25/decision
  • Priority support
  • Advanced analytics
Scale
$0.25 / decision
  • Unlimited decisions
  • Everything in Team
  • $0.15 each above 2,000/mo
  • Dedicated support
  • SLA guarantee

Frequently Asked Questions

Everything you need to know about multi-agent AI deliberation.

What is multi-agent AI deliberation?
Instead of asking one AI for an answer, deliberation forces multiple specialist AI agents to analyse the problem from different angles, then pits challenger agents against them to find weaknesses. A synthesizer resolves conflicts, and a consensus engine decides whether the outcome is strong enough to proceed -- or whether to loop back for more analysis or escalate to a human. This produces higher-quality decisions than any single model, with a full audit trail of who said what and why.
What does "Bring Your Own Key" mean?
You provide your own API key from any supported AI provider -- OpenAI, Anthropic, Google Gemini, or any OpenAI-compatible endpoint like Groq, Together AI, Mistral, or DeepSeek. Your key is encrypted with AES-256-GCM at rest and passed to agents per-request. You pay your AI provider directly. The Arguing Machines charges only for the deliberation infrastructure, not for inference.
Which AI providers are supported?
Anthropic (Claude Haiku, Sonnet, Opus), OpenAI (GPT-4o, GPT-4.1, o3, o4-mini), Google (Gemini 2.0 Flash, Gemini 2.5 Pro), and any OpenAI-compatible API. This covers Groq, Together AI, Mistral, DeepSeek, Ollama, Azure OpenAI, and more. You can switch providers at any time from the Settings page, and auto-migrate your agent fleet to the new provider's model lineup.
How much does a typical decision cost in AI tokens?
It depends on the provider and the number of agents. A typical finance deliberation with 3 specialists, 3 challengers, a synthesizer, and a decision exec costs roughly $0.30-0.60 in API tokens on Anthropic, $0.15-0.40 on OpenAI, or $0.05-0.15 on Gemini. The platform fee is separate and based on your plan.
Can I create custom Boardrooms and Agents?
Yes, on the Builder plan and above. You can create custom Boardrooms with specific rules (like unanimous gate vs. weighted quorum) and staff them with custom agents. You define the agents' names, roles (analyst, skeptic, synthesizer), prompts, and model assignments. This lets you build dedicated decision flows for specific business problems.
What happens if agents disagree?
That is the entire point. When agents disagree, the synthesizer must resolve the conflict by citing evidence from each side. The consensus engine then evaluates whether the resolution meets the configured confidence threshold. If not, the system loops back to an earlier phase with the failure reason injected, forcing agents to address the specific weakness. After a configurable number of loops, it escalates to a human operator for final judgment.