Skip to main content
Frontier Signal

GitHub Copilot CLI’s Rubber Duck Now Supports Claude for Cross-Model Review

GitHub Copilot CLI's 'Rubber Duck' feature now uses Claude as a critic agent for GPT sessions, enabling cross-model code review and expanding model flexibility.

Operator Briefing

Turn this article into a repeatable weekly edge.

Get implementation-minded writeups on frontier tools, systems, and income opportunities built for professionals.

No fluff. No generic AI listicles. Unsubscribe anytime.

GitHub Copilot CLI’s “Rubber Duck” feature, designed for cross-family code review, now supports additional large language models, specifically integrating Anthropic’s Claude as a critic agent when the primary session uses a GPT model. This enhancement, announced on , allows developers to gain a multi-model perspective on their code suggestions directly within the command-line interface, aiming to improve code quality and provide diverse feedback. For sessions already utilizing Claude, the system will leverage a GPT-powered critic agent, ensuring a cross-model review capability regardless of the primary model in use.

  • GitHub Copilot CLI’s “Rubber Duck” feature now uses Claude as a critic agent for sessions primarily powered by GPT models.
  • Conversely, for Claude-powered sessions, a GPT model will act as the critic, enabling cross-model review.
  • This multi-model approach aims to provide diverse perspectives and potentially higher-quality code suggestions and reviews.
  • The feature is accessible via the /experimental command within the Copilot CLI.

What changed

GitHub has expanded the model support for “Rubber Duck,” a cross-family review agent within the GitHub Copilot CLI. Previously, the specifics of its underlying model configuration for review were less explicit. As of , when a developer’s Copilot CLI session is using a GPT model, the “Rubber Duck” critic agent will now be powered by Claude. Conversely, if the session is primarily using a Claude model, the critic agent will utilize a GPT model [1, 4]. This creates a deliberate cross-pollination of model perspectives for code review, moving beyond a single-model dependency for critical feedback.

This update builds on the continuous evolution of GitHub Copilot CLI, which recently saw enterprise-managed plugins enter public preview on , and ongoing enhancements to Copilot in Visual Studio Code [2, 3]. The “Rubber Duck” feature itself is part of a broader “agent mode” within Copilot CLI, which leverages premium requests and offers flexible model options depending on the specific functionality [7, 8]. The integration of Claude for GPT sessions was noted in the github/copilot-cli releases, specifically under an /experimental flag [5].

How it works

The “Rubber Duck” feature in GitHub Copilot CLI functions as a “cross-family review agent” [1]. When a developer uses Copilot CLI to generate or refine code, the primary model handles the initial suggestions. With this update, if that primary model is a GPT variant, the “Rubber Duck” agent will invoke Anthropic’s Claude model to act as a “critic.” This critic agent then provides an independent review or alternative perspective on the code or suggestions generated by the GPT model. The reverse applies if the primary session is running on Claude; a GPT model will step in as the critic [1].

This mechanism is designed to leverage the distinct strengths and potential biases of different large language models. By having a model from a different “family” review the output of another, the system aims to catch errors, suggest improvements, or offer alternative approaches that a single model might miss. This dual-model approach is part of GitHub Copilot’s broader agent mode, which consumes “Premium requests” and allows for varied model options depending on the specific task [7, 8]. Users can access this specific multi-model “Rubber Duck” functionality through the /experimental command within the Copilot CLI [5].

Why it matters for operators

For operators—be they engineering leads, startup founders, or individual contributors—this multi-model “Rubber Duck” update is more than just a feature; it’s a subtle but significant shift in how we should approach AI-assisted development. The explicit integration of cross-family models like Claude and GPT for review isn’t about mere model swapping; it acknowledges that no single LLM is a silver bullet. Each model has its unique training data, architectural biases, and resulting “personality” in code generation and critique.

What this means in practice is a potential for higher quality, more robust code. An operator should view this as a built-in “second opinion” mechanism, reducing the cognitive load of manual cross-verification. Instead of running code through GPT, then copying it to a Claude-powered chat for review, the process is now streamlined within the CLI. This is particularly valuable for complex logic, security-sensitive code, or performance-critical sections where a fresh, algorithmically distinct perspective can highlight subtle flaws or suggest more elegant solutions. We contend that this move by GitHub validates the “ensemble AI” approach for critical tasks: combining outputs from diverse models to improve overall reliability and quality. Operators should actively experiment with this feature, not just as a novelty, but as a mandatory step in their CI/CD pipelines for AI-generated or AI-assisted code, treating the “Rubber Duck” output as a preliminary, automated peer review. This proactive integration can catch issues earlier, reducing costly refactoring cycles and improving developer velocity.

How to try it today

To access the multi-model “Rubber Duck” feature, developers need to be using GitHub Copilot CLI. The functionality is currently available through an experimental flag. According to the github/copilot-cli releases, users can activate the “Rubber Duck” agent for GPT sessions, powered by Claude, by using the /experimental command [5].

Ensure your GitHub Copilot CLI installation is up-to-date, as shell completions are automatically installed and updated after running copilot update [5]. Once updated, you can explore the experimental features to leverage the cross-model review capabilities.

Risks and open questions

  • Model Divergence and Contradictions: While diverse perspectives are beneficial, there’s a risk of the critic model offering conflicting advice or identifying “issues” that are merely stylistic preferences rather than genuine flaws. How will the system or the user reconcile potentially contradictory feedback from two different LLM families?
  • Performance Overhead: Running two distinct LLMs (even if one is for review) for a single interaction might introduce latency. Operators need to understand the performance impact on their development workflow, especially for real-time coding assistance.
  • Cost Implications: The “agent mode” and specific model options within Copilot CLI utilize “Premium requests” [7, 8]. While the specific pricing impact of this multi-model setup isn’t detailed, increased usage of such advanced features could translate to higher operational costs for organizations.
  • Transparency of Critic Logic: How transparent will the “Rubber Duck” critic agent be about its reasoning? Understanding why a Claude-powered critic flagged an issue in GPT-generated code is crucial for learning and trust, beyond just receiving a suggestion.
  • Customization and Configuration: Will operators eventually be able to configure which models act as the primary and critic agents, or fine-tune their review styles? This would allow for tailoring the “Rubber Duck” to specific project standards or coding philosophies.

Author

  • Siegfried Kamgo

    Founder and editorial lead at FrontierWisdom. Engineer turned operator-analyst writing about AI systems, automation infrastructure, decentralised stacks, and the practical economics of frontier technology. Focus: turning fast-moving releases into durable, implementation-ready playbooks.

Keep Compounding Signal

Get the next blueprint before it becomes common advice.

Join the newsletter for future-economy playbooks, tactical prompts, and high-margin tool recommendations.

  • Actionable execution blueprints
  • High-signal tool and infrastructure breakdowns
  • New monetization angles before they saturate

No fluff. No generic AI listicles. Unsubscribe anytime.

Leave a Reply

Your email address will not be published. Required fields are marked *