This data is great, and it is exciting to see the rapid growth of autonomous coding agents across GitHub.
One thing to keep in mind regarding merge rates is that each of these products creates the PR at a different phase of the work. So just tracking PR create to PR merge tells a different story for each product.
In some cases, the work to iterate on the AI generated code (and potentially abandon it if not sufficiently good) is done in private, and only pushed to a GitHub PR once the user decides they are ready to share/merge. This is the case for Codex for example. The merge rates for product experiences like this will look good in the stats presented here, even if many AI generated code changes are being abandoned privately.
For other product experiences, the Draft PR is generated immediately when a task is assigned, and users can iterate on this “in the open” with the coding agent. This creates more transparency into both the success and failure cases (including logs of the agent sessions for both). This is the case for GitHub Copilot coding agent for example. We believe this “learning in the open” is valuable for individuals, teams, and the industry. But it does lead to the merge rates reported here appearing worse - even if logically they are the same as “task assignment to merged PR” success rates for other tools.
We’re looking forward to continuing to evolve the notion of Draft PR to be even more natural for these use cases. And to enabling all of these coding agents to benefit from open collaboration on GitHub.
What is your team’s take on the copyright for commits generated by ai agent ? Would the copyright protect it?
Current US stance seems to be:
https://www.copyright.gov/newsnet/2025/1060.html
“It concludes that the outputs of generative AI can be protected by copyright only where a human author has determined sufficient expressive elements”.
If entire commit is generated by AI then it is obvious what created it - it’s AI. Such commit might not be covered by the law. Is this something your team has already analysed?
This is a very fascinating aspect which is not discussed much. So far in human history every text was written by someone and thus there is some kind of copyright.
Now we have text which is legally not owned by anybody. Is it "public domain" though? It is not possible to verify it, so maybe it is but it still poses legal risks.
For something like a compiler where the output is mostly deterministic[0] I agree. For an AI that was trained on an unknown corpus, and that corpus changes over time, the output is much less deterministic and I would say you lose the human element needed of copyright claims.
If it can be shown that for the same prompt, run through the AI several times over perhaps a year, results in the same output - then I will change my mind. Or if the AI achieves personhood.
[0] Allowances for register & loop optimization, etc.
> “It concludes that the outputs of generative AI can be protected by copyright only where a human author has determined sufficient expressive elements”
How would that work if it's a patch to a project with a copyleft license like GPL which requires all derivate work to be licensed the same?
IANAL, but it means the commit itself is public domain. When integrated into a code base with a more restrictive license, you can still use that isolated snippet in whatever way you want.
More interesting question is whether one could remove the GPL restrictions on public code by telling AI to rewrite the code from scratch, providing only the behavior of the code.
This could be accomplished by making AI generate a comprehensive test suite first, and then let it write the code of the app seeing only the test suite.
Hmm, so basically automated clean room reimplementation, using coding agents? Our concepts of authorship, copying, and equivalence are getting a real workout these days!
you'd need a pretty good opsec and non-search capable agent and logs of all its actions/chain of thought/process to be able to truly claim cleanroom implementation tho
The logs and traceability are the secret sauce here. It's one thing to have an artifact that mysteriously replicates the functionality of a well known IP-protected product without just straight up copying it. It's another thing to be able to demonstrate that said artifact was generated solely from information in the public domain or otherwise legally valid to use.
if its of your interest, i was investigating this and found out all the big labs like openai offer and indemnity clause for enterprise customers, that is supposed to assure you that it doesn't output non-compliant license code (like copyrighted or AGPL or whatever), BUT you have to accept them keeping all your logs, give them access, and let them and their lawyers do build their own case in case of getting sued.
I guess they're mostly selling insurance to bigCo's, and saying, hey we have the money to go to law, and the interests to win such a case, so we'll handle it
> If entire commit is generated by AI then it is obvious what created it - it’s AI.
This is not the case. The output of a compiler is 100% created by a compiler too. Copyright is based on where the creative aspect comes from.
I have had very little luck having 2025-era AIs manage the creative aspects of coding -- design, architecture, and similar -- and that's doubly true for what appears to be the relatively simplistic model in codex (as far as I can tell, codex trades off model complexity for model time; the model does a massive amount of work for a relatively small change).
However, it is much better than I am at the mechanical aspects. LLMs can fix mechanical bugs almost instantly (the sort of thing with a cut-and-paste fix in some build process from Stack Overflow), and generate massive amounts of code without typos or shallow bugs.
A good analogy is working with powertools versus handtools. I can do much more in one step, but I'm still in creative control.
The codebase I'm working on is pretty sophisticated, and I might imagine they could implement more cookiecutter things (e.g. a standard oauth workflow) more automatically.
However, even there -- or in discussions with larger models about my existing codebase -- what they do is in part based their creativity on human contributions to their training set. I'm not sure how to weigh that. An LLM oauth workflow might be considered the creative median of a lot of human-written code.
I write a lot of AGPL code, and at least in the 3.5 era, they were clearly trained on my code, and would happily print it out more-or-less verbatim. Indeed, it was to the point where I complained to OpenAI about it at the time, but never got a response. I suspect a lot of generated code will include some fractional contribution from me now (an infinitesimal fraction most of the time, but more substantial for niche code similar to my codebase).
So in generated code, we have a mixture of at least a few different pieces:
This is a great point! But there's an important tradeoff here about human engineering time versus the "learning in the open" benefits; a PR discarded privately consumes no human engineering time, a fact that the humans involved might appreciate. How do you balance that tradeoff? Is there such a thing as a diff that's "too bad" to iterate on with a human?
I do agree there is a balance here, and that the ideal point in the spectrum is likely in between the two product experiences that are currently being offered here. There are a lot of benefits to using PRs for the review and iteration - familiar diff UX, great comment/review feedback mechanisms, ability to run CI, visibility and auth tracked natively within GitHub, etc. But Draft PRs are also a little too visible by default in GitHub today, and there are times when you want a shareable PR link that isn't showing up by default on the Pull Requests list in GitHub for your repo. (I frankly want this even for human-authored Draft PRs, but its even more compelling for agent authored PRs).
We are looking into paths where we can support this more personal/private kind of PR, which would provide the foundation within GitHub to support the best of both worlds here.
Yes. This is a really key part of why Copilot coding agent feels very different to use than Copilot agent mode in VS Code.
In coding agent, we encourage the agent to be very thorough in its work, and to take time to think deeply about the problem. It builds and tests code regularly to ensure it understands the impact of changes as it makes them, and stops and thinks regularly before taking action.
These choices would feel too “slow” in a synchronous IDE based experience, but feel natural in a “assign to a peer collaborator” UX. We lean into this to provide as rich of a problem solving agentic experience as possible.
As peer commenters have noted, coding agent can be really good at improving test coverage when needed.
But also as a slightly deeper observation - agentic coding tools really do benefit significantly from good test coverage. Tests are a way to “box in” the agent and allow it to check its work regularly. While they aren’t necessary for these tools to work, they can enable coding agents to accomplish a lot more on your behalf.
In my experience they write a lot of pointless tests that technically increase coverage while not actually adding much more value than a good type system/compiler would.
They also have a tendency to suppress errors instead of fixing them, especially when the right thing to do is throw an error on some edge case.
Ten is a statically typed tensor programming language for defining AI models.
Ten has the following features: (1) Succint syntax and operators tailored to AI model definition (2) Fully statically typed tensors, including generic functions over tensor dimension and batch dimensions (...) (3) First-class hyper-parameters, model parameters and model arguments for explicit model specification (4) EinOps-style reshaping and reductions - tensor dimensions are explicit not implicit
Example (a functional GPT2 implementation):
Gelu(x: {...}) -> {...}:
return 0.5 * x * (1 + Tanh(0.7978845608 * x + 0.044715 * x**3))
SoftMax[N](x: {...,N}) -> {...,N}:
exp_x = Exp(x - Max(x))
return exp_x / Sum(exp_x)
LayerNorm[S,E]|g:{E},b:{E}|(x:{S,E}) -> {S,E}:
mean = Mean(x)
variance = Var(x)
return g * (x - mean) / Sqrt(variance + 1e-5) + b
Linear[N,K]|w:{N,K},b:{K}|(x:{...,N}) -> {...K}:
return x@w + b
FFN[S,E]|c_fc, c_proj|(x:{S,E}) -> {S,E}:
a = Gelu(Linear[E,E*4]|c_fc|(x))
return Linear[E*4,E]|c_proj|(a)
Attention[Q,K,N,V](q:{...,Q,K}, k:{...,N,K}, v:{...,N,V}, mask:{Q,N}) -> {...,Q,V}:
return Softmax[N](q @ Transpose[N,K](k) / Sqrt(K) + mask) @ v
MHA[H,S,E,K]|c_attn, c_proj|(x:{S,E}) -> {S,E}:
q, k, v = Linear[E,E*3]|c_attn|(x) {S,(3,H,K) -> 3,H,S,K}
causal_mask = (Tri[S]() - 1) * 1e10
out = Attention[S,K,S,K](q, k, v, causal_mask) {H,S,K -> S,(H,K)}
return Linear[E,E]|c_proj|(out)
Transformer[H,S,E]|mlp, attn, ln_1, ln_2|(x:{S,E}) -> {S, E}:
y = x + MHA[H,S,E,E/H]|attn|(LayerNorm[S,E]|ln_1|(x))
return y + FFN[S,E]|mlp|(LayerNorm[S,E]|ln_2|(y))
GPT2[H,S,E,B,V]|wte, wpe, blocks|(inputs:{S}) -> {S,V}:
x = wte.[inputs] + wpe.[Range[S]()]
z = for i in 0...B: x, y -> Transformer[H,S,E]|blocks.[i]|(y)
return LayerNorm[S,E]|ln_f|(z) @ Transpose[V,E](wte)
Status: Working prototype, but lots more I'd love to do to bring this to life (README has more details of future thoughts/plans).
Great suggestion! We haven't yet integrated ESC into Pulumi AI, but it's something we'll be looking into. Great opportunity to really make it easy to get started with ESC.
Agreed that YAML (and JSON) can be difficult to manage at large scale. This is actually a big part of why Pulumi ESC exists, to be able to decompose large YAML/JSON configuration files into smaller logical and composable units.
As you note, intellisense and error squiggles can also help a lot here - both for ensuring references to other environments are correct, and to get checking for your dynamic secrets providers. We’ve added the Monaco editor (from VS Code) into the Pulumi Cloud console to make it easier to offer these features in the very near future.
We also offer a preview pane to make it easy to play interactively validate your environments documents while working on them directly in the console.
Lots more coming for providing an even richer experience working with environments in Pulumi ESC.
Closure compiler was actually one of the biggest influences on the design of TypeScript, and even the early motivation for the approach that TypeScript took.
> There were many options already available, but none seemed to be resonating well with a broad enough section of the market. Internally at Microsoft, Script# was being used by some large teams. It let them use C# directly instead of JavaScript, but as a result, suffered from the kind of impedance mismatch you get when trying to stand at arms length from the runtime model you are really programming against. And there was Google’s Closure Compiler, which offered a rich type system embedded in comments inside JavaScript code to guide some advanced minification processes (and along the way, caught and reported type-related errors). And finally, this was the timeframe of a rapid ascendancy of CoffeeScript within the JavaScript ecosystem — becoming the first heavily used transpiled-to-JavaScript language and paving the way for transpilers in the JavaScript development workflow. (Aside — I often explained TypeScript in the early days using an analogy “CoffeeScript : TypeScript :: Ruby : C#/Java/C++”, often adding — “and there are 50x more C#/Java/C++ developers than Ruby developers :-)”)
> What we quickly discovered we wanted to offer was a “best of all worlds” at the intersection of these three — a language as close as possible to JavaScript semantics (like CoffeeScript) and syntax (like Closure Compiler) but able to offer typechecking and rich tooling (like Script#).
Excel online at last back in 2015 was writing in script#. Not only c# IDE support was just miles ahead (that's per vscode, typescript days), the biggest thing was the ability to author unit tests that leverage lots of work from at the time dedicated testing organization. (Who wrote unit tests in js 10yrs ago, anyone? )
:raises-hand: - I was certainly writing unit tests in JS in 2012. Jasmine came out in 2010 and was already widely adopted.
Also, Jasmine wasn't the first test runner by a long shot (John Resig wrote one for jQuery before Jasmine was a thing and there were earlier ones too).
> The self hosted backend does have plenty of wringles though like not supporting the same stack name across different projects with the same backend URL contrary to how the hosted backend works.
Just a note that we did add support for projects to the self-hosted backend a few months ago, which does bring the self-hosted and managed backends into alignment here.
It’s funny - when we were first designing TypeScript - I often described it as "TypeScript is to CoffeeScript as C#/C++/Java is to Ruby" often adding "and there are 50x more of the former developers than the latter" [0]. And CoffeeScript’s approach of transpiling down to clean JavaScript was a big inspiration for TypeScript. In the 10 years since then, some of the Ruby/CoffeeScript aesthetic has become more mainstream in other programming languages (Swift, Rust), and gradual type systems have become more of an expectation even in dynamic languages like Ruby (Sorbet), Python (mypy) and PHP (Hack). So it does seem very natural to bring these back together now like Civet is doing.
This data is great, and it is exciting to see the rapid growth of autonomous coding agents across GitHub.
One thing to keep in mind regarding merge rates is that each of these products creates the PR at a different phase of the work. So just tracking PR create to PR merge tells a different story for each product.
In some cases, the work to iterate on the AI generated code (and potentially abandon it if not sufficiently good) is done in private, and only pushed to a GitHub PR once the user decides they are ready to share/merge. This is the case for Codex for example. The merge rates for product experiences like this will look good in the stats presented here, even if many AI generated code changes are being abandoned privately.
For other product experiences, the Draft PR is generated immediately when a task is assigned, and users can iterate on this “in the open” with the coding agent. This creates more transparency into both the success and failure cases (including logs of the agent sessions for both). This is the case for GitHub Copilot coding agent for example. We believe this “learning in the open” is valuable for individuals, teams, and the industry. But it does lead to the merge rates reported here appearing worse - even if logically they are the same as “task assignment to merged PR” success rates for other tools.
We’re looking forward to continuing to evolve the notion of Draft PR to be even more natural for these use cases. And to enabling all of these coding agents to benefit from open collaboration on GitHub.