Why AI Writes Better Code in Some Languages Than Others

Jan 29

A Research-Based Analysis of LLM Code Generation Quality

January 2026

The Theory

After months of working with AI coding assistants across different technology stacks, I've developed a theory: the quality of AI-generated code depends not just on how "smart" the model is, but on the intersection of three factors — training data quality, language constraint mechanisms, and framework establishment.

In practical terms: Go produces more reliable AI-generated code than C#. Orleans outperforms custom actor frameworks. Python beats JavaScript for consistency. These aren't random observations — they're predictable outcomes once you understand how LLMs actually learn to write code.

This post digs into the research behind these observations and proposes a model for predicting where AI coding assistance will excel (and where it will struggle). For teams making technology decisions in 2026, understanding this dynamic is becoming increasingly important.

The Quality-Over-Quantity Paradigm

Academic Consensus

Recent research has fundamentally shifted our understanding of what drives LLM code generation quality. A 2025 study from ACM ASE found that practitioners rank Reliability, Relevance, and Accuracy as the most important dataset characteristics, while sheer volume ranked significantly lower.^[1]

Li et al. (2023) in their paper From Quantity to Quality introduced the Instruction-Following Difficulty (IFD) score, demonstrating that careful selection of high-quality training samples can outperform larger datasets of mixed quality.^[2] A 2024 arXiv study on training data optimization found that "nearly all optimization techniques improve LLM-based code generation, underscoring data quality as a primary performance driver."^[3]

Critically, the same study revealed that "combining multiple techniques rarely produces additive gains in functional correctness, revealing a clear upper bound."^[3] Beyond a certain quality threshold, additional data provides diminishing returns.

The Python vs JavaScript Question

Consider why Python training data may be higher quality than JavaScript data. Python benefits from several quality advantages: extensive documentation in scientific computing, consistent coding standards (PEP 8), and its dominance in machine learning means models are often trained by researchers who write Python professionally.

JavaScript's ecosystem fragmentation — multiple frameworks, rapid evolution, varying quality of npm packages — introduces noise into training data. The Continue.dev analysis of multilingual LLM performance noted that while JavaScript has one of the largest presences on GitHub and Stack Overflow, benchmark performance does not linearly correlate with dataset size.^[4]

The Constraint Hypothesis: Why "Opinionated" Languages Excel

Functional Programming and Predictability

Some of the most interesting observations come from functional programming communities. As one practitioner analysis noted: "In functional programming, everything is immutable, side effects are discouraged, and you don't have to worry about distant or abstract values hiding somewhere in your codebase... When an AI is trying to understand what a piece of code does, this predictability is invaluable."^[5]

Chris McCord, creator of Phoenix Framework, argued at a 2025 conference that Elixir's "cohesive tooling and language design" make it well-suited for AI coding agents. Unlike fragmented ecosystems like JavaScript, "we have Mix, your build tool" providing a unified experience.^[6]

Go's Design Philosophy

Go's strong alignment with AI code generation is supported by its deliberate design constraints. Go enforces: a single canonical code format (gofmt), explicit error handling, limited language features (no generics until recently, no inheritance), and strong typing. These constraints reduce the "solution space" that an LLM must navigate.

Research on type-constrained code generation (Chaudhuri et al., 2025) demonstrates that "leveraging type systems to guide code generation" significantly reduces compilation errors and increases functional correctness.^[7] Go's strong type system provides exactly this kind of guidance.

The C# Paradox

C# presents an interesting paradox: it has strong typing and excellent documentation, yet AI often struggles more with it than Go. The likely explanation lies in C#'s extensive flexibility. C# supports multiple paradigms (OOP, functional, procedural), numerous ways to accomplish the same task (LINQ vs loops, async patterns, nullable reference types), and constant language evolution adding new syntax.

Research on design patterns and LLMs found that "LLMs often fail to properly understand existing design patterns and coding styles of a project, leading to generated code that does not meet project requirements."^[8] C#'s rich pattern library actually becomes a liability when the AI must choose among many valid approaches.

Established Frameworks vs Custom Code: The Orleans Evidence

BaxBench: Empirical Framework Comparison

The most direct evidence for why established frameworks outperform custom ones comes from BaxBench (Vero et al., 2025), a benchmark testing LLM backend generation across 14 frameworks and 6 languages. The key finding: "in less popular backend frameworks, models further struggle to generate correct and secure applications."^[9]

Even the best model (OpenAI o1) achieved only 62% correctness on established frameworks like Django and Express. Performance dropped significantly for less common frameworks. A custom framework, by definition, has zero training examples — placing it at maximum disadvantage.

At Infonuncio Consulting, we've seen this firsthand. When we work with Orleans — a Microsoft-backed, well-documented actor framework — AI assistance is genuinely helpful. When we've experimented with custom actor implementations, the AI produces code that looks plausible but fundamentally misunderstands the architecture.

The Library Bias Effect

A January 2025 paper, "LLMs Love Python: A Study of LLMs' Bias for Programming Languages and Libraries," found that LLMs "heavily favour well established libraries over high-quality alternatives." NumPy was used unnecessarily in up to 48% of cases when better alternatives existed.^[10]

This bias extends to frameworks: Orleans, as a Microsoft-backed actor framework with substantial training data, benefits from this preference. Custom frameworks, regardless of technical merit, cannot compete with Orleans' representation in training corpora.

Counterarguments and Limitations

The Benchmark Criticism

Most coding benchmarks (HumanEval, MBPP) are Python-centric, potentially inflating perceived Python performance.^[11] When Tencent's AutoCodeBench tested 20 languages equally, they found that "models showed small differences in popular languages like Python and JavaScript, but huge differences in less common languages."^[12]

The Elixir Paradox

Despite Elixir's theoretical advantages from functional programming, practitioners note that "LLMs sometimes mix syntaxes from different languages or suggest functions that don't exist" when generating Elixir code due to limited training data.^[13] The constraint benefits can be overwhelmed by data scarcity.

Productivity vs Quality

One counterintuitive finding: languages where AI assistance seems less impressive may actually be more productive overall. As one analysis noted: "Maybe the reason 'AI doesn't help much' with Elixir isn't because AI is bad at Elixir — maybe it's because Elixir problems are already well-structured enough that we don't need as much help."^[5]

A Revised Model for Predicting AI Code Quality

Based on the research, I propose a model for predicting LLM code generation quality:

LLM Code Generation Quality = f(Training Data Quality × Language Constraints × Framework Establishment)

Factor	Description
Training Data Quality	Not just volume, but consistency, documentation quality, and adherence to best practices in training examples
Language Constraints	Strong typing, enforced conventions, limited idioms reduce the solution space and guide the model toward correct outputs
Framework Establishment	Well-documented, widely-used frameworks have more training examples and benefit from the LLM's library bias

Practical Recommendations

For optimal AI-assisted development:

Prefer established frameworks over custom solutions when AI assistance is important
Choose languages with strong conventions (Go > Python > JavaScript for AI consistency)
Document custom code extensively to help the AI understand your patterns
Consider semantic search tools to surface relevant code context for the AI

Composite Scoring Results

The following table applies this model to common language/framework combinations found in fintech companies (startups and mid-stage). Each factor is scored 1-10, with the composite score calculated as a weighted geometric mean — emphasizing that all three factors matter. A zero in any category severely impacts overall performance.

Scoring Methodology:

Data Quality (DQ): Volume and quality of training examples, documentation, Stack Overflow presence
Language Constraints (LC): Type system strength, enforced conventions, idiom consistency
Framework Establishment (FE): Adoption rate, documentation depth, years in production use
Composite: Geometric mean approximation: (DQ × LC × FE)^(1/3) normalized to 10-point scale

Language	Framework	Data Quality	Lang. Constraints	Framework Est.	Composite	Notes
Java	Spring Boot	9	7	10	8.6	Enterprise standard; massive training corpus
Go	Standard Library	9	9	10	9.3	Canonical examples; gofmt enforces consistency
Python	Django	9	5	10	7.7	Excellent docs; Django conventions help offset Python flexibility
Go	Gin	8	9	8	8.3	Strong constraints; popular API framework
Kotlin	Spring Boot	7	8	9	7.9	Leverages Java ecosystem; null safety helps
Python	FastAPI	8	6	7	7.0	Growing fast; type hints improve outcomes
Ruby	Rails	8	6	9	7.5	"Convention over configuration" aids AI
C#	Orleans	7	6	7	6.6	Good MS docs; actor model patterns established
TypeScript	NestJS	7	7	7	7.0	Decorators provide structure; growing adoption
C#	Dapr	6	6	6	6.0	Newer; sidecar pattern less common in training
Scala	Akka	6	7	7	6.6	Niche but quality; actor patterns well-documented
Elixir	Phoenix	5	8	7	6.5	Functional benefits offset by data scarcity
Rust	Actix	6	10	6	7.1	Compiler catches errors AI would make; smaller corpus
TypeScript	Express + Custom	6	5	5	5.3	Framework established but custom patterns fragment
C#	Custom Framework	2	4	1	2.0	Zero public examples; AI has nothing to learn from

Key Observations:

Go + Standard Library achieves the highest score (9.3). The combination of canonical, high-quality examples with strict language enforcement creates an ideal environment for AI code generation.

The C# gradient is instructive. Orleans (6.6) → Dapr (6.0) → Custom (2.0) demonstrates how framework establishment dominates when language constraints remain constant.

Rust's constraint advantage partially compensates for smaller corpus. Despite less training data than Python or Java, Rust's compiler enforcement (LC=10) helps the AI avoid errors it would otherwise make.

Elixir's paradox is visible. Strong constraints (LC=8) but limited data (DQ=5) results in a middle-tier score, explaining the mixed experiences developers report.

TypeScript + Express + Custom patterns score poorly (5.3) despite TypeScript's popularity — the "custom" element fragments the solution space.

Java Spring Boot rivals Go due to sheer training data volume and enterprise standardization offsetting Java's moderate constraint level.

Conclusion

The research suggests that constraint reduction (fewer valid ways to write code), data quality (not quantity), and ecosystem cohesion (unified tooling and conventions) are the key drivers of AI code generation quality.

The Orleans vs custom framework observation is particularly well-supported by the BaxBench findings. The Go vs C# experience aligns with the constraint hypothesis. And the Elixir observations from the functional programming community, while complicated by data scarcity, reflect functional programming's inherent AI-friendliness.

The practical implication: when planning AI-assisted development, optimize for constraint and convention over raw language power or flexibility.

Appendix: Sources and References

[1] Liu et al., "What Makes a High-Quality Training Dataset for Large Language Models: A Practitioners' Perspective," Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE'24), October 2024. https://dl.acm.org/doi/10.1145/3691620.3695061

[2] Li et al., "From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning," arXiv:2308.12032, 2023. https://arxiv.org/html/2308.12032v4

[3] Anonymous, "On the Effectiveness of Training Data Optimization for LLM-based Code Generation: An Empirical Study," arXiv:2512.24570, December 2024. https://arxiv.org/html/2512.24570v1

[4] Continue.dev, "LLMs are helpful with Python, but what about all of the other programming languages?" Continue Blog, November 2023. https://blog.continue.dev/programming-languages/

[5] Revelry, "Which Language Is Best For AI Code Generation?" Revelry Insights, October 2025. https://revelry.co/insights/artificial-intelligence/which-language-is-best-for-ai-code-generation/

[6] Yolanda, L., "Phoenix Creator Argues Elixir Is AI's Best Language," The New Stack, October 2025. https://thenewstack.io/phoenix-creator-argues-elixir-is-ais-best-language/

[7] Chaudhuri et al., "Type-Constrained Code Generation with Language Models," arXiv:2504.09246, 2025. https://arxiv.org/pdf/2504.09246

[8] Anonymous, "Do Code LLMs Understand Design Patterns?" arXiv:2501.04835, January 2025. https://arxiv.org/html/2501.04835v1

[9] Vero et al., "BaxBench: Can LLMs Generate Correct and Secure Backends?" arXiv:2502.11844, ICML 2025, February 2025. https://baxbench.com/

[10] Twist et al., "LLMs Love Python: A Study of LLMs' Bias for Programming Languages and Libraries," arXiv:2503.17181, January 2025. https://arxiv.org/html/2503.17181v1

[11] Chen et al., "Evaluating Large Language Models Trained on Code," arXiv:2107.03374 (HumanEval), 2021. https://github.com/openai/human-eval

[12] Chou et al. (Tencent Hunyuan), "AutoCodeBench: Large Language Models are Automatic Code Benchmark Generators," arXiv:2508.09101, August 2025. https://arxiv.org/html/2508.09101v1

[13] Eberhardt, J., "Writing Elixir with LLMs: Maximizing Efficiency and Avoiding Pitfalls," Medium, November 2024. https://medium.com/@jonnyeberhardt7/writing-elixir-with-llms-maximizing-efficiency-and-avoiding-pitfalls-141a1b65374b

Tags: AI, LLM, Code Generation, Architecture, Orleans, Go, Python, Microservices

Ryan Brooks

Why AI Writes Better Code in Some Languages Than Others

A Research-Based Analysis of LLM Code Generation Quality

The Theory

The Quality-Over-Quantity Paradigm

Academic Consensus

The Python vs JavaScript Question

The Constraint Hypothesis: Why "Opinionated" Languages Excel

Functional Programming and Predictability

Go's Design Philosophy

The C# Paradox

Established Frameworks vs Custom Code: The Orleans Evidence

BaxBench: Empirical Framework Comparison

The Library Bias Effect

Counterarguments and Limitations

The Benchmark Criticism

The Elixir Paradox

Productivity vs Quality

A Revised Model for Predicting AI Code Quality

Practical Recommendations

Composite Scoring Results

Conclusion

Appendix: Sources and References

Location

Contact

Why AI Writes Better Code in Some Languages Than Others

A Research-Based Analysis of LLM Code Generation Quality

The Theory

The Quality-Over-Quantity Paradigm

Academic Consensus

The Python vs JavaScript Question

The Constraint Hypothesis: Why "Opinionated" Languages Excel

Functional Programming and Predictability

Go's Design Philosophy

The C# Paradox

Established Frameworks vs Custom Code: The Orleans Evidence

BaxBench: Empirical Framework Comparison

The Library Bias Effect

Counterarguments and Limitations

The Benchmark Criticism

The Elixir Paradox

Productivity vs Quality

A Revised Model for Predicting AI Code Quality

Practical Recommendations

Composite Scoring Results

Conclusion

Appendix: Sources and References

Using Microsoft Orleans: Why?

Location

Contact