The AI Coding Divide is a Best Practices Divide
Turns out Kent Beck was right all along
If you’ve been paying attention to the AI coding discourse on X, you’ve probably noticed a pattern. Some groups talk about how incredible it’s been. Others report QA disasters, broken code, and AI that actually slows them down. The weird part is both groups are using the same tools.
This isn’t a judgment of AI. It’s a judgment of the teams using it.
Engineering teams with great practices, solid test coverage, the ones that cared about quality and craftsmanship, those are the teams seeing real gains from AI. Meanwhile, teams without tests, without discipline, without any culture of quality foundation are the ones struggling. And frankly, it’s pretty easy to predict which camp you’ll fall into. Look at your codebase. If it looks like shit, AI’s not your savior.
TDD Is the Killer App
Who knew this is where we’d end up? The killer app for AI-assisted coding is Test Driven Development.
Not even TDD specifically, but the same practices that accompanied TDD. The same orgs that used pair programming, that cared about not breaking the build, that actually did code reviews instead of rubber-stamping a +1. Sure enough, these are the same orgs that are adding AI tools and seeing the actual benefits.
TDD maps almost perfectly onto how AI agents work. Write a failing test, generate the code, let the agent iterate until it’s green. The test is the spec, the acceptance criteria, and the feedback loop all in one. Teams that already did TDD didn’t have to change their workflow at all.
Testing skeptics spent years telling us that writing tests slowed them down. Now those same people are complaining about how bad the quality of AI-generated code is (and oh the hubris to talk about how AI writes poor tests). They never cared about code quality before, and it turns out their AI agent doesn’t either.
The Gap Is Widening
The disciplined teams are in a different loop. AI ships code, tests catch mistakes, confidence stays high, so they delegate more to AI. They make discoveries about how to use it better because they have the tools to catch when they’re not using it correctly.
Undisciplined teams are stuck in the opposite cycle. AI generates code nobody understands, there are no tests to validate it, it breaks, confidence drops, you feed it more AI code but now have to review it manually. It’s a doom loop.
This gap isn’t closing, it’s accelerating. And just adopting AI tools isn’t going to solve it. Forget about the permanent underclass, and let’s start talking about the permanent tech debt class.
The Tech Debt Time Bomb
Now what was once a difficult codebase to work in is quickly becoming an unmaintainable nightmare. And it’s growing at 10x (like that 10x engineer that shipped nightmare 20k loc change PR’s)! For some of these orgs, the best thing to do is actually to NOT adopt AI. Fix the underlying problem, or just don’t do anything. Don't buy Claude Max subscriptions for your team. Accept that you've lost and let someone better take your place.
Because ultimately, AI agents are just junior devs with unlimited patience. You would never let a junior dev push without tests, without PRs, without code reviews, without the processes and culture that maintains quality. And yet teams are letting AI do exactly that, and then complaining about the results.
It’s Fixable
Look, it’s not all doom and gloom for the teams drowning in tech debt, but we can’t ignore how we got here and expect AI to fix it. It is fixable! But by us! AI will not solve these issues unless you step up your own team’s quality and craftsmanship.
It’s easy to look at a codebase today and the AI tools available and imagine that big rewrite you always wanted to do. A clean slate upon which to build the next version, free of tech debt. And while rewrites were almost always a disaster in the past, we’ve actually seen several teams pull it off this year thanks to AI coding. But it’s an exercise in futility if you don’t fix the problems that got you there. Don’t just plan a rewrite, figure out how to also build culture of engineering excellence alongside it. Without that planning, without that cultural shift, expect to be doing this rewrite every 6 months until the paychecks stop clearing.
No Secret Sauce
While AI seems pretty magical, ultimately it is just another tool; and when applied incorrectly, and by people that never cared about quality in the first place, the result is not all that surprising.
The winners in this space aren’t going to be the seat of the pants vibe coders, or even the engineers that embrace AI the most fervently. It’s the engineers that gave a shit in the first place. The ones that had pity on whoever read their pull request and did their best to make them concise and well explained. The same ones that went out of their way to not break the build, the ones that wrote tests that actually tested something, the ones that started with tests.
It turns out that AI just mirrors the team you built. If you built a B player engineering team, you get the B player AI to go along with it. And there’s no Claude Max plan / Ralph Wiggum loop / PR review agent that will pull you out of that hole.
Now with that out of the way, let’s get into what’s interesting this week in Open Source.
Testing the Machines
promptfoo (12k stars, +208 this week) — The #2 trending repo overall. Test your prompts, agents, and RAGs. Red-team your LLMs. Compare GPT vs Claude vs Gemini. Declarative configs, CI/CD integration. This is what happens when someone builds pytest for the AI era and actually gets the developer experience right. 30 stars/day. promptfoo.dev
anti-slop (299 stars) — A GitHub Action that detects and auto-closes AI-generated junk PRs. Your repo’s immune system, in YAML.
AsyncReview (285 stars) — Inspired by DevinReview. The pitch: AI that reviews code with the depth of a senior engineer. The reality: probably still better than the “LGTM” you were going to get, and not $25 a PR.
claude-plugins (71 stars, +24 this week) — Open-source plugins for Claude Code that add plan-first SDLC workflow, code review, and — this is the interesting bit — LLM quality judges. AI reviewing AI’s work, grounded in your actual codebase. The recursion is getting recursive.
SWE-AF (406 stars, +27 this week) — Autonomous software engineering fleet: plan, code, test, ship. Testing baked into the agent workflow rather than bolted on after.
evidently (7.3k stars) — ML and LLM observability. Evaluate, test, and monitor AI systems with 100+ metrics. Been around a while, still growing, still necessary. The “you should really have this in production” tool that everyone knows about and half of teams actually deploy.
This Week’s Top Shelf
The rest of the trending board.
omlx (2.8k stars, +350 this week) — #1 trending. LLM inference server for Apple Silicon with continuous batching and SSD caching, managed from the menu bar. 50 stars/day. The “run it locally on your Mac” crowd keeps winning. omlx.ai
CorridorKey (4.9k stars, +883 this week) — Perfect green screen keying. From Corridor Digital’s Niko Pueringer. 126 stars/day. When someone who actually works in VFX builds the tool they wanted, the results speak for themselves.
varlock (1.7k stars, +95 this week) —
.envfiles built for sharing, powered by@env-specdecorator comments. Finally, a sane answer to “how do I share env vars with my team without a Slack DM and a prayer.” varlock.devShadowbroker (1.4k stars, +301 this week) — OSINT for everything. Track private jets, spy satellites, seismic events. All in one interface. 43 stars/day. Someone built the movie hacker dashboard, but real.
CLI-Anything (2.9k stars, +408 this week) — “Making ALL Software Agent-Native.” Wraps any GUI application with a CLI interface so agents can use it. The idea that all software should have a CLI is not new. The idea that AI agents should be the reason why is very 2026.
IPED (2.2k stars, +109 this week) — Brazilian open-source digital forensics tool. Used by law enforcement for crime scene evidence processing. Not glamorous, extremely important, and suddenly trending. +15.6 stars/day.
Genuinely Technical
The stuff that might survive the HN comments section.
adobe/openpbr-bsdf (263 stars, +46 this week) — Adobe’s reference implementation of the OpenPBR BSDF. An open physically based rendering spec, from the company that makes half the tools the industry runs on. The kind of thing graphics people will quietly build on for years.
voltropy/mog (111 stars, +18 this week) — A new programming language designed for safe AI agents. Written in Rust. Whether “agent-safe PL” becomes a real category or not, it’s a bold thesis and someone had to try it.
shimmy (3.8k stars) — Python-free Rust inference server. OpenAI-API compatible, GGUF + SafeTensors, hot model swap, auto-discovery, single binary.
google-deepmind/simply (483 stars, +129 this week) — DeepMind’s minimal JAX codebase for frontier LLM research. Designed for rapid iteration on autoregressive models.
deepflow (3.8k stars) — eBPF-based distributed tracing and profiling. Steady, serious infrastructure work.
golar (191 stars) — Embedded languages framework built on typescript-go. If you’re into compiler tooling and language servers, this is interesting plumbing work.
freestiler (58 stars) — Rust-powered vector tile generator for R, Python, and DuckDB.
mlx-snn (215 stars, +37 this week) — Spiking neural networks built natively on Apple MLX. Neuromorphic computing on consumer hardware.
Security & Reverse Engineering
qualcomm_gbl_exploit_poc (681 stars, +206 this week) — Qualcomm bootloader unlock via GBL exploit. Qualcomm’s legal team is probably having a meeting about it right now.
vulhunt (553 stars, +157 this week) — Vulnerability detection framework from Binarly’s research team.
SysWhispers4 (313 stars, +44 this week) — Direct and indirect system calls for AV/EDR evasion. Covers Windows NT 3.1 through Windows 11 24H2, x64/x86/WoW64/ARM64.
CORUNA_IOS-MACOS_FULL_DUMP (28 stars) — Recovered samples, extracted Wasm/binaries, decoded payloads and analysis scripts from the Coruna iOS/macOS exploit kit. 28 JS modules, 6 Wasm, 13 ARM64 binaries. Primary source material for security researchers.
agent-safehouse (935 stars, +188 this week) — Sandbox your LLM coding agents on macOS so they can only touch the files they need.
Self-Hosted & Dev Tools
neko (19.6k stars, +532 this week) — Self-hosted virtual browser in Docker via WebRTC. 76 stars/day. Has been around for a while, now exploding. The use cases range from “collaborative browsing” to “I want a browser that isn’t on my machine.”
arbor (273 stars, +73 this week) — Native desktop app for agentic coding workflows with Git worktrees, terminals, and diffs. Written in Rust.
holysheep-cli (306 stars, +62 this week) — One command to configure Claude Code, Codex, Gemini CLI, Cursor, Aider, and whatever else you’re running. The fact that we need a meta-tool to configure our AI tools is peak 2026.
edgeFlow.js (463 stars, +68 this week) — Browser ML inference framework with task scheduling and smart caching. Run models client-side without melting the tab. edge-flow-js.vercel.app
scrutiny (300 stars, +17 this week) — Hard drive S.M.A.R.T monitoring with historical trends and real-world failure thresholds. I prefer to run my smart tools when I hear the clicking.
batctl (88 stars, +24 this week) — Battery charge threshold manager for Linux laptops.
CircuitBreaker (139 stars, +23 this week) — Self-hosted IPAM with beautiful topology visualizations. For the infrastructure people who want their network documented in something prettier than a Visio diagram from 2019.
ai-rules (152 stars, +33 this week) — Governance framework that forces AI agents (Cursor, Windsurf, Copilot) to respect your project’s boundaries, UI libraries, and design patterns.
plannotator (2.7k stars, +73 this week) — Annotate and review coding agent plans visually, share with your team, send feedback to agents.
Fun & Finds
silent-hill-decomp (539 stars) — In-progress decompilation of Silent Hill (PS1, US 1.1). Preserving gaming history in C.
RuView (34.4k stars, +2,149 this week) — WiFi DensePose: human pose estimation and vital sign monitoring from commodity WiFi signals. You need some decent hardware to pull this off, but it’s very cool.
HappyTorch (171 stars, +40 this week) — “LeetCode, but for tensors.” PyTorch practice problems covering LLM, diffusion, PEFT. Self-hosted, supports both Jupyter and web.
vimalender (78 stars) — A calendar with vim keybindings. Not your fathers calendar… oh no, wait, it is.
xemu (147 stars) — Original Xbox emulator for Android.
agent-town (31 stars) — A pixel-art AI agent online collaboration platform. Goofy but very fun.
talkio (200 stars, +24 this week) — Local-first desktop app that puts GPT, Claude, Gemini, and DeepSeek into one group chat. Tauri 2 + React 19. Like a group chat where everyone is the smartest person in the room and nobody agrees.
Closing Picks
twitter-cli (1.5k stars, +227 this week) — Twitter/X feed, bookmarks, and timelines in terminal. 32 stars/day. The terminal is eating everything, and apparently that includes doomscrolling.
job-ops (1k stars, +133 this week) — “DevOps applied to job hunting” Self-hosted pipeline to track, analyze, and assist your application process. CI/CD for your career. Even unemployment gets a Kanban board.
kerminal (413 stars) — ANOTHER terminal emulator to try out!



