Engineering

Claude Code Review: Six Months of Running It on a Real Codebase

Julia Mase12 min read

I have been running Claude Code on the ProTechStack codebase for about six months. Not the "I tried it for a week and wrote a hot take" kind of running. The "this is how I ship features now" kind. Every day, multiple sessions, on real work with real stakes.

This is the review I wish I had read when I was deciding whether to commit. I am going to tell you what is genuinely great about Claude Code, what is genuinely annoying about Claude Code, and the things I figured out too late that would have saved me weeks of friction if I had known them sooner. No punches pulled, no vendor talking points, just what I actually think after half a year of daily use.

#The short version

Claude Code is the best AI coding tool I have used, and it deserves the growth it is seeing. It is not a silver bullet, it has real failure modes, and you will get burned if you treat it as autocomplete. But for the kind of work I do (web applications, backend services, refactors, infrastructure scripts), the gap between Claude Code and the best alternative is larger than the gap between the best alternative and nothing at all.

My productivity on in-scope work is roughly 3x what it was before I started using Claude Code. That is a soft number from my own tracking, not a benchmark. In-scope means tasks I would describe as "medium complexity, well-specified, touching 3 to 20 files." Out-of-scope means new greenfield product decisions, UI polish, and debugging that requires context Claude Code cannot access. On out-of-scope work my productivity is maybe 1.3x, which is still positive but not the order-of-magnitude change.

#What works

I am going to be specific because "Claude Code is great" is a useless statement. Here are the things that actually work in practice.

Multi-file refactors are a different job. The single biggest improvement in my workflow is that refactors I used to dread now feel cheap. Renaming a concept across 30 files. Extracting a shared module. Migrating from one library to another. Plaintext search-and-replace was never enough for these because the meaning changes context to context. Claude Code handles the semantic part and tests the result. I finish these tasks in 20% of the time they used to take.

Debugging with good error messages is near-magical. When I have a stack trace and a reproduction, Claude Code will trace through the code, explain what is happening, and propose a fix. Half the time it gets it right on the first try. The other half, the explanation itself is useful even if the fix is wrong, because now I know where to look.

CLAUDE.md is doing more work than I realized. The first CLAUDE.md I wrote was 40 lines. The one I have now is 180 lines. It covers our naming conventions, our preferred libraries, the specific test command to run, the common pitfalls in our codebase, and a list of "do not do this" patterns we have had to correct. Every new session starts with this context already loaded, which means Claude Code is 95% of the way to understanding our project before I type the first prompt.

Plan mode saves me from bad directions. Press Shift+Tab before giving Claude Code a task and it explores the codebase and proposes an approach before writing code. I use this on every non-trivial task now. The 90 seconds of plan-reading saves me from 20-minute dead ends where Claude Code builds the wrong thing.

Hooks turn a tool into a workflow. My current hooks: a post-edit formatter for TypeScript files, a pre-commit lint runner, and a test-output filter that strips everything except failures before Claude Code sees the results. Each one of these was the result of a specific annoyance I got tired of. Each one eliminated an entire class of small corrections I was making by hand. Hooks are underrated and underexplained in the official marketing.

#What does not work

Now the honest part. Six months of daily use has shown me the ceiling.

It cannot read non-code context. Claude Code is excellent at reading code. It is bad at reading the rest of the engineering work. It does not know why we made the architecture decision we made last quarter. It does not know that the weird pattern in the auth module is because of a compliance requirement. It does not know that the flag we pass to the build script is load-bearing. You can put some of this in CLAUDE.md, but you cannot put everything, and the stuff you forgot to write down is the stuff that causes Claude Code to make a confidently wrong change.

It sometimes invents APIs. Not often, but often enough that I have been burned twice. Claude Code will generate a call to a function that does not exist, or use a library method that was deprecated three versions ago. This is the classic hallucination failure mode, and it is still present on both Sonnet 4.6 and Opus 4.6. The context7 MCP server helps a lot because it fetches live docs, but it is not a complete fix.

Long conversations drift. If I keep a session going for two hours across ten distinct tasks, Claude Code starts forgetting things from the first hour. This is a context window limitation, not a model limitation, but it manifests as "wait, why did it do that" moments. The workaround is to use /clear between unrelated tasks and to keep sessions focused. This is my single most common mistake.

Rate limits on Pro are a real ceiling. I started on Claude Pro at $20 a month. I hit the five-hour window cap every single day within two weeks. Max 5x at $100 felt like unlimited for about a month, then I hit the ceiling there too when I started running parallel agent teams. I am now on Max 20x at $200 and rarely hit it, but I notice. If you are a heavy user, budget for Max 20x from the start. If you are on Pro and you like the tool, you will upgrade within a month.

Opus is not usually worth it. I ran a month of A/B work between Sonnet 4.6 and Opus 4.6 on identical tasks. Opus was better on maybe 15% of tasks, and of those, the difference was often subjective. For 5x the cost, Sonnet is the default and Opus is for architecture decisions. If you use Opus by default, you will burn money and not notice a quality difference on 85% of your work.

#The features I underused for too long

This is the section I most want people to read. These are the features I knew existed but did not use for the first few months, and each one was a meaningful upgrade the moment I started using it.

Productivity gain from adding each feature
Rough percent productivity gain from adding each feature over the baseline of using Claude Code as a chat interface. Compounds across features.

CLAUDE.md. The biggest lift. I would say it is responsible for a quarter of my overall productivity gain. The tell that you need to write one is that you keep re-explaining the same project context every session. The fix is to just put that context in a markdown file at your project root.

Hooks. Second biggest. The tell that you need hooks is that you are making the same small correction over and over. "Claude, please format this file with Prettier before you finish." "Claude, only show me the failures from the test output." These should be hooks, not prompts.

Plan mode. Third biggest. The tell that you need plan mode is that Claude Code sometimes goes off in the wrong direction for 20 minutes and you have to course-correct. The fix is to use Shift+Tab at the start of non-trivial tasks.

Custom slash commands. You write a markdown file in .claude/commands/ and now /my-command runs that workflow. We have /review-pr, /migrate, /add-test, and several others. Every team ends up with their own vocabulary here. The tell is that you are typing the same multi-step prompt more than once a week.

Subagents. Lowest lift, but still positive. Useful for verbose operations where you do not want the output polluting your main context. Running a test suite, fetching a big documentation page, scraping through a long log file. The subagent runs in its own window and returns a summary.

#Community consensus, for what it is worth

I poked around the usual places (Hacker News, Reddit r/ClaudeAI, DEV.to, engineering blogs) and the consensus tracks my experience more closely than I expected.

The single most common complaint I see: the Pro plan usage cap bites fast, and the jump to Max feels expensive. I agree. Anthropic should probably offer a "Pro Plus" tier between $20 and $100 because the gap is currently uncomfortable.

The single most common piece of advice I see: invest in learning CLAUDE.md, hooks, and slash commands. I agree. This is the same advice I would give, and it is the same advice that shows up in every "six months with Claude Code" review worth reading.

The single most common praise: Claude Code is significantly better on multi-file refactors than every alternative. I agree. This is where I notice the biggest gap.

#Would I recommend it

Yes, with one condition. If you are going to use Claude Code, commit to learning the infrastructure features (CLAUDE.md, hooks, slash commands, plan mode) within the first week. If you treat Claude Code as a fancy autocomplete, you will be disappointed and quit. If you treat Claude Code as an agent that needs configuration, you will be delighted and never go back.

I would also caution against starting on Pro if you know you are a heavy user. You will hit the cap, be frustrated, and possibly quit before you experience the tool working well. Max 5x at $100 a month is the right starting tier for anyone who codes for more than four hours a day. Max 20x at $200 is the right tier if you run parallel agents or background tasks.

If you are casually exploring AI coding tools and want the cheapest entry, Claude Pro at $20 is still the honest answer. Just know that you may outgrow it quickly.

Try It Out

Preparing for AI-first engineering interviews?

Companies are asking about tools like Claude Code in 2026 interviews. Our AI interview prep covers the exact workflows and pitfalls employers want to discuss.

Start Free Session

#FAQ

Frequently asked questions

Is Claude Code worth it after 6 months?
Yes for most working developers. After six months of daily use I estimate a 3x productivity gain on in-scope tasks (medium complexity, well-specified, 3 to 20 files) and about 1.3x on out-of-scope work. The tool rewards deliberate investment in CLAUDE.md, hooks, and slash commands. Treat it like configuration-worthy infrastructure and it pays off within a week.
What are the biggest weaknesses of Claude Code?
It cannot read non-code context like architectural decisions, compliance constraints, or tribal team knowledge that is not in CLAUDE.md. It still occasionally invents APIs. Long sessions drift when they exceed the context window. And the Pro plan rate limits bite faster than most new users expect, pushing you toward Max 5x or Max 20x sooner than you thought.
Should I start on Pro, Max 5x, or Max 20x?
Start on Pro at $20 if you are evaluating or code a few hours a day. Start on Max 5x at $100 if you already know you code 4+ hours a day and want headroom. Start on Max 20x at $200 if you run parallel agents, multi-agent teams, or full-day Opus sessions. Most heavy users end up on Max 20x within a month of real use.
Is Sonnet or Opus better for daily coding?
Sonnet 4.6 is the right default for 85% of coding tasks and costs 5x less than Opus 4.6. Use Opus for architecture decisions, complex multi-step reasoning, and hard debugging. Using Opus by default burns money without a meaningful quality improvement on typical work. Anthropic's own cost management docs make this exact recommendation.
What is the single most important Claude Code feature to learn?
CLAUDE.md. It is a markdown file at your project root that Claude Code reads at the start of every session. Put your coding standards, preferred libraries, build commands, and common pitfalls in it. This single file eliminates most of the context re-explaining that frustrates new users and is responsible for a significant fraction of the productivity gain from the tool.
How does Claude Code compare to Cursor after 6 months?
They are different tools for different jobs, not competitors in the strict sense. Claude Code wins on agent-driven refactors, terminal-first workflows, and hooks-based automation. Cursor wins on interactive UI work, multi-model switching, and inline tab completion. After six months I use both, not one. See our full Claude Code vs Cursor comparison for detail.

#Sources

Related Posts