Is Claude Code Actually Better? 1,820 Hours of Data

What is Reazy?

Reazy is a text-to-speech app that lets you listen to anything you read. Chrome extension, web app—you upload a document or visit a webpage, and it reads it to you with custom AI voices.

It’s the test case for this analysis. I’ve been building it solo for over 2.5 years, and tracking my focused hours data for the last 22 months.

The tech stack:

Here’s why this matters to me:

I’m not drawing a salary while I build Reazy. And I scoped a little too large for my first web development project. Shipping faster means revenue. A reduction in debug time isn’t just a stat — it’s runway.

The Setup

22
months
1,820
hours tracked
1,387
commits

Development Timeline

22 months • 2,100 hours tracked • 4 AI tools

GPT Era
895h
Jan-Oct '24
Cursor
270h
Nov '24-Jan '25
Windsurf
415h
Feb-Jun '25
Claude Code
520h
Jul-Nov '25
Major Phases:
Voice Training
Extension MVP
Web App
Auth + Lazy Loading
TanStack + Reader UX
Mobile + Capacitor
Voice Training v2
Search + Polish
↑ Claude Code Adopted July 2025
Mobile optimization, Capacitor integration, voice training automation, search feature, scroll coordinator, and ongoing polish.

22 months of development. 1,820 hours tracked in a spreadsheet. 1,387 commits across five repositories.

The question I want to answer: Did Claude Code actually make me more productive?

Over those 22 months, I used four different tools:

That gives us a natural experiment. Same developer, same codebase, using different agentic AI tools at different times.

Why This Is Hard to Measure

The Measurement Problem

How do you compare 1 Claude commit (detailed) to 3 pre-Claude commits (terse)?

They might represent the same amount of work—or completely different amounts.

Pre-Claude Commit (May 2025)
[speak-to-me] bug fix: auth delayed settings sync
1 line
No context
No root cause
Future-you: ā€œWhat was this?ā€
Claude Code Commit (Nov 2025)
fix(search): add bypassCoordinator to scrollToIndex

Fixed nested scroll operation bug preventing search navigation to distant chunks.

Root Cause:
Search Phase 1 (priority 3.5) called scrollToIndex()
scrollToIndex created TOC operation (priority 3)
ScrollCoordinator dropped inner operation
Solution:
Added bypassCoordinator parameter
Search Phase 1 passes bypassCoordinator: true
30+ lines
Full context
Root cause documented

Complicating Factors

šŸ“Commits vary in size— Claude era commits are more detailed
🧩Features vary in complexity— Stripe integration is not a bug fix
ā°Hours varied week to week— Some weeks 20h, some 40h
šŸ“ˆSkills grew over 22 months— Better developer doesn’t mean better tools
šŸ—ļøInfrastructure was built early— Early work enables later speed

Raw numbers are misleading. Here are four different ways to look at this data, each controlling for different complications.

Lens 1: Debug Session Duration

Debug sessions are complexity-independent. When you’re hunting a bug, the feature is already built. You’re just trying to find the root cause. It takes however long it takes.

This isolates ā€œproblem-solving speedā€ from ā€œfeature complexity.ā€

~60% shorter debug sessions with Claude Code
Pre-Claude avg: 54h → Claude avg: 30h
Cursor (Pre-Claude)
Windsurf
Claude Code

The longest debug sessions in the pre-Claude era:

Now look at the Claude Code era:

Pre-Claude debug sessions averaged around 50 hours. Claude Code sessions averaged around 25 hours. That’s roughly 60% shorter.

Debugging with Claude Code feels much easier. It feels smarter (even though I used Claude in the cursor/windsurf). Often pasting in the logs can solve it with a couple code-and-test iterations.

One caveat is that the Tanstack Virtualization was infrastructure work that was more complex. That said, I believe I would have implemented it faster using Claude Code.

Lens 2: Commit Behavior

Commits jumped from ~33/month to ~83/month
Not more output—a behavioral change in how I work
GPT Era
Cursor
Windsurf
Claude Code

Before Claude Code, I averaged about 33 commits per month. After? About 83. That’s a 150% increase.

But here’s the thing—this isn’t ā€œmore output.ā€ It’s a behavioral change.

Why Commits Increased

Claude writes good commit messages

I can just ask ā€œcommit this with a good messageā€ and get a clear, descriptive commit. Removes friction from the process.

Frequent commits = safety net

More commits meant I felt safe breaking things and pushing forward. Easy to roll back if something goes wrong.

Claude Code made me a better developer

More commits isn’t ā€œmore outputā€ā€”it’s better version control habits. Claude made committing so easy that I actually started doing it more.

The tool changed my behavior, not just my productivity.

Lens 3: Documentation Behavior

3,600+ lines of documentation
Lessons learned, code complexities, debugging context—all easily saved after changes

Documentation by File

Why Documentation Actually Happens Now

Claude makes it frictionless

During debugging, Claude explains what went wrong and why the fix works. That explanation becomes documentation with minimal effort.

AI_docs = Claude’s memory

Claude uses these docs as context. The documentation isn’t just for me—it’s how Claude remembers architectural decisions and past debugging sessions.

Pre-Claude

ai_docs/
└── (empty)

Other tools did not encourage this habit. It is almost essential with Claude Code.

Claude Code

ai_docs/
ā”œā”€ā”€ search-virtualization-deep-dive.md
ā”œā”€ā”€ search-feature-architecture.md
ā”œā”€ā”€ scroll-coordinator-guide.md
ā”œā”€ā”€ firebase-architecture.md
└── …

3,600+ lines of architecture docs, debug sessions, patterns.

AI_docs is the best habit for complex projects

It lets you build projects bigger than what fits in your head. Architecture decisions, debugging context, integration details—ask Claude. Those details belong in text so you can focus on higher level issues like user experience.

Claude’s memory is only as good as what you document.

Lens 4: Qualitative Pattern Changes

Numbers don’t capture everything. Some changes are about how work happens.

šŸ¤
Trust Over Time
Claude has earned trust through updates
Started at Claude 3 inside wrappers. Claude Code harness and models receive continuous updates. In just a few short months, the experience has improved significantly.
šŸ”„
Recovery
Easier to recover from mistakes
Cursor/Windsurf UIs got clunky on long threads. Complex multi-file work was slow. Claude Code is faster to jump back with ESC ESC shortcut or using commits.
⚔
Tool Experience
Claude CLI is zippy and simple
Simple design, even toylike. It is the easiest dev tool I’ve ever used.

Conclusion

Claude was first and is still the best (December 2025) despite lots of competition.

Claude Code is a magical experience that makes coding easy

The experience is qualitatively better than any agentic integrated development environment I’ve tried. The traction and progress they’ve made—even over the last few months—is very real.

The Wrapper Experience vs Claude Code

Cursor / Windsurf
  • UI got clunky on long threads
  • Complex multi-file work was slow
  • Context compression issues
  • Had to re-explain things
Claude Code
  • Zippy CLI experience
  • Simple, clean design
  • Full context preserved
  • Easy to recover from mistakes

Evidence it’s catching on

Cursor’s Pivot

Cursor hired Claude Code’s creators—Boris and Cat (developer and product manager). They stayed a couple weeks before Anthropic was able to hire them back. Now Cursor has their own CLI agentic coding experience.

Product-Market Fit

My feeling: Sonnet 4 hit product-market fit with Claude Code. Lots of people migrated to the better experience.

Competition Following

Gemini CLI, OpenAI Codex, and others have followed Claude’s lead. Proof that the approach is effective.

Not all agentic coding tools are equal.

What The Data Shows

šŸ“‰
Debug sessions ~60% shorter
From 50-hour average to 25-hour average
šŸ“
Documentation exists
3,600+ lines that didn’t exist before
šŸ’¾
Commit behavior changed
~150% more commits because Claude makes it frictionless
⚔
Workflow itself is different
More trust, easier recovery, zippy CLI experience

The Reality Check

As of the end of 2025, coding with agentic tools is still collaborative and iterative. A human in the loop is required. 100% vibe coding probably takes longer than steering smartly toward your goals. Testing is still required, though Claude Code is getting better at one shot implementations. You still have to test and fix almost everything moderately complex. You still have to steer it, especially when things are complex—documentation to follow, integrating parts. You also have to mind your architecture. Sometimes Claude gets stuck in the weeds, and you have to step back and talk about the architecture at a higher level.

Documenting everything has been a game changer. AI_docs is Claude’s memory. When I hit the same issues again, Claude already has the context. When I have documentation, I can use him as an oracle. I ask him to give me a command to do X and he can lookup the context and provide the terminal syntax I asked for.

Appendix: Methodology
Data sources:
  • hours.csv: 610 entries from April 2024 through December 2025
  • Git commit history: 1,387 commits across reazy-app, reazy-inference, tts-training-pipeline, and supporting repos
  • Personal notes and documentation
Tool era boundaries:
  • GPT Prompting / Copilot: January 2024 - October 2024
  • Cursor: November 2024 - January 2025
  • Windsurf: February 2025 - June 2025
  • Claude Code: July 2025 - present (first commit: July 14, 2025)
Commit attribution:
  • 282 commits (20% of total) explicitly marked with ā€œGenerated with Claude Codeā€
  • Many more commits in Claude era not explicitly marked but created with Claude assistance
Limitations:
  • Single developer, single project—may not generalize
  • Skill growth over 22 months is a complicating factor
  • Infrastructure maturity affects later development speed
  • Commit granularity changed between eras

The goal was honest analysis, not advocacy. I tried to show the data fairly and let you draw your own conclusions.

Subscribe via email or RSS