i gave it an empty repo. it gave back a live site.
most ai tools suggest. claude cowork ships. i pointed it at a near-empty repo and walked away with a live website - and after building the whole of this site inside it, the verdict is simple: promising. not finished, not magic, but the first one i've used that does the work instead of describing it.
what it actually shipped
not snippets to paste. files written, builds run, deploys pushed - the thing answering at a url. the design you're looking at, the hero, the tag system, the rss feed and sitemap and share cards and structured data, the security headers, the small reading furniture most sites never bother with.
the first few drafts landed clean. it didn't just scaffold and hand me homework - it shipped working versions i could look at, react to, and redirect. a few hours of conversation stood in for what used to be a week of plumbing.
and the context held. long sessions, a lot of files, and it kept the thread the whole way - no constant re-explaining what we were building or why. that alone is most of the difference.
where it still asks for rework
the honest part. it needs a second pass more often than i'd like. you point, it builds, you correct - and the correcting is real work, not a formality.
one bug made the point. the site went live and broke in a way that only showed at the edge - stale html cached against javascript that a later build had already replaced. it reached for the wrong fix first. what turned it around wasn't a better guess; it was a log. i handed over the actual server output and it landed the real fix in one move. more testing required before i'd let it run unwatched.
the new model holds the line
worth saying plainly: the current model is a step up from the ones before it. less drift, fewer confident wrong turns, steadier across a long task. the gap between "impressive demo" and "i shipped with it" is exactly the gap this version started to close.
the real win is the system, not the tool
here's the thing the reviews miss. the leverage isn't the model on its own. it's assembling a setup that matches how you actually work - the model, the host, the repo, the loop you trust - and keeping your hand on the wheel while it moves.
take control. experiment. build the system that fits you, not the one in the screenshot. the tool is a multiplier on someone who already reads what's happening; point it at a problem you can diagnose and it does a week in an afternoon.
every layer that ever promised to free you just moved the leash. this one's no different - the difference is where the leash ends up. for once it's short, and it's in your hand, as long as you can still tell when the machine is wrong.
i could. that's why this shipped. you're reading it on the site it built.