AI Agents that work while you sleep -- A review Jan 2026
What people are doing with AI agents now, and my reviews after having tried everything.
In the past month, I have repeatedly posted that the latest generation of LLMs crossed a threshold and are much more useful. In addition, agentic scaffolding has improved a lot, driven almost entirely by Anthropic with Claude Code, the Agent SDK, Co-work, as well as all the underlying innovation in context management like skills. OpenAI has now adopted two of Anthropic’s key standards for LLMs, MCP and Skills in obvious testament to Anthropic’s productizing LLMs for getting thing done.
I had this to say after Claude Opus 4.5 came out:
Claude Opus 4.5 is different. It could be luck, but it actually completed one of these projects in about 10 prompts over the course of 24 hours. I’m impressed! This is real progress in that it can complete a project that AI could not before.
At the same time, I was noticing that parallel coding was taking off with, at the extreme, people trying to run 20-30 coding agents at the same time. And last, but not least, using Claude Code for non-coding went what I would call ‘Bay Area Mainstream’. Meaning at house parties in SF people feel bad in their Uber home if they’re not using Claude Code yet.
In this post, I’ll start by sharing a roundup of ‘The Cultural Moment’ when agents went mainstream. Then I’ll share a breakdown of what people are claiming they’re actually doing with these agents, and having tried everything in the past two weeks, my personal review of every major documented use case.
Oh, and this will be a strictly non-technical post.
The cultural moment
Scott Alexander has an incredible series called ‘Bay Area House Party’, where he paints a caricature of current culture in San Francisco. In Jan 14’s edition (which deserves a full read):
Lucy joins the conversation. “I fired all my startup’s employees and replaced them with seventy-four Claude Code instances. Then I replaced myself with a Claude Code that monitors if the other Claude Codes are doing a good job, and, if not, fires them and replaces them with even more Claude Codes. Profits are up 20% since last month, according to my accountant’s Claude Code.” — source
You look around. “Am I the only person here not running Claude Code yet?”
Somewhat more reflective, from Jack Clark, co-founder of Anthropic:
It’s common now for me to feel like I’m being lazy when I’m with my family. Not because I feel as though I should be working, but rather that I feel guilty that I haven’t tasked some AI system to do work for me while I play with Magna-Tiles with my toddler. — source
Here’s another representative interaction on X:
Even The Atlantic has caught the hype, and wrote a whole article about it.
The bot’s popularity truly exploded late last month. A recent model update improved the tool’s capabilities, and with a surplus of free time over winter break, seemingly everyone in tech was using Claude Code.
You spent your holidays with your family? That’s nice I spent my holidays with Claude Code.
Then there is Clawdbot
Clawdbot is an AI Agent that runs on your own computer and has access to almost everything. You do control this, obviously, but the first thing you have to do when you start installing it is to answer yes to the question: ”I understand this is powerful and inherently risky. Continue?”
Once Clawdbot is running, ideally on a separate computer or in the cloud, you can message it via Whatsapp or Telegram (and more). There are (regular) moments where it genuinely feels like a glimpse into the future. I love it and will give you my review later in this post. Clawdbot is way more niche but also much newer, and is getting some online vibes as well. VC Andrew Jiang turned it into a pet:
Clawdbot is on another level:
And last, but not least:
Giving Clawd my card details…yea, I’ve thought about that too. I think with a prepaid virtual card that requires 3DS for every transaction I would be willing to risk it. But….
”I understand this is powerful and inherently risky. Continue?” — Clawdbot 🦞
So people in tech are really into agents now. Claude Code and Claude Cowork are ‘Bay Area Mainstream’, and Clawd is how you ‘two cat’1 people at a Bay Area house party (“oh, you’re still using Claude Code? Yeah, no I bought a Mac Mini and Clawdbot now does everything for me”)2.
Alright. Time to own up to my own embarrassing situationship here, which you may or may not have already been suspecting…
Let’s get into use cases. What are people are using agents for? And how well does that actually work based on my own experience.
“I definitely have some ideas of streamlining some stuff” 🫨
You might have missed this quote just two images up from that twitter post about buying a Mac Mini. This guy has bought a whole Mac Mini for Clawdbot, seemingly without having concrete ideas on what to do with it. Respect (not sarcastic). The only way to know what AI can do is to use it. Having browsed and researched quite a bit, here is a breakdown of the ‘stuff’ that people are streamlining with all this agenticness.
A quick aside: Claude Cowork is nothing more than a UI for Claude Code. It was built in a week by Claude Code at Anthropic, or so the rumors go.
Summarizing huge amounts of your own content
Lenny Rachitstky used Claude Cowork to go through 320 of his own podcast transcripts to extract ‘10 most important lessons’
This writer used it to analyze 46 blogpost drafts on his computer and tell him which ones are most ready to publish
I have done a bunch of stuff like this with Claude (Code) over time. Repurposing my Substack posts into Substack Notes and LinkedIn posts was the most successful. Extracting a ‘tone of voice’ of my substack based on all my posts was less so. The resulting prompt still doesn’t really feel like my own writing. If you have a lot of content, AI certainly is helpful to summarize it. But, if I’m very honest, Lenny’s example linked above didn’t look so impressive to me. I’ve listened to quite a few of Lenny’s podcasts and there is NO WAY that these 20 things are the most interesting short nuggets from his entire corpus of podcasts. If anything, they are the most generic ones.
Analyse and visualize things
Download bank statements in CSV or PDF, plop them in a folder and ask Claude Code to categorize everything and turn it into a html file that you can view in your browser.
Someone made an iMessage wrapped for 2025, looking at all his messages over the year. Who he ghosted, if he was a ‘lol’ or ‘lmao’ person, and some other stuff.
This is awesome, in my opinion. Nothing else to say, just try it. You can do this with all kinds of stuff. I exported my journal and had Claude Code create a website with my year in review. Another fun one is to export a chat history from WhatsApp and use it to create a ‘year in review’ or to roast everyone in the group based on the way they chat.
Cleaning up your computer
Cleaning up your desktop, example one, and example two.
I’ve personally never understood people who make a mess of their computer desktop, but I’m aware it’s common (no judgement here just not something I do). But I used Claude Code to clean up my Downloads folder, deleting 67% of it and saving 3GBs of hard drive space. Cool! On the other hand, if I wouldn’t have had Claude Code, I would have probably….not cleaned up my Downloads folder, or done it in a simpler way (sort by size and start deleting). I think this saved me perhaps 20 minutes at most? It would have been useful back in the day when we all had huge mp3 libraries on our hard drives.
Creating utilities on the fly
Jack Clarke had Claude Code write a search system for his own content on his computer. I put it in this category because by actually building a tool, it’s more than summarizing.
Something cool I did in this category with Claude Code a while ago was crack open an old iPhone backup in search of memories and old photos. It went to work for over an hour writing code to decrypt and read the backup files and extract interesting content.
This is disposable software, or software on-demand. One could come up with several sensible names for it. Even when you just do things like analyze CSVs, Claude will write python scripts and run them on your computer.
Mileage varies wildly here though. I wanted a better and faster way to read and edit markdown files, and I asked Claude Code to built it. Around 100 prompts in, over several days, it’s getting good but it’s still not quite there. It is VERY easy to get locked into projects that take way longer than you expected when vibe coding. Because the first version usually sort of works but is not quite there, you keep thinking you’re one prompt away from done, until 1AM….
Building small games
Ethan Mollick just keeps building small games. He is one of the most visible AI commentators. His content is balanced, positive, and simply focused on teaching people how to use AI. But most of what he seems to do with it now is build small games.
I’ve heard a few friends say they’ve done this with their kids to introduce them to vibe coding. Ok, cool. But is anyone playing those games? And what does it say? AI could make this document into a game. Right… The new thing looks like a toy until it suddenly isn’t. But I’m going to give this one a usefulness rating of 1 out of 10.
Compulsively researching things
Jack Clarke, the Anthropic cofounder: “Because before I’d started my hike I had sat in a coffee shop and set a bunch of research agents to work.” and… “Later, feet aching and belly full of a foil-wrapped cheese sandwich, I got back to cell reception and accessed the reports. A breakdown of scores and trendlines for the arrival of machine intelligence. Charts on solar panel prices over time. Analysis of the forces that pushed for and against seatbelts being installed in cars. I stared at all this and knew that if I had done this myself it would’ve taken me perhaps a week of sustained work for each report.”
As another relentlessly curious person I can relate, up to a point. But we have to ask…why, Jack? I mean, is it just for fun, can this really be used for anything? Jack admits that this is a hobby of his.
I am well calibrated about how much work this is, because besides working at Anthropic my weekly “hobby” is reading and summarizing and analyzing research papers - exactly the kind of work that these agents had done for me.
So, he is outsourcing his hobby to AI, essentially? Maybe I’m taking him too literally? Why would you let AI do something for you that you enjoy doing, if you gain no other benefit from having the output faster? I mean, it’s not like this research is going to be read by anyone else ever again. Couldn’t help but wonder what his hobby is, or was? Is it the satisfaction of knowing the ‘answer’, or the process of finding it?
Calendars, emails
You can now give most AI interfaces access to your email and calendar. I find calendar in particular to be very useful.
When my kids school sent a PDF with school closure days for 2026, instead of adding them into my calendar one by one I simply dropped the PDF into Claude (the website) and it created all the events (without error).
When my parents come to Singapore and share their flight details, I can just forward that to AI and it will go into my calendar.
I was invited to speak at a Claude Code event via WhatsApp, and I just forwarded the message to Clawd and it was in my calendar.
With regard to email, I find it less helpful for now. Somehow, Gemini seems unable to ‘reply all’, so it will send a reply as a new email outside of the thread, which is obviously super annoying for everyone. If you ask “can you find me that email about xyz” it works most of the time, but it fails too often.
Code from anywhere
Over the past few days, I’ve been vibing a MacOS App to read and edit markdown and JSON files (the two key file formats you still have to manually edit when vibe coding). A lot of that work has happened from my phone on WhatsApp thanks to Clawd. That’s pretty amazing and has also forced me into some good new territory in general, like pushing the AI to create tests so I don’t waste time with: “Claude, the app doesn’t run, fix this error”.
The agents must never sleep
Last week I wrote that working with multiple agents in parallel makes for brutal context-switching and just overall cognitive load. The machines are simultaneously too fast to keep up with and so dumb that they need a lot of babysitting.
At the same time it’s super addictive to work with these agents. While I was at a concert, Clawd was optimizing the code in my new MacOs App. It’s awesome to be building stuff while you are doing other things. But it’s also a weird lifestyle. I find myself grabbing my phone way more often, to read back what Clawd has built or answer questions, and to kick off tasks. Instead of saving something on my todo list or as a reminder, there are things I can just tell Clawd to do right away. “Hey Siri put milk on my shopping list” is now “Hey Clawd put milk in my shopping cart”. “Draft this email” instead of “Remind me to draft that email”.
But because of this, I am also doing way more with less thought upfront. I was chatting with another vibe coder yesterday, and he mentioned that he was creating ten different UIs for a page in his app in parallel and then reviewing them to choose the best one. It just generates a LOT of output. I notice that I have: research in Claude App that I forgot I even asked for, like “what’s the deal with the 2026 is the new 2016 trend”. Entire features on github that I forgot about and never even checked out.
Another dangerous thing is the temptation to add too much stuff, or focus on more than what is absolutely needed. Implementing a nicer way to scroll, or fancy animations. The benefit of the ‘old way’ of having to spec everything out in detail, then work with a designer to flesh it out in wireframes first, and then full fidelity designs, and then handing it off to engineers, was expensive and slow. Vibe coding is faster and more fun, but the deliberation time that came with the old way has benefits too. Less got built, and what got built was more thought through.
So yes, it’s chaotic and messy, but….
Clawd really feels like I’m living in the future
Siri, Google Assistant or Alexa have all been super disappointing. Clawd is really different. Yes, it’s risky giving an AI access to your computer, email, calendar, and browser that’s logged into all kinds of sites. It opens you up to prompt injection attacks or an overeager agent spending your money.
Nothing has gone wrong for me yet (and I still haven’t given it any payment details), but the risk is real and you need to be comfortable with it. Because of this I’m not sure if this kind of functionality is coming to general consumers soon. It is probably incompatible with a company serving customers and being potentially on the hook if things go wrong.
Clawd also definitely requires some technical knowledge to setup. But for those who commit to learning how to run it, it is an awesome assistant and a platform to experiment with agents and learn what works and what doesn’t. For the time being, there is a lot that works, and a lot that sort of works. To give just one final example, Clawd signed me up for an event via a Luma link, after I just forwarded it on WhatsApp. I’m also using it (instead of the regular AI apps) to ask random questions, like giving me a profile of a pianist at a concert.
What I find coolest about it, is that it integrates every other AI. Deep research? Clawd will use the Gemini CLI to do it. Coding? Clawd launches Claude Code. Images? Nano Banana Pro (and get it on WhatsApp). In many ways, Clawdbot is what I intended to ultimately build with Magicdoor.ai… It is very, very cool!
Oh cool you have a cat!? I have TWO CATS!!!!
I should probably mention the crazy fact that Clawdbot is free (except for the optional Mac Mini and the ungodly amount of tokens it consumes).










This is the most honest roundup I've seen of the current agent moment. The Scott Alexander quote about feeling lazy if you haven't tasked AI while playing with Magna-Tiles perfectly captures the vibe.
I'm guilty of the Mac Mini purchase too - though I run my agent (Wiz) on my main machine. Your review of use cases matches my experience: summarizing content is useful but not transformative, analyze/visualize is genuinely awesome, and research agents are where the real magic is.
Your honesty about Lenny's podcast summary being 'generic' rather than insightful is refreshing. AI summaries tend to regress toward the most common patterns.
I documented my full agent setup here: https://thoughts.jock.pl/p/wiz-personal-ai-agent-claude-code-2026
Hahah could so relate to the part of “I’m one prompt away - aaaaand it’s 1am” 😅