Reflections on ~20 hours of building software with AI

Despite many flaws in current AI coding assistants, in just a few weeks I’ve learned more coding then in the 7 years before. Coding feels like a fun text-based video game now instead of a boring slog.

Sep 27, 2024

I often say my coding is like my French. In many cases I can understand roughly what’s going on, I can follow the structure, I can write HTML and very simple Javascript things. The most impressive things I’ve built so far are my personal website, which is written in plain HTML, and a bunch of helpful AppScripts for Google Sheets. But I am definitely not a developer.

On the other hand, I do have quite a lot of experience with just about every other domain of building startups, and I’ve also been playing around with LLMs since GPT 3. When I started trying to build the above-mentioned stuff with LLMs this year, something clicked. There is an opportunity for me to become a ‘full stack entrepreneur’. Creating the Perplexity add-in for Google Sheets made me feel a childlike enthusiasm that I haven’t experienced in a long time. A bit like playing with LEGO. So I decided to try something more ambitious. A wrapper for a selection of generative AI models that I like to use. This solves a personal pain point where I was at one point paying $60 per month for AI (ChatGPT+, Perplexity Pro, a google sheets plugin and a dedicated images tool). When I started using Perplexity via the API, I realized I’ve been paying for the power users. My overall usage cost across all these tools is under $5 per month.

My readers are a mix of technical and non-technical folks, so I’ll write what follows with the following principles:

I will briefly explain technical concepts if they are fun and/or important to know.
Sometimes I’ll neglect to explain concepts where they would take too long to explain, it’s ok to let those go over your head as they won’t affect the overall story.
For those who are more technical I will not be able to resist humble-bragging about my newfound technical prowess, and probably make embarrassing errors. Please feel free to brutally DM me and fix my errors.
You’ll definitely walk away from this with a better, more nuanced understanding of the current state of building software with AI.

A short primer on the current hype stack

A website that has functionality is called a “web app”. Behind the scenes they are not much different from an app on your computer. They have a directory structure, with instructions that tell the computer what to do. But to enable people to access it over the internet, it has to be ‘deployed’ to the cloud, which means it runs on a bunch of computers in the cloud. With that in mind, the following things need to be done to build a web-app:

You have to write the code, and get it to work on your own computer
You have to ‘deploy’ it to the computers in the cloud so it becomes accessible over the internet

Up until one year ago, a common web development stack would be (from front to back-end):

NextJS (front-end aka client)
NextJS or NodeJS (back-end aka server)
PostgreSQL database (on AWS)
Something like Firebase or Cognito for user authentication
AWS EC2 and S3 for server and storage respectively
Github to manage code versions and deployment pipeline
A text editor like VS Code on your computer to write code

The currently super hyped stack is:

TailwindCSS for styling with Shadcn/ui for components and Radix icons
NextJS for front and back end
Supabase for database and user auth
Vercel for storage and server
Github to manage code versions
Cursor to write code with lots of AI help
V0 to prototype and generate entire pages of front-end code
Marblism to even generate full stack code

Decisions, decisions, decisions

One thing that I think is helpful about this development is that there is currently no question about which framework to use. Vercel is built by the same folks that build NextJS. Supabase has a set of ‘quick start’ instructions for NextJS which you just have to copy/paste. Next is an excellent framework. With Ox Street our team refactored the whole web client from vanilla React to NextJS1. Things like Tailwind, Shadcn/ui and Radix are amazing. It’s why all web apps look the same nowadays, but it’s a wonder that all of this stuff is free!!!

If you fire up cursor and ask the sidepanel AI to create a NextJS app, you will have just that in under a minute. But here it already gets a bit more tricky: do you want to use the ‘App Router’ or not? /src directory? Typescript? Ehm…. no idea. Ask AI. It wasn’t long before I started to get into some trouble.

The bad and the ugly

As someone who tends to see the positive side, and also because this whole journey is about the journey, this is ok with me. But I am actually learning to develop software because the AI is so useless at it. It has become clear to me that currently, there is no chance that someone with zero technical inclination can build anything more complicated than a simple single-page trello clone or online quiz. 99% of people who create a concept with V0 are going to give up before getting a single user. This makes me happy, barriers to entry are still quite significant. Don’t believe anyone who says things like “building is solved”, or “anyone can know build a full-fledged web app in a weekend.”

Your AI intern has terrible dementia

Every time you start a new chat with the LLM it just starts it’s first day of work with you. It is tempting to think that you have an ongoing conversation with Claude 3.5 or ChatGPT but that’s not how it works. You have to prompt it from fresh in every context window.

In the course of building a project, many decisions are made. It matters whether you use the App Router or not. It matters which version of NextJS you use. All of this makes a difference later to how you should implement features. The further you get in the project, the more the AI is going to struggle.

V0 is impressive. It generates a functional single or multi-page app from a screenshot or prompt. But it’s just front-end and that’s hard to mess up without bothering with any logic. Start building actual user stories and things fall apart.

Cursor keeps making the same frustrating types of mistakes which I think are mainly caused by lack of context, or in other words, AI dementia:

Using features that no longer exist in NextJS 13 and later, or in one case even trying to implement a method that was deprecated in NextJS 9. Btw we are on version 14.2 right now.
Trying to store information in the ‘client’ local storage, which doesn’t work (easily) with NextJS server side rendering.
Referencing the wrong directories within the app, for example I’m using /src/app but another approach is to use just /app, or another way is to use /pages (if not using the App Router). Cursor will often implement things assuming it is one or the other at random.

My initial approach was to start each little project by getting Cursor to index the whole codebase in the first prompt, but it really didn’t work. Then I asked GPT-1o to index my entire codebase and create a prompt for his little brothers and cousins to understand the key aspects of the project. It looks like this:

Framework & Directory Structure:
Built with Next.js using the App Router.
All code is organized under the /src directory.
Language & Typing:
The entire project is written in TypeScript.
Path Aliases:
Configured in tsconfig.json:
    "paths": {
      "@/*": ["./src/*"]
    }
….. continues for 25 more lines

I am now adding this description to my first prompt every time and that seems to help quite a lot.

The AI digs itself into holes that it won’t get out of

I’ve had many cases where, even within one chat, the LLMs make inexplicably dumb mistakes. The most frustrating is when it flips back and forth between two wrong answers:

Me: “we have this error: ‘error blablabla’”
AI: “Ok I understand the issue, here is the fix”
Me: *Implements fix, runs code, a different error: “we now get this error”
AI: “Ok, the issue is because of this and that, this is how to fix it”
Me: *Implements fix, runs code, back to original error: “Dude, now we have this error again..”
AI: … etc

It has been very hard for me to prompt my way out of these dead ends. Creating a long prompt with a detailed problem description, explaining things we already tried and what happened, and giving that to GPT-1o to reason works reasonably well, but takes a lot of time.

By the way, we didn’t use CSS Grid to fix this issue. We just added one line of code in another file to solve this in a much easier way.

In one funny exchange, the various AIs had me uninstall all dependencies, reinstall them, and was about to have me do all those steps again when I noticed a new file had appeared in my project. I deleted the file and the issue was solved. There went another hour.

This dumb intern thinks he knows it all

The weirdest, and definitely most annoying type of mistakes is what I can only describe as overreach by the AI. Fixing one issue while creating two other ones, deleting entire blocks of logic. Changing logic (and breaking it) while only asked to implement something else. This is suuuuuuper annoying!!

Comments in the code (//this is a comment) don’t do anything but help you as the developer remember what certain pieces of code do. Many, many, many times, when implementing something completely different, Cursor’s assistant will try to delete all the comments in a file. Wtf.
When making a change somewhere way further down on this page, the AI will often also try delete all comments throughout the file
Overreaching and deleting “unneeded” code. If I ask AI to implement a new UI component to start a new chat (for example), there is chance that it decided that to achieve this, all of the logic related to the ‘send chat’ button is not needed. It has often tried to just delete entire blocks of important logic that work perfectly.
Overreaching and breaking unrelated code. A related problem is when we ask to implement a ‘new chat’ button is that the AI will randomly change unrelated parts of the code. At one point changing something about my user authentication setup that broke everything, went down one of those unrecoverable holes and created such an irreparable mess in just 15 minutes that I just reloaded my last ‘save game’ from Github.

This has taught me to end most prompts with something like “limit yourself to making only the strictly necessary changes. Do not remove any comments or refactor any code unless absolutely needed.” To be honest I think Cursor should just have that in the system prompt. Now that I mention it, maybe as a user you can add it in…

Implementing ‘full stack’ features is really, really, REALLY hard for LLMs

An example of a ‘full stack feature’ is adding authentication to your app. This requires setting new secret access codes (env vars), server-side routes, client-side code and passing information around between client, server and external services. Put the above pitfalls together and you can already see where this is going.

After implementing the server side logic and moving on to the front-end, the AI will forget how the server side works, or just go down a completely different path.
After updating the client side stuff, and trying to make a change somewhere, the AI will break everything in some way.
Every internal or external API call creates opportunities for the AI to forget a critical piece of information about one or the other file, or to implement a method that no longer exists or is incompatible with some aspect of the project.

The good

It is fun! Compared to, say, tediously graduating from Code Academy or reading endless docs, this way of learning is much more hands-on and exciting. And also, dare I say it, less lonely. It is nice to have an AI companion who never gets tired, knows an answer to every question and helps you code. The LLM are friendly, helpful, patient, apologetic when things go wrong, and happy when things work.

It’s super good at explaining things

Whenever you take a break to ask stuff like “why does this work”, “how does it work”, “what is server-side rendering”, “what are the pros and cons of the App Router vs Pages, and which one do you recommend”, you get great answers! Those answers are easily verifiable with google, or even just by double-checking with another AI. It’s a much more enjoyable way for me to learn than reading manuals. My way of learning is to just start pressing all the buttons and see what happens, and learning with AI is perfect for that.

I would hate debugging type and syntax errors by hand

If there is a bracket {} missing somewhere in the code, it won’t work. A semicolon where there should be a comma, etc. I know these kinds of issues very, very well from my days building financial models. Having spent countless hours with bloodshot eyes looking for the bug in an excel model, only to find one wrong reference or syntax error in a formula, I would really hate dealing with those issues when coding. The AIs catch and fix these kinds of small errors in seconds. Just that is already a huge time saver for a beginning coder, and I would imagine, even for an experienced one.

When it gets things right, it’s awesome

There are glimpses of an exciting future every now and then. One cool thing was implementing a ‘dark mode’. Since I’m working on this mainly at night, my eyes started hurting from the white background so I started a side quest to implement dark mode. Now this is in very large part thanks to TailwindCSS, but it took about 5 minutes to implement a dark mode that follows system settings!

I’m finally learning to develop software!

Over the past 7 years I’ve dabbled. In 2018 I googled which language I should learn as a beginner, and the internet said Python. So I went to Code Academy and worked my way through half of the Python course. It was boring and I saw there was a looooong road ahead from learning some basic Python to developing an app. So I used Bubble instead to build something small. Python was also in retrospect I think the wrong language to learn for me even then. The web runs on Javascript. But back then, even that information wasn’t that easy to find. In the Ox Street days, at one point I learned to use github and shipped small front-end experiments like changing a button color or position. It was satisfying! I looked at the Javascript and I had no idea how it worked, so I did some Javascript code academy and read some React documentation, but I quickly decided my time was better spent on other things.

Despite current AI coding assistant’s many flaws, in the past few weeks I’ve learned more about software development then in the seven years before. It’s not boring anymore, it feel like an amazing and fun text based video game. My official point of view until 6 months ago was that it was too late to learn to code. At this point I’d be better off just hiring engineers, who would anyway become less important thanks to all the no-code/low-code tools out there. But it just kept nagging that I wanted to learn to build at least MVPs all by myself.

Key learnings so far that will save you time

Use the reasoning models like GPT-1o to make an implementation plan, then implement in separate chats with Claude and GPT-4o.
Keep an up-to-date ‘onboarding prompt’ for the LLMs with all the key choices that matter in the project. Which versions of frameworks are used, which implementation choices have been made. It helps a lot to give the AI this context at the start of every chat.
Ask the AI to explain code, go slower to ensure you understand the code. I find myself mode and more editing code directly, instead of via prompts, especially for small changes. So I guess I’m finally learning some Javascript.
Have multiple AIs work with you. Get GPT-o1 to generate an implementation plan, then have Claude 3.5 analyze and comment on it. Implement code with Cursor, then start a new chat and have another Claude window double-check it. While every AI answer has a certain probability of being wrong, if you start multiplying small probabilities the level of uncertainty goes down.
When implementing any library or integration, follow the damn official docs instead of asking the LLM. Tools change, most of the AI tools are not online, their training data is not up to date. Supabase has deprecated one package (auth-helpers) in favor of another one (supabase/ssr), and I wasted 2 hours trying to implement supabase by following AI instructions that mixed up the two methods.

To end this, I will say that this has so far completely reaffirmed my experience with LLMs in other domains like Customer Support. It enables novices to operate at the level of entry-level professionals, but that’s about it. It does not help experts all that much. It is still a long way from replacing [insert job].

Instead of making a very small change to the stop logic, the AI completely replaced a very simple implementation of a popular library with complicated code

Doesn’t matter, but for completeness sake and to show my engineer readers that I do at least some homework: NextJS is a framework build on top of React.js which in turn is a library for Javascript. Each introduce shorthands and idiot-proof ways to do things in a few lines of code that would take thousands of lines in plain Javascript

Playbook Musings

Discussion about this post