State of Copilot
The Pragmatic Engineer newsletter recently did a survey on AI dev tooling, and out of 216 responses, 40% positive, 30% neutral and 20% negative. This inspired me to share my own experience.
Personal history of editor choices
Let’s first quickly go over my history of editor tools.
My undergrad was in China, and back then we were using DOS / Win3.1 machines. Notepad was the main editor when I started programming. I still remember the day I opened Borland C++ 3.1, and was blown away by what an IDE could do.
Coming to the US for grad school, all development work was on Unix, and Emacs became my editor.
My first job was Microsoft, so I went back to Windows machines, and Visual Studio was the chosen IDE.
At Google, I switched to Mac laptop (and stayed that way ever since). Development was on Linux, so I learned to do all coding using Emacs on the SSH terminal.
Leaving Google, I re-entered the IDE world. I tried a lot of different tools, and eventually landed on Atom.
After selling Leap.ai to Facebook, I was happy that I could continue to use Atom (since Atom was built by GitHub and Facebook), but I was soon surprised that Facebook internally primarily used VS Code. I continued to use Atom for a while, but eventually switched to VS Code as well.
In Gaida, we started with VS Code, but recently all switched to Cursor.
Why did we pick Cursor?
Honestly Cursor is just like VS Code. UI is the same, all configs / extensions can work as is. This made the initial adoption low friction.
The built-in Copilot is quite good for the free version.
Git commands are overly complex, and hard to remember. I used to have a git cheatsheet tab open in my browser and check regularly. Now using Cursor, I just use human language to describe what I want, and Cursor can translate it to the actual command. This was very helpful.
We spend a lot of time in running build commands or tests, and when they fail, we debug what the problems are. Cursor offers insights on the errors, and suggests potential fixes. When it works, it’s amazing.
During debugging, we often need to add extra logging to help identify the problems. Now I just type “add some logs” in Cursor, and let Cursor suggest the logging code change.
During coding, we often have similar logic, e.g., we did something to ‘foo’, and now need something similar to ‘bar’. In the old days, copy-n-paste, and then replace foo with bar. Now Cursor will automatically generate the code for bar.
Cursor copilot is also very good in writing tests, thus truly enabling TDD (Test Driven Development) mode.
If I have to summarize Cursor copilot, I would say it’s as if Stack Exchange has been directly integrated and applied to your local code context.
Limitation of Copilot
Improvements are more on tactical level
There seems to be a phenomenon that hands-on eng consider AI tools to positively improve productivity but eng leadership is much more neutral on this.
My rationale is that AI tools provide improvements on tactical tasks (all the examples above fall into this category). It makes engineers’ lives simpler (less time spent on tedious and repetitive tasks).
However copilot doesn’t help much on strategic tasks yet, like product direction, prioritization, business logic, etc, and these strategic tasks are the dominant factors in overall development speed (which eng leadership has more focus on).
Hallucination
Also, this article won’t be complete without a mention of when copilot does the wrong thing and leads to a disastrous outcome.
Wed night, after finishing a key functionality, I was so excited that I decided to also handle one task that I’ve punted for a while - to turn down the integration test service after integration test finishes. I typed a human command “bring down the service at the end”, and Cursor translated this to a command in github workflow config. I test ran it. Instead of stopping the integration test service, it deleted the service completely. I had to spend 2 hours recovering the deleted service.
Yes, it’s my fault for not noticing the command it translated to is “delete”. A lesson learned. Don’t forget LLM can hallucinate, and copilot output still needs to be double checked by the human.