We Are Thinking About AI Coding The Wrong Way

We Are Thinking About AI Coding The Wrong Way
Marvin the Paranoid Android Lego Set from Rebrickable

For the past few years, AI coding and subsequent doomsday scenarios have been everywhere. The AI market bubble, the numerous layoffs and insane statements by AI executives have been driving the conversation. A loud community seems to believe AI is simply just going to be able to do EVERYTHING software, while skeptics continue to claim it will never be able to replace humans based on the examples they have seen so far. I myself, despite (or maybe because of) my background (PhD in ML) so far refrained from making definitive statements about the capabilities of AI.

In any case, either party sounds equally (well maybe not quite equally) deranged to me. On one hand, the pro-AI corner seems to have thrown all caution to the wind and is developing at a break-neck speed without any concern for quality or accountability. On the other hand, the AI-skeptics seem to be in denial of the massive capabilities these tools have brought to software development. Either way, my opinion is that whichever corner you might find yourself in, software development (and our society) is fundamentally changed.

I spent some time of my PhD on adversarial attacks. I published papers on this topic. Basically, the whole point of adversarial attacks is manipulating data such that the model response is basically garbage. This is the data you send to the model, not model's own training data. This post is not about that, but it is part of a point I will be making. While the models keep getting more resilient to these types of attacks, they are not infallible.

Furthermore, and this is the more important point, every single ML model is stochastic. Meaning, their outputs are drawn from a probablity distribution. In most cases their response will be internally coherent. Also in most cases, they won't be identical. This poses an idempotency problem.

However, if your use-case is just taking pre-defined tasks and launching a bunch of AI agents to code for you, as long as the solution does what it is supposed to do, you don't technically care all that much about the output varying slightly, given that the variable references, the logic and everything functionally remains the same. (I can hear devs yelling at me about all the things that can go wrong, I am aware and I am being intentionally reductive, bear with me I have a point to make).

After I finished my PhD, I stopped working on the ML space, and pivoted to data engineering instead. My use case is pretty similar to the one above. I have several integration and validation tests in place for what I expect from my data transformations, as well as resource management monitoring. Re: actual syntax, I am fairly comfortable with letting AI write it. Mostly because I am intimately familiar with the tools I am using, as well as my data, and nothing goes into the production without automated and manual checks. We are also able to recover raw data in all cases. I digress. My point is, I use AI as a coding buddy that makes my job a lot easier. I have been doing so increasingly in the past 2 years.

Having lore-dumped this massive background on you, I am sure you can see I sit somewhere in the middle of the two camps. I am both very aware of potential failings of LLMs and a frequent user of them for coding. And I think we are thinking about it all wrong.

I don't think we should be thinking about whether LLMs can code C compilers start to finish or not, or any task for that matter. I think we should be coming from a default point of "They can, so what do we do about it?". Because one property of LLMs is that, they will output something. It may even work on the first go. Eventually they will be good enough to one-shot almost all decently complex problems, given enough context and processing power. You may not believe this, and maybe I am wrong but for the sake of the argument, let's assume that this is true.

Assuming this is true, and we integrate LLMs to every step of our lives and systems, the potential failures of said models/integrations become the main issue. The problem becomes the resiliency of our systems, not the coding capability of models. (I am assuming that LLMs will digitize most of our systems, or at least write software that will be critical to those systems)

In my mind, the potential failures of AI-based software systems are akin to earthquakes. Same stochastic nature. For now, we cannot predict when an earthquake will occur, we cannot predict the instantenous magnitude or the depth or the duration. We can predict the potential maximum magnitude the hypothetical earthquake can have. We know how long it has been since the last one. And as far as I am aware we can monitor the load on the fault-line too (correct me if I am wrong please). Based on these, we can run simulations on different scenarios on what the real-life impact of the earthquake will be, given the following variables:

  • The magnitude
  • Depth
  • Location and the distance to residential/industrial areas
  • Duration
  • Time of day (i.e. during the day where the workforce is in offices vs. homes etc.)

Maybe more variables that I am not aware of. Given all this, the way to prepare for an earthquake is not to predict it. It is to build societies and systems that are resilient to earthquakes in the first place.

As a Turkish millenial, I witnessed the destruction caused by earthquakes as I was growing up, as well as 2 years ago. As a child, we used to get trainings on what to do in case of an earthquake. During these trainings a common talking point was about how we should have social and governance systems in place to ensure the safety of overall population including regulations on construction and many more. Japan was often given as an example of a country living with the reality of constant and large earthquakes and yet they managed to adjust in a way that the population largely remains protected.

When you are using LLMs during your own development, you should be thinking about, what's the worst thing that can happen if this LLM-produced code gets into production? What is the size of the impact? How many users, hours of downtime, $$ lost, systems impacted, debugging effort, time to recovery? There should always be a predictable and replicable way to reverse the changes and fast. This is just a best practice, doesn't have much to do with LLMs to be honest.

Honestly, the issues created by AI coding within the software development space are much easier to manage and solve if the effected tool is not integrated into broader social systems. When the LLM output scrambles the social security contributions for instance, how do we recover from this? This is NOT a software development problem anymore, it is a system design problem. And the system in question is the social security system itself. If we want to integrate LLMs to everything, our fundamental workflows in these systems must change.

When planning for integrating AI into our financial, healthcare, social and military systems, we should be approaching the design as occasional errors being inevitable. Most of these will be harmless, much like earthquakes, you won't even notice them. Some will be noticable but again harmless. Our designs should be such that when the inevitable monster earthquake hits, our systems are able to tolerate it and recover from it. And I don't believe this is only a software system design issue. In my mind, this is more of a societal design issue. You don't adapt the AI to your use-case, you adapt the way you work to AI. And therefore whether it is worth integrating AI or not should be carefully considered for each system.

I have no framework regarding how to go about this. My gut feeling is that we first need to take account of where we are most vulnerable as a society. Hopefully, by that point, people who are smarter than me will be able to catalogue and suggest strategies on mitigating the impact of each potentially catastrophic failure.

The energy and resource consumption of LLMs remains a giant issue. I am not convinced that in thier current state LLMs are worth the cost. But much like any technological advancement, I hope the capabilities of AI overall will continue to increase while the resource requirements flatline. And hopefully, before it is too late, the very same AI consuming all the resources will help us replenish those resources (viable nuclear fusion cannot come soon enough, although by current estimates we are at least 70 years away)