05.06.2025 – Pavel Klavík
Well, at least the serious ones. In recent years, software got much slower, basically killing all advances in hardware performance. Finally, software performance got to the front of debate. Yet the discussion is often twisted toward things that do not really matter. One such myth is that there are fast and slow programming languages, while in fact the difference in speed is rather negligible. Furthermore, languages differ in power, so it might make sense to trade a bit of performance for it. I will showcase this on my favorite programming language Clojure.
#development, #languages, #programming, #optimization, #tech, #performance
Modern software has become incredibly sluggish and things keep getting worse every year. This completely contradicts trends in hardware—computers are getting faster all the time, so you would expect that software will reap the benefits. Still, each new update just makes software slower and often even more broken than before. It is not unreasonable to say that modern software is often 1000× slower than it should be.
It is great that the discussion of slow software performance got in front of mainstream developers. The topic reached a much wider audience, thanks to game developers Casey Muratori and Jonathan Blow, who have a large number of followers. It makes perfect sense: game developers work much closer to the hardware and try to squeeze as much performance from it as possible. This brings a deep understanding of what computers are capable of, without which it is harder to notice that things are too slow. Many new programmers have never actually seen how fast computers can be. In the past, computers were much slower, so code had to be optimized to run at all. Systems were much simpler so it was easier to see what was happening under the hood. That’s no longer true.
The debate about software performance is amazing and I am very grateful for that, because only by acknowledging the problem, we can actually make things faster. Unfortunately, the debate is often twisted in counterproductive directions. In the article, I want to talk about one such misconception.
When I was at university, there were endless “holy wars” among students over which language was best. The contenders? C++, Java, and C#—three languages that, for most purposes, are nearly identical. Sure, there are more exotic options like Prolog or Lisp, bringing fresh ideas to the table, but most programmers never touch them.
The biggest myth is that some programming languages are slow while others are fast. I have heard many times that some software is slow just because it was written in Java, and it would be much faster when written in C++. Well, slower by how much? If the answer is 30%, then it is very well possible. But any larger gap is more likely the programmer’s fault—not the language’s.
Engineers working on languages, compilers, and virtual machines have put a lot of effort into making them very efficient which mostly removes the speed differences between languages. Google’s V8 JS engine makes JavaScript a viable language. For example, string implementation with ropes is very impressive. And garbage collection algorithms got much more advanced in recent years.
I searched for concrete numbers comparing programming languages speed. This is hard, since so many other factors influence how fast software runs. The challenges of language benchmarking are well explained in this discussion with Casey Muratori. For example, you cannot just write a tiny snippet in several languages and compare their speeds, because real code rarely looks like that. So you don’t learn much from such comparisons.
It is also hard to compare two similar projects written in two different languages under the same load. Suppose that we measure that Nginx written in C can serve static files 10× faster than our own server written in NodeJS. Does this imply that C is 10× faster than NodeJS? Of course not—Nginx developers just spent a crapload of time optimizing its performance, while we didn’t. A better NodeJS implementation might run only 2× slower.
And it might very well be that programmers who use C or Rust care much more about performance, so of course their software ends up faster. Whenever you see some flashy article claiming that rewriting a project to Rust magically made it 50× faster, ask yourself how bad the original code in the other language was. (Hint: Using nested immutable vectors as two-dimensional arrays is not a good idea.)
Let’s conclude by putting some upper bound on language speed differences. From experience, I would bet that pretty much every real programming language is at most 5× slower than the fastest one available, and this is probably a huge overestimate anyway. If you don’t like it, hopefully we can agree on 10× slower at least. On the other hand, modern software is often 1000× slower than it should be. Great! You still have two orders of magnitude to make your software more performant. Even when you can just make it 10× faster, it will stick out of the slow software crowd and your users will appreciate it. So I wouldn’t stress much about the speed of your programming language, just use whatever you are familiar with.
Are you one of those low-level programmers who loves to flex about how much faster your software is?? Well, first of all, thank you for using Rust. Second, you should understand how much of a disservice you are doing to the performance of software in general. Imagine someone has spent years building a project in Python. Then you show up, call their work “too slow,” and insist the only answer is to rewrite the entire project in Rust since it is SO MUCH faster. Their natural reaction: “Screw the performance, I am not doing that.” The real joke is, with a little profiling and a few tweaks here and there, the old code could get much faster—almost for free.
Even when the speed difference between “fast” and “slow” programming languages is small, shouldn’t we still take advantage of it? Sometimes, yes. If you’re writing something that does heavy computation or needs to run in a very limited environment, it makes sense to use a low-level language to get as close to the machine as possible. But in those situations, you probably already understand performance quite a lot and are using one of these languages. Most of the time, this just isn’t the main issue.
One reason to choose a language is your familiarity with it. Big companies might pick a language that’s popular or easy to hire for. The real technical reason is that some languages are more powerful than others. I believe the best decision is to choose the most powerful one.
If you think that all programming languages are pretty much equal in power, it will be hard to convince you otherwise. Paul Graham wrote an excellent essay where he describes this as the Blub Paradox. I was quite lucky that I tried various programming languages, so I could experience this first hand.
My favorite programming language is Clojure, which is a high-level, dynamically typed, functional Lisp running on top of JVM. There are also other hosts available for it, in particular ClojureScript, which compiles to JavaScript, so one can target the frontend with it as well. The entire OrgPad is written in it : the backend in Clojure on JVM and the frontend in ClojureScript. It became pretty much the only language I have used for the last 7 years.
Clojure is the most powerful language I know and I don’t really plan to switch; unless something much more powerful comes in the future, of course. It’s so simple, I can fully focus on solving business problems. This is a big part of its power.
Concerning performance, you might be thinking that Clojure has to be slow. There are no free lunches, are there? But Clojure is really fast. Its creator, Rich Hickey, has spent a lot of time making it fast. Since it compiles and runs on JVM, it can leverage the amazing performance of this platform. And its design makes concurrency easier, so you can squeeze even more performance out of today’s multithreaded hardware.
There is also another hidden performance benefit to using more powerful programming languages. Our time is limited, and it doesn’t matter whether we are doing an open source project or working for a company. Whenever we finish an improvement in OrgPad, there are usually ten new meaningful changes we would like to accomplish in the future, including various performance improvements. For example, right now I am rewriting how cell sizes are calculated, which will improve load time of OrgPad documents by at least 2×. If we can move forward faster, we actually have more time to work on performance, which in the end might make software even faster.
You probably wouldn’t guess that my second favorite programming language is C. I never liked C++ and other object-oriented languages such as Java or C#, since they never felt like they added much. The standard libraries in those languages are, of course, more powerful, but they come with a lot of unnecessary baggage. My problem with “Clean code” is not its horrible performance, but that it actually makes code less readable and clean. Software should be built from data and functions, nothing else is really needed. Similarly to Clojure, C is a very simple and straightforward language. I believe this is not a coincidence, and Robert Martin nicely points this out in his talk Clojure is the new C.
Clojure is more powerful because it lets you do things that are pretty much impossible in most other languages. For a quick introduction to Clojure, check out my video Building web apps with Clojure or my interactive Clojure tutorial built in OrgPad.
Getting quick feedback while programming is important. There is a saying that one should iterate quickly. Which is quite wrong, as Rich Hickey points out: “iterate” means “do over”. We should instead improve quickly. This is done by doing a lot of experiments and figuring out what works and what does not work.
Unfortunately, things are actually moving in the opposite direction. Compile times keep growing, and it often takes minutes before testing your code. TDD is a sort of hot-fix to these problems, where we write tiny programs on the side so we can check the functionality faster. When I worked at Google, I often had to deploy my code to their cloud, and I only got feedback on it after an hour. Figuring out problems there was hard and development was very slow.
Clojure brings a powerful interactive experience to programming. This is actually nothing new; these ideas were already explored in the ‘70s with Smalltalk and Lisp machines. Unfortunately, we have sort of collectively forgotten about them; for example, they are not taught in programming courses or university classes. At least I personally was introduced to them via Clojure, and it completely changed my view of programming. I even created an entire video about this called Clojure Superpower: Interactive Programming with REPL.
Hot code reloading was popularized on the web by Bruce Hauman who created ClojureScript Figwheel, see this amazing video of interactive coding of Flappy Bird. You can run an app, change its code, and see those changes live without restarting. The essential part is that the state of the application is preserved, only its look and behavior afterwards is updated. This is great when working on user interface, we can quickly do tests and small adjustments, and coding becomes a lot of fun. It would be pretty much impossible for me to go back to the old write-compile-try out cycle. This programming approach is so life changing that it was adapted by various other frameworks, for example, React Hot Loader, and it can even be implemented in C.
The other amazing tool is called REPL. It allows us to connect to a running application (on a server or in a browser) and execute arbitrary code there. Do you want to do some quick computation or update data stored in the database? You can run exactly the same commands as during your normal development, unlike altering the database using for example psql. REPL is tightly integrated with my editor, so I can easily execute code or even reload changed files from there. When I am developing some new functionality, I can just reload it using REPL and immediately test it out without having to recompile and restart the server.
Now there is something outright impossible in usual programming languages, which scares a lot of programmers. REPL does not have to connect to a running server on my development machine. It can also connect to the same application running on our staging or production machine, using an SSH tunnel.
This gives me a lot of new powers. I can quickly run computations there with production data, or make manual changes in them. When I do a bug fix on the server, I can reload the changed file without having to deploy and restart the server, and we will add the change to git and redeploy later together with other changes. And I sometimes use REPL to do development in staging/production which would otherwise be difficult. For example, we built and tested our Stripe integration directly on our staging machine, which was essential since Stripe could send messages back via a webhook.
Clojure has amazing built-in immutable data structures: lists, vectors, maps and sets, and these can be arbitrarily nested in each other. Immutability means that any change to their values basically creates a new updated copy on the side while the old one is preserved. Of course, copying-on-write would be too slow. These data structures are represented by trees in memory. Therefore most data in memory is shared between versions, we only have to store a new path from the root to the modified node. Parts which are no longer needed will be later removed by garbage collection. For example, this is how appending 5 at the end of the vector [0 1 2 3 4] looks in memory. See this blog post if you are interested in details.
Immutable data structures simplify the code and reasoning about it by a lot. For example, you don’t have to worry whether some function will change its inputs because they cannot be changed, just a new version can be returned. Clojure encourages storing the entire state in a few places and writing most of the code using pure functions. Pure functions are much easier to reason about, test, they are great building blocks for software.
Rich Hickey has spent a lot of time optimizing these data structures. He actually had to invent how to implement fast immutable hash maps in this way, which has been reimplemented in many other languages since. The resulting performance of Clojure data structures is 1-2× slower for reads and 1-4× slower for writes compared to mutable Java data structures, which is fast enough for real applications. Actually, they allow much easier concurrency as we will see, so you can better utilize the entire computer. And doing a lot of modifications to some immutable data structure can be sped up with transients.
Combining mutable data with concurrency is hard. In some cases, like processing pipelines, you can split data into pieces, process them independently, and then combine the results. But as soon as data needs to be shared, things get tricky. Usually, this is solved by locking: one thread locks the data while using it, so other threads can’t access it at the same time. Even if a thread just wants to read data, it often still needs a lock, to avoid another thread changing things mid-computation and causing inconsistencies. To speed things up, we might lock just parts of the data which are needed. This leads to problems where threads deadlock each other by holding parts of the data and waiting for the other parts to be released. I have hopefully illustrated that concurrency is really hard in the mutable world.
With immutable data, we can avoid locks completely. Clojure offers a few ways to handle concurrency, implementing different STM mechanisms. The simplest one is an atom, which is a box pointing to some immutable value and the target changes over time. Any thread can dereference an atom to get its current value. Since that value is immutable, it’s safe to use in any computation—it can’t change out under your hands, so no locks are needed for reading.
The value of an atom can be changed by Clojure swap! function. It is given an atom and an update function. It applies the update function on the current atom value and its result is then stored in the atom. What happens is that the new value is built in memory on the side, usually sharing most of the data with the previous value. When the computation is finished, the atom is atomically swapped to the new generated value using Java’s AtomicReference. If some other thread retrieved the previous value, it is still preserved and unchanged, so it can finish its computation. And from now on, threads will receive the new value when dereferencing the atom.
So how to avoid locks for writes? Suppose that two threads try to update the same atom at the same time. One of these threads finishes first and updates atomically where the box points. When the other thread tries to update the box, it discovers that it already points to a different value. Therefore, it restarts the computation with the new input value until it succeeds in being the first thread to change the box target.
Here is an example code showcasing atoms. We store the current value in m, then change the atom's value, but the original value stored in m is preserved.
(def a (atom {:a 42 :b "Hello"}))
=> #'user/a
(def m @a)
=> #'user/m
(swap! a assoc :a 101010 :c 42)
=> {:a 101010, :b "Hello", :c 42}
m
=> {:a 42, :b "Hello"}Atoms work amazingly in situations where most threads are reading the values and changes are pretty quick. For example, most of the state of OrgPad’s server is stored in one big atom called cache-db, which takes about 2 GB in memory. It works as a source of truth for the information about users, documents, permissions, and so on. When the server starts, it reads all data from multiple tables in PostgreSQL and builds cache-db, which takes about 15 s. Then all changes are quickly written to both cache-db and PostgreSQL. Most of the reads can be handled by cache-db, only occasionally loading some more data from PostgreSQL which is much slower. There might be 50 different threads reading and updating cache-db all the time, and it works amazingly well. Using cache-db, we have greatly decreased latency and saved a lot of performance on our server.
In Clojure, we write code that transforms data structures into other data structures. For example, we have some data as a vector of maps and write a function that turns them into Hiccup, a data structure representing HTML. Then Hiccup is turned into an HTML string, which the browser renders. Data is the main thing, which is why I don’t like the term “functional programming” and prefer data-oriented programming.
Since Clojure is a Lisp, its code is also written as Clojure data structures. So in the same way, we can write functions called macros which turn this code into another code. Macros run at compile time and allow powerful code generation. Lisp macros are great because the structure of the language is very regular. On the other hand, generating or transforming, say, C++ code is hard and error-prone.
Let’s look at an example. Say you want to take the numbers 0 through 9, increment each by one, and sum them up. You might write:
(reduce + (map inc (range 10))) ;; => 55This works, but the nesting makes it harder to read—you have to start inside and work your way out. Add more transformations, and it quickly becomes even messier.
Clojure solves this elegantly with threading macros. With ->> macro, you can express the same logic as a clear, readable pipeline:
(->> (range 10) ;; => 0, ..., 9
(map inc) ;; => 1, ... 10
(reduce +)) ;; => 55Threading macros aren’t some magical language feature—they’re just simple macros included in Clojure’s core library. At compile time, ->> transforms your pipeline into the nested form above. Below is the actual code for ->>. If you don’t follow every line, don’t worry! I just want to illustrate that the code is really short:
(defmacro ->>
"Threads the expr through the forms. Inserts x as the
last item in the first form, making a list of it if it is not a
list already. If there are more forms, inserts the first form as the
last item in second form, etc."
{:added "1.1"}
[x & forms]
(loop [x x, forms forms]
(if forms
(let [form (first forms)
threaded (if (seq? form)
(with-meta `(~(first form) ~@(next form) ~x) (meta form))
(list form x))]
(recur threaded (next forms)))
x)))Macros let you build new language features right into Clojure. If you want to dive deeper, check out Mastering Clojure Macros for more details.
We personally don’t use macros much in OrgPad’s code, but they are very powerful and useful in the Clojure libraries we depend on.
Last but not least, Clojure is incredibly straightforward. Consider the problem of counting the frequency of used words in the given string. Here is the full implementation in Clojure:
(defn count-words [text]
(-> (str/lower-case text)
(str/split #"\W+")
frequencies))Not only is the code just three lines—it actually reads like a recipe: convert the text to lowercase, split it into words, then count how many times each word appears.
For comparison, consider how we could solve the same problem in Java. Nowadays, functional Java streams can be used:
public static Map<String, Integer> countWords(String text) {
return Arrays.stream(text.toLowerCase().split("\\W+"))
.collect(Collectors.groupingBy(
Function.identity(),
Collectors.summingInt(s -> 1)
));
}It works similarly, but the result is much noisier than the Clojure version. Looking at the code, the underlying recipe is harder to see.
The classical Java implementation with a cycle would be even more distant:
public static Map<String, Integer> countWords(String text) {
Map<String, Integer> freq = new HashMap<>();
String[] tokens = text.toLowerCase().split("\\W+");
for (String token : tokens) {
freq.put(token, freq.getOrDefault(token, 0) + 1);
}
return freq;
}Clojure code is much more straightforward and concise, and this carries through your whole code base. Some Clojure companies estimated 5× to 10× savings in the number of lines. OrgPad currently has about 140k of lines, divided between frontend, backend, and shared code in between. I certainly wouldn’t want to write and manage about 1M lines of Java/JS, doing pretty much the same thing but much worse.
I said comparing language performance is tricky, but let’s do a quick benchmark between Clojure and Java, just for the fun of it. I generated about 250k of random lorem ipsum characters and ran both implementations multiple times. The Clojure code took about 3.7 ms while the Java cycle code ran in 2.95 ms. The difference is caused by two reasons:
So the Clojure code currently takes 25% longer.
We can also write a more efficient Java implementation which sweeps the string only once and directly detects words without a regexp.
public static Map<String, Integer> countWordsCharByChar(String text) {
Map<String, Integer> freq = new HashMap<>();
StringBuilder sb = new StringBuilder();
for (int i = 0; i < text.length(); i++) {
char c = text.charAt(i);
if (Character.isLetterOrDigit(c)) {
sb.append(Character.toLowerCase(c));
}
else if (sb.length() > 0) {
String word = sb.toString();
freq.put(word, freq.getOrDefault(word, 0) + 1);
sb.setLength(0);
}
}
if (sb.length() > 0) {
String word = sb.toString();
freq.put(word, freq.getOrDefault(word, 0) + 1);
}
return freq;
}This code finishes in 1.24 ms, so the Clojure code now takes almost 3× longer.
Can we say that Java is 3× more efficient than Clojure? Well no, as we have already discussed. Furthermore, I have actually tricked you. I said before that I only program in Clojure. I didn’t want to spend time figuring out how to work with Java and do proper benchmarking there. Luckily, Clojure runs on top of JVM and can access everything in Java using a special syntax called Java interop. So I have actually written both “Java” codes in Clojure directly and ran all benchmarks there using the Clojure Criterium library. So we were not comparing Clojure vs Java performance, but performance of three different Clojure implementations. Here are the codes:
(defn count-words-java [^String text]
(let [freq (HashMap.)]
(doseq [^String token (.split (.toLowerCase text) "\\W+")]
(.put freq token (inc (.getOrDefault freq token 0))))
freq))(defn count-words-java-char-by-char [^String text]
(let [^HashMap freq (HashMap.)
^StringBuilder sb (StringBuilder.)
append-string! #(when (pos? (.length sb))
(let [word (.toString sb)]
(.put freq word (inc (.getOrDefault freq word 0)))
(.setLength sb 0)))]
(dotimes [i (.length text)]
(let [c (.charAt text i)]
(if (Character/isLetterOrDigit c)
(.append sb (Character/toLowerCase c))
(append-string!))))
append-string!
freq))Even this code is shorter and, at least for me, more readable than the equivalent Java code. Well, some people even say that Clojure is a better Java than Java.
As you can see, there are always ways to optimize performance. I would probably stick with the original Clojure version since it is nicer and more readable. But if it turns out that performance of this particular operation matters for my application, I could easily make it more performant.
While modern software is incredibly slow, this is hardly the fault of chosen languages. So what should we do to make things faster? First of all, we need to acknowledge the problem and care enough about it.
It is important to improve your overall understanding of how computers and programming work. Not only for performance—you will become a better programmer in general. You build knowledge by experimenting with various things and evaluating what works and what does not. Often the most random things and experiences turn out to be useful later.
If you have never written any low-level program, I encourage you to pick some simple computational problem and write a solution for it in, say, C. A long time ago, I did a lot of competitive programming, and I also wrote a simple solver for the game Sokoban in C. And while I don’t do these things nowadays, I have learned quite a lot along the way.
On the other hand, if you have never tried a high-level programming language, I encourage you to give Clojure a try. Lisp is very powerful once you get used to its unfamiliar syntax. And experiencing interactive programming might completely change the way you want to do development.
Last but not least, I plan to write another blog post with some practical tips on how to improve performance, together with war stories from OrgPad’s development.