Against Vibes: When is a Generative Model Useful

:: academia, research

Let’s suppose I wanted to answer a question: is the tool X useful for the task Y. If I were scientific about this, I would analyze the properties of tool X and develop a model, and the task Y and the requirements for it and develop a model, and I would use my models to predict the behaviour of tool X in the context of task Y. “Can I use timber instead of stainless steel as a support beam for this structure?” “Will this acid be an appropriate solvent for this reaction?” “Will this programming language provide these real-time guarantees?”

The discourse on generative models is not like this. Instead, you get claims like “software engineering is dead”, and attempts to shove generative models into literally everything without a thought. Search? Generative models. Code completion? Generative models. Summarization? Generative models. Voice to text? Generative models. Stock images? Generative models.

Any attempt to criticize this tends to go in circles and/or have people arguing past each other. Is a generative model useful for internet search? Well, look, it produces text that is plausibly related to the input prompt, so. So… what? That doesn’t answer the question. But the newest models are so much better! Better at.. what?

I was upset about this when it was being called “prompt engineering”, and found no sign of engineering, but instead a series of vibes about how to phrase a prompt in a particular version of a particular model, which sometimes produced output that is plausibly related to the input prompt and therefore plausibly close to what you might have intended. I’m upset now when people are making claims that agents are so useful, but can’t tell me when or why or how they’re useful beyond vibes about feeling more productive (vibes that have been refuted by real science contrasting objective measure of productivity vs. subjective reports), or examples of having produced a lot of plausible output.

(Okay, there are some researchers doing actual science and writing papers; I’m talking about the arguments being made as this stuff is integrating into schools, workplaces, etc.)

I want to know when generative models are useful. I don’t want to feel like they’re useful, that’s just a vibe. I’ve been a generative model skeptic basically from the beginning. I could not convince myself that generative models were useful. But I was also skeptical of my own subjective experience. I could imagine that a model capable of produce code from natural language would be useful, in some use cases that I had not found. I imagine there must be a model of when a generative model X is useful for task Y.

In this post, I’m not addressing ethical, political, or social questions. Those questions are important, and I want to address them separately from what the technology is capable of. Just for context: I think the widespread deployment of this technology is deeply problematic and irresponsible. I think further investment in it at its current scale is an almost criminal level of fiduciary negligence and will cause economic harm. I think the ethics of all of this are deeply troubling.

But for now, I just want to know what they’re technically capable of.

Are the ACM’s profits supporting its mission yet?

:: academia

Last year, as UBC got involved in ACM Open negotiations, I got curious about ACM financials. As I dug into them, I didn’t like what I saw, and wrote a blog post and then a CACM Opinion piece. The ACM honoured me with a response article, which includes lots more data and a contrary perspective on the ACM’s $250,000,000 in assets and $10,000,000/yr in profit. The response also made an interesting claim:

In terms of DL subscription revenue alone, the ACM is projecting a loss of $6M to $8M in 2026 when the DL goes fully open. The five-year projection is for a loss of $25M to $30M.

I was very interested in this claim. In making it, they have activated the trap card of falsifiability.

I’m quite happy that the ACM is moving to an open access model, and want that to be financially viable. But, as I said last year,

I do not think it is in the public or professional interest, nor does it advance art, science, engineering, and so on, to charge unnecessarily high publication and conference fees, taken out of public, research, and educational funding, and to hold that surplus income as an increasingly large pile of assets.

So I was keen to pay attention to this claim about the need for that surplus, and its use in supporting ACM Open.

Well I’ve got to reading the new ACM 2023 and 2024 publication finance report, and the new FY2023 IRS data, and attention I am paying. So… is the ACM losing money in the switch to ACM Open, and is its hundreds of millions in profit over the past decades helping it achieve its mission?

TLDR: Na. The ACM remains profitable and its net assets have continued to grow by tens of millions of dollars a year. And IMO, some of their rhetoric is even more worrying than that.

Academic freedom, freedom of speech, and politics

:: academia

Recently, there has been an attack on academic freedom and free speech at UBC, by fringe right-wing extremists, under the guise of protecting free speech. They recently escalated this attack from bullshit internal political wrangling (which I’ve been helping fight), to a BC Supreme Court Petition: Petition-VLC-S-S–252602-filed–7Apr2025.pdf

I’m so tired. So angry. This is such bullshit.

Let’s talk about academic freedom and freedom of speech.

A high-level summary and interpretation of ACM finances

:: academia

I’m in the middle of liaising with the UBC library to attempt to negotiate joining ACM Open. It’s not going well. While US universities appear to see cost increases of 2x—3x, Canadian universities are seeing costs of 10x—20x. And our budgets run much tighter, and with far less research funding, particularly for article processing and other publication costs. A 10x—20x increase in publication costs is hard or impossible to swallow.

So, I’ve been forced to look at publishing—where should I publish, how much will it cost, etc—and one question I asked was: why does ACM Open cost so much? It’s a lot more than other open access journals. So I started looking at ACM finances and.. well…

Here’s my dive into ACM finances, and questioning fundamental claims surrounding ACM publishing costs.


UPDATE, March 1 2024 After discussions with other ACM members and feedback on this article, I’ve realized there are several errors in the analysis and the interpretation. I’ll publish an updated version eventually, but leave this version for transparency and as a record.

In short, here are a list of errors in the below facts and analysis:

  1. The data relied upon, FY 2022, is atypical due to including running conferences under COVID restritions. Conferences generally pay for themselves.
  2. The $700 average cost of an article is justified by the Form 990s, despite the profit. This can be seen by cross-referencing the ACM publications finances report with the Form 990s, which is somewhat difficult to do because of the difference in categorizations and granularity of the different reports.
  3. Eliminating membership fees would actually increase costs, since it would likely increase membership, and members receive various benefits which incur costs.
  4. The ACM is probably not as bad as it seems at investing. The calculated return on investment appears to be atypical or artificially low. Further, for very good reasons, the ACM has a conservative investment strategy.
  5. The assets, and income from assets, are at least partially restricted in various ways, both from donors (restricted or endowed funds), or because it is held for the SIGs, which limits how it could be spent down.

However, I still stand by my high-level interpretation of the high profit and high assets of the ACM.


What is a model?

:: notes, research, academia

What is a model, particularly of a programming language? I’ve been struggling with this question a bit for some time. The word “model” is used a lot in my research area, and although I have successfully (by some metrics) read papers whose topic is models, used other peoples’ research on models, built models, and trained others to do all of this, I don’t really understand what a model is.

Before I get into a philosophical digression on what it even means to understand something, let’s ignore all that and try to discover what a model is from first principles.

What is syntax?

:: notes, research, academia

I’m in the middle of confronting my lack of knowledge about denotational semantics. One of the things that has confused me for so long about denotational semantics, which I didn’t even realize was confusing me, was the use of the word “syntax” (and, consequently, “semantics”).

For context, the contents of this note will be obvious to perhaps half of programming languages (PL) researchers. Perhaps half enter PL through math. That is not how I entered PL. I entered PL through software engineering. I was very interested in building beautiful software and systems; I still am. Until recently, I ran my own cloud infrastructure—mail, calendars, reminders, contacts, file syncing, remote git syncing. I still run some of it. I run secondary spam filtering over university email for people in my department, because out department’s email system is garbage. I am way better at building systems and writing software than math, but I’m interested in PL and logic and math nonetheless. Unfortunately, I lack lot of background and constantly struggle with a huge part, perhaps half, of PL research. The most advanced math course I took was Calculus 1. (well, I took a graduate recursion theory course too, but I think I passed that course because it was a grad course, not because I did well.)

So when I hear “syntax”, I think “oh sure. I know what that is. It’s the grammar of a programming language. The string, or more often the tree structure, used to represent the program text.”. And that led me to misunderstand half of programming languages research.