8. März 2026

agentic os: the next ui revolution

all the building blocks for the next ui revolution are on the table. but nobody has put them together correctly. that reminds me of something.

three breaks

in my conscious lifetime i have witnessed two fundamental breaks in human-machine interaction: the first was the graphical interface replacing the command line. i remember the moment i first got to use windows 95; before that i only had an old 286 that could only be operated via cli, which had mostly overwhelmed me in primary school. windows after that was something i could use intuitively, opening up an entirely new world.

the second break was the smartphone: when i watched steve jobs on the apple keynote livestream, it was one of the few moments in my life where i had a distinct feeling of singular quality: this moment changes the world.

in retrospect, both revolutions followed the same pattern: the technological building blocks existed long before a combination of all blocks was found that worked as a coherent whole. in both cases there was a transition phase — fascinating in hindsight — where new technology was pressed into old metaphors, before eventually a product came along that put everything together correctly. i always find it amusing when you read comments on hacker news or reddit that frame all these inventions as obvious, trivial, and not even new, even though closer inspection shows that this putting-together is much harder than people think.

now, i am certain we are facing another break with existing paradigms. agentic ai, tool use, emotional voice interaction, autonomous agent systems: once again there are many building blocks on the playing field. the assembled system, however, is still missing.

the mouse took 27 years

i didn't consciously experience the first break, but its history is instructive nonetheless. on december 9, 1968, douglas engelbart demonstrates something in san francisco that would later become known as the "mother of all demos" [1]: a mouse, overlapping windows, hypertext links, collaborative real-time work. the future of computing in 90 minutes.

five years later xerox parc builds the alto [2], the first computer with a graphical interface, mouse, icons and windows. the technology works, but the alto is not a product, it's a research project. the xerox star follows in 1981 as a commercial device, but it's too expensive and too slow. when the apple macintosh appears in 1984, it is the first usable consumer computer with a genuine mouse-first interface, and yet it remains a niche product. only windows 95, a full 27 years after engelbart's demo, feels to the general public like an operating system truly built for the mouse.

27 years from invention to mass-market product. and in between: a long phase where the mouse exists, but the interfaces are still designed for the keyboard.

the transition phase

norton commander, 1986. a file manager for dos, two panels side by side, entirely keyboard-driven. from version 3.0 (1989) it gets mouse support, which changes nothing about the fundamental design because the mouse operates an interface built for the keyboard. it works, for power users even superbly, and midnight commander, the open-source successor, is still used today. but it's not a mouse-first interface. it remains the old metaphor with a new input device.

this is not a design flaw. it's a pattern. when a new interaction technology emerges, it is first inserted into existing paradigms. the mouse clicks on text menus. the touchscreen operates miniaturized desktop windows. and the ai agent types into chat windows.

before the iphone

i think it was cebit 2002 when i took a regional train to hannover with a friend. deutsche telekom had set up a huge booth where magenta-clad people walked around pressing a nokia 7650, the first nokia with a built-in camera [3], into your hands. we took a photo and sent it via mms to my email. magical and simultaneously completely useless.

but in retrospect it was a moment when important new building blocks fell onto the playing field: a camera in a phone, a mobile network that could transmit data, and the internet that was fueling everyone's imagination at the time.

the years before the iphone are full of such moments. palm pilot, 1997: a stylus-based organizer with a handwriting recognition system called graffiti that you had to learn first [4]. windows mobile: a miniaturized desktop windows with a tiny stylus on a tiny screen where you had to tap open the start menu with a pen. nokia with symbian: internet via wap, keyboard-driven. blackberry: email machine with a physical keyboard, beloved by managers.

all the pieces existed. lithium-ion batteries enabled usable runtimes. umts brought mobile internet. touchscreens were invented. websites and email were widely used.

what was missing was not technology. what was missing was someone who radically says: no stylus, no physical keyboard, finger-first, multi-touch, and the entire interface designed from scratch for this interaction. that someone was steve jobs on january 9, 2007. [11]

the first iphone didn't even have umts, just edge, which was slower than what the competition offered at the time. it had no copy-paste, no third-party apps, no mms. technically it was inferior in many individual categories. but the UX was so fundamentally right that none of it mattered.

we're sending ourselves mms

2026. i look at the building blocks of the next revolution and have the same feeling as back at the telekom booth.

llms can use tools, and with agentic tool use the core capability has been invented that enables everything else. mcp (model context protocol) standardizes how tools are declared and called. WebMCP brings this to the browser [5]. hume ai has shown that emotional, natural voice interaction is possible [6]. agentic applications like claude code and cursor write software, OpenClaw offers a glimpse of autonomous agents, airpods sit in millions of ears and 5g networks are deployed, even though nobody quite knows what for yet, which is strikingly reminiscent of umts before the iphone.

and what do we do with all of this? we type text into electron webapps.

the chatgpt desktop app, claude desktop, gemini in the browser: these are the windows mobile phones of the ai era. you have a text window with a cursor, you type and wait, while underneath a technology works that would be capable of so much more. it works, for power users even superbly. claude code is something like the midnight commander of the agentic era, a tool for a domain that is extremely powerful, but not an entirely new paradigm of computer use.

there have been first attempts to think beyond the chat window. the humane ai pin had the right thesis at its core, namely that you don't need a big screen when the agent itself is the interface rather than the display. but the execution was so poor that hp bought the assets in early 2025 for 116 million dollars and shut down all pins [7]. the rabbit r1 launched with four app integrations and a battery that lasted four hours, which was roughly as convincing as it sounds [8]. these are the palms and pocket pcs of our time. the direction is right, but the execution shows how hard this putting-together really is.

the next paradigm

what does the operating system look like that is built for agents?

the honest answer: nobody knows. just as in 2006 nobody knew what a smartphone should look like that was built for fingers. but one can sketch the direction.

an agent that runs in the background, that filters, plans and acts, doesn't need a big screen. what it needs is a connection and a microphone. occasionally it needs a display, when you want to see something, compare, choose. but for most interactions audio is enough.

the device would perhaps be smaller again than current smartphones, wearable like a brooch, foldable with a folding screen when you do need a display. it would record around the clock, audio and perhaps video, and would thus always have all essential information at hand. together with airpods this creates a system that can handle most tasks that today require a screen.

star trek the next generation is — as so often — a guide here. "computer, when is my next meeting?" works wonderfully via voice. "computer, show me the way to the turbolift on deck 7" works better with a screen. neither voice nor touch alone is the solution, but the combination of both.

and the architecture would perhaps be a radical thin client with minimal local processing power, because the cloud handles everything heavy. if even games are now better streamed than locally computed, as nvidia geforce now demonstrates [9], then it's actually nonsensical to still pack large processing power into mobile devices. what counts is battery life and size, and 5g, which has so far been a solution without a problem, would finally be the infrastructure that makes this possible.

what disappears

settings for example, which nobody can navigate anyway because even at apple they've become so bloated that you get lost in them. an agent configures itself and asks the user about their preferences. app grids and home screens are relics of the app era, just as bookmark bars are relics of the website era. manual notification management becomes superfluous because the agent decides what deserves its user's attention and what doesn't.

the attention inversion

the last 10 to 15 years of the tech industry can be summarized in one sentence: platforms optimize for time-on-platform. infinite scroll, autoplay, push notifications, dark patterns that keep your face pointed at the screen. the entire platform economy is based on monetizing your attention.

an agentic operating system would invert this. no longer the platform decides what you see, but your agent decides what deserves your attention. it filters, curates, summarizes and acts autonomously 24/7 in the background. attention could thereby move away from devices for the first time in 30 years.

in my last article about webmcp i described how the surface of the ad-funded internet dissolves when agents interact directly with structured interfaces. here it's about the consequence: when the surface disappears, the business model based on surfaces disappears with it.

from my perspective the question is not whether this happens. the question is for whom.

your agent works for you — or for someone else

those who have money already pay for at least a paid ai today, if not the particularly powerful max tier. those who value productivity have long understood that social media harms its users and these platforms don't make you feel good, even though few manage to follow through on that realization. this is no secret, it's a truism that changes nothing because the conditioning attention mechanisms are more powerful than insight.

an agentic os could technically break this open. but the question is the business model.

class one: your agent works for you. you pay a subscription, maybe 200 or 300 euros a month, and the agent is loyal because you are its customer. it filters advertising, curates what's relevant, cancels subscriptions you don't need, negotiates prices and fundamentally acts in your interest.

class two: your agent is free, so you are the product. except this time it's not a feed you could theoretically close, but an autonomous agent making decisions on your behalf. one that "recommends" you the hotel paying the highest commission. that "forgets" to cancel your subscription. that summarizes news emphasizing certain perspectives and suggests "suitable" products.

this would be fundamentally worse than social media dark patterns. on instagram you at least theoretically know you're being manipulated. an agent that doesn't act in your interest but pretends to — that would be a new category of dark patterns.

the market structure drives exactly in this direction: google and meta must offer their agents ad-funded, that is their business model. the alternative would be abolishing their own business model, and no publicly traded company does that voluntarily. so free agents will come that are not neutral. the only question is how visible this will be.

the apple dilemma

from my perspective apple is the only major player whose business model doesn't depend solely on attention, because they live on hardware margins rather than ad revenue. apple has the ecosystem, the chips, the integration, the customers willing to pay, and the privacy positioning. you'd think if anyone can build the agentic os, it's apple.

and at the same time they seem directionless: mac studios with m-chips that are accidentally powerful enough to run llms locally are appealing to customer segments they had long since deprioritized.

the problem here is obviously not technology. the problem is that a real agentic os would turn the entire existing business model upside down. the app store, the ios ui philosophy, the relationship with content providers, the entire home screen metaphor — all of it would have to go. ceo tim cook appears to be someone whom nobody outperforms at optimizing the existing. but i don't think he'll make a 180-degree turn that cannibalizes his own platform before he retires.

steve jobs was different, he waged the flash war and took on the entire music industry. he would have destroyed his own platform at any time to build the next one. cook is not jobs, which is not meant as criticism of him personally, because almost nobody is jobs and jobs also needed cook to turn apple into a money machine, but it means that apple's future stands on a knife's edge.

open questions

the parallels to the smartphone revolution are obvious to me: the building blocks exist, early attempts fail at their execution rather than the idea, the established players move cautiously and struggle to find the right combination of blocks.

but there are also differences: the smartphone was a consumer device with an immediately tangible benefit because it put phone, internet and camera in your pocket. an agentic device has a vaguer benefit because it does things better that you were already doing before, just differently. and the leap from "i type a message" to "my agent handles it" is conceptually and psychologically wider than the leap of doing things you previously did on a pc now also doing them on the go with a smartphone.

on top of that come conceptual questions that don't yet have good answers: 24/7 recording is technically possible and almost necessary for a good agent, but the microsoft recall debate [10] has shown how quickly something like that can tip over. a device that records everything is always also a surveillance device, and the line between "my agent remembers for me" and "my agent surveils me" is thinner than you'd think.

add to this that the platforms of recent years have dismantled open protocols rather than built them up. rss is de facto dead, twitter closed its api, reddit made its api paid. open standards would be incredibly powerful right now, but the incentives don't align because the major players lose more from openness in the short term than they gain.

and then there is the most fundamental question, namely who builds the whole thing: not the technology, that exists. but the product, the thing that in retrospect looks obvious, just as the iphone in retrospect looks obvious. someone needs to put voice, touch, agent, cloud, hardware, trust and connectivity together in a way that feels like it couldn't be any other way. that is not a technology question but a design question, and it is the biggest one since 2007.

the answer could come from a company that doesn't exist today. or from one we know but underestimate. or it could take another decade, just as it took 16 years from engelbart's demo to the macintosh. the patterns of the past say: it will come. the patterns of the past also say: it will look surprising.

until then, we're sending ourselves mms.

translated from the german original by claude opus 4.6

resonanz

how did you like this post?

sources

[1] D. Engelbart, "the mother of all demos," fall joint computer conference, san francisco, december 9, 1968. wikipedia [2] xerox parc, "xerox alto," 1973. first computer with graphical interface, mouse and windows. wikipedia [3] nokia 7650, first nokia phone with built-in camera and mms support. released june 2002. wikipedia [4] palm pilot, march 1997. stylus-based pda with graffiti handwriting recognition. wikipedia [5] T. Zindler, "webmcp: the end of the surface web," 2026. zindler.dev [6] hume ai, empathic voice interface (evi). hume.ai [7] I. Mehta, "humane's ai pin is dead as hp buys startup's assets for $116m," techcrunch, february 2025. techcrunch.com [8] rabbit r1, 2024. dedicated ai hardware device with four app integrations. wikipedia [9] nvidia geforce now, cloud gaming platform. nvidia.com [10] microsoft recall, announced may 2024. feature that records screenshots and makes them searchable. delayed multiple times after massive criticism. wikipedia [11] S. Jobs, iphone introduction at macworld keynote, january 9, 2007. youtube

← alle beitraege