Table of Contents

"What is AU?"

"A utility function says how good something is."

Two equations. The first equation shows u(apple) = 0. The second shows u(banana) = 5.

"If we aren't sure what the thing is, we use expected utility."

The EU of a maybe-banana / maybe-apple equals 50% times the utility of apple plus 50% times the utility of banana. In this case, the EU is 2.5.

"What about attainable utility?" in handwritten black text, framed by decorative gold swirls.

"People have a natural sense of what they 'could' do. If you're sad, it still feels like you 'could' do a ton of work anyways. It doesn't feel physically impossible."

A distressed stick figure has a thought bubble that reads, "I could be productive...". They imagine themself working at a desk.

'However, the part of you that predicts stuff doesn't buy that's gonna happen. New section: Beliefs about Future Actions. Imagine suddenly becoming not-sad. Now, you "could" work when you're sad, and you "could" work when you're not-sad, so if AU just compared the things you "could" do, you wouldn't feel impact here. But you did feel impact, didn't you? So not only is AU not using the "could" algorithm, but it uses accurate predictions about how states of mind affect what you'll do later, and whether those actions will get you what you want.'

AU is roughly “Embedded Agentic” expected utility. Next: "It seems like we have beliefs about our future actions. Imagine having the following beliefs, where thinking more won’t change the numbers:"

A three-panel cartoon showing probable actions. A sad stick figure under a cloud represents "Moping and watching Netflix, 90% chance." A figure texting an ex "hey, u up?" is "Texting an ex you shouldn't, 6% chance." A stick figure at a desk is "Actually working, 4% chance."

Handwritten text asking "What does your AU feel like?" It suggests learning you're 90% likely to mope doesn't feel impactful because "AU seems to simply use our expectations," not aspirational possibilities.

Title: "Conservation of Expected AU." "Your beliefs should account for what you already know. If you believe that tomorrow you'll have good reason to expect to get an A on your final exam, then that should change your mind today. In the exact same way, your AU should account for what you already know - you don't expect to believe you can get a promotion without already believing it. However, sometimes we can predict impact to others."

A stick figure holding a gun stands outside a building labeled "BANK" and thinks, "They're not gonna like this..."

"From the perspective of the bank's clients, it feels like".

A line graph showing "Client AU" over time. AU starts high, then drops when a robber enters and the client thinks, "Am I gonna get shot?". The AU stays low until Batman arrives, causing AU to rise back to the original level.

"When the robber considers the client's' AU, it feels like".

A plot "Client AU" against time. The line is initially flat. An annotation marks when a "Robber enters," adding "he knows they won't like this." The line then rises sharply when "Batman arrives," an event noted as "also a surprise to the robber."

"Let's switch gears a bit; we're now ready to understand The Gears of Impact".

"How does a car crash affect you? Which consequences are important to you, exactly? Changes in your attainable utility, of course." Below, a car crash is juxtaposed with a graph labeled "Your AU" that shows a sharp drop.

"An event impacts us exactly and only when it changes what we think can happen for us. Probability mass has to shift."

Comparing "Before" and "After" probabilities for three outcomes. Before: "Go on a date" (Best) is 80%, "Stuck in traffic" is 20%, and "Paralyzed in hospital" (Worst) is 0%. After, a blue arrow indicates a shift in probabilities to 1% (date), 0% (stuck in traffic), and 99% (paralyzed).

"The degree to which important outcomes are affected by the event is the degree of the impact."

"At this point, this might feel obvious, but don't forget how far we've come! We can think about this process with the Frank analogy."

Frank is surrounded by three pink marbles. Two analogous scenarios begin: "Frank knows of pink objects, but doesn't think he can get to them." and "You've got plans tonight, but your friend flakes."

The two scenarios continue: "Frank considers alternatives..." and "You think about what you could do instead..."

A cartoon character with a square head stands on grass, surrounded by three gray circles. An exclamation point and a bright pink and yellow sun appear above its head. In the marble story, Frank finds a pink marble. In the friend-planning story, you remember another friend is available to hang out.

"Once promoted to your attention, you see that the new plan isn't so much worse after all. The impact vanishes. However, if you don't see a better alternative for some time, this becomes the new normal. If you then find a better alternative, this feels like a positive impact."

Two diagrams show a person with paths to goals like relaxing, grocery shopping, and hiking. In the first, all paths are open. In the second, the person is harmed, blocking all options. Text explains: "You are the common denominator. Objective impact involves harm to you or your resources, and this is why."

 "The first third of the sequence meets its close. We understood why some things seem like big deals, righted a wrong question, and just now skirted the fascinating deeper nature of objective impact... Objective impact, instrumental convergence, opportunity cost, the colloquial meaning of 'power'—these all prove to be facets of one phenomenon, one structure."

Frank and the Pebblehoarder sit together on a cliff's edge, overlooking a vast mountain range at sunset. The scene pays homage to the ending shot of the 2012 film, The Hobbit: An Unexpected Journey.

Exercise

Why does instrumental convergence happen? Would it be coherent to imagine a reality without it?

  • Here, our descriptive theory relies on our ability to have reasonable beliefs about what we’ll do, and how things in the world will affect our later decision-making process. No one knows how to formalize that kind of reasoning, so I’m leaving it a black box: we somehow have these reasonable beliefs which are apparently used to calculate AU.
  • In technical terms, AU calculated with the “could” criterion would be closer to an optimal value function, while actual AU seems to be an on-policy prediction, whatever that means in the embedded context. Felt impact corresponds to TD error.
  • Framed as a kind of EU, we plausibly use AU to make decisions.
  • I’m not claiming normatively that “embedded agentic” EU should be AU; I’m simply using “embedded agentic” as an adjective.

Find out when I post more content: newsletter & rss

Thoughts? Email me at alex@turntrout.com (pgp)