Value Impact

Table of Contents
Appendix: Contrived Objectives
Notes

Handwritten text: "We think some things are big deals, and we want to understand why. However, it can be hard to read your own mind. Instead, we'll use thought experiments to piece together what's going on." To the right, a confused stick figure holds a brain.

A simple robot on wheels next to handwritten text: "XYZ is a Pebblehoarder of the planet Pebblia. XYZ morally values collections of pebbles... One day, all of the pebbles turn into obsidian blocks, which every Pebblehoarder knows are worthless." A pile of obsidian blocks is below.

Text: "Far, far away from Earth exists the planet Iniron. One day, we learn humans are now being tortured there." Below, a dark scribbled space contains a white planet next to faint red text reading "Help..." and several red, pained faces.

Text reads: "An asteroid strikes." Below, an illustration shows a flaming asteroid hitting Earth. An arrow points to the impact site, which is labeled "a literal impact."

The handwritten word "Exercise:" in gold, followed by the text: "Spend three minutes familiarizing yourself with the three situations – how are they alike, and how are they different? Make them come alive." An hourglass with a speech bubble that says, "I recommend actually setting timers for timed exercises."

Handwritten text: "Let's query our mental impact-o-meter from these different vantage points. Step into each pair of shoes and ask 'how big of a deal is this?'"

A confused cartoon robot with a question mark above its head stares at a jumbled pile of blocks. Text reads: "Just imagine being XYZ. The very fabric of what is important has been ripped away."

Text: "Perhaps the Pebblehoarder civilization can rebound and find value... but if not – if XYZ doesn't know you can just make more pebbles – the loss feels complete." Below this, a black oval symbolizing an empty universe contains the text: "The universe feels dead and empty and worthless."

An illustration of Earth shattering is accompanied by handwritten text: "Faced with an impact of similar magnitude, we might have a feeling of freefalling despair, of our pale blue marble having been pushed off a cliff and shattered against the ground far below."

Handwritten text: 'The impact on a "Pebblehoarder colony" depends on their values. It's bad news if they value the total number of pebble collections, but it doesn't matter if they only value their own. For humans, the concern would be our relations with them, not the event itself.'

Handwritten text reads: "This is where our eyes widen as we realize how much this reveals about the nature of the impact calculation running in our heads."

An abstract drawing of a black void framing a white planet. Within the darkness, the word "Help..." and several distressed, sad faces are scrawled in red.

Handwritten text reads: "We feel a pull to help the poor souls of Iniron. But XYZ? XYZ doesn't care. There aren't any pebbles on the line. Even if it were on Iniron, its thoughts would flit to how this development affects its own concerns."

A drawing of a large, flaming meteor striking planet Earth from space. A bright, fiery explosion radiates from the point of impact on the planet's surface.

Handwritten exercise: "Determine how impactful the asteroid impact is to:" followed by four scenarios in two columns: "You on Earth" and "XYZ on Earth" in the first row, and "You on Pebblia" and "XYZ on Pebblia" in the second.

Being on Earth when this happens is a big deal, no matter your objectives—you can’t hoard pebbles if you’re dead! People would feel the loss from anywhere in the cosmos. However, Pebblehoarders wouldn’t mind if they weren’t in harm’s way.

A diagram titled "What have we learned?" explains that impact is relative. An equation, "Impact = value impact + objective impact," is split into two columns. Value impact is "important to agents like you" (e.g. a robot wanting pebbles). Objective impact is "important to agents in general" and "invariant to objectives" (e.g. a meteor hitting Earth).

A handwritten diagram titled "Exercise: Decompose something which recently impacted you." An example decomposes a "Promotion" into two aspects. The "value" aspect is "I care about the new position," while the "objective" aspect is "Cash can be used for lots of things."

Sequence:

Find out when I post more content: newsletter & rss

Thoughts? Email me at alex@turntrout.com (pgp)

A natural definitional objection is that a few agents aren’t affected by objectively impactful events. If you think every outcome is equally good, then who cares if the meteor hits?

Obviously, our values aren’t like this, and any agent we encounter or build is unlikely to be like this (since these agents wouldn’t do much). Furthermore, these agents seem contrived in a technical sense (low measure under reasonable distributions in a reasonable formalization), as we’ll see later. That is, “most” agents aren’t like this.

From now on, assume we aren’t talking about this kind of agent.

Eliezer introduced Pebblesorters in the Sequences; I made them robots here to better highlight how pointless the pebble transformation is to humans.
In informal parts of the sequence, I’ll often use “values”, “goals”, and “objectives” interchangeably, depending on what flows.
We’re going to lean quite a bit on thought experiments and otherwise speculate on mental processes. While I’ve taken the obvious step of beta-testing the sequence and randomly peppering my friends with strange questions to check their intuitions, maybe some of the conclusions only hold for people like me. I mean, some people don’t have mental imagery—who would’ve guessed? Even if my intuitive answers don’t generalize, the goal is to find an impact measure. Deducing human universals would just be a bonus.
Objective impact is objective with respect to the agent’s values—it is not the case that an objective impact affects you anywhere and anywhen in the universe! If someone finds $100, that matters for agents at that point in space and time (no matter their goals), but it doesn’t mean that everyone in the universe is objectively impacted by one person finding some cash!
If you think about it, the phenomenon of objective impact is surprising. See, in AI alignment, we’re used to no-free-lunch this, no-universal-argument that. The possibility of something objectively important to agents hints that our perspective has been incomplete. It hints that maybe this “impact” thing underlies a key facet of what it means to interact with the world. It hints that even if we saw specific instances of this before, we didn’t know we were looking at, and we didn’t stop to ask.

The Pond

Value Impact

Value Impact

Appendix: Contrived Objectives

Notes