Humans can feel rewarded by signals that only imitate real success. This article explains why: evolution optimized us for fast proxy scoring, not perfect truth detection. Once stronger artificial cues appear, reward circuits can be hijacked even when rationally we know the outcome is fake.
Why “Fake Satisfaction” Can Still Fool Human Instincts: A Model of Genetic Goals and Reward-System Mismatch
1) The Core Problem
Many human behaviors appear to aim at “real” goals, such as:
- Survival
- Reproduction
- Social status
- A sense of achievement
But in reality, one striking pattern keeps showing up:
Even simulated satisfaction can make us feel the goal has been achieved.
Examples:
- Pornography substituting for real intimacy
- In-game achievements substituting for real-world accomplishment
- Social media likes substituting for social recognition
- Hyper-palatable food substituting for genuine nutrition
In short:
Humans can be deceived by simulated success.
2) The Central Explanation: Genes Have No Conscious Goals, Only Proxy Signals
From an evolutionary perspective, genes do not hold conscious objectives.
Human behavior is shaped by:
- Behavioral drives
- Reward mechanisms
not by direct control over final outcomes.
So the system cannot directly enforce “reproductive success” itself. It can only tune proxy variables like sexual desire, attraction, novelty seeking, and social reward sensitivity.
That is the key asymmetry.
3) The Reward System Is a Low-Resolution Scoring Function
The brain’s reward architecture can be approximated as a scoring function:
- Input: behavior + environmental cues
- Output: felt pleasure, relief, motivation, or satisfaction
Its purpose is not to model reality with full causal depth.
Its practical question is closer to:
- “Does this signal resemble historically beneficial outcomes?”
not:
- “Is this outcome truly beneficial in long-run reality?”
So the system is efficient, but vulnerable.
4) Supernormal Stimuli: Stronger Signals Hijack the System
Behavioral biology describes a critical effect: supernormal stimulus.
When artificial stimuli are stronger than natural ones along reward-relevant dimensions, the brain often prioritizes the artificial option.
Examples:
- Pornographic novelty intensity > ordinary relational cues
- Sugar-fat combinations > natural food profiles
- Variable-ratio game rewards > delayed real-world payoff loops
- High-frequency social feedback metrics > slow, embodied social trust
The system is not selecting “more real” signals.
It is selecting stronger reward-coded signals.
5) Why Evolution Did Not Build Perfect Anti-Cheat Protection
5.1 Evolution Is Local Optimization
Evolution optimizes for past environments where proxy cues were usually coupled with true outcomes.
It does not pre-adapt for every future synthetic exploit.
5.2 Perfect Anti-Cheat Is Computationally Expensive
A fully cheat-proof mind would require:
- High-fidelity world modeling
- Robust causal inference
- Constant reality-vs-simulation discrimination
For biological systems, that is extraordinarily costly.
5.3 Goodhart’s Law
When a measure becomes a target, it stops being a good measure.
Examples:
- Sexual drive can be captured by pornographic simulation
- Hunger can be captured by junk-food reward engineering
- Social approval can be captured by metricized likes
Proxy optimization drifts away from true objective fulfillment.
6) Why We Can Know Something Is Fake Yet Still Feel Satisfied
The human brain is layered.
A useful simplification:
System 1 (lower-layer fast process)
- Fast
- Affective
- Automatic reward-driven
System 2 (higher-layer deliberative process)
- Reflective
- Analytical
- Normative reasoning
The key constraint is:
Higher-level cognition cannot fully override lower-level reward circuits in real time.
So a familiar split appears:
- Rationally: “I know this is synthetic”
- Experientially: “It still feels rewarding”
Both can be true at once.
7) Does This Cause Drift from Genetic Fitness?
Individual level
Yes. Potential outcomes include:
- Reduced reproduction
- Dependence on virtual reward loops
- Behavior detached from long-run survival optimization
Population level
Selection pressure still exists, but:
Cultural and technological change now moves much faster than genetic adaptation.
So mismatch can persist for long periods.
8) A Compact Abstract Model
Genetic fitness pressure
→ Behavioral drives
→ Reward scoring (dopaminergic and related systems)
→ Action selection
The critical vulnerability:
The scoring layer relies on manipulable proxy signals.
9) Key Insight
The deeper issue is not simply that humans “get tricked.”
It is that the architecture was never built to evaluate ultimate truth directly.
It was built to react to signal intensity correlated with adaptive value in ancestral contexts.
That correlation can now be industrially exploited.
10) Broader Implications
This structure is not only human. Similar dynamics appear in:
- AI reward hacking
- Goodhart failure modes
- Objective-function misalignment
At a deeper level, humans and AI share a structural vulnerability:
Optimizers can be exploited when proxy signals are easier to maximize than true goals are to fulfill.
One-Sentence Summary
Humans are not systems that directly optimize real goals; we are systems that optimize signals that look like real goals.