kylec | 015: Reward and Punishment

It strikes me that one of the reasons it took me so long to understand what generosity really is and why it works from a natural systems perspective is that it’s hard to see how people are rewarded or punished within the system for their actions—especially for a child. If I get a Christmas present, that looks suspiciously like a reward, so I must be doing fine. No need to give, I just lost time and money. Meanwhile all around me, I can see jerks thriving and nice people getting shafted. Blind Luck meddles both ways, and Lady Justice peeks out from under the blindfold.

The problem is that man’s notions of justice—of reward and punishment—differ sharply from nature’s. Many Americans’ concept of fairness is that of a vending machine: press whatever button you think you deserve and have it plop into the tray. This causes quite a bit of grief when your just desserts get stuck, or someone else happens to get two bags of it. We have to remember that “justice” is an abstract concept, invented by us, and nature* is under no obligation to cooperate. Our notions of reward and punishment are mechanical. They follow straight lines to immediate dispensation, the same kind every time.

Nature’s system of rewards and punishment is organic. It is living, adaptive, self-correcting, and incorporates many complex elements, giving and receiving feedback between both parallel and nested systems (i.e. of a higher order). Our rules matter only as long as they don’t contradict natural laws, at which point they get wedged into a heavy boot tread. If we can figure out how those higher order laws work, how nature rewards and punishes and why, we can get a better sense of how they might work in our own lives, given that despite our best efforts we are still part of nature and subjects to natural law.

*in this essay I use this word to mean the sum of all events in the world, to include all natural systems

Justice for Madmen
Our rewards and punishments are decided and dispensed by human arbitrators: judges, committees, academies of motion pictures, and parents. This is top-down justice, and the ones who aren’t corrupt really do try their best. But individuals and small groups are on a parallel, or nearby, order of magnitude as the recipient in terms of mind. None of them claim to be infallible. They deal with limited information, and have a limited capacity to imagine the second and third order consequences of their choices.

They’re also remarkably inconsistent. While our justice systems deal only in punishments (I’ve never received a check on my windshield for parking really well), they make a good example of the problem in that they differ substantially according to time and place. The laws in one state are not the laws of another. What’s legal in the good ol’ US of A might be criminal in Pakistan. And what’s legal in a place in the year 2020 was probably illegal in the past, or carried a different sentence, and will change again in the future. Two different sets of parents on the same street teach their children different behavioral standards with differing methods. That alone should illustrate the arbitrary nature of the way we punish and reward one another. How can it be fair if no one can even agree what fair is? When I hear of people speaking of things that others deserve, or don’t deserve, all they’re really doing is signaling to their peers what kind of dictator they would make.

Our notions are fashion, at best. That doesn’t mean they don’t work. They may well be skillful reactions to circumstances that produce desired results (the desires being the judge’s). It also means that they are not now nor will they ever be absolute truths, appropriate in all situations, societies, and ages across the globe and human history.

Human justice systems isolate elements. They attempt to strip the actions that they punish or reward down to their bare essentials, removing most if not all context. This is essential, because it’s hard for a parent or a judge to take into account literally all information about the situation, the persons involved, their life histories, etc. Every decision would drown in complexity that our brains can’t handle. So if, for example, someone commits murder, we consider some of the circumstances, but try not to think of their gender and race, or what they had for lunch, while bringing context we deem relevant such as family history of violence and witnesses to good character when considering a sentence. It’s a messy process, and decisions about what to isolate and consider can have dramatic effects on the outcome. The contextual information is often less important than a comparison to similar cases and precedents for how they were handled. Who the victim was isn’t supposed to matter, either. A killing is a killing. Higher order abstractions like the crime, categories of premeditation, typical sentencing are more important.

Contrast that with another man-made system of the past: the weregild. Literally, “man gold,” it means the price that the convicted must pay in restitution based on the victim. It’s common in societies without mature and formal justice systems. A fixed value might be assigned based on the victim’s social standing, or it might be negotiated between the killer and the victim’s family. It sounds brutish, but in the absence of a prison system, and when losing a productive family member could mean struggling to put food on the table, it was often a sensible solution. It’s important to note that instead of isolating as many elements as possible, the entire context was factored into the decision. It was “unfair” in that two different murders in the same society may end in drastically different penalties based on who was involved. That’s unfair when justice is absolute, but when more elements of the system contribute feedbacks, what we find is not an absence of justice, but a contextually-sensitive and relative justice.

If I killed a man in 10th century Iceland, who that man was mattered a great deal. If he was a drifter and a scoundrel known for causing problems, I might even get off with a shrug of the shoulders from the community. If we was a beloved individual with a large family, that wouldn’t be the case. Failure to agree on a weregild would probably mean outlawry and death, and the price would have a suspicious correlation with how many people were willing to take up arms against me if the terms felt unjust. In other words, the context of my actions within the community and the harm it felt would result in a slightly-more bottom-up dispensation of punishment. A rich man might seem to “get away with it,” but he would have to replace the victim’s productive value in the eyes of his kin, and if he killed someone else, his previous history might result in people being unwilling to accept any price but outlawry.

This isn’t an argument for the virtues of the weregild. Only an example of the extent to which context plays a role in punishment and reward, how it can differ over time and place, and whether “fair” means in accordance with an abstract principle, or a more complex system going through a series of feedbacks that returns it to a homeostasis. A 10th century Icelander might be equally appalled that a man be charged with manslaughter or 2nd degree murder for defending himself with lethal force, or that we lock petty criminals in massive prisons.

The All-Inclusive Package
In nature, things we might consider punishments or rewards are decided by complex feedback within whole systems, and consequences are also dispensed within those systems. The punishments and rewards are built-in to the system, not paired with an action and carried out by some agent. What goes up, must come down, and what goes up very far goes splat. But what’s punished isn’t the high leap, it’s the ignorance and arrogance of it—the failure to observe gravity, and to consider one’s own vulnerability.

The notions of fairness are less fickle over time and space, because they’re governed by natural laws at every order of magnitude, gravity being an example of a very high one. There are also lower forces like weather, ecosystem balance, the biological limitations of species compared to other species, and the rules enforced within species, like ours. My actions take place in the context of my species’ rules, which take place in the context of the genetic capacities of the species, which take place in the context of a local biome, etc. Context is always critical, and when a lower context fails to map to a higher one, you get something like cognitive dissonance, where a single action can earn reward at one level, and punishment at the next. This contextual dissonance is the essence of Gregory Bateson’s double bind theory of schizophrenia, and one day I might just have to write about why context-blindness is literally madness.

Reward
Human rewards suffer from a poverty of imagination. While I know this is not true everywhere, or even of everyone here in modern America, broadly speaking, people tend to things of rewards in terms of fortune and fame (which can just as easily bring misery and ruin lives, or end them), or in terms of comforts (continuation of present conditions/ease of homeostasis).

We want to be rewarded immediately. The action induces a reward, and each reward is assigned neatly to an action. In a sense, it seems actions are done for rewards, as opposed to for their intrinsic value. Literal rewards are great for training dogs and children to behave a certain way. Puppies and kids can’t earn enough caloric energy to stay alive on their own, so adults provide it for them under conditions, until they can earn it for themselves. This process mimics the way that certain classes of actions tend to lead to sustenance as adults. Meaning sometimes, the chips get stuck in the vending machine even if you’re a really sweet person who paid full price.

For grown-ups to continue to expect such a mechanical reward system is silly. So if mature actions aren’t performed for doggie treats, what do we hope to get out of them? Information. Especially information on future actions. How often does the action bring pleasure, and how much? How often does the action bring pain, and how much? We don’t need consistency or absolutes. From observing the feedback of our actions, we learn to build classes of actions that bring us steak dinners and avoid spontaneous amputations. Cheap info is best. Pain avoidance is always a priority over pleasure seeking, because a neutral result leaves you more or less in the same position ready to try something else. Small failures—ones with tolerable pain associated—are as good as rewards, because we learn what not to do.

Imagine you awake on the floor in a pitch black room and you don’t know where you are, how you got there, or anything else about the situation. It could be said that you lack “information” (differences that make a difference). You could be surrounded by a moat of crocodiles, or a buffet table. Perhaps you’re here to be murdered, or you simply passed out and a nice person brought you somewhere safe to sleep it off. What would you do once you finished sobbing?

If you leap to your feet and sprint screaming in a random direction, you might happen to pick the hallway, and a trajectory for the exit, but you might also find the moat, or spill the food, or run face-first into a wall. A thoughtful person might first feel the ground to determine whether they’re indoors or out, and if the floor is marble or shag carpet (the preference of serial killers). Anything to get a piece of information to act on. Then she might stand and extend her hands slowly into the dark on all sides. Finding nothing, she might take a careful step and do it again, and again, until finally she contacts a wall. Then feel along the wall, maybe listening for sounds and sniffing for smells that might offer clues.

The only thing like a reward in this scenario is more information to act again. The only thing like a punishment might be a hand that fails to extend, i.e. hits a wall, providing more information at very little cost. Gradually, she builds a map of her surroundings, finds moulding, then a wooden door, a knob, and a lighted room in the home of a friend who was apparently not thoughtful enough to place her on a couch.

Reward and punishment blur together to form “information-seeking,” in which even the absence of new information is a type of information (as when the hands fail to contact anything, teaching her that she is not within arm’s-length of a wall). There is no reward that absolves anyone of the obligation to continue seeking information, and the only thing that prevents continuing is death, though we also prefer to avoid amputation and keep our information-seeking as low-cost as possible.

Children who are too young to have developed language skills instinctively grasp this system. They start by literally feeling around in the dark, then testing their own movements, their manipulation of objects, followed by testing how they should interact with the world and others by seeking approval and avoiding unpleasant experiences, adapting to that information, and testing again. They’re learning at all orders of magnitude: from parents, to society, to the hard lessons of gravity.

Where some go astray is to mistake the mechanical reward system employed by their community for the way things work at all levels. It’s hard to think of an example of another species that misleads itself like that. A hawk who misses a rabbit doesn’t hop along the ground crying about how much he deserved it. He goes right back up to circle again. And when he succeeds, the only reward is a full belly and the obligation to do it again tomorrow. In nature, reward is often little more than the absence of punishment. The fact that you’re still living.

Luck
Luck is the great thorn in the flip-flop of Man. A well-taken action that nearly-always paid off in the past can come up empty, or even result in catastrophe. No idget would put a dollar in a vending machine if what they got and whether or not they got it depended on unpredictable factors outside their control. (Just as I typed that sentence I remembered “slot machines”). Those of us who obsess over fairness must despise luck, like a disease to be vaccinated against. All efforts to ensure people are punished and rewarded fairly fall victim to it in some way, the most fundamental level being the circumstances of birth. We want the same actions to have the same results every time, for everyone. (These actions are only the “same” stripped of context and considered as pure abstract objects).

I’ll call luck the confluence of unpredictable forces as they manifest in our lives to boost (good luck) or backhand (bad luck). It’s always part of the equation, because there will always be things beyond out ability to predict, and even more beyond our ability to fathom at all. To admit it’s a factor isn’t to say it’s the only factor, though. If I do everything by the book in planting an orange tree, it might still succumb to a disease that sweeps through the region next year. But if I planted a whole grove, even if more than half are wiped out, I’ll still have my OJ and the means by which to replant from hardy stock. Was that a punishment, or a reward? In nature, they’re not opposites. They’re perspectives. Punishments and rewards are value judgments we place on information. The simple avoidance of punishment can be seen as a reward. So can being harmed to a lesser degree than your peers, some of whom lost 100% of their trees. Relative difference matters.

Again, the rewards and punishments are built-in, and they’re non-linear. Those who think and act well on information collect many opportunities to be rewarded. When some of them inevitably pay off, people will call it “luck.” You could instead call it “antifragility” in Nassim Nicholas Taleb’s terms. Consistent skillful action is a hedge against bad luck, and consistently terrible action is a hedge against good luck (“fragility” in Taleb’s terms). The former will be subject to very few forces that can ruin him, and the latter will protect himself against kindness and good fortune. Those who waffle between sensitive and insensitive actions will be flipping a quarter and calling it in the air.

When a fisherman adds to his trotline, he doesn’t expect to find a fish on every single one, but knows the mores lines he sets over the greater distance, the more likely he is to have fried catfish for supper.

Luck is essential in nature because it’s basically all the different forces of nature operating according to their own natures and interacting in surprising ways. The randomness clears out systems that are fragile, and brings novel conditions to survivors. Novelty is essential for evolution. A system without novelty is a closed system, incapable of change, vulnerable to entropy.

To complicate matters, we’re not only subject to those sensitive or insensitive actions that we take, but also those that happen on the level of higher orders to which we belong (family, people at work, the city of Meridian, Mississippi, America, humans, etc.). Someone who acts skillfully but was born in the wrong place or fell into a job involving careless coworkers and high explosives has the potential punishments and rewards of those orders built into his circumstances. Avoiding the blast, he might then lose his job due to an economic recession. If he’s smart, he can spot and hedge against a lot of fragilities, or he may decide to leave and join some other group that makes better decisions. This is essentially what a refugee is doing—avoiding the collective fallout. If ultimately he resettles on the coast of Oregon, and is swallowed by a tsunami when the Cascadia fault finally gives, we might say that bad luck followed him. But even that’s just the result of a communal decision, very long ago, to establish a major settlement in an area prone to catastrophic tsunamis, and the subsequent decision not to relocate when that information came to light. Our poor guy wasn’t being punished for evil, just as he wasn’t being rewarded for good. He just experienced circumstances that followed from his—and others’—success or failure to observe the laws of nature.

What goes up must come down.

Failure to Observe
Attempting to game the system by fixing the rules on the lower order of magnitude in your favor may have a small initial payoff, but in the long-term actions are still subject to the slow, patient, non-linear feedback of higher order systems. Those whose mothers didn’t have to dig their participation trophies out of the trash will have to face the same higher order hardships as others, but will be ill-prepared by comparison. Mechanical reward and punishment are used in early development so that children can associate the actions that earned them. Action and consequence need to be tightly coupled, because the young mind can’t yet handle the abstract thinking required to see how things help or harm in the long run. So a candy or a switch makes an effective guide, until individual actions are learned, then understood as classes of action that set up favorable future conditions.

Unearned rewards train people to expect rewards without earning them. It might be necessary to initially reward the most basic example of the act, but as time goes on, standards rise and rewards come less frequently, for example only every third time the standard is met, then every tenth, until an individual understands which classes of action lead to what, and why that makes the actions intrinsically valuable, even if at times there’s pain involved.

On the flip side, unearned punishments can train people to be risk-averse, to assume minor acts will be associated with great suffering, and so to avoid most information-seeking due to past trauma and an inability to accurately assess situations. When immediate-term pain-avoidance becomes a priority, many potential long-term benefits are forfeited. That’s a natural reaction. If you were in excruciating pain that you knew would pass in a day or so, and were offered either a dose of morphine now, or a brand new Tesla tomorrow, many reasonable people would choose the painkiller. A healthy attitude is to identify and seek small pain, small failures, and avoid only the ruinous ones. Avoiding any semblance of pain means the best that can be hoped for is to continue in the current situation, which will degrade over time under the stress of entropy.

Those motivated most strongly by reward are seeking comfort and the ease of homeostasis. But all life requires energy, so energy must still come in from elsewhere, and at a greater rate than it’s spent by the comfort-seeker. It could be gotten with hard work, but that isn’t comfortable. Hard work involves seeking information and spending energy. The greatest rewards tend to be the ones that required the greatest discomfort.

From the standpoint of flow more energy coming in than going out is an accumulation, and what can’t be stored is lost as waste or soon turns stagnant. Since there are no free lunches according to the laws of thermodynamics, that energy probably came from others’ labor. It’s not hard to see how the seeking of unearned rewards beyond the very early, find-your-feet stages of learning is parasitic to the system that provides them.

Seeing is Believing
We are obsessed with being able to see the consequences. If a criminal isn’t showering in prison, we think he got off scot-free, but that’s just on first-order consequence from a manmade system. We don’t know what unseen misery he endures on a daily basis, or endured his entire life before it lead to the crime. The well-paid upper management suit scowling on his way to his 300th meeting of the quarter might not be as happy as the poor Cajun fisherman whistling down his trotline.

Or on the reward side, we get up in arms when the right movie star fails to win a little gold statue, missing the point that the reward of a popular performance is built into her reputation, paycheck, connections made, and ease of finding future work.

First-order cause and effect is useful when shooting pool. Recall that minds use collateral energy , though—the reaction might involve far less or far more energy than the thing that triggered it. Second-order and higher consequences are better thought of as relational than causal. An action, or many, sets up certain conditions. Many different elements in the system interact, transforming the input, and a new condition arises that is related to the old one. Fertile soil, good sun, and plenty of rainfall provide conditions under which we can expect plants to flourish, but they aren’t the cause, nor do they determine how many of what type of seeds happen to fall where. They provide a certain context, in which a certain range of specifics can be expressed. Maybe it’s too wet for desert plants, not wet enough for tropical ones, while those who prefer a Mediterranean climate do well if they can find a way into the soil. Likewise, the seed isn’t the cause of the tree. It doesn’t grow from shag carpet. It’s a relationship between the seed and certain conditions that brings about specific growth, in its own time.

Misalignment
When our notions of fairness don’t align with higher order realities, we can run into situations where the same action that earns a reward on one level earns harm at the next highest level (e.g. the participation trophy, more specifically the complacency it teaches). Or small failures and criticism from peers on the human level can teach perseverance and result in the person outlasting all of them to lead a happy life.

So where do these adult habits of mechanical reward-seeking and punishment-avoidance come from? In large part, they’re a holdover of a system of behavior training in early childhood that’s very useful at the time. The key is being properly weaned and made into an independent adult who can think and act on the second order, beyond the immediate and the abstractly human. Otherwise, you get a 200-lb entitled child.

I’m sure there are a variety of other factors. I suspect one might be the habits of thought that come with a stable, regulated fiat money economy. All goods and services are valued exactly, and in the same monetary terms. Payment and delivery happen in a timely manner. You don’t drop off $20 at the grocery store today, leave with nothing, and maybe later collect $40 of goods a week later if that’s what you happen to need. You give it to the clerk for specific items valued at $20 and take them home. Seems obvious, but past societies haven’t always done things that way.

Lacking a stable currency, some system of barter or mutual support would evolve. Persons A, B, and C have different skills that are valuable to the community. Maybe carpentry, farming, and punching jerks in the face. All of them would likely be self-sufficient to a large degree, and their specialized needs would be met by other members in the community. That doesn’t mean that Person C would be unable to get help building a house if Person A doesn’t need any faces punched. Members of the community would serve whichever upstanding neighbors had needs, as those needs arose, knowing that all members of the community would also do so for them. Person A might end up working a lot harder, a lot more often, but he wouldn’t whine about being overworked and underpaid, because his value means his needs are constantly being met, and his lack of relative need may even be a point of pride.

Even in the very recent past—the early half of the 20th century when paper money abounded—people had stronger family and community ties, which made it less difficult for them to imagine how punishment or reward might come about in nonlinear fashion over time, so they helped out more not because they expected compensation, but for the intrinsic value of the act. A century later, it’s more common for people to feel entitled to help from the community, but fail to contribute value or do the hard work of maintaining the strong community ties that build those mutual support networks.

Generosity
Putting energy out in an intelligent fashion tends to result in energy received, as with the fisherman and his trotline. Had he spent all his energy hanging his lines from the tops of trees in a great pine forest, he would starve (too much out, stupidly, not enough back). Failing to put out lines and demanding to be fed by everyone else, and screaming and shaking his fist when they refused, would be akin to setting a trotline of misery that’s sure to hit eventually. The best situation is flow: the near match of energy out and in, spreading the excess wisely throughout the system (which the recipients might call “luck”), and getting back enough to wake up the next morning and do it again. He shouldn’t expect to receive a reward (a survival boon) without first improving the survival conditions of others.

The Fates
Mechanical reward and punishment are useful for training children. Adults should orient toward seeking information, preferably at little or no cost to themselves, but always at little or no cost to others. It’s a mistake to get caught zoomed-in, seeing only the local and immediate consequences. We must always consider the contexts and the second-order consequences.

The Fates weave with threads of our own actions, as well as the actions of systems to which we belong, all pitted against other individuals and systems, with a healthy dose of luck thrown in. That means they’re not entirely in our control, nor entirely determined. Rather, we’re positioned at the eye of the needle, feeling our way through it all, and what we do with the information we get goes a long way toward deciding where and how it passes next.

015: Reward and Punishment

no subject