SPEAKER 1: OK, welcome everybody. Thanks for coming to our final lectures of the series. And as we always, do it's the best lecture, the last. It's our very own Joe Halpern.
Joe has a remarkable productive profile. He started out as an applied mathematician, then joined computer science. But he has worked in areas of economics, in psychology and philosophy, and computer sciences. He's contributed to several areas-- networking, security, artificial intelligence.
So he really spans the spectrum of [? all ?] scholarly activity. So that's amazing. And he will actually today bring us some of the technical work. So we've heard many talks about ethics and the challenges of AI and the future of AI and the challenges of how to give ethics and how to have AI systems behave ethically. Joe is one of the people who's actually spearheading this field, and will lay out hopefully an agenda that will make these machines behave properly.
So when I look at Joe's little bio, I guess, I realized he writes more books than most faculty members write papers. He wins more awards than most faculty [INAUDIBLE] get citations. So that's about as good as it gets. So let's welcome Joe.
JOE HALPERN: I feel like I should lower expectations.
So having sat through all the other wonderful talks we've had so far, and there were just wonderful videos and little kids hugging robots. And this is like a technical talk. So I'll try to make it as accessible as possible.
But the point of this talk is that it's hard to define things carefully. And to get into that, you to look at the definitions. But let me give you some background.
So what exactly is moral responsibility and intention? So when I started looking at this, it's frightening. You do a web search on Amazon, and just put in "moral responsibility." So how many books have "moral responsibility" in the title? And I gave up when I got to 50.
You go to the Cornell library, and there's shelves of books that have things like "moral responsibility" in the title. So people have looked at this a lot. But what they haven't done very much is actually try to define these notions.
So I'm not a philosopher. But occasionally, I have to read philosophy papers. And the trouble with reading them, in some cases, I feel like as I read from paragraph to paragraph, the definition is shifting under my feet.
What it meant in paragraph one is not quite what it means in paragraph three. And the trouble with that is, if we're going to build computers and we want them to be ethical, whatever that means, we're going to have to build into the computer some notion of what it means to be responsible. We have to have a precise definition. You can't be fuzzy when you're writing code.
So the point of this exercise is to try to be a bit more formal. And it's hard. In the process of writing this paper, so far I've come up with three definitions, each of which I thought was the right one and discarded for the next one.
I've been through this game before. And the real trouble is that it's very hard to prove a theorem saying you have the right definition. So it is slippery. Let me just give you a sense of what's going on.
But [? I ?] [? say, ?] why do we care? So we're building autonomous agents. They're going to have to make moral judgments, and talk about this later.
Germany recently proposed a code for driverless cars. The proposal-- they didn't pass it yet, but they're actually starting to talk about legislation. The proposal specified among the things that a driverless car should always opt if, it has a choice between injuring people or damaging property, it should damage property, not injure people. Which seems very reasonable on the face of it.
But there is this slight problem. Suppose now we're building an autonomous vehicle that has lots of data. It can calculate probabilities, or at least frequencies. And it says the probability of $100,000 property damage is 0.999 and the probability of minor injury is 0.001.
How does it make that tradeoff? I mean, if you just read this, that you always prefer property damage over personal injury-- the trouble is in the real world, it's not a choice between definitely having property damage and definitely having personal injury. These things are attached with likelihoods, then you have to make tradeoffs. But the law isn't very good about making trade, and this particular law didn't discuss tradeoffs at all.
So if you think this is just a minor issue, think about, you're building in a timeless vehicle, and you're caught in traffic and you want to pass the car in front of you. Well, passing the car in front of you entails a very small risk, that there might be a car coming the other way that you don't notice and you have an accident. These things happen when you pass.
So if you're not going to take probabilities into account, you'll never pass any car in front of you. I don't know that we want our autonomous vehicles to drive that safely. So this is an issue. People have been thinking about this for a long time.
And those of you who are avid science fiction-- I was an avid science fiction reader in my youth. I read all of Asimov's books. And Isaac Asimov had his Three Laws of Robotics. And they go like this.
Number one is, "A robot may not injure a human being or, through inaction, allow a human being to come to harm. A robot must obey orders given it by human beings, except where such orders would conflict with the First Law. And, "A robot must protect its own existence, as long as such protection does not conflict with" the first two laws.
So the trouble is that it's really hard to apply these laws. And in fact, most of Asimov's short stories in robotics were about the conflicts when it came to applying these laws because there were tradeoffs. We'll see, in fact, some examples later.
What if preventing harm to one person causes harm to another? How do you apply the First Law? You don't want to allow anybody to come to harm. But whatever you do, there's a chance that somebody will come to harm. You just have to trade off who it's going to be, and maybe how much harm it's going to be.
We get moral issues like-- we are in a situation. We're building robots that will help elderly people. This is a big deal in Japan. Now they have a rapidly aging population. They don't have enough caretakers for the rapidly aging population, so they're very interested in having robots assist.
Well, suppose an elderly patient wants to commit suicide. Is it appropriate for the robot to assist her? That's clearly an ethical issue. So these are the kinds of things we're going to have to design code to deal with.
I should just add by way of passing, in one of his later novels, Asimov introduced a zero flaw. That, "A robot may not harm humanity, or, by inaction, allow humanity to come to harm." I think it's great that he did that, because as we heard in some of our earlier talks, there's this concern that robots will get to the point where they're super-intelligent, and things that the robot could do could indeed harm humanity. We're not talking about individual people. But that's not the point of this talk.
So let me-- how many people have not heard of the trolley problem? A few of you haven't. Most of you have, it sounds like. So this is getting a lot of press.
So this is a problem that goes back to Philippa Foot, a philosopher at Oxford and Cambridge. She moved from one to the other, I believe. And then examined more carefully by Judith Thompson, who was at Harvard, other philosophers and many, many, many other people since. There's whole books written about the trolley problem, which are actually fun to read.
So it goes like this. Suppose a runaway trolley is heading down the tracks. There are five people tied up on the track who can't move, and the trolley is going to run them over and kill them if it continues.
You can't stop the trolley, but you've got a lever in front of you. You can shift the lever. And instead of going down the main track, it will go down the side track.
The bad news is, down the side track, there is one person who's tied up there, and that person will get killed if you pull that lever. So what should you do? That's the trolley problem. And what's your degree of moral responsibility for the outcome if you do or you don't pull the lever? So this is a problem that's caught the attention of a lot of people.
While we're at it, there are a number of variants to this problem. How would you feel? Would you feel different if instead of just moving the lever to the right, you actually-- there was a fat man sitting on top of a bridge. And the way you got to stop the train is you pushed the fat man over.
Now, it's the same problem. Either one person dies. So the fat man, being fat, will stop the train. That's the point. And so if you push the fat man over, one person will die. If you don't, five people will die.
So structurally, it's the same as this. Do you feel differently about it? Do you feel differently about pushing the fat man than you would about shifting the lever? People do.
I'm not saying if there's a right answer or a wrong answer. I'm pointing out that. And if you feel differently, why?
AUDIENCE: [INAUDIBLE] what if there's a 90% chance [INAUDIBLE]?
JOE HALPERN: Don't confuse me with probabilities.
Let's look at a more modern version of the problem. This actually appeared in a paper in Science last year-- less than a year ago, actually. So it's called "The Social Dilemma of Autonomous Vehicles."
It's the title of the paper. "Should an autonomous vehicle swerve and kill its passenger when otherwise it would kill five pedestrians?" So again, structurally the same.
Now, what was very interesting about this paper, these guys are not philosophers. They're computer scientists and, to some extent, social psychologists. So they actually asked people what they would do. And it turns out that people thought it should swerve and kill its passenger rather than killing the five pedestrians. They thought that was the right thing to do.
But they wouldn't buy a vehicle that was equipped this way. So they want other people to buy that. They don't want to buy it themselves. My car should protect me. Is that OK?
And notice, by the way, before we go on, this would be a conflict in Asimov's Laws of Robotics. That's why they're not so helpful. If you think of the car [? and ?] autonomous vehicle is a robot-- which it could be.
I mean, it's got intelligence, just because it's not humanoid, you should think of it as a robot. The laws say you shouldn't harm anybody. But here, you got to harm somebody. You just have a choice between harming one person or five. Yeah?
AUDIENCE: Right now, you're talking about the driverless cars being hacked, just like everything else is being hacked. And like I mentioned before at these lectures, the technology is moving quicker than the solutions for it. So we're putting out products before we could actually secure safety.
JOE HALPERN: So it's certainly true that security in these cars is a major issue that I'm certainly not going to talk about in this talk. It's yet another thing we should worry about. We're bringing out driverless cars. And for us as a society, there are going to be tradeoffs.
My sense is the driverless cars will probably save many, many more lives than they hurt, because I think the autonomous vehicles are ultimately going to be much safer than human drivers. But we definitely do run the risk of them being hacked, and people get hurt that way. So there are tradeoffs. There's no easy answers here.
So let me start talking a bit technically. And if you read the philosophy papers, there seems to be general agreement that the moral responsibility involves causality. Another tricky word there. Agent a can't be morally responsible for an outcome that he didn't cause, or it didn't cause.
We're talking about autonomous vehicles. So now you've reduced the problem-- you haven't reduced it. But part of the problem of defining moral responsible is defining causality. But let me come back to that in a second.
Another part of it is-- let me use the word "blameworthiness." To what extent is agent a to blame for the outcome? And what does that mean? Very roughly speaking, it means what could a have done to prevent the outcome from happening.
So to look at that, to explain blameworthiness, you have to look at the options that the agent had. So people do say, don't blame me. There's nothing I could have done about it.
We hear that in English. We want to make that precise. What are your options?
And the last thing that at least some people think matters for moral responsibility is intent. Did you want the outcome to happen, or was it an unintended byproduct of your real goal? So you could say, well, in the trolley problem, even though if you push the lever one person is going to die, you didn't intend for him to die. That was an unintended byproduct of your real goal, which was to save the five people.
But again, we want to make it precise. What does it mean that you intend something or don't intend something? I mean, I'd like to be able to say on the witness stand it's true that one person died.
But I didn't intend to kill him. I intended to save five people. That was my true goal. But what does that mean exactly?
So let me just say, as an aside, that not everybody agrees that intent is relevant when you're charging moral responsibility. It doesn't matter what you intended. All that matters is what actually happens. Your intent is [? irrelevant. ?] So again, we could think about that.
Let me start talking about the first point, which I could give quite easily a whole talk about, which is causality. The literature considers two flavors of causality. There's what's called type causality. That's like smoking causes cancer. Statisticians look at that a lot.
Causality is not correlation. So we want to say smoking that-- we know that smoking is correlated with cancer. But we want to say more, that it is a cause. What does that mean exactly?
The other kind of causality, if you like, is token, or actual, causality. And that's the fact that Willard smoked for 30 years caused him to get cancer. So token causality looks at a specific person, rather than general statistically smoking causes cancer. So for moral responsibility, we care about token causality, because we're typically looking at are you responsible for this outcome. So we're talking about an individual person.
To give you a sense of difference, I can say something like, well, it's true it was pouring rain last night, and it's true I was drunk. But the cause of the accident was the faulty brakes, which is why I'm suing GM. So notice, in this case, this is token causality. I'm saying in this particular case, the cause of the accident was the faulty brakes. As far as type causality goes, we know in general that lousy weather causes accidents.
Like tonight, there are probably more accidents in there were the night before. Drunk driving certainly causes accidents. But I'm trying to say, in this particular case, those weren't the causes. So although we have type causality, that's not the actual cause in this case. What exactly does that mean?
So the intuition that people have used to capture token causality depends on what are called counterfactual-- statements that are counter to fact. Now, it sounds like a big word. But I claim you're all used to thinking this way. The idea is to say that A is a cause of B if, had A not happened, B wouldn't have happened. That sounds like a lot.
But again, think about the drunk driving example. I want to say it's not the case that my drunk driving was the cause of the accident. Even if I hadn't been drunk, I still would have had the accident.
It's not the lousy weather that was the cause of the accident. Even if the weather had been great, I still would have had the accident. It's the faulty brakes that are the cause of the accident, because if the brakes hadn't been faulty, I wouldn't have had the accident. I drive great when I'm drunk.
So that's where the counterfactual comes in. And that's the kind of reasoning you have to do for causality. You have to think about what would have happened had this not happened. And in fact, in the law, the notion of causality is-- causality comes up all the time in the law. And the notion of causality used in law is exactly this notion of but-for causality.
So you talk to lawyers, they will tell you about but-for causality. The trouble is, lawyers also know the but-for causality is not enough. It doesn't work. So let me give you an example of where it doesn't work. This is an example.
[? Due to a ?] a philosopher named David Lewis, but there's lots and lots of examples in the spirit. So Suzy and Billy are both throwing rocks at a bottle. They're both great shots. They both want to hit the bottle.
But in fact, Suzy threw a little bit harder, or threw a few milliseconds earlier. Doesn't matter. So it was her rock that hit the bottle, and the bottle shattered.
Now, most of us, I think, would say that Suzy is a cause of the bottle shattering, and not Billy. Despite the fact that Billy threw, his rock didn't hit the bottle. So he's not the cause. Are we together?
OK, the trouble is, if you think about the standard but-for definition, the one they use in the law, had Suzy not thrown, the bottle still would have shattered. It doesn't work. Unlike the case with the driving, where I say, well, had the brakes not been faulty I wouldn't have had the accident. It's the brakes.
Here it's not the case. I want to say suicide is the cause, but she's not a but-for cause. Had Suzy not thrown, the bottle would've shattered anyway. Billy would have shattered it. Does that make sense?
So getting a good, formal definition of causality is hard, because you have to be able to deal with examples like this. And trust me, the philosophy, literature has hundreds. Every time somebody tries to propose a definition, a bunch of philosophers get to work, write a paper saying that won't work.
Here's an example. I speak from experience, having been on the other end. Lawyers have it even better. All the examples that the philosophers come up with, there is case law.
And we can't say so-and-so shot somebody else. Somebody poisoned them and somebody else shot him. He died because of the shot. This really happened. And we want to say the shot is the cause, not the poisoning.
And it's exactly this kind of thing, because even if the guy-- so I was at a workshop about a year and a half ago. I was, like, the token computer scientist, and there were all these philosophers and lawyers. And they were playing us a cassette tape of a Supreme Court case, where the justices were trying to decide-- it went all the way up the Supreme Court, where the justices were trying to decide causality.
And it was a case of a drug dealer-- someone who died of a drug overdose, but he had a very bad heart condition, and would have died a couple of days later. So they're trying to charge the guy who sold him the drugs with murder. But it wasn't but-for causality.
You can hear Justice Kagan saying, this isn't but-for causality. How do you explain this, counsel? So lawyers haven't come around.
I'm not going to talk about that. If you're interested, I could easily give a whole hour talk. But let me tell you the models that we use for this, because that will come. Up
So this is a bit of formalism. Bear with me. It's not bad.
So this is an idea due to Pearl, who's a very well known computer scientist. We want to think of the world as being described by a bunch of variables that affect. Chosen variables are things like I drove, yes or no. So the variable is whether or not I drove.
So it either has value 0, I didn't drive. Or I was drunk, yes or no. The car's brakes were faulty, yes or no. I mean, the variables can have a lot more value.
So I mean, things aren't always 0, 1. So think of variables that have some set of possible values, and the variables affect each other. So you could have another variable. I
Had an accident, yes or no. What are the things that affect whether or not I have an accident? Things like whether or not it's pouring rain or whether or not I'm drunk, and so on.
So Pearl suggested we have models, we have a bunch of variables, and they're related by equations. And it becomes useful in these models-- bear with me, it's not that bad, to split them up into two sets, or what we call the exogenous variables. Those are the ones-- it's a modeling question.
We [? can decide ?] these are the things we're going to take for granted. This is the way the world is. We can't affect these variables.
And then the endogenous variables are the ones that get affected by other variables that we can do something about. And then we have equations that relate the variables. And we have an equation for each variable that describes its value in terms of other values. The details don't matter here.
But let me give you an example of how these models are used, because this is going to come up. So imagine we have two arsonists who drop matches in different parts of a dry forest. Trees start to burn. And let's look at two scenarios.
One I'm going to call the disjunctive scenario. It takes only one match to burn down the forest. If I drop a match, just me, that starts the tree going. And then the whole forest blazes.
And the conjunctive scenario is where you take two matches to get the forest burning. One is not enough. It doesn't create a big enough fire. And we can describe these scenarios using a causal network.
So the nice thing about these structural equations is you can represent them using networks. And I'll explain this in a second. For those of you who know about Bayesian networks, who've taken, like, AI courses-- so Pearl was the person who did Bayesian networks. But then he moved from just modeling probability to modeling causality, but used the same kinds of models. Just an aside for people [? seeing ?] that.
So here's the idea. We have, in this case, four variables. Let's say ML1 is the first arsonist drops a match. It has value either 0 or 1. He either drops it or he doesn't.
ML2 is the second arsonist drops a match. FB stands for the forest burns down. Either it does or it doesn't. And the U at the top is an exogenous variable that determines whether or not these guys want to drop matches.
I don't know. They had an upbringing where they got into dropping matches. Or they were in a bad mood, and they just want to burn down the world because they're in a bad mood. So it's whatever it is that it affects their psychological mood that causes them to drop the match, yes or no.
But that's outside the scope of the model. There's just some external factors that make them decide to drop matches or not. And you can see, there's an arrow from U to ML1, U to ML2, and from ML1 and ML2 to FB. And what that's supposed to indicate is, the equation for ML2 just depends on you.
So technically, there's an equation for every variable, other than the ones at the top. There's no equation for the exogenous variable at the top. But there's an equation for ML1 and ML2 and FB, and it describes the value of that variable in terms of its parents in the graph, in terms of the nodes that are pointing to it in the graph. So something determines whether the first sum-- U determines whether or not the first arsonist wants to drop the match. U determines whether the second arsonist drops the match. That's outside the scope of the model.
The interesting thing is, the forest burns. So I said we have two models. One model is the disjunctive model that says whether or not the forest burns down, this is 1.
If either of these is 1-- for those of you [? not familiar, ?] this is an or just says this is 1. If either this is 1 or this is 1. That's just saying the forest burns down if at least one of the guys drops a match.
The conjunctive scenario says the forest burns down if both of them drop a match. This is an "and." So the forest burns down if the first arsonist drops a match and the second arsonist drops a match.
So that's the difference between the two models. They have just different equations for FB. Does that make sense? That's as technical as it's going to get. Greg.
AUDIENCE: [INAUDIBLE] Do you [? really mean ?] equations, or do you mean [INAUDIBLE]?
JOE HALPERN: I meant equations. I mean, you can think of them as definitions. So the key point of these models is they tell me the counterfactual. They tell me what would happen if the first guy didn't drop a match.
AUDIENCE: [INAUDIBLE] directionality.
JOE HALPERN: Yeah, so I actually had it in the previous slide. So think of this as an assignment statement, like in a program. So the fact, FB is assigned this value. But that doesn't mean that you can figure out this value knowing these two. So there's definitely a direction here.
So yeah, so the equality is misleading. It should be an assignment statement. So if that's what you meant, yes, you're absolutely right. I think it actually even-- yes. So that bottom line was supposed to say that, that there was directionality.
So I'm going to be mainly interested in these models. That's the models I'm going to use. So intuitively, the arrows in this model determine the flow of causality. So an arrow going this way says the value of U is causing intuitively-- I haven't defined cause, the value of ML1 and ML2, and the values of ML1 and ML2 are together determining the value of FB. And the causality goes down.
So I'm going to restrict attention in this talk to scenarios where the causal network is what computer scientists would call acyclic. There's no cycles. So you can't have A affecting B, which affects c which affects A. So if you think of these variables as being time stamped further down as later, we don't have reverse causality.
So that's all I'm going to say about causality here. I'm going to assume I have a definition of causality, because I want to get to other stuff. But I have a whole book on it if you're interested.
I haven't written that many books. But it one of the ones I've written. So that's part of it.
So suppose, let's say for the purpose of this talk, we have some formal definition of causality. I went through all that stuff before because I'm still going to use those causal networks, even for the other definition. So the definition is based on the causal networks. It tries to capture this intuition of counterfactuals.
So I should say before I leave it, these networks and these equations make it possible to describe the counterfactual. Because I can ask the question, what would have happened? So suppose I'm in a setting where both guys dropped matches. I can ask the question, what would have happened had this guy not dropped the match? Because I have the equations to tell me, and there's operations I can do on models.
And I can look at the model where both guys drop matches. Let's look at the model that would describe the situation had the first guy not dropped a match, or had the second guy not dropped a match, or had both of them not dropped the match. This framework will let us do that formally.
So it's not that a super hard up. But I would need another half an hour just explain that. So let me not do that. Pretend you get the general flavor.
But assume we have a definition of causality. That's only one of the ingredients we need. We want to model uncertainty.
So the whole point here is that, in general, before you perform an action, first of all, you might not know what the causal model is. You might not know how the world works. You do something, you don't know what's going to happen. If you turn left, who knows what's coming down the pike. And so you might not know the causal model, and you might not know the context.
So in terms of the arsonist example, you might not know if it needs two matches to get the forest burning or one. So you might not know if the conjunctive model or the disjunctive model is the right one. You also might not know if the other guy's dropping a match.
So in this framework, you have uncertainty about two things. You have uncertainty about what's the right causal model. How do the equations work? And you have uncertainty about what actually happened.
And the way we're going to model what actually happened is, what is the value of the exogenous variable. Because once you give me the value of the exogenous variable, everything else is determined. So let me say that slowly. I'm not sure I stressed it enough.
But back here, the whole point of having these acyclic networks is, once you tell me what the value is of the variable at the top, everything else flows down. And I can tell you the values of ML1 and ML2. Once I know ML1 and ML2, I know FB. The whole point is, with these acyclic networks, once you tell me what happens with the exogenous variables that determines everything else. You had a question.
AUDIENCE: So considering the two scenarios where the two matches are an and or an or function, and so in each case, person 2 drops the match, and in each case, the forest burns down. In one case, is the person somehow more guilty than in the other case?
JOE HALPERN: Well-- so that's a good question. That's part of what the definitions are intended to capture. So I would say, in both cases, both disjunctive and conjunctive, suppose both people drop matches.
So in the conjunctive case, you're a but-for cause. Had you not dropped the match, the forest wouldn't have burned down, according to definitions. If our definition is in the disjunctive case, you're also part of the cause. The two of you together, so to speak, are the cause, because you need to change both things to get a different outcome. I have a notion of degree of responsibility, and you're less responsible.
But the way it's going to play out here is, suppose you're trying to decide whether or not to drop the match. So you're looking at it before it's happened. And now you can say, OK, what's the probability that I'm going to make a difference? So let me get there.
So it's a good question. It turns out that there's a number of closely related notions of responsibility. That you could say-- let me try to answer the question a bit, and I'll get more formal.
So think about a vote where it's 11 to nothing versus 6 to 5. So somehow, you'd like to say in an 11 to nothing vote, OK, everybody who voted for the guy is responsible for the outcome. But somehow, you feel less responsible if it's 11 to nothing than if it's 6 to 5.
That's even after the fact. We voted, we look at what happened. And I can say, yeah, somehow it's less. So I have a definition of what's called degree of responsibility, which is how many things do you have to change before you become critical? It sounds complicated. It's not.
In the 6 to 5 case, you're already critical. If you flip your vote, that changes the outcome. So your degree of responsibility is basically 1. In the 11 to nothing case, you have to change five other votes before you become critical. Changing five votes will make it 6 to 5, and then you become critical.
So degree of responsibility is 1 over 5 plus 1. So in general, your degree of responsibility is 1 over the number of things you have to change plus one. Turns out to be a naive definition, but it turns out to match-- people have done experiments in this definition. It seems to match somewhat the way people ascribe responsibility.
That's one notion of responsibility. It's not quite the notion of moral responsibility. I'm getting there. But it just shows that the power of this framework lets you answer kinds of questions like that.
And in English, I think we have words like "blame" and "responsibility." And they're more or less interchangeable in English, but it turns out I'm going to argue there are two or three closely-related notions that are different, and we need to tease them apart. Actually, I not going to argue it so much in this talk. But I think that's the case.
So back to [INAUDIBLE]. So in general, as I said, an agent has uncertainty about what the causal model is. An agent has uncertainty about what actually happened. So I'm going assume, which is probably not reasonable, that you have a probability distribution on that uncertainty.
It's more reasonable when you think about autonomous vehicles or computer programs in general, because it is reasonable to assume they're getting data. They're getting lots and lots of data, and they can use that statistical information to get probabilities. With people, I think it's not quite so reasonable to assume that we're willing to assign numbers to the world. And I actually might have some time at the end to say more about that. But let me keep going.
So the whole point is that, because of this uncertainty, an agent doesn't know whether performing an action act will actually cause an outcome. You don't know what's going to happen when you do something, because you don't know the way the world works, or because you don't know what the other people are doing. But you can compute the probability that your action will cause something.
Essentially, you have a probability on models. In each model, you can figure out whether performing your action causes a certain outcome. So now, you can figure out the probability that your action will cause the outcome. Does that make some sense?
Again, you don't have to be an expert in probability here. All I'm saying is, you don't know the way the world works. You don't know what actually happened. So if I think that there's a chance that it's the disjunctive model, it takes two matches.
Now, what's the probability of my dropping a match, causing the forest to burn down? Well, it's exactly the probability that the other guy dropped a match if it's the disjunctive model. If it's the conjunctive model, then if I don't drop the match, then the forest won't burn down.
If I do, it will. I'm going to be a but-for cause. So I can compute the probability that my action will make a difference. I can compute the probability that all essentially be a but-for cause for the outcome, once I have a probability on models.
And the definition of degree of blameworthiness looks like a lot of words. The idea is really simple. So let me say in English, before you read the slide, I'm about to perform an action. Call it "act." I'm comparing it to a different action, act prime.
For each of those actions, I can say, well, how likely is it that that action will bring about a particular outcome? It could be that this one would bring it about with probability 2/3. That one would bring it about with probability 1/2. So I can look at the difference, to what extent is this action making the outcome more likely than that one? The difference, 2/3 minus 1/2, is 1/6.
So your degree of blameworthiness is the biggest difference. So you can say I'm to blame to degree 1/3 if I can make this outcome 1/3 less likely with probability of 1/3, compared to anything, compared to what I'm doing. What's the biggest difference? To what extent can I affect the outcome?
If you want to say, there's nothing I could have done, this outcome is going to happen no matter what I do, then your degree of blameworthiness is 0. It happens with probability 1 if I do action act. It'll happen with probability 1 if I do anything else.
Don't blame me. I couldn't have done anything about it. No matter what I did, the outcome would have happened. But if it happens with probability 1 with my action, but there's another action that would only make it happen with probability 1/2, then my degree of blameworthiness as 1/2. So degree of blameworthiness, it's got to be positive, because the [? lower bound of using ?] this action comparing act to itself gives you 0 in anything.
So the biggest difference is going to be a positive number. Your degree of blameworthiness is a number between 0 and 1. And think of the numbers representing the degree to which you could have made a difference in the outcome. [? Armand, ?] you had a question.
AUDIENCE: So in the 11/0 voting case, [INAUDIBLE] not to blame at all [INAUDIBLE]?
JOE HALPERN: So there, the 11 to 0, you'd have to have a probability. So in that case, you'd say we're looking at it before you've performed the action. So if you ascribe probability 1 to at least 6 other people voting, then indeed, there is nothing you could have done to affect the outcome.
Does everybody understand? So in the 11 to nothing case, after the fact, we can see the vote was, in fact, 11 to nothing. But before you voted, you didn't know it was going to be 11 to nothing. But you had, let's say, some probability of what the other people are going to do.
So you can ask, what's the probability that I could have made a difference? So if you're certain that at least seven other people or six other people are going to vote, no matter what you do, then your degree of blame is zero, and you can say it didn't matter what I did. That was going to be the outcome.
And this actually comes up in Senator cases. People are willing to vote for something if they know [INAUDIBLE] actually willing to vote against it if they know it doesn't make any difference, because they want to be able to look good to the folks at home-- I didn't vote for it. But I didn't vote for it because my party assured me that it was OK not to vote for it, because they already had enough votes. So it was OK. So people do think about things like that.
AUDIENCE: So in that case, the degree of blame is exactly the probability of 5?
JOE HALPERN: In the 11 to nothing case, yes, it would be a probability of 5. In general, it's more complicated. But yeah. So again, intuitively, the degree of blameworthiness measures the extent to which performing an action, other than the one you're contemplating doing, can affect the outcome.
Now, it sounds really straightforward. It's not quite so easy. And so let me point out some issues.
And one we've already pointed out, with the 11 to nothing vote, let's take it with the arsonist. To what extent is one of the arsonists to blame for the forest fire? Well, obviously, it depends on how likely the conjunctive case is versus a disjunctive case. You'll have a probability on that.
If it's conjunctive, for sure you're to blame, because obviously, you could have made a difference by not dropping your match. How likely is it the other arsonist dropped a match? But suppose each arsonist thinks that, with high probability, we're in a disjunctive scenario, so all you need is one match. And I'm also almost certain that the other guy's going to drop a match. So that says that each one has a low degree of blameworthiness, according to this definition.
And you might say, well, that's weird. So here, there's two arsonists. And both are maybe even correctly convinced that, almost certainly, the other guy is going to drop a match. So therefore, they both drop matches, and they both say, nothing I could have done. The other guy was going to do it.
And so the other thing I'm looking at now, but I'm just trying to point out-- that's why this stuff is not so easy. That between them, they were the cause of the fire, the two of them. And so although each individual has a low degree of blameworthiness, the group, the two of them together, have a very high degree of blameworthiness. And I think this is something we do in practice, that this comes up.
So how many people have heard of the tragedy of the commons? Most of you have. Let me just repeat it for those who haven't.
So let me take one version of it with overfishing. So you have a bunch of fishermen. And this is really happening now. That if they all overfish, there won't be enough fish left to reproduce, and no more fishing next year.
So obviously, if each of them is convinced that all the other guys are going to overfish, they might as well overfish, too, because their family needs to eat this year. So there's no point in cutting back, because if everybody else overfishes, next year, there's not going to be any fish, no matter what I do. But together, we could do something different.
So we can talk about, in this definition, what's the degree of responsibility of the group? The group has a degree of responsibility 1. Clearly they could have done something about it all together. And I think that we, as a society, think about the interplay between individual responsibility and group responsibility.
And again, I'm not answering the questions of what they're-- I'm not trying to say what is the right thing here. But I am trying to say that this framework lets us look at group responsibility, as well as individual responsibility. And that's something we need to think about. There was a question back there.
AUDIENCE: Yeah, I think the tragedy of the commons [INAUDIBLE] the arsonists think of that [INAUDIBLE]. It's not what you call an acyclic network.
JOE HALPERN: The network, in terms of causality, is acyclic. There's game-theoretic concerns about what their beliefs are. So when I look at the network, I'm taking out beliefs.
So you're right. I look in your eyes. You look at mine. And I'm pretty sure you're going to do it. You're pretty sure I'm going to do it.
And that is cyclic. I'm taking that out when I look at these networks. But I understand your point.
So let me just say I understand the point, and there are real issues when you bring up what are called game-theoretic concerns. That's a whole other layer when you're talking about societal things. It's just one of the many complications with it. So you do need to think about that.
So this model is going to be acyclic if you take out the beliefs. But once you add the beliefs, you're right. There's a whole other set of issues. Question here.
AUDIENCE: Can you define the cost of not cooperating as the difference between the degree of blameworthiness of the group and the total degree of [INAUDIBLE]?
JOE HALPERN: You could. So you could start to think about how do I bring the group into this. And again, there are lots of ways. There's more than one way of doing it. So let you not get into detail.
When I say that nobody has looked at this, I really mean this is-- I mean, despite all these books, there are really no formal definitions. And I feel like, personally, I'm groping my way in this thicket of trying to figure out what are the issues and how to think about them. I don't want to claim that I've solved all the problems here. So that's why the title said "in search of definitions," rather than trying to say, got it, there they are.
Let me point out one other-- so Paul, let me just do this, and I'll [? get to you. ?] Here's another subtlety. So suppose you've got a doctor who uses a drug to treat a patient, and the patient dies thanks to side effects due to the drug.
Now, the doctor can honestly say, I had no idea that there would be these adverse side effects. So as far as his probabilities are concerned, the probability of him giving the patient this treatment, the probably of the outcome being death was essentially zero. He had no idea. Suppose the doctor is even telling the truth.
But society might say, wait a minute. What if there were articles in leading medical journals about the adverse effects of the drug? So the law takes that into account.
I'm not a lawyer, but this is my understanding, that there is a difference between-- so you're not guilty of intentional murder. And so you don't have any criminal liability, but you have civil liability. The family will sue you, and they will win if they can point out that in the medical literature, there are all these articles saying this drug had adverse side effects.
So the way we capture that, in this framework, is the definition is relative to a degree, to a probability distribution. But I don't have to take the doctor's probability distribution. The courts will say you should have known. So the probability distribution that we should be using for civil cases is the distribution that a reasonable doctor would have had. A reasonable doctor who'd kept up with the literature and knew all this stuff would have known not to do this.
So we can talk about-- so let me just repeat, the notion of degree of blameworthiness is relative to a probability distribution. But it's not always the case that you want to take the doctor's actual probability distribution. You may want to take the distribution he should have, had he been keeping up with the literature.
And as I say, the law uses both distributions, depending on whether it's a criminal case or a civil case. So it's not that there's one right distribution to take. You might want to take those different distributions at different times. Paul, you had a question.
AUDIENCE: It was more just a comment. [INAUDIBLE] extreme formulation of your [INAUDIBLE] position [INAUDIBLE] individuals burning fossil fuels [INAUDIBLE] in your model [INAUDIBLE] group [INAUDIBLE].
JOE HALPERN: Yeah. I mean, you're absolutely right. That's a tragedy of the commons for humanity, in some sense. And I don't have a good answer.
But each individual can correctly believe that everybody else is doing it. What I'm doing makes epsilon difference. But together, we're responsible for the outcome.
So that's exactly the kinds of issues that countries try-- all these conferences to deal with global warming and to cut back on the use of fossil fuels. Is it OK for maybe developing countries to use more because they need to catch up? So how do we apportion blame. I mean, these are clearly subtle issues, and I don't think there's an obvious right answer.
So let me stress that the notion of blameworthiness is relative to an outcome. So the definition talks about the degree of blameworthiness an agent has for this particular outcome. So if you think about the trolley problem again, depending on which way you pull the level, either you have degree of blameworthiness 1 for one guy dying, or degree of blameworthiness 1 for five guys dying.
[INAUDIBLE] degree of blameworthiness doesn't talk about whether the action is OK or not. You are to blame for the outcome. So you should feel bad, no matter what happens. You still might feel you did the right thing, though.
So let me-- so clearly, you're going to be to blame for one person dying or five people dying. Which should you do? Somehow, you're going to have to evaluate tradeoffs. And the standard way of doing that is by using what's called a utility function.
So you assign a degree of-- I don't want to say goodness, a utility teach outcome. How bad is it or how good is it? And the standard thing that you'll read about, that you'll learn in business school, is choose the action that maximizes expected utility.
What does that mean? That means for each outcome, you multiply the probability of the outcome occurring if you perform the action. Because we have all these causal models, you can say how likely is this outcome to happen? And multiply that by the utility of the outcome. How good or bad is it?
So you can say, well, if I do this, then five people die. If I do this, then one person dies. Five is worse than one, so the expected utility of one person dying is higher than the expected utility of five people dying. Therefore, I should go for the action that only kills one person.
But what utility function should you use? There's obviously no right utility function. We certainly tend to view some as more reasonable than others. You might say, well, it's OK if you're a mother and that one person is your child. Well, then maybe it's acceptable to kill the five people.
So at the end of the day-- so let me just say a few words about intention, and then wrap up. So, roughly speaking, I'd like to say an agent who performs an action act intended an outcome O had a-- again, there's a counterfactual, and we can model it in a framework. Had a been unable to impact the outcome, he wouldn't have performed the action. So you intend something if you did the action in part to make it happen.
So think about the trolley problem. And I want to say, yeah, the reason I pulled the lever this way, one person died. But I didn't intend that person to die. My intention was to save the five people.
Well, I can make that precise by saying, even if that person hadn't died, I still would have pulled the lever. I didn't pull the lever to kill the person, because I would have pulled it anyway. I would've wanted the train to go down the track.
Even if there hadn't been a person, I would have been thrilled that the person hadn't been there. All the better. So I didn't intend that outcome, because I would have done the same thing, even if the outcome hadn't happened.
Is that making sense? That's the idea. Still, it's not quite right. It turns out to be a bit more subtle than that. But more or less, that's what's going on.
The reason it's not quite right is, well, here's one example. There's a number of reasons it's not right, but it's illustrating the subtleties. Suppose an assassin plants a bomb intending to kill two people.
He wants both of them dead. So he intended to kill person a and he intended to kill person b. He still would have planted the bomb, even if person a hadn't died.
So this definition doesn't quite work. He still would have done it. Even though he intended personal a to die, he still would have done it, even if a person they hadn't died, because he wanted b to die as well.
So you've got to play with it a bit to get it right. I think I've more or less got it right now. But this definition has been a moving target for the last few weeks.
So let me-- this is research. I mean, the paper isn't finalized yet, because I'm still thinking about it. And I'm sure after I've written it, there'll be a bunch of philosophers who will write papers saying not, hasn't got it right, here's why. Which will actually be good. At least they'll be thinking about the formal definitions, and they're good at coming up with examples. And I'm not sure I have exactly the right definitions.
So roughly speaking, I want to say, when is an act-- OK, let me use the words, morally acceptable. Well, if you're thinking in terms of maximizing expected utility, it's acceptable if it maximizes the agent's expected utility, and the agent had reasonable probabilities and reasonable utilities. What do I mean by that?
So the reasonable probability is, again, think of the doctor. If the doctor places zero probability on this drug having adverse side effects, and there were a bunch of articles in the literature saying this drug had adverse side effects, the law would say that's not a reasonable probability to have. You should have read the literature. Bad probability.
If I like the idea of burning down forests, and I get high utility from burning down forests, people would say that's not a reasonable utility function to have. I mean, I'm doing the thing that maximizes my expected utility. I'm dropping this match because I like to see forest fires blaze. But the law would say, or most people would say, that's not an OK utility.
But when we're talking about reasonable probabilities and reasonable utilities, it's obviously more complicated than that. The law certainly takes into account-- not just the law. I think most people reasonably take into account things like age and mental capacity.
The law has what are called McNaughton Rules that take into account somebody might have limited mental capacity. So the law doesn't apply quite the same way that those people. They didn't understand right and wrong. They didn't appreciate what was going on.
Or think of little kids who don't appreciate the consequences of their actions. So you might say a little kid has a bad-- I think my teenagers occasionally have very bad probability distributions over outcomes. But we're willing-- sometimes [? what ?] I remind myself, willing to be a bit more reasonable, because they're only kids after all. So what counts as reasonable?
Also, when we talk about reasonable, we-- and this is where the computer science comes in, might want to take into account computational limitations. That it's hard to figure out probabilities. I mean, you might intuitively perfectly reasonably not have the right probabilities, because it was hard for you to figure them out. And we'd let you off or not.
I mean, it's one thing to say the doctor should have read the literature if it's in a major journal. But if it's an obscure journal, maybe not. If it's hard to figure things out in general, I mean, what are the odds of something happening? Maybe you calculated it wrong. We'll let you off the hook if it's a hard calculation, at least to some extent.
When I talk about emotional state, I mean obviously, in the heat of the moment, if you have to make a decision right away, you might make a bad decision because you might calculate the utilities and probabilities inappropriately. And again, we let you off there. So when we're making moral judgments about acceptable actions, it's clear when it comes to people, we're not necessarily taking into-- we do take into account things like computational limitations and emotional state.
And of course, as I say, you can be held blameworthy for an outcome, even if the action is acceptable. So I want to distinguish blameworthiness from acceptability. You might decide it's acceptable to kill one person to save five. But you're still to blame for the one guy dying.
But now, let's talk about an autonomous vehicle, or in general an autonomous agent built by computer scientist. In some ways, things get easier, because first of all, a computer scientist asks the computational question. If this program is supposed to decide what's the right thing to do, how hard is it to compute?
And the answer is, these computations are not hard. Once you have a probability utility calculating degree of responsibility and so on, degree of blameworthiness, that's not that hard. So the hard part is giving the agent the right probability and the right utility.
Well, the right probability, that part is relatively easy. Autonomous agents are going to do much better than we are. They're going to have lots and lots of data, and access to lots of data. So they'll be able to calculate probabilities, I think, at least as well as people.
The hard part is the utilities. And this is not a talk that's going to give you answers. How do we give an autonomous agent reasonable utilities? And it's a question that some people are starting to think about.
Bart Selman and I are part of this grant, a large grant from the Open Philanthropy Foundation. And the PI of that grant, Stuart Russel at Berkeley, is looking what he calls the value alignment problem. And that's the problem of getting artificial agents, autonomous agents, to learn utilities from data. And the alignment part is, you want their utilities to align with those of people. So you want the values of the systems we build to be the same as the values of people.
That's hard. Just watching humans might not reveal good behavior. There's a bunch of recent experiments that show that. On the course website, I actually pointed to a paper that appears in Science in the last couple of weeks, where we have agents looking at data of people talking.
And the agent correctly deduces-- I mean, doesn't correctly deduce. Learns, sees that these people-- the data is exhibiting all the behaviors we expect in terms of racial discrimination, not treating women as well. If you're learning from data, you'll learn maybe to treat women badly. Because if you look at historical data, that's historically what happened.
It's not clear we want our agents to learn from data, because the data might just get our agents to perpetuate all the bad things we've done up to now. So this is a hard problem. And again, just watching people, people aren't always as moral as we would like. So again, just watching what people do and learning what people do doesn't necessarily translate into the values of the system aligning with what we want them to be aligned with. Listen to what I say, don't do what I do kind of thing.
So let me stop by saying I don't want to pretend that I've solved any problems. So if you're hoping to come to this lecture to see how we should do things, I'm sorry. I think the formal definitions are really important.
So I will say, a, I don't think I've nailed the definitions yet. It could very well be if I gave this talk in a month, I would give you slightly-- well, I haven't really gone into the technical details of the definitions. But if you had forced me to give you the technical definitions, the technical definitions I give you in a month might be different from the ones I would give you today. Certainly, the ones I'm giving you today would be different from the ones I would have given you a month ago. So this is definitely ongoing research.
That said, I feel very strongly that, a, we need definitions. We're going to have to build them into our systems. We can't have serious discussions unless we know what we're talking about. And the formal definitions force us to make precise what it is we're saying. And I actually feel pretty strongly that the basic framework of thinking, in terms of these causal models, where you're forced to write down how one variable affects another.
Now, [? we ?] might disagree about what the causal model is. So lawyers have asked me, if we ditch the but-for definition, and go with your definition of causality, that doesn't mean the lawyers are out of a job. What will happen is the different lawyers will start arguing, it turns out that the definition of causality [? and ?] all these definitions are relevant to the model you built.
So if you build a lousy model, you're going to deduce from your model that a is a cause of b. Somebody else would come up with a different model, a is not a cause of b. So these definitions, even if you agree on the definitions, don't give you the answers, because they don't even tell you what is the right model to describe the world. So even if all the lawyers in the world accept our definition of causality, forget about [INAUDIBLE] of what's moral responsibility, all these definitions matter to the law, different lawyers can disagree about the model.
So we have, in one of our papers, some guidelines as to what constitutes a reasonable model. But you can definitely have disagreement. But at least you agree on what you're disagreeing about, once you have the formal definitions.
So the bottom line here is that it's a plea for getting good formal definitions. In English, when we use words like "responsibility" and "blame," they have at least two or three different flavors. And you can distinguish, using these models, two or three notions that are all reasonable. It's not that one's right, the other one's wrong.
And since this is the last lecture, I'll take the liberty of having a final slide that says look, this talk hasn't said anything about how we decide what counts as reasonable or acceptable. I doubt we'll get universal agreement. I mean, I remember talking to somebody about it who said almost any law that you think [? is, ?] everybody ought to agree with this one. "Thou shalt not kill."
Certainly there are times when people do kill. We honor people who kill in times of war. You shouldn't kill, except for war. It's very hard to come up with moral absolutes.
But nevertheless, we're going to have to reach some kind of consensus. If you're thinking about conferences on climate change and who gives what, I mean, we're going to have to reach consensus, at least when it comes to autonomous agents. And it's a task that we all need to be involved in. And that means you guys.
Pretty much all the ethical and social issues that we discussed in this course are subtle and complex. So I try to hint formally at some of the subtleties and complexities. And we're going to need informed citizens to make good decisions.
I don't think we can leave this to experts. These are not-- these kinds of decisions are not rocket science, where the work is building the rocket. These are decisions about what kinds of autonomous agents do we want.
What do we think-- how should we build them? What values are we going to instill them with? Don't leave it to the experts. So you guys should get involved That's it.
I guess we have a few minutes for questions. I should say that, since it's the last talk, all the talks in this lecture series are on the course website. If you look at CS 4732, you can find-- except for mine. It isn't up yet.
But mine will be up in about three or four days. So you can see any talks that you missed. There were some great talks. Any questions, comments? Yeah, Martha.
AUDIENCE: [INAUDIBLE] question. So a long time ago, like 25 years ago, [INAUDIBLE] example of a terror bomber and a strategic bomber?
JOE HALPERN: I don't know the example.
AUDIENCE: OK, so both of them want to take out a muntions plant from the enemy. And the munition plant happens to be right next to a school. Terror bomber thinks this is really good, because it will also create a lot of terror. The strategic bomber would prefer not to kills the kids in the school, [INAUDIBLE]. So then there's a lot of debate about whether they both intended to kill the kids, and whether [INAUDIBLE]. I'm just wondering--
JOE HALPERN: So these definitions would say terror bomber-- so let me repeat Martha's question. So there are two types of bombers. There's the terror bomber and the strategic bomber. Both of them decide to bomb a certain strategic munitions plant.
Next door to that is a school. From the terror plot bomber's point of view, feature, school. And from the strategic bomber's point of view, bad, school. Now, according to these definitions, neither one of them intended it, because both would have done the bombing anyway, even if the kids had been saved. So as far as intention goes, it was not even of the terror bomber.
Now, if he would have put the bomb in anyway, even if somebody had told him the munitions plant wouldn't have blown up, just the school. I mean, somehow the munitions plant is hardened. The bomb just bounces off of it, but it kills the kids. So if you would have put the bomb in anyway, because he was thrilled that the kids were going to die, then he intended to both blow up the munitions plant and kill the kids. He intended both.
But if he wouldn't have put in the bomb, had the munitions plant been saved, then according to this definition, he didn't intend to kill the kids-- even though his utility might have increased had the kids died. So there's a different-- intending it doesn't mean, at least according to these definitions, that your utility increased. It means you would have done the act anyway, even if it hadn't-- he wouldn't have done it, had it not happened. So under reasonable assumptions, neither one of them intended to kill the kids, although in one case, the utility decreased because the kids dying--
AUDIENCE: [INAUDIBLE] blameworthy for killing the kids.
JOE HALPERN: Not for killing the kids, no. Yeah. [INAUDIBLE] so you could [INAUDIBLE].
Oh, no, hold on. No, no, no. I take it back. No, they are both blameworthy for killing the kids, because blameworthiness is for an outcome.
Blameworthiness, on my contention, says what could you have done? Is there another action you could've done that would have made a difference? Well, clearly, not planting the bomb at all would have made-- so they're both completely to blame.
Sorry, the definitions had it right, even though I had it wrong. [INAUDIBLE] Yeah.
AUDIENCE: So are you defining intention there as a binary quality of it is [? this ?] single factor that made the difference.
JOE HALPERN: Yeah, so under this definition, it is the degree of blameworthiness. But you either intended it or you didn't according to this definition. Now, the definition takes into account probabilities and utilities. Because whether or not you would have done the action, now you're looking at, well, how likely are things to happen?
So it's not that I'm not taking properties and utilities into account into what I'm figuring out. But at least according to this definition, you either intended it or you didn't. And I think that accords, at least with my usage of the word, did you intend to do it?
I didn't-- sort of kind of intended. I did intend it or I didn't, at least according to this definition. But that's today. Maybe speak to me in a month, and I might be saying something different. Paul.
AUDIENCE: [INAUDIBLE]. [? Follows ?] actually from an earlier discussion in the context of the course. So we heard John [INAUDIBLE] talking about this commercial software that he uses to take [INAUDIBLE] to [? jail ?] [INAUDIBLE]. And when Dan [INAUDIBLE] asked him again, he believes there shouldn't be any government evaluation for all of this. And he said basically no, we should [INAUDIBLE] the technology [INAUDIBLE] the [? direction ?] [INAUDIBLE].
And you know I'm not a Luddite. But we can [INAUDIBLE] foresee happen here in the case of autonomous vehicles is all of these algorithms will be implemented in a black box over which we have no say whatsoever. And unlike the doctors, nobody is going to expect them to read your paper.
JOE HALPERN: I mean, I think there's a chance that you're right, unfortunately. Especially about the part about reading my paper.
But we are-- this is one of those things where, as a group, this is where group blameworthiness is the appropriate concept. Each one of them can say, what can I do? I can't do anything. And I don't find it acceptable to say I personally can't do anything. Therefore, I won't.
So I don't claim to have nailed the definition of group blameworthiness yet, and how exactly if you want to ascribe blameworthiness of a group. Now what do you do with the individuals in the group? But again, I think that the framework is there.
And I think these issues become really relevant. It's not enough just to think on an individual level here. I'm starting to sound like a moral philosopher, which I'm not.
But I think when one thinks about moral issues, you can't just think as an individual. You have to think also as a member of a group. Yeah, OK, [? Russ. ?]
AUDIENCE: So you talked about degree of blameworthiness based on probability. Is there similarly a degree of intentionality?
JOE HALPERN: No, so that's what I said to this question here. That at least in this definition, intentionally is either 0 or 1. But the definition takes into account the probabilities, because whether or not you intended something, you're looking at the outcome, would you still have done it, had this not happened? Well, that'll depend on probabilities and utilities.
AUDIENCE: So if the doctor thought it was, like, a 5% chance of the side effects happening, and did it, anyway, I guess you'd say that was negligence? Is that intentionality?
JOE HALPERN: OK, so let me give you part of what makes this hard. So suppose I'm a doctor, and I have a patient who's extremely sick. And the only way I can think of to save him is this operation, which only has a 5% chance of success.
So I realize, going into it-- and it's not that I don't know the numbers. Let's say I'm right. With probability 0.95, the patient's going to die, anyway. The probability of him living is only probability 0.05.
I perform the operation, he dies. I certainly didn't intend for him to die. I would have performed, and had I known he was going to die, I would not have performed the operation. I mean, presumably, I would've put him in hospice care, or something like that.
So the definition works. So if it turns out that I actually want two slightly related definitions, one is did I perform the action intending to affect that variable? That's something I can decide upfront.
Now, which particular value of the variable did I intend to happen? Well, obviously, the good one. So let me make that a bit more precise.
So in the context of the doctor, why did he perform the operation? Well, he was intending to affect the health of the patient. He certainly wasn't intending the actual outcome, which is the patient dying.
Which outcome did he intend? Well, he intended for him to live. So it turns out that really, that's actually the product of the last two weeks.
That's not what I would've said a month ago. But that is something I've been thinking about. I want to distinguish which variable did you intend to influence, or did you intend to influence the variable. And what outcome, what value of the variable did you want, if that makes sense.
So it makes perfect sense that you should perform an action, even though you think it's quite unlikely to have a beneficial outcome if the alternative is the guy dies, anyway. I mean, doctors do this. They'll perform an operation, understanding that the probability of a good outcome is low.
But from their perspective, it's nevertheless the best they can do. And this happens all the time. We don't certainly want to say the doctor intended the patient to die. [INAUDIBLE] on that.
AUDIENCE: Is there a reason to expect that these definitions should simplify [? down ?] [INAUDIBLE]? Or maybe [INAUDIBLE] uncomplicated [INAUDIBLE]?
JOE HALPERN: So I think the basic-- I guess I have to think, or else I wouldn't be doing this. I don't want to say the definitions will totally simplify, and I don't think the simple. I think they're relatively simple. I think what's hard is applying them.
So I can say, look, what does it mean for an action to be morally acceptable? Well, it just means it's the best thing you could do, given reasonable probabilities and utilities. That's a good gloss, and the formal definition isn't far from that. And that's really simple.
Now, applying it is incredibly hard, because we have to agree on what reasonable probabilities and reasonable utilities are. So the problem isn't in this case that the definition is so complicated. It's that getting us to agree on what what's reasonable is hard. But I don't-- I mean, it doesn't mean we shouldn't try, I think. I saw another hand back there?
AUDIENCE: I was going to ask about the doctor problem. Suppose [INAUDIBLE] interestingly [INAUDIBLE] So the doctor [INAUDIBLE] the medicine, but they know it might have adverse side effects [INAUDIBLE] that drug [INAUDIBLE] that [? properly ?] [INAUDIBLE] how's that affect [INAUDIBLE]?
JOE HALPERN: Well, yeah. I mean-- so again, without going into formal definitions, let's talk intuitively. This happens all the time. I mean, I think there's a low chance of adverse side effects. But this is the best medication.
On the whole, I think it's the right thing to do. So the expected utility is high. If there actually are adverse side effects, that could happen.
You say, well it's certainly not what I intended. But I understood it might happen. There is a 10% chance.
Even though the probability, according to all the models, that Donald Trump would win was low, he won. And it wasn't zero. So did you intend that outcome? No.
Are you blameworthy for the outcome? Yes, because had you done something else, maybe that outcome wouldn't have happened. Is the action morally acceptable? Yes, because at least under reasonable assumptions, it was the best thing you could have done with a reasonable probability and utility.
So yes, yes, and yes. Even though you're to blame for the outcome, it was still a reasonable action to do. So, Anna.
AUDIENCE: Yeah, so you mentioned there something else. Like there's something else you can do. And in of the samples you presented, there always seemed to be two options [INAUDIBLE]. Is there a concept of, like, exploration, where it's like [INAUDIBLE] there's a third option [INAUDIBLE]?
JOE HALPERN: Of course. In the real world, things are much more complicated. So everything is relative to a model. So these models, if you want allow an option of exploration, you're the modeler, you get to decide what the [? space of ?] possible actions are.
So this is, in some sense-- it doesn't address that issue at all. It's not that adding an exploration would be different. It would just be a different model.
And you would have a different model and say yeah, you should have explored. That's another action you could've done. And so it's just at the level at which we're willing to model.
I'm looking at the clock, and we have a class that starts-- I mean, the discussion portion of this class starts in two minutes. So we should probably end it. I mean, I guess I'll stick around--
Thanks for coming to the course. This is it.
We've received your request
You will be notified by email when the transcript and captions are available. The process may take up to 5 business days. Please contact firstname.lastname@example.org if you have any questions about this request.
The need for judging moral responsibility arises both in ethics and in law. In an era of autonomous vehicles and, more generally, autonomous AI agents, the issue has now become relevant to AI as well. Although hundreds of books and thousands of papers have been written on moral responsibility, blameworthiness, and intention, there is surprisingly little work on defining these notions formally. But we will need formal definitions in order for AI agents to apply these notions.
In this talk, given May 1, 2017, computer science professor Joe Halpern takes some preliminary steps towards defining moral responsibility, blameworthiness, and intention. Halpern works on reasoning about knowledge and uncertainty, game theory, decision theory, causality, and security. He is a fellow of AAAI, AAAS American Association for the Advancement of Science, the American Academy of Arts and Sciences, ACM, IEEE, and SEAT Society for the Advancement of Economic Theory. Among other awards, he received the ACM SIGART Autonomous Agents Research Award, the Dijkstra Prize, the ACM/AAAI Newell Award, and the Godel Prize, and was a Guggenheim Fellow and a Fulbright Fellow.