SPEAKER: So I would like to welcome you all to the last LPHD speaker of the year. You've heard me say before here that one of the terrific things about this job is you get to introduce old friends and feature their work. And this sort of goes double today.
Roddy's James S. McDonnell distinguished professor of psychology at WashU. And until recently, he was also associate dean in Arts and Sciences for Academic Planning, which we found out last night was a very different job than what the title would suggest.
His PhD is from Yale University, where he worked with another old friend, Bob Crowder, of happy memory. He's formerly been at Purdue. And he also taught a couple of different times at the University of Toronto and then at Rice before he went to WashU.
The important thing is that he is, by all odds, one of the most influential memory researchers of this generation. He's done hugely important work on false memory and implicit memory. I remember hyperamnesia, too, and a lot of recent work on retrieval practice that turns out to be very influential in retention and education.
And that's actually why Roddy was here, to do the provost teaching seminar. And he very kindly agreed to speak in our speaker series today.
He's going to talk about the curious relation between memory confidence and memory accuracy today, something that in memory in the law is hugely important because we so often use confidence as a way of gauging the credibility that we place on people's memory of course, famously of course in eyewitness identification. But we use it all over the place, too, in forensic interviewing.
Also, in interrogation, a nonconfident memory report is often the entering wedge in interrogation to start breaking a suspect's story down. But I'll stop there and let Roddy get started.
HENRY ROEDIGER: OK, thanks very much, Chuck. And thanks for having me here. So what I want to do today is-- I shortened the title to be a little less cumbersome than the one I sent in a few months ago. But it's going to be about this problem of confidence and accuracy in memory and something I've come to relatively recently in the last few years. And I'll tell you that story.
So here's what the experts say. Dan Simons and Chris Chabris, who got their PhDs here at Cornell in the Psychology Department across the way, did a survey and published in 2011. What do experts believe about memory? And what do the people out on the street-- it was supposed to be a random survey. They didn't have a very high return rate. But nonetheless, they did the best they could.
And they found that people believe that confidence and accuracy are pretty tightly correlated if you ask them. And then the experts they polled said that confidence and accuracy are not correlated, which is the received wisdom-- or weakly correlated, the received wisdom from the psychology and law community. And that's what I want to talk about today.
There are really two traditions of confidence and accuracy work. And the quote that I just showed you comes from one of them.
If you look in cognitive psychology, the people that Chuck and I mostly hang around with, people who study simpler events in the lab, the story about confidence and accuracy usually is quite different. There's a very tight correlation.
This is from a book by John Dunlosky and Janet Metcalfe. And they said the relative accuracy of people's confidence is high-- higher confidence almost inevitably means an item has been presented. Low ratings mean that something hasn't been, if you use a particular type of scale.
So that's one tradition of confidence and accuracy research. But the other tradition-- whoa, backup-- is the one that comes out of the psychology and law. These are mostly social psychologists who got interested in legal processes and started developing their own measures of memory and doing work.
And so here are just three quotes. I could multiply this by about 200, and you get the same picture. So "Confidence is neither a useful predictor of the accuracy of a particular witness nor of the accuracy of particular statements." That was from Smith, Kassin, and Ellsworth.
Gary Wells said in The New York Times-- it's a quote from The New York Times-- "There's little or no relationship between the accuracy of witness identification and confidence." And Odinot et al. said "Confidence should never be allowed as evidence in the courtroom." Pretty dramatic statement.
So I was blissfully going about my life, not paying too much attention to all this. And that changed in 2010. A lot of people get their research ideas from theories and stuff. I get mine from email.
Lynn Nadel, a friend at University of Arizona, was planning a book called Memory and Law. And he wrote me an email, a very long email, asking if I would write about confidence and memory.
And I studied the email for a moment and thought about it and wrote him back and said, Lynn, I think you must have made a terrible mistake, that I'm widely known for never having written one word about this. Why on Earth would you ask me to write about confidence and memory?
And I said, Gary Wells, there are experts out there who study confidence and memory. And he said yes, but all these experts have staked out positions on various issues. And we want somebody to come in and take-- in other words, they prized me for my ignorance of the issue.
We'd like somebody to come in and take a fresh look at all this and try to figure it out. Why is the literature so conflicting and confusing? Why can you find these different claims all from very credible people?
And so I almost said no. I was about to go on sabbatical. I had all this stuff I wanted to write up. And then I finally thought about it. And I thought, well, I'm going on sabbatical. You're supposed to learn to do something new on sabbatical. You're supposed to think new thoughts. And so maybe this would be fun to do.
I was going to University of California at San Diego, planning to sit on the beach a lot. But I could think thoughts about confidence and memory while I was doing that. And I was going to be at UCSD, too.
So I finally said yes. Why is this not working? So eventually I said yes. And then the first thing, of course, when you're embarking on some new project-- there we go-- you need to get collaborators.
I had a relatively new graduate student at the time named Andy DeSoto. So I enlisted him. And I was going to UCSD. I had a friend there who actually worked in the other tradition of confidence, John Wixted.
So John Wixted, I asked him. I came out. And I told him, look, I've been asked to write this chapter. How would you like to come in with me? And that was his expression kind of there. That's why I like this picture.
He didn't really want to. He said, well, I got a lot to do. I'm chair of the department. But finally I talked him into it. He felt like it would be-- he's a very nice guy. And he just couldn't tell me yes/no in his department.
So I brought John in. And together we wrote a chapter on all of this. And then we started doing research on it. Whoops, this keeps going too fast.
So the relation of confidence and accuracy is inherently correlational. Psychologists have made more mischief about correlation, not just correlation being confused with causation. There are just all kinds of other sorts of mischief we can get ourselves into and, in fact, have over the years.
And so long ago, in 1955, Ernest Hilgard, a very famous psychologist at Stanford, said in Psychological Review, one of our most prestigious journals-- this is just not wanting to work today.
Here we go. "Correlation is an instrument of the devil." That's a quote, Psych Review. Put it down. Go back and look at it. And I'm taking that totally out of context, of course.
All of these kinds of things that we can-- if one of your measures is unreliable, it just has low reliability, you'll never get a correlation. If you have a lack of variability in one measure, you won't get a correlation.
If you have a variation of that problem, constricted range, then Hal Pashler and his colleagues pointed out certain fiercely high correlations, which were called voodoo correlations, that you can get. I don't want to go into conditions. None of this is what is the problem though in confidence and accuracy once I started looking at the literature.
But one problem that is is just how the issue is framed. And in this case, the title of my talk, the title of lots of articles is, what is the relation between confidence and accuracy in reports from memory? And there's all kinds of ways you can measure that. You can mean different things.
And so you can have a zero correlation from one measure and a reasonably sized correlation from another measure. And that's going to be part of the point of my talk.
So let's just consider, here are several methods that we talk about in the chapter when we review the evidence. So suppose you're doing experiments like this. So you're doing a memory experiment. And you manipulate an independent variable-- study time, something boring like that, or different types of encoding strategies or types of materials, where some types are better remembered than other types.
And then you also collect confidence measures. And you ask, do the two go together? If you manipulate something that affects memory, does it also affect confidence?
And as far as I can tell, every experiment ever done on this problem shows a positive correlation. You never move memory with an independent variable when you don't also move confidence. So in this sense, it's always a positive correlation.
But what about across subjects? So what if Chuck and I see the same set of events and, on average, he has higher confidence than I do? Will he also have higher accuracy? So in other words, looking at confidence and accuracy as a subject variable, where we might have different confidences.
And you could imagine actually the correlation would be 0 or even negative. Maybe people who are really good are also kind of cautious about how good they are. They might think, oh, memory is fragile. I better not be too overconfident.
So you could imagine a 0 or negative correlation. What you actually get is positive. But you could imagine that. And you could do it across items.
So in this kind of analysis, you average across lots of things being remembered. In this kind, you average across people. And you say, are events that tend to be well remembered also events for which there's high confidence?
And then, finally-- not finally, there's other things I'm not even putting up here. But these are the main ones. So if I give you a hundred things to remember and you remember some better than others-- this is more the question in eyewitness testimony-- within subjects, are the things that I'm most confident about most accurate about, and so forth and so on?
So here are all these different things. And they don't have to agree. I mean, this always happens. So if you mean confidence and accuracy in this sense, which we don't usually-- but that's always true. And these are ones that you can ask questions about.
And the first thing I did was just start playing with numbers, what could be called simulations if you were doing it on a computer. But I was doing it on pencil and paper and calling it playing with numbers.
What kinds of data could you imagine? Could you find a positive correlation in one case and a 0 in the other? And the answer is yes. It's at least possible to do that.
But we also started doing some experiments, Andy DeSoto and I did. And I'll first tell you about a word list experiment, just showing you how this can be kind of tricky.
So this is a paper published in memory a couple of years ago. We had people study categorized lists. So these are from just general categories, like, in this case, vegetables, articles of furniture, types of automobiles, those kinds of things.
And so what we did-- there are norms for these. You can go to the norms and see what kinds of things people generate when you get hundreds of undergraduates to generate vegetables. And so here's the first 20 vegetables.
What we did in this experiment is we presented numbers 6 to 20 in a random order to people, not just-- I mean, we had 10 categories altogether. I'm just showing you one.
So we had presented 15 words. We left out the top five. Those were never presented. So altogether, people studied 150 target words and 10 categories, 15 items per category like this. And then we gave them a test of 300 items.
So they saw 300 words on the test a little while later. And they were told to look at each word and say yes or no, was it in the list? And the 150 lures were-- 50 of them were these kinds of items, the top five in the category that we didn't present. And there were 10 categories. So that's 50.
And then there were another 100 lures taken from categories that weren't used in the experiment at all. So they were just taken from other categories from the norms. So those were very unrelated to the materials we presented.
So the recognition test looked like this. They would get a word like "cauliflower" and say, is the word old or new? They would pick one. And then there was a sliding scale.
It started here in the middle. And they were told, place this little cursor from, 100 being completely confident that my judgment is right, either old or new. 0 means I'm not confident at all. I'm just guessing. You made me say old or new. So I said old or new. But I'm just guessing.
So that's the confidence rating on a 100-point scale, something often used in the metacognition literature, not used too much in the-- although it's getting to be more used in the eyewitness literature. So what happens when you do this kind of experiment? What we did was to look over items. That was the main area of interest.
And if you look at studied items, here's people's accuracy, the hit rate. So this is the probability of calling cauliflower, let's say, is here. And people got it right-- whoops-- got it right 70% of the time. It's a hard test. 300 items, it's a hard test.
So they got it right this much. And there were this conference. So overall, items, there's a positive correlation, plus 0.67. You might worry about this outlier. If you throw that out, the correlation drops a little bit. That is pumpkin.
I don't know why pumpkin is not very recognizable. It seems like a perfectly good item to me. But anyway, people, they don't have high accuracy. In fact, they're very low. And their confidence is low. But even if you throw that out, the correlation is still high.
But what happens to those first five items we left out? So these are items that are prototypical items in the category, like carrots, for vegetables. But now we've left them out.
Well, now if we look at confidence and accuracy, accuracy is simply 1 minus the false alarm rate. So it's correct rejections. I correctly rejected carrot, or I failed to.
And now what we see is a negative correlation. You can state this, too, as one is the-- I almost want to plot this the other way because it makes more intuitive sense. But basically, the less accurate you are, the more confident you are. Yeah, I just said that right. The less accurate, the more confident.
If you turn this around, if you plotted false alarm rate against confidence, you would see those items for which you are most likely to false alarm, you're also most likely to be high confident in. So you can get in the same experiment, here's a positive correlation with one set of items, the targets. And with these confusable lures, you get a negative correlation.
I don't have the unrelated lures. But then it goes back to being positive. The more accurate you are, the more confident you are. And so this is averaging across two experiments.
So we have studied items. If we do it, I just showed you the between events. If you average across both experiments, it's 0.70. If we do it between subjects, this is the, how confident on average is the subject? How accurate are they? It's still nice and positive.
If you do it within subjects, if you do it for individuals, it's still quite positive. For the strongly related lures-- these are the confusing items, the prototypical items we left out-- the correlations are negative. Although the between-subjects ones falls to about 0. And it's not statistically different from 0.
The unrelated lures are all positive. If you ask, what do people usually report in memory experiments, they report all items. And they don't usually break them down. So you can see the correlations between confidence and accuracy.
If you look at all the words in the list, they look pretty small. This is actually kind of what you get in the eyewitness literature. But you can see that's underlying this because we broke out different item types.
If you have strongly related lures, very tricky lures, ones that cause false memories, false recognition, then you get a negative relation between confidence and accuracy. For studied items, you get a positive relation between confidence and accuracy.
And so Andy DeSoto and I have done a number of experiments like this. Chuck tells me he inflicted his class with this second paper, which I'm not going to talk about, where we showed similar things in another. And we've written a chapter about all this too in a [INAUDIBLE] book in honor of Larry Jacoby.
And Andy DeSoto, I hope, will be writing up four more [INAUDIBLE] like this soon from his dissertation work, which he just finished. So in other words, I convinced myself from doing these experiments with simple materials that the relation is you can't just frame it as, what is the correlation between confidence and accuracy? And yet that's the way it's been framed for 45 years now, like there's one answer to this.
There's not one answer. There can be several different answers. But let's cut to the chase. All this is word lists. This is a psych and law group. What about eyewitness memory, eyewitness situations?
Well, what I'll be telling you about next is one of the great things about the internet is you can do research with other people when they are not close to you. So John Wixted is happy now. After writing this chapter, he got all interested in these issues.
And he has totally converted his research program mostly to be about issues of psychology and law. And his former student, Laura Mickes, now at University College of London, and then Steve Clark and Scott Gronlund, two memory researchers who also have gotten interested in this--
SPEAKER: Steve will be here in September [INAUDIBLE].
HENRY ROEDIGER: Oh, OK, good. That we published a paper in American Psychologist last December. The paper was submitted. It was rejected very solemnly. John Wixted protested our rejection. It's a very slow process.
We have no idea what happened. But eventually it got accepted and came out, but over some very decided opposition that said, no, there is no correlation between confidence and accuracy, because we were saying that there was and, in other words, flying in the face of 35 or 40 years of research on this issue.
So how did we reach this conclusion in the eyewitness literature? First, the back-up, one reason everybody believes the story that there is no relation between confidence and accuracy in eyewitness memory is because of the wrongful conviction problem that many of the-- Innocence Project has shown there are more than 300 wrongful convictions that have been reversed on DNA evidence.
And when you look at what kind of evidence got people convicted erroneously, mostly it was eyewitness testimony, about 75% of the cases. So people highly confident in court, they say, this is the person who did it. I'll never forget that face. And they're wrong.
So you can be highly confident but also wrong in court. And so that's one reason, kind of never mind the research, but just from first principles, there seem to be a lot of problems here.
And psychological science has been brought to bear on this problem. Why do these high confidence, false memories occur? Well, the traditional answer is because eyewitness confidence is worthless. It's uncorrelated or, at best, weakly correlated with truth.
And that's why these people get wrongly convicted. Is that the case? And I'll be saying that it's not the whole story. So today's argument, eyewitness memory researchers I think they've gotten it partly right and partly wrong.
Yes, confidence in court is pretty worthless. You don't get to court unless you're highly confident. If the witness is still saying six months later, I don't know, I think it was that guy, but I'm not really sure, well, that's not anybody a prosecutor is going to take to court.
But what I want to argue is that the right kind of confidence measure can be quite useful. And we're throwing away something that's really useful by ignoring confidence. And the trick is, it's when the confidence judgment is taken.
The very first time a witness is faced with a lineup, the very first time and probably only the very first time, that confidence rating can be very useful. By the time you get to court, it's not particularly useful because it's gotten inflated over time.
So let's just talk about-- you probably all know this. Let me run through this. There are two main procedures for looking at lineup identification. The standard one that's typically used is the simultaneous lineup.
So if you have a suspect, let's say that's the suspect in the lineup, then you put in five fillers who match in general characteristics as the suspect, hair color and that kind of thing. So it could be a live lineup. Or it could be a photo lineup like this. But the fillers are known innocents.
If a witness picks a filler, the filler is not going to go to jail, because the police-- often it's a police officer in fact. These are people picked from pictures, books, or from-- in real lineups, they're people who are known to be innocent.
So this is a simultaneous lineup. You look at all the faces at once. You make a judgment. And then, if they're doing it right, you make a confidence rating if you're a witness.
The other way of doing this is the simultaneous lineup. Oh, sorry, the fillers are all known to be innocent. You're asking, is that one person?
So the sequential lineup is the same idea. Now you see faces one at a time. And you say yes or no to each face and you stop-- at least in the real world, you stop to when you get to one you say yes to.
In experiments, people do it both ways. You might see all the faces. You might stop when you get to this guy and the person said yes.
And so it's not part of my talk today. But one of the things John Wixted got interested in is this difference between simultaneous and sequential lineups. The one-minute version of this story is that police departments have used simultaneous lineups for a long time.
Gary Wells and colleagues did research some years ago suggesting that sequential lineups were better, using a particular measure, a ratio measure. It turns out, in retrospect, Gary Wells seems to have been wrong. There's now a huge amount of evidence coming from Wixted, Clark, and Gronlund, all the people shown there.
And I've actually published one experiment and have two more going on right now looking at this issue, too. And we all find, using similar detection theory and ROC analysis, simultaneous lineups provide better discriminability.
And yet Gary Wells and colleagues have convinced 30% of the police departments in the US to go from the superior simultaneous lineup to the inferior sequential lineup. It's going to be a real black eye for our field once all this comes out. But it's being fought, contested.
I'm telling you my opinion. If Gary Wells were standing here, he'd be telling you they're probably not different. He's come around to that point of view now. But he also doesn't believe in [INAUDIBLE] detection theory, it turns out.
So anyway, going back to all this, let's go back to confidence and accuracy. It doesn't really matter for this point. So the relationship between eyewitness confidence and accuracy again, you can have an initial lineup.
Let's say you're robbed. You got a look at the person who robbed you. The police later find people and ask you, put them in a lineup, and ask you, can you pick the person. And so you try to do that.
Let's say you do pick the person at the first identification. But then many identifications later, if you think about it-- you might have more than one lineup. Often, people have a photo lineup and a real lineup with real people. But then you're also thinking about it all the time. You're describing it to friends.
Once you've picked a person, you kind of imagine that person having done it as you think about the crime. And so when you get to court, you can be 100% confident. But maybe here you weren't 100% confident at the initial lineup. And that seems to be a big problem, where people's confidence grows over time. And we'll come to that.
Oh, I put this in to remind me. You can think of memory as being like DNA evidence or fingerprint evidence. Suppose you originally have a fingerprint here off a doorknob. But then 100 other people have touched the doorknob. Well, that doorknob will become contaminated.
Well, even a first identification, your memory is not pristine and pure like it was the moment after you saw the person. So even this first identification could be somewhat tainted by your own thought processes. But certainly by the time you get here, the memory is essentially contaminated.
What you say in court is influenced not just by what originally happened at the crime. But it's about all those retrievals that you did in between. And those, as we've known since the time of Professor Bartlett, can influence your memory.
The more you retrieve something, if you bias it in certain ways-- retrieval often helps the memory. But if you bias it in certain ways, it can also change it. And in particular, confidence becomes inflated.
So what we need to focus on, I'll argue by the end of the talk, is this first identification, asking about that. But even in the first stage identification-- so a typical experiment in this field is started by Lindsay and Wells in 1984. You see a simulated crime, a videotape maybe. Or maybe you see a real person out in the world. I'll show you an experiment like that in a moment.
And then later, it could be usually often the same session because for just convenience. Psychologists like to do experiments where it's one session. But it could be a delayed test, too. And a number of people looked at the confidence and accuracy relation in eyewitness identification on a first test, not later.
And so here's what Wells and Mary said about their results. The eyewitness conference-accuracy relation is weak under good laboratory conditions and functionally useless in forensically representative settings.
Here's what Penrod and Cutler said in 1995. You shouldn't rely on eyewitness confidence. A weak indicator of accuracy even when measured the time the ID is first made and under relatively pristine laboratory conditions. I'm going to argue this conclusion is wrong.
And the reason it's wrong was pointed out 20 years ago and still hasn't quite penetrated the field. I went the wrong way. It's not because correlations are from the devil. But that figures in again.
So in 1996, Peter Juslin, a Swedish psychologist, got interested in this problem. He did a straightforward experiment. And he analyzed the experiment in two different ways, so basically kind of like the Wells experiment.
So participants watched a videotaped theft. And later they attempted to identify the guilty suspect from a lineup. Standard stuff. And they gave a confidence rating.
And what they did was to plot the data a new way, a way that eyewitness researchers had never thought of plotting the data. And many people still don't if you go look at the literature.
They said, let's just put confidence averaging over everybody on the ordinate. And let's put the proportion correct on-- I'm sorry, the confidence of the [INAUDIBLE] proportion correct on the ordinate. And let's just plot what the relation is.
And guess what. It couldn't be much better. Low confidence is low accuracy. High confidence is high accuracy. And then they said, well, now let's go do what Wells and Lindsay and lots and lots-- all the eyewitness researchers are doing. They're using something called a point-biserial correlation.
The point-biserial correlation is just a pure scenario where one variable is a 0 or a 1. You either recognize the person or don't. And then you make a confidence rating.
And that's what everybody had used. And nobody had ever plotted the data this way in the field. And what they did was to show that the correlation coefficient, even-- so this is a calibration curve like metacognition people use, where you just plot confidence to accuracy.
So metacognition people in cognitive psychology do this all the time. But the psychology and law group never did this until the Juslin paper came along. And so they showed confidence was just about as good as it could be here. And yet the correlation they showed, the point-biserial correlation could be anything, even with data like these. It depends a little bit on the data.
But basically, even if you have perfect calibration in their experiment-- I think it was 0.3 something. So if you use the point-biserial correlation, which was popular in the research, it was about 0.40 in the Juslin.
And this is what people, after the Lindsay paper come out with, they'd always shown, well, maybe-- and they said maybe confidence is a little bit useful but not really very useful. And so what Juslin et al. pointed out was that the point-biserial correlation was simply an inappropriate way to measure this.
And they go into details about why this is so. But just take it from me that you can just see this looks like a better way if you want to know if confidence is related to accuracy without going into the details.
And so what happened after this? This was published 20 years ago this year. Not much. Everybody in the psych and law community kept measuring point-biserial correlation, showing it was weak.
That's changed now. One person who changed it is Neil Brewer, who's done a lot of experiments at Flinders University in Australia. Let me just tell you about one experiment his group did. This was published in 2013.
Brewer is one of the people who caught on and said, yes, this is the way to do things. So here, the confidence-accuracy relationship, again, as though there can only be one, for eyewitness identification decisions. And he varied how long you see the person, how long you wait before you test them. And then you had a divided attention condition that I'm not going to talk much about.
So it was a neat experiment. It was done out in the field. So a researcher approached a subject. You can do it on the campus here.
So you come up to somebody and say, hi. I'm a psychology researcher. Would you mind being in a quick experiment for me. We'll pay you $10. And they say yes.
So then the second thing that happens is that another person who was standing behind a tree not far away moved out from behind the tree to be examined by the witness. Say, look at that person. They weren't committing a crime or anything. Just there they are.
And so in the long-exposure condition, they stood outside something like 15 seconds. And the short-exposure, it was 5 seconds. And then later, they were tested. In Australia, they use eight-person lineups. So they used an eight-person lineup.
And they ask later, either pretty shortly afterwards or they bring them back, find them later, and test them, using an eight-person lineup, getting accuracy and confidence from the person. And 908 subjects. Fun experiment to run if you're a graduate student. 908 times you have to stand out from behind a tree. So not one I would want to do. But I'm happy they did it.
And so what did they find? Well, here are the people who were choosers. Here are the ones who actually picked somebody. And what you see is it's not perfect. This is the calibration curve that would indicate accurate performance.
But I'm going to show you this plotted a couple of different ways. So what they found was that the most confident subjects were accurate about 80% of the time. The fact that the points are below the line shows that people were overconfident. They said, I'm 80% to 100% sure. They would be right there if they were accurate. But they were a little bit off.
And here, they're a little bit underconfident. Actually, when they expressed very low confidence, 20% confidence, they were accurate, more like 30% or 40% of the time. The red is the immediate test. The open is the delayed test. They both show pretty good calibration.
AUDIENCE: How long was the delayed round?
HENRY ROEDIGER: I think a week. I mean, it's not six months or anything. And I might be wrong about that. But I think it was a week.
And what seems to happen is you can change your subjective confidence. I mean, notice, the highest confidence are about equally accurate.
Now, probably there's not as many high-confidence judgment in the delayed case. But if they do make a high-confidence judgment, they're just about as accurate. So that seems reasonable from this.
So this is called a calibration plot. That line would be perfect calibration, like we saw Juslin had his subjects were somewhat better calibrated than these for whatever reason. But nonetheless, it's pretty good.
And oh, I think I just went through this with you, just showing if you're 20% confident here, you're more or less 35% accurate. But up here, it's more like you're a little bit under confident-- I mean, sorry, overconfident there. I think I said all that.
Now, if you go ask, they did the point-biserial correlation on these data. And the correlation is only 0.365. So they got the usual low correlation when you do it the way Wells and Lindsay suggested doing it and which people do it. But Brewer has shown, no, this isn't really the best way to assess confidence and accuracy.
A bit about calibration plots. They almost always show what I just showed you when adults are tested. This will break down with kids, might break down with older adults. We really don't have much research there.
But at least with student populations, you almost always get the kind of calibration I have shown you. It's also when there's a fair lineup and when you don't have the administrator of the lineup, like a police detective, saying, hey, take another careful look at number three there just in case when you're about to reject the lineup.
And also, you have to have the confidence rating taken right after you make the judgment. You can't delay the confidence rating because then it tends to inflate.
But what Wixted has discovered and claimed-- and this is being debated a bit in the field-- is traditional calibration plots underestimate suspect ID accuracy because of the way they've been calculated. And he suggests changing that.
And the reason is they include-- the way this is typically done by Brewer, say, is they include filler IDs. They put that as part of the equation.
And Wixted argues we never have a problem with filler IDs. Filler IDs are identifying somebody known to be innocent. They're never going to be convicted.
He suggests a better way to do this is-- I just said that-- is to use a different procedure. So you've got the suspect and then the innocent fillers. But the filler IDs are not the problem. So he suggests using a better measure that he calls suspect ID accuracy.
And this is simply the correct-- so you see a target present lineup with a suspect in it. And the person picks that suspect, the one who, because it's an experiment, we know was the right one. And he suggests that the denominator should be correct suspect IDs plus incorrect suspect IDs.
Now, this is an issue I haven't talked about. But in these experiments, half the time you get a target absent lineup. So in other words, people would see that same lineup I just showed you. And yet they wouldn't have seen the person in the first place.
So the idea is this is kind of like a false alarm rate in a recognition memory experiment. The person who committed the crime or who stepped out from behind the tree is not there. But what's the probability of picking the person anyway maybe just because they stand out?
So if you use this measure, so suspect accuracy, correct suspect IDs divided by the same thing plus incorrect suspect IDs from target absent lineups-- I hope this is making sense-- then if you use that measure, you get calibration plots. And that stands to reason to me and to a lot of people because fillers are just not a problem. Why put fillers into the equation? You're never going to convict a filler.
And here's the Palmer et al. data that I just showed you. Now using that formula that Palmer et al. published enough of their data so that you could recalculate these. And now look at the high confidence. They're almost perfect if you ignore fillers.
Here's the immediate. Immediate's a bit better than delayed. But interestingly-- and here's the main point-- what we really should be looking at is, are the people who are highly confident on an initial test also highly accurate? And the answer is yes.
High confidence is high accuracy when measured this way. If you use a point-biserial correlation, you still get 0.36. If you use this measure-- and again, computing the correlation coefficient is not correct here. You could. It would be very high. I can tell you why later.
But basically all you can do is point to this and say, is high confidence equal to high accuracy? Is low confidence equal to low accuracy? And it's at least much lower.
Here, even the people who weren't confident were still accurate about 80% of the time. So it's not worthless, even when it's low.
So a very influential paper came out in 1995. This was the first meta analysis of this literature by Sporer, Penrod, Read, and Cutler. Reviewed the literature on confidence and accuracy. They reviewed 30 studies using simulated crimes like the kind I just talked about.
They found an overall point-biserial correlation of 0.29. And for choosers, people who actually make a choice, it was 0.41. And so they in the field, ever since this meta-analysis came out, they said this is a weak confidence accuracy, not strong enough to be used in court. We can pretty much ignore confidence has been the message to the field.
So we had the right idea, "we" being Wixted et al. again, of doing a new meta-analysis. Suppose we look at the new way. So the first thing we did was to say, can we find the data published in such a way that we can compute the measures we want-- the calibration plots that I've shown you that we wanted to measure?
So we wrote to all the authors who contributed papers to this meta-analysis to see if anybody happened to have their original data still sitting around. And the answer is John Read did, one person who contributed three experiments to that meta-analysis.
So we got those first to look at what's there. Here are John Read's data that he kindly provided. And remember, these are part of the data that led to this conclusion that confidence and accuracy are uncorrelated or weakly correlated.
Well, you see when you plot calibration plots, here's a Read paper from experiment 1 from 1992. Very good calibration. Experiment 2, good calibration. In fact, this is a seven-point confidence scale, five-point confidence scale. If they gave a five, they were 100% correct. It doesn't get much better than that.
Here, if they give a five, they were 100% correct. And yet from these data, because the point-biserial correlation was 0.37, the conclusion was reached that these data show that confidence is worthless when, in fact, they seem to have used the wrong measure would be the argument. The point-biserial correlation is just not informative to the issue.
AUDIENCE: Roddy? Can I ask you a question?
HENRY ROEDIGER: Yeah.
AUDIENCE: It'll be quick. It seems to me that the point-biserial correlation and these data answer different questions.
HENRY ROEDIGER: They do.
AUDIENCE: For example, let me ask the question, does the SAT predict success in college? If you look at these kind of data, you'll say overwhelmingly it's great. On average, it is.
If you look at individual correlations, you'll get about 0.37. In one case, you're looking at the accuracy for individual subjects. In the other case, you're looking at averaged accuracy.
So they're asking entirely different questions. The point-biserial, or the Pearson product moment, could be low and the average is extremely high.
So if you care about individual subject data, I would say that the point-biserial is the correct statistic. And if you care about group average data, on average how good is the SAT, or how good is the confidence, it is a different question from the question of, for me, individually, is the SAT going to be a good predictor for me individually as my confidence rating [INAUDIBLE]?
So it seems like there are two different questions. And it's not a question of which is better, but which question do you want to answer?
HENRY ROEDIGER: Yeah, well, I would--
AUDIENCE: Which question is [INAUDIBLE].
HENRY ROEDIGER: Yeah, I would frame that slightly differently. But I would argue this is the right way to look at it, how does confidence relate to accuracy? If I'm 100% confident, what's the likelihood that I'm highly accurate? And the answer is-- let me just finish up because I'm going to show you.
So we did a-- and we talk about this in our paper. So we've written a new paper. It's now been rejected three times-- because we're right.
And-- including your journal. Not you, but your journal. And so we decided, look, it's time for a new meta-analysis. It's been 20 years. Progress has happened. People are doing experiments. They're doing them better. Let's have a new meta-analysis.
And we'll look at point-biserial. Although we have a big section in the paper about why it's probably not the right way to do it. And we'll also look at it this way.
And I'm not going to go through the whole thing here. So this is Wixted et al. again. I'm part of the et al. Wixted's the heavy lifter here.
And let's see, I already said that. So I'm going to tell you about the meta-analysis. But it's been rejected three times. So if you want to ignore this, that's fine. You'd be like most of the field of psych and law. But I think there's an important point to be made here.
So I'm not going to show you. We've done this for every paper we can find in the literature. We include the three Read studies. We couldn't get data from the early days that were in the other. But there's no reason to think they'd be any different.
Here's one of the top three conditions-- I'm not going to tell you what they are-- from a paper by Brewer. Here's one by Brewer and Wells where they did identification for a thief and a [INAUDIBLE]. Here's one by Deboulier and Dodson, simultaneous sequential paper.
Simultaneous produces a better-- these are ROC curves, confidence, accuracy. And you plot the curve. Simultaneous is better than sequential. But sequential is right on the line, too. Confidence is better.
Here's the average of those conditions over here. And you can see in every case high confidence is very high accuracy-- every case. And every case we've looked at, every experiment we've published-- not that we've published-- every experiment that everybody has published for which we can get the data look like this. We show lots of these graphs in the paper.
And here, if we aggregate over 10 studies that used 100-point confidence scales, here is what it looks like. And so high confidence is not perfect. But it's something like 97%, 98%. So it's very high. Averaging over all of these studies, there are no real exceptions.
This is fair lineup studies. If you used a biased lineup, this goes all to hell. And that is as it should be. Here, Mickes, Flowe, and Wixted used a biased lineup, where one of the people kind of stuck out like a sore thumb, one of the lures. And now you see confidence.
So the highest confidence led to only 70% accuracy. If you average those, same thing. So yeah, if you have a biased lineup, this doesn't help you. Some will say this is still a little bit better than sequential in terms of accuracy. But it goes away.
By the way, this shows why you can't just compute a correlation coefficient. The correlation coefficient would still be right here, right? It's monotonically increasing. It's probably 0.95 or something.
But what you really want to look at is, does high confidence give you high accuracy? And here it doesn't, even though the correlation coefficient would be high.
So there don't seem to be exceptions here, somewhat surprisingly. So this is the take-home study based on the laboratory studies of the kind that have been done since the 1980s.
What about, people say, well, it's laboratory studies. Are confidence and accuracy really related in the real world? Well, it's really hard to ask that. There are two relevant studies. Let me just tell you about one that's the easiest one to explain.
Let me first get some water here or coffee. And this was a really interesting study done by Behrman and Davey. And they had analyzed eyewitness data from real crimes and real eyewitnesses in Sacramento, California.
And what makes this interesting-- I mean, the problem with doing this in the field is you don't know what ground truth is. You really don't know if the person identified is the right person. It could be mistaken.
But in this case what makes it a little bit different is there was strong independent evidence that the person was the right person. So let's say if you had a person snatching, you have a lineup. The witness says that's the guy who snatched my purse. And later they find the person in the guy's apartment. OK, probably that's the guy who snatched the purse.
So it's these kinds of data, so where they think a suspect ID, you probably got the right person because there was independent confirmatory evidence. So this analysis just depends on that.
And they had three response options in the lineup that they did. So it's, I'm sure that person, the one they picked out of the lineup, committed the crime. I'm not sure, but I think number so-and-so is the person who committed the crime. Or I don't recognize the person. I'm rejecting the lineup. It's nobody.
So this is really just a 2-point confidence scale. I'm sure it's the person. I think it's the person. And that's typical. Nobody in police departments typically uses a 100-point scale.
But let's see what happens with just a 2-point scale. And the data are thin. But when they said, I am sure that's the person, 92% were correct. And when they said, I think it's the person, they were correct half the time.
So there's another big study in Houston. Wixted published an analysis of it in the proceedings of the National Academy of Sciences. It's too long for me to present here. But go read that if you're interested, where he came to the same conclusion.
And confidence and accuracy was not on the radar screen. It's not why they were doing these studies. The one in Houston was more about simultaneous sequential lineups. But they got accuracy for those.
So it seems like in the real world this might be true, too, from the two studies we have. So-- that's what I just said. So let me end up with a book by Brandon Garrett that you might know. It's a great book. I recommend it. It's called Convicting the Innocent and how-- Where Criminal Prosecutions Go Wrong. How do innocent people get convicted?
And so the usual story is because of high-confidence, erroneous eyewitness memory. And in a sense, that's true. So let me put this in Garrett's own words.
He got records from 161 DNA cases. And he went through the transcripts, everything they had on them. And in 57% of those cases-- so you've got 161-- in 57% of these erroneous cases, the initial eyewitness identification was made with low confidence.
This is the important point. So I've been concentrating on how high confidence means you're right, low confidence means you're probably making an error. This is really what police should attend to.
If a witness makes a confidence rate that's low, what they typically do is say, yep, you got him. You're right. And they're confidence shoots right up. And then they think about the crime over and over.
There's lots of famous cases where confidence just shoots right to the top. And by the time they get to the court, they're totally convinced themselves. It's like the famous Ronald Cotton case in North Carolina. That's exactly what happened there.
She was very-- it took her five minutes to make an initial identification. Jennifer Thompson took five minutes to make initial identification. Very low confidence. And then two days later, she was absolutely sure because the police told her, that's who we thought it was, too.
So a witness expresses low confidence in saying, I'm probably making an error. That's what low confidence means. But it only matters the first time you do it. That's when the police, detectives, and the defense should be concentrating on, not what they say in court.
It's very hard to ignore what they say in court. But what is critical is this first eyewitness one. And Garrett says in his book, "I expected to read that the eyewitnesses were certain at trial they'd identified the right person. They were, of course.
I did not expect, however, to read testimony by witnesses at the trial indicating they really had trouble identifying the defendants. Yet in 57% of the trial transcripts, 92 of the 161, the witnesses reported they had not been certain at the time of their earlier identifications."
Now, you could say, what about the other 43%? Were they high confidence at the beginning? The answer is we don't know, because nobody took a confidence rating in those cases. It's not that they were high confidence necessarily. They might have been. We just don't know.
But in many of these cases where justice was not served, an innocent person was put away, the witness was very low confidence starting out. And the confidence grew over time.
So to wind this up-- why do I have all these transitions? Don't discount initial eyewitness confidence. That should be the message, I think, to the legal system now.
Eyewitness confidence is telling you something, the initial eyewitness confidence. If it's high, I wouldn't personally say convict on that. You need converging evidence. But don't throw it away. It means something.
And if it's low, if the person says, I think it's that person, but on a five-point scale, I'm a two, well, that means you're really probably making an error. I mean, you're probably better than chance. But there's a high likelihood of making an error from low confidence. So initial confidence is very useful. And we're right now throwing it away by saying confidence isn't useful.
So the low confidence could be really useful for exonerating people. That's what should be focused on. And high confidence is fine in the courtroom. But you should go back and look, were they high confident at that initial-- I mean, they're always going to be high confident in the courtroom?
You should go back and look, were they high confident all the way along? That case is much more believable than if they go from being very low confident, over repeated testing, to very high confident.
Nobody knows exactly how confidence grows. I'm doing a bunch of experiments with some students right now just asking, how does confidence change with repeated tests?
I can't find but one experiment somewhat relevant to the issue, kind of surprisingly. We all know it happens. But how? And why? I think I've said that, too. And so I am done. Yes? Thank you.
AUDIENCE: I was just wondering if there's some benefit in not studying the relation to confidence and accuracy, because it seems to have a lot of debate about that, but instead studying the way that people parse sequential events and what they focus on. So if you focus on the victim, do not take their testimony as heavily as someone who would focus on the perpetrator.
HENRY ROEDIGER: I'm not sure I understand your qu-- so say it again. I'm not sure I'm quite getting it.
AUDIENCE: So instead of studying how confident someone is in identifying the subject, studying how they would-- so if they were given a narrative of the situation itself, studying how they parse the sequence of events, so whether they frame like the victim or if they frame it like the perpetrator.
HENRY ROEDIGER: Oh, yes. That would be interesting. I have a colleague, [INAUDIBLE], about event segmentation from different perspectives. And there is research like that. [CHUCKLES]
In talking to police, they keep thinking psychology keeps making their lives more difficult, which we do. But they mostly will-- prosecutors-- well, I'm trying to think, what would this do to help a police officer trying to get the truth, or a detective trying to get the truth of the matter, as opposed to eyewitness, a suspect ID obviously does.
So I think what you're asking is interesting. It reminds me of some famous experiments in our field, where if you walk through a home and you-- say, I show you a videotape of a house. Imagine you're a homebuyer, and you go through. And then you recall things from the perspective of a homebuyer.
And now you say, well, imagine you'd gone through as a burglar. Now, go back and recall the events of the house. Well, you get a bunch of different events or a bunch of different things in the house. Oh, because you're looking at it with a different perspective.
If you're getting at something like that, it really is true. As you analyze things from different perspectives, you encode and remember different details. I'm trying to think of how that might help in this case. By the way, confidence and accuracy for all kinds of events and stuff like that is-- I mean, people have looked. And it's pretty high across events like I showed.
You can get tricky events, like I also showed, where it goes away. So maybe that would work. I'm not sure. Yes?
AUDIENCE: What if you artificially manipulated somebody's feeling of confidence? Would that have any impact on their accuracy?
HENRY ROEDIGER: It's been done in the lab. Larry Jacoby has done studies like that in the lab. And you can, there, manipulate confidence and not affect accuracy. But I kind of doubt it.
I mean, I'm trying to think of what a plausible way would be. I mean, it's kind of what police do naturally when they tell a suspect who might have low confidence, we think you got the right one. Well, their confidence shoots up. But their accuracy stays right where it was.
AUDIENCE: But I could even imagine it having a negative impact on accuracy.
HENRY ROEDIGER: Oh, yeah. I don't know of experiments like that or how exactly you do that. Valerie?
AUDIENCE: Yeah, perfect timing because I am talking about eyewitness testimony in my social science and law class over at the law school across campus.
HENRY ROEDIGER: OK.
AUDIENCE: So thanks, this is a really interesting set of results. It strikes me-- it's actually in line with your comment-- that the biased lineup results that you were showing, where confidence and accuracy go down, is kind of like the courtroom testimony, right?
Lots of other factors are influencing confidence. It's something going on in the experiment that could be artificially increasing confidence, like ways in which a particular suspect looks unusual or sticks out or something like that.
HENRY ROEDIGER: Mhm, yeah. I mean, I'm sure there are ways you can gimmick confidence. The point, I think, is that in all these studies that over the years have been taken as having low confidence-accuracy relationship, whatever their benefits and flaws, we're using those same data, just analyzing them in a different way.
So it's hard to say we're using a biased data set, because we're using the set-- [CHUCKLES] we didn't generate many of these results. Wixted's doing some. I'm doing some.
But the relation still stands there every time we look at it, when you look at it that particular way. Now coming back to Bob's point, you can argue that maybe it's not the right way to look at it. But actually now, the whole field is swinging to say, yeah, Juslin was right in 1996. That's what we should be doing.
SPEAKER: Question behind you.
HENRY ROEDIGER: Oh, sorry. Yes?
AUDIENCE: Yes, so I got a two-part question actually. So I'm trying to put together the first experiment you had with just the related words. And then you found a negative correlation, like if people falsely accepted a related distractor, it's related to high confidence. So that's kind of like picking a filler in a lineup.
HENRY ROEDIGER: Yes, it absolutely is. So good point. Now, could you-- and it comes back to your point-- could you rig this so it doesn't work?
Well, I used to use this example that I thought was hypothetical. But then I googled it and discovered there are all kinds of cases like this.
Suppose one identical twin commits a crime, and the other identical twin is arrested and put in a lineup. Well, guess what? [CHUCKLES] Well, Google that problem. It seems to happen in California every week. Just in California, I found four or five cases like that.
So you could imagine, yes, if you have-- and that's a biased lineup essentially. If you have somebody in the lineup who looks very similar to the person who committed the crime but is not that person, you will get a mistake.
The point of all these studies is-- and even in the field ones-- have been looking-- some police departments take photographs of all their lineups, which I think is a great idea, because then you could just go through-- suppose you just gave a lot of people targeted absent lineups. You just showed them lineup after lineup and said, one of these persons we think is a criminal. Pick them.
Now, if people can do that much better than chance from the lineup, it's a biased lineup by definition. You didn't see the crime. But you're able to pick the criminal.
Well, they haven't put good fillers in. They haven't put plausible fillers. They've made the person stand out, even if the person didn't commit the crime. I haven't seen any evidence like that. But I think it would be really neat. And some police departments do this.
AUDIENCE: They used to do that in Arizona. Arizona has a mandatory eyewitness pre-trial, [INAUDIBLE] hearing. And it was legitimate to take the lineup and go out and ask 30 people if you're doing expert testimony.
HENRY ROEDIGER: Yeah.
AUDIENCE: One of these people committed a crime. Who do you think it is? And if you've got a significant chi-square, you could present it as evidence in court.
HENRY ROEDIGER: Yeah, I mean, that's the way to do it. But I would be curious just if you took hundreds of lineups that police construct, is there some kind of bias in there? I mean, you could ask that just by-- you could do it on MTurk. [CHUCKLES]
Show all those MTurkers out there in their parent's basements lineups. And see if they can pick the right person. Yes? Oh, I'm sorry. Let me get her and come back to your second part.
AUDIENCE: This might be quick for you. There's been some claims by Simoni, Getty, and others that children actually are not so good at answering confidence questions, or actually they answer them well. And it correlates with their reaction time. So there's a speed-accuracy tradeoff, right? So if you're accurate, you may also be faster.
So her claim is that what they're doing is that they're kind of picking up on how quickly they responded to the question. And that actually correlates with their confidence ratings when they are old enough to make them explicitly. Is there something like that with adults? I mean, are those confidence ratings a response to speed?
HENRY ROEDIGER: I don't think in the real world they ever get speed. But usually speed and confidence and accuracy are all correlated. I mean, you can get--
AUDIENCE: And is there anything on the negative correlation side, so with those false alarms, is that when speed and confidence break?
HENRY ROEDIGER: They would fall apart because when you make-- Chuck and I both study high-confidence false alarms from not using this paradigm but another paradigm. So this is something called the Deese-Roediger-McDermott paradigm.
I'm not going to go into it. But it very reliably produces high-confidence false alarms. They're also usually very fast. We have looked at that, sometimes as fast as hits, so actual items.
And so you can get people fooled very badly by something. It would be like the identical twin experiment in a lineup. Yeah, I'm sure that's the one. But it's the twin brother or something. So it's a good point. And I really don't know the children's literature. You would.
AUDIENCE: It doesn't [INAUDIBLE]. With adults, the idea that, what is saying 'I'm confident'? What does that mean, "I'm sure'? Are you responding to your accuracy, your perceived accuracy? Or are you just responding to how quickly you can respond?
HENRY ROEDIGER: Well, the claim is from people who study recognition as a process, that there's a whole distribution of memories that have a certain strength. And the further you are onto the strong side of the distribution, you'll get more people saying yes to an item. And they'll be doing it faster. So that's one of the underlying theories behind all this.
So yes, those things would be correlated. Now, how subjects are making it, 'oh, because I did this quickly, I think I'm confident', or whether--
AUDIENCE: It's almost like you don't know that's what you're responding to. But what you're really responding to is how quickly you made the decision.
SPEAKER: The person to the right of you can explain to you a paper by a person who's very well known and beloved by her on this very issue, in which people who were very fast were very good eyewitness ID people. Quite controversial paper in its time.
AUDIENCE: Yeah, 1991 and '92 [INAUDIBLE].
HENRY ROEDIGER: OK.
SPEAKER: David Dunning.
AUDIENCE: David Dunning.
HENRY ROEDIGER: OK.
SPEAKER: One more question--
HENRY ROEDIGER: She has a second of her--
AUDIENCE: I'm sorry.
HENRY ROEDIGER: Yeah, she had a second part.
AUDIENCE: So going back to-- [CHUCKLES] sorry. If accepting a related filler is correlated to high confidence-- and you had that one slide where the calibration method was correct suspect ID over correct suspect ID plus the filler ID, and because that includes the filler ID and a filler ID has that wonky backwards correlation, then could that be why we're seeing the--
HENRY ROEDIGER: Well, the filler IDs add noise but not signal. I mean, no police department is going to say, oh, we thought it was this guy. But really this guy who's sitting out in the lobby, I guess it was him. Filler IDs are never going to be taken seriously. So there's no reason to put them in the equation.
I mean, sometimes people do make high-confidence filler IDs. I mean, it shows confidence is not perfect. You can make a high-confidence error. I'm not saying it's perfect. I'm just saying we shouldn't throw it out because-- and which study was it?
In the Houston study, I think, which was a huge study, I think there were 17 out of hundreds of targeted absent lineups. There were 17 filler ID people made with high confidence. So it does happen.
I forgot what proportion that is. But it's not a vanishingly small problem. But you're never going to take someone to court. I mean, it's a filler ID. So I'm not sure that answered your question. But Chuck's right. I've got to go catch a bus to New York City.
SPEAKER: That was the last time [INAUDIBLE].
HENRY ROEDIGER: Thank you.
We've received your request
You will be notified by email when the transcript and captions are available. The process may take up to 5 business days. Please contact firstname.lastname@example.org if you have any questions about this request.
Henry L. Roediger, III, James S. McDonnell Distinguished University Professor, Washington University, presents different methods of analysis to understand why cognitive psychological studies show high correlations between confidence and accuracy in reports from memory in lab settings, but weak or nonexistent relations in simulated crime studies.