share
interactive transcript
request transcript/captions
live captions
download
|
MyPlaylist
[MUSIC PLAYING] SPEAKER 1: I am delighted today to introduce Lenka Zdeborova, giving her second Bethe lecture. And I am charged with giving you an introduction again to Hans Bethe. I gave my funny introduction Monday when she gave the colloquium, and this is going to be a slightly more serious one.
Hans Bethe made tremendous contributions to an enormous number of fields of science. He came to Cornell at 1935 after several substantive contributions. In 1939, as one of his contributions, he showed how the stars shine. He worked out the nuclear physics of the stellar reactions. For that he got the Nobel Prize in 1967.
In the 1940s, he led the theory effort in the Manhattan Project. And Paul Ginsburg here just told me, or reminded me, or something that he was the one and not, as the Oppenheimer movie suggested, somebody else who figured out that the H bomb would not cause the atmosphere to explode. He was a major influence in the Cornell physics department, naturally enough, in setting a tone of collegiality that survives to this day. So we have really very friendly, cooperative feelings about making our department strong and friendly.
In 1975, he retired after 40 years, but he continued to be active. And he would give the first colloquium every fall, the colloquium being named after him. And at 97, he was still reporting in the colloquium about new advances in the study of supernovas. In 1986, he explained why the sun hadn't shut down, the so-called solar neutrino problem.
And the other thing he did-- when you're famous, you can either strut around and be important, or you can try to do good things for society. And he would lend his name very selectively to things that he felt he was qualified to make judgments about, which include opposing the H bomb project, defending people in the McCarthy era, opposing Star Wars, and promoting and lobbying for the Limited Test Ban Treaty.
Now, this is not all about Hans. It's about Lenka Zdeborova and what she's presenting today. And I have a brief introduction. Some problems are easy to solve by computers, others are fiendish. Asking about the fiendishness of all the examples of a problem of a certain size-- a traveling salesman problem with 1,000 cities to visit, a spin glass state with 1,000 spins-- turns out to be a wonderful playground of statistical physics.
Zdeborova today will introduce her powerful tools that not only predict the fiendishness distribution and explain why some problems are particularly evil, but also provide the best methods for solving these problems. Amazingly, just as large numbers of atoms form solids, liquids, and gases, she will show you how large numbers of spins or maybe cities form entire phases, each with new properties. And naturally, she'll tie it all into ideas about deep neural networks like ChatGPT.
Lenka did her graduate work in Paris and in Prague, had a prestigious fellowship at Los Alamos National Labs, worked at one of the major universities in the Paris area for 10 years. She's won several prizes over the years. She now runs the statistical physics of computation at the University in Switzerland where all my European friends want to work. Let us watch her, as she says, erase the boundaries between theoretical physics, mathematics, and computer science.
[APPLAUSE]
LENKA ZDEBOROVA: Thank you so much for the introduction and for the opportunity. Let me try to set this up. There we go. Thanks for coming. So as introduced, I will get to the bridging physics and computer science, and understanding hard problems lecture, but talking to colleagues over the three days here already, I realized that not everybody knows where actually EPFL is. So seeing the nice view from the physics department, I thought I will also share the view on our campus. And that's our campus. The physics building is this one, and this is the lake and the alps behind. That's where it is.
So without further ado, where I will start, I will remind you some high school physics phase transitions.
[AUDIO OUT]
And here is the temperature in Celsius. 100 Celsius, water boils and becomes vapor. And 0 Celsius, water becomes ice. But there is a whole diagram here with the critical point, and this is where the water is, this is where the ice is as the pressure is changing and the temperature is changing and it's--
SPEAKER 2: So I have to-- water vapor in English means particles of water in air. What you mean is steam.
[LAUGHTER]
SPEAKER 3: We call it water vapor here.
LENKA ZDEBOROVA: Yeah. This is just snapshotted somewhere from the internet, so--
[LAUGHTER]
To me, it makes sense. I speak French. The next slide is in French, actually, just if you want to be confused even more.
[LAUGHTER]
But I find this slide beautiful. It actually tells you where those pressures and temperatures, where these things happen actually are, because what I hope you see here, this is the temperature and pressure on Venus. This is the temperature and pressure on Mars. This is Earth. So that gives, in perspective, where actually this tricritical point is and where this critical point is.
And I suppose that the French is also a universal language.
SPEAKER 2: [INAUDIBLE]
LENKA ZDEBOROVA: [LAUGHS] Great. So why does phase transition happen? That has been a problem in physics in the last century. We didn't really kind of know. Does it mean that at the temperature the molecules now suddenly start to interact differently on what happens? And the really fascinating thing is that this all happens because the molecules are many.
And that's it. Nothing else changes at that critical temperature, critical pressure. It's because the molecules are many. And in one mole, they are 10 to the 23. That's really, really many. And it's taking the limit of number of molecules to infinity that make these phase transitions emerge. So that's a fascinating result from last century physics.
Now, this more is different. That's a way it was put by Phil Anderson in his quite famous essay where he says the behavior of large and complex aggregates of elementary particles it turns out is not to be understood in terms of simple extrapolation of what happens for few particles. Instead, at each level of complexity, entirely new properties appear, and the understanding of the new behavior requires research, which I think is as fundamental in its nature as any other referring to understanding the universe. So understanding what is different when there is more is what he was arguing for.
So there is one property related to this phase diagram and phase transitions in water that we may be less familiar with, because it appears less often in everyday life. We are boiling water every day. In winter, we slide on ice. But this metastability, what this is about.
So this video shows you water that has been put in a freezer and cooled down below the freezing point, say minus 5 degrees Celsius. The water stayed liquid until you took it out and hit the table. As you hit the table, it kind of shakes the plastics. There is some microplastic that splits and gets to the water, and creates a nucleus.
And it's around that nucleus that suddenly you see the wave, you see the kind of opaque propagating. There's the water freezing, and you see how it's propagating in the water. And it happens when the impurity appears, and the freezing starts around that impurity. So you can supercool water, and then suddenly kind of freeze it this way. The water that was supercooled is called metastable.
So the similar phenomenon can happen when in clouds, actually, where-- if there is, in clean air, in clouds the droplets of water actually get supercooled. So they are still water. They are still rain, not snow. And as they are falling down, they keep being water. But once they hit surface-- for instance, a car or something where there is enough dirt to nucleate on it-- they immediately freeze. And that can create things like that. I hope this will not happen to any of our cars. That looks scary.
What happens as the water freezes that I want to demonstrate on this example, which is a hand warmer. This is a little-- I actually brought a few of them with me, and I also have a have a video which I will switch. That has a liquid in it. This is liquid. It's soft. That is a sodium acetate, some chemical that melts at 58 degrees Celsius. Here we are well below that. We are about 20. And it's still liquid. That means it should be solid, because it melts at 58, so higher. So it should be solid at room temperature. Yet it's still liquid. It's supercooled liquid.
And then there is this coin that you see this kind of gray part. So let me try to switch to the video. Oh, this way. This way you can see in big what actually happens when I pinch the coin. Let me know. At least one person in the audience so that they actually believe me that was happening. Please do the same with me, Bart. There is the coin. You pinch the coin, then you see the crystal is propagating as a wave through the whole hand warmer.
And its a hand warmer, because as this is happening, it's warming. It's becoming quite warm. I can send it around. Like now this is crystallized. It's quite warm. It stays like that for a while. And the fact that it became warm, it really is the heat that proves to you that it's not-- that the liquid was not the stable phase, that it was the crystal that was the stable phase, because it had lower energy. The liquid had to get rid of the energy so that then it became the crystal at lower energy. And it has to then release the excess energy, and that's the heat. That's what warms it. That's what happens here.
So why did it stop? I don't know, but hopefully it was visible. Maybe let's play it one more time so that we see this nucleus growing. You pinch it. And then you see the white around, and then you see that it's growing at a certain speed. It's the front propagating through the whole material. So this is a consequence of a phase transition, first order phase transition, that we are maybe less familiar with from everyday life.
Now, when we explain what happened with the crystal growing, we call it nucleation in physics. And when we teach undergraduate physics and explain nucleation, we give argument that is summarized on that slide in some equations. But the essence of this slide is that, as you create the impurity, there is randomly creation of the other phase in the system.
And then it's about competition of the volume of this random impurity and the surface of the random impurity. The volume makes it gain energy, because this phase that is nucleating has lower energy, but it's losing on the surface. And then the droplet needs to be big enough so that the volume wins over the surface. And that's the key here.
So the volume of the large droplet is larger than its surface. So the way you can imagine it, imagine that you have a jar and you are putting balls or say like beans into the jar. Now it's a medium sized jar, and you're counting how many of your beans are touching the surface, and you're counting how many of your beans are not touching the surface.
And you are looking what's the fraction of those with respect to everybody that are not touching the surface. And if your jar is small, well, maybe everybody will be touching the surface or almost everybody. But as the jar is growing, you will have almost everybody that will actually not be touching the surface.
So that's always happening in three dimensional space. That as jars grow bigger, the number of the fraction of beans that are not touching the surface is going to almost everybody. And this is the key property that enables this nucleation. Now, remember this volume is larger than the surface eventually. In outer space, this is true. It won't be true in high dimensional spaces, and we'll get there in a bit.
So here, another example, because in this hand warmer, this metastability looked quite fragile. It's enough to pinch and it becomes a [? crystallite, ?] and you can pull it back so that it melts at the 58 degree Celsius. But sometimes metastability is very stable. And that's the case of diamond. Did you know that diamond is actually not at its equilibrium in the sense that, if it suddenly turned into graphite-- which is what we write with. That's what the that's what the pen has in it.
So if it turned that that's the same element, if it turns suddenly into that, it would warm up. It would release energy. The diamond at normal pressure, normal temperature is metastable. The way it was created was somewhere early history of Earth and/or somewhere deep where there is big pressure, big temperature. That's where it was created. Then it reshuffled, then it got to the surface, and we have some diamond today.
But if you want to create it artificially, you have to take graphite and go to very high pressure to create artificial diamond. It's not easy. At the same time, if you bought a diamond ring to some of your close ones, you're not really worried that it will suddenly nucleate into graphite. All right. That didn't-- you didn't hear that happening to some friends of yours. That's not happening. It's very, very stable. It lives for whatever we are concerned.
So that was what we needed to know about phase transitions in physics for the purpose of the talk. So as said, now we will be talking about phase transitions in computer science and how that comes together. The example that I will be talking about is graph coloring. What is it? Let's introduce it on this natural problem of map coloring.
So this is a map of Czech Republic where I come from with its regions. The blue one in the center here, that's Prague. Many of you know Prague as the capital of Czech Republic. There are at least two other cities that you know. You may not know that you know them, but you know this one here that's called Plzen in Czech. That's where I was born. But in German, it's called Pils. [LAUGHS] That's where the Pilsner-- plus er is the adjective from Pils in German. That's where the Pilsner comes from.
And another one that is in this blue region here, it's a smaller city than Plzen, it's called Budweis. That's where the original Budweiser comes from. They're still brewing it, the Czech one. It's pretty good. No comment on the local one. I don't want to cause troubles to my [? self. ?]
So it is known since centuries that if you make a map and want to cover every region with one color so that neighbors do not have the same color so that you can distinguish them visually from each other that using four colors is enough. It's one of these mathematical properties that took humanity more than a century from its conjecture to its mathematical proof. So it's one of these famous problems in mathematics, computer science, science in general.
Now we will be talking about coloring of graphs. How do I create a graph from a map? I do the following thing. I make a point. Instead of coloring the whole region, I just put a point in the middle. And then if two regions are neighbors, that means they share a boundary. I can cross from one to another without going through yet a third one. Then I put an edge there. I put this black line in between. That's all the edges between the neighbors.
And now I forget about the map, and I have a graph that is colored and the nodes are colored. And the rule of the game here is that if they are connected with an edge, then they do not have the same color. Because they were neighbors, so I colored them with the different colors so that I don't confuse them.
So now this was a so-called planar graph because it was created from a map. The edges were not crossing, but we can also make graphs where the edges are crossing. A lot of edges are crossing. And if you make such generic graphs, that's a-- and we ask whether they can be colored with a given number of colors, that's one of the evil problems in computer science. It's one of the NP-complete problems where it's believed that a computer that solves the worst cases of this problem would need exponential time.
Now, this exponential-- if unless you are a scientist you may not have an intuition what that actually means, so let me explain what that means, right? In the worst case, here, the nodes are colored. So you see that only those that have different colors are connected, but imagine that I did not put the colors and gave you this version. It's very hard to find how to put the colors so that this works. You would basically have to try all the possibilities, meaning the following.
This node can have three colors, so you try the three of them. This one can also have three colors, so you also try three of them. That makes 3 times 3. That's 9. The third one can also have three colors, so 9 times 3 is 27. And now, if this numbers, 9 and 27 were meters, OK? What would be the distance at which you get-- by the time you get to node number 12, you would be in New York by node 12.
And by node 18, you would be passing the moon. That's how fast the exponential grows. Because that's how much time a computer would need to kind of solve these problems in the worst case. This is just growing too much. We would not be able to solve any decently large problem. This is not possible. That's too much.
So computer science, since half a century, developed a beautiful theory. These are foundation papers of the theory of how we actually recognize these NP-complete problems. That in the worst case, we believe need such a long time. Now, the trouble with this theory is that there is no trouble with this theory. But what is actually happening today is that these NP-complete problems are solved every day.
In machine learning, for instance-- that's just one example. In many other areas. We know how to teach a computer to play Go. We know how to teach a computer to fold proteins. We know how to teach a computer to converse with people in the ChatGPT. Now learning of neural networks, in the worst case, we know it's NP-complete. It's one of those evil problems.
Yet for the problems we care about, some magic seems to be happening and this is not preventing us from solving them. So what we do not know, we don't have a theory that explains which instances, which cases of these hard problems actually are not so hard, are actually computationally tractable. So we know that the worst ones are horrible, but not all of them are horrible. And we do not have a good, satisfactory characterization of how to distinguish the worst ones, the evil instances, from the ones where it actually is possible.
So that's kind of what motivates. We need to work on this more. And now talking about neural networks that learn all the things, we really don't understand even much simpler questions about neural networks, such as in some very simple example of a neural network that tries to take images and classify between airplanes, and automobiles, and birds.
And this is done in many works and in many papers. We know how to teach computers to do that, but many papers showcase this on this same data set, which is-- Cifar10 is a name for a collection of images that everybody is using to showcase their algorithms in which we have 5000 pictures per category. So how many pictures should we need to actually recognize that a picture of a dog is a picture of a dog? 5000? That seems a lot. 1000? 100? You don't know. I don't know. Nobody knows.
We don't know how to characterize what is the minimum number of samples that is actually needed to achieve this task. We have the current system that does it in a certain way, but we don't know whether it can do better. And if we somehow knew that it can do better, we do not know whether it's because we need to improve the process, the algorithm, or whether we need to change the neural network somehow, or whether we need to do something completely else than training a neural network. We don't know.
So in order to advance on this problem, in science we simplify things. So we go back to the graph coloring. That's a simpler setting. And we will be looking at a particular instance of the graph coloring that I will define or introduce to you as this kind of a card game between you, the audience, and me, the speaker, presenter.
So the game will be the following. Imagine I have a deck of cards. I have three different cards in the deck, blue, red, green. And there is roughly the same number of each. And I shuffle my deck, and then I go around and I give one card to every person. You don't show me the cards. Everybody has one card. I don't see the colors. My goal will be to guess the colors.
Now, if I was a magician, I wouldn't have to do more than that. I would somehow guess the colors, right? But I'm not the magician. I am a, at best, mathematician, physicist, computer scientist, mushroom hunter, but that's not useful. So I need to observe something. I need to observe something. So what I observe, I will ask you very kindly to the following thing. I will randomly choose-- or you will. You will randomly choose among you a couple that does not have the same two cards.
So if-- I don't know, Bart and Jim have red, both of you, then this couple will never come up, OK? Then Ahmed and Jim-Ahmed has green. So for instance, Jim and Ahmed can come up. Bart and Ahmed can come up. From all those that have differing colors, all those can come up. There are many possibilities. You just give me one and then give me total m of these pairs that do not have the same color.
So I collect it, right? I write, OK, Jim-Ahmed, Ahmed-Bart, et cetera, et cetera. I make a list of these m pairs. And this is the information I have. Now, can I do it? Now based on this information without being a magician, can I compute back who had which card?
So you notice that I didn't really get any information about the colors. So at best, I can split the room in three groups so that each of the groups has the same card. So let's think about it this way. Can I split the room into three groups without ever seeing your cards just from the list so that then if you show each other in the group your cards, most of you will have one color and the other two will have the other two colors.
So that's my problem here. Now I forget the people. I'm back at the graph. I forget the colors. You didn't show them to me. This is what I observe. I observe the list of edges, this graph. That's what we call graph. And I'm asking, can you find the groups of nodes who have the same card? That's the problem at which we will be looking, OK?
So now, how do we solve this? That's where the hero of the day-- OK, no, it's not yet. Sorry. Spoiling my punch line here. Now, this is just to say that this is not just a toy problem that I invented to amuse you, OK? This is a special case of clustering of data, and that's a problem that appears in sciences and is very important in many applications. And this is just a toy realization or toy model for clustering of data. But in science, we have this hope that if we understand something non-trivial on very simple model that this will have consequences in the real world, and at least in physics, that's a strategy that somehow turned out correct in many cases. So we go with that strategy.
And now, that's where the hero of the day comes to play. When we ask actually how to solve this, what should I do? I have a computer and your list. What should I program to the computer so that I can split the room into three groups? So that's where I actually go back to yet another paper of Hans Bethe called Statistical Theory of Superlattices, which is almost a century old now.
I also take some follow up works. Thouless, who realized this also can be applied to disordered systems. The graph is random. Mezard and Parisi, who actually realized that the lattice, which [INAUDIBLE], did not have any cycles actually can have cycles and that it is truly applicable for the graph that is created in this game that we are playing with the cards. And Kabashima and Saad, who realized the connection between what Bethe did and what has been done in information theory at one hand, and also in computer science on another hand that is an algorithmic version of what Bethe did.
So if I put all that together, I realize that science gave me the solution to this problem. And the solution to this problem, I expressed it here with an equation. But if you are scared of equation, just ignore it. Doesn't matter. You will still understand the rest. But this equation, if you are not scared of equation, is these are messages that nodes are sending between each other, and they can be interpreted as probabilities that node i takes color s in absence of the edge between node i and node j.
It has this very simple form where probability that I take one color must be that none of my neighbors actually had that color. That's this 1 minus, this probability is, and this product is that I suppose that my neighbors were independent and make a product of those probabilities. So for those in the audience who kind of know what probability is, this is very simple. And this, what is below the line the denominator here, is just a normalization so that this actually is a probability.
So I have this simple kind of message that everybody needs to send to each other again, and again, and again until they saturate to something that is not changing anymore. So for those who are scared by the equation, Mark Meza likes to explain this with this beautiful painting where Jesus is looking for son Matthew.
And they are playing cards here, and pointing to each other, and showing actually who possibly is the son Matthew in that painting. So it's this message passing procedure between the nodes that has been discovered in science as the optimal algorithm for this particular game with colors and cards that we are playing today.
So now I know that, I can program it into computer and, based on my list, plot how well I will be doing. So what this plot is showing-- here I have a case with three colors. What this plot is showing is how well I'm doing-- higher is better. 0.33 means that I'm just randomly guessing. If I have three colors and I randomly guess your color with probability 0.33, I will be right. That's not much.
And what is here is how many of these pairs I actually observed divided by the size of the room. So if I observed two pairs per person, that means that every person on average was in four of these pairs, because there are two persons by pair. So you need to come up. So Jim came up or Ahmed came up with Jim and Bart, but there must be two other people with whom-- roughly, with whom Ahmed came up.
And as long as I have more of these pairs than that, I will start to be doing something better than randomly guessing. And before that, I am strictly not able to do better than randomly guessing if the graph, the number of people in the audience is large. But this audience here today is large enough for the curve to look very close to this curve.
So that's surprising that there is this sharp change of behavior. Suddenly, there is enough information to start to be saying something. This is a phase transition. It's a normal second-- what we call in physics second order phase transition. It's not the one that is happening in water. It's the one that is happening in magnets.
So the magnets on your fridge that are keeping there the photos of your children, or friends, or whatever, if you heat them up to roughly 1,000 Kelvin, so something like 700 degrees Celsius, they will fall. They will not be magnetic anymore. That's very hot. Many things in your home would burn at that temperature, so not advised. But the physics tells us this. If you did that, they will fall of the fridge. They will not be magnetic anymore.
The nature of the phase transition in those magnets is exactly the same one of what's happening here in our card game as we reach four pairs per person. So great. But we talked about water, and the metastability, and all that. So how to tie it to that. So let's see.
That's called in physics the first order phase transition as in water. So can we get that in our card game? Very easily. I just need five colors instead of three. So let's play exactly the same game, but I now add brown and yellow. And we do exactly the same thing. Exactly the same equation for those who are not scared of the equation that was holding for any number of colors.
The code, if I was not completely stupid, I also wrote it for generic number of colors, so I run the same code and get now a picture that is more complicated. Now there are three different curves. So it's the same equation. Now I have five colors. Now what does this picture mean? Let me maybe just explain it in terms of--
So this c, what is the axis here, is, again, how many questions did you-- how many pairs. In how many pairs did your name need to appear so that something is happening? So now we took five colors, so I have more options so I need more of these edges, right? There are more options. It will be more difficult for me to pin down who had what, so I need to be asking more questions. This is intuitive.
Before it was about four. Now it is about 13, 16. And they are actually now two numbers that are important. One number that is important is 16. If I have more than 16 pairs per person, then the algorithm that I wrote will actually give me a good answer. Very good, very close to 1. Suddenly very close to 1. Before that, there's the blue points here. So before 16, I will be just randomly guessing. And above 16, this algorithm gives me 95% right. That's great.
But there is this number 13.2. I can show mathematically that if I had less than 13.2, then it's impossible. I cannot do better than randomly guessing. The information is not there. I collected too few things. It's not possible. But in between, if say the number was 15 or 14, it is possible. If I had an algorithm that was flying to the moon, doing exponentially many steps, then I would manage to find a good split, and the goodness of that split would be the green curve here.
But I only have my algorithm and that one will not give it to me, OK? You say there is a better one. You're lazy. Find a better one. OK. I work hard. I try 100 other ones. Nothing does better. Often they do worse, some of them do the same, but nothing does better. And there is a conjecture here that stands that no polynomial algorithm-- that means something that is not using-- not flying to the moon, not using exponentially many steps is not possible.
Now, this is a conjecture. In mathematics, we say conjecture, we don't know how to prove it. If we knew how to prove it, we would on the way prove the biggest open problem in computer science. That's not happening. That will not be easy to prove something like that. But in which sense we can be bold and conjecture it, and it's a fascinating problem in computer science for the century in which, is this true? A, is this true? It may not be true, of course.
There might be an algorithm that actually does it. And if it is true, in which sense it is true? What do we need to add to this theory of computational complexity in computer science to be able to formalize this to a satisfaction of even the most demanding mathematicians among us? That we don't know, but this conjecture stands and it's 16. The number is 16.
As the graph is large, it's actually the number of colors minus one square. That's the number. That's the number that you need for this algorithm to work if you have less. This other number is-- doesn't have a simple formula, but we can compute it. We can numerically compute it, but there is a gap. This gap is growing larger and larger as the number of colors grows. So it's not disappearing. It is there. It is important.
So physically what does it mean? This hard phase is the supercooled liquid. What is happening for this number of pairs is exactly what is happening in the supercooled liquid. It's in the metastable phase. Now in the supercooled liquid, we had this nucleation. It crystallized actually into the ice when we kind of hit something dirty, some impurity.
Now here in the graph, that's where we come back to the jars with the beans. Now on the graph, as we were defining it where I choose at random the pairs, it is not true that the volume eventually becomes bigger than the surface. On a graph, if it is very, very large and I fix the distance into which I'm going, meaning that-- OK, Ahmed was connected to Jim and Bart, so that's distance 1 from him. Then Bart is connected to something else, that distance 2. I go to distance, say, 10 and I count the people that are at distance 10. That's the surface.
Now, I count the people that are at distance less than 10. That's the volume, and I always get that they are comparable. I don't get that the volume is growing-- is getting bigger than the surface. If the graph is large enough, then I fixed this distance. Then if I want to fix it to 100, I have to make the graph really, really, really large so that this is still true.
But fundamentally is this comparison of the surface to the volume that destroys the nucleation. So here, the metastability is really exponentially long living, and there is really nothing to do about it, at least that's the conjecture.
So from this point of view, it's like the diamond. It's really hard. The diamond is still in finite dimension, so formally there is nucleation, but practically we don't worry that it will turn into graphite. Except it's kind of the other way around, right? Here, the diamond we are happy that it stays diamond. But in this bad algorithmic solution that is in the metastable phase, we are unhappy that it actually stays there. We would like it to go to the better one where it actually solves the problem, but it doesn't. So from this point of view, it's inverse. But the reason for why it stays there is very similar.
So that's what's happening. Beautiful connection between phase transitions in physics of the first order type and conjectures about algorithmic hardness in average case over some very concrete ensemble of cases, instances, our kind of random choices of the pairs in the graph. But now, can such understanding actually help us to something? Can it bring us to new algorithms? Can it do something?
So not in the [INAUDIBLE] coloring problem this conjecture stands we really believe that you cannot do something, and if somebody one day right finds an algorithm that can do better, that will be a huge breakthrough. But sometimes, sometimes we can do something. And now I want to tell you in the rest about one case where we can do something.
So that's about the compressed sensing problem. What is this? This is a beautiful idea that, when you take pictures in your vacation, for instance, just photos of your family, of the sea, et cetera. And then you take-- so when you bought the camera, they advertise to you the number of pixels that it is actually taking, the number of little pieces of the picture that the camera is actually taking. Now that's in the millions so I know.
Now when you actually download the pictures from the camera and store them in your computer, you would think that each pixel-- you need kind of three colors and how intense each of them is-- you would kind of think that you have certain number of bits per pixel that you will actually take in the memory of the computer to store that picture. That's actually not true. We have algorithms that compress the information that is in the picture that the computer actually stores it in a smaller memory than the number of pixels you actually originally had in the picture.
So that's because the picture has a lot of structure in it, and this compression is hence possible. That's a standard kind of property of structured data that is used in compression algorithms. Now compressed sensing, the idea here is, can we use this property that natural images are compressible to also measure them with less measurements to start with? Instead of taking every pixel one by one, can we take fewer measurements and still from those fewer measurements reconstruct the full picture?
And there's the idea of compressed sensing that has been very influential, say, active-- say, 15 years ago, 20, 15 years ago. This is one of the kind of papers. I took the snapshot. Many people followed up on these kind of ideas. How is this useful? Well, the main kind of promise of compressed sensing is in medical imaging.
If some of you or some of your family members needed to take the CT scan in hospital or the nuclear magnetic resonance, the MRI scan in hospital-- even for a banal thing, like your knee that got twisted or something, you need to lie in that machine for 20 minutes and it's boom, boom, boom going around. It's pretty annoying. It needs to take all those measurements to reconstruct precisely what's wrong with your knee so that then the doctor can actually help you.
If we could cut the number of measurements by factor 10, we would cut the time you need to spend there by factor 10. For the same cost, we would pass 10 times more patients. That would be great. That's a great promise. So that's what compressed sensing is hoping to achieve. So as before, this is the general idea.
Now I need my kind of coloring graphs game, some abstraction that brings it to a case that I can tackle with equations. And that, indeed, is done in compressed sensing. You can formalize what this means mathematically. There is a matrix that represents the measurements, what the machine is measuring that is multiplying a vector that represents the picture-- what is actually being measured-- and the product gives us the measurements that is another vector from which we will be actually reconstructing it.
So vector is matrix. Matrix, this length here is the number of measurements and the length here is the number of pixels we need, so we are aiming at fewer measurements than the number of pixels. And the reasons for which the pictures are compressible-- I actually should have said a few slides back here-- that's what was illustrated here.
There is a mathematical way of transforming the picture so that it's represented by a list of numbers, among which many of those numbers are zeros and only few of them are non-zeros. So that's called sparsity. So we kind of use that sparsity of the vector to be actually able to reconstruct it from fewer measurements. So we just mathematically formulate this problem and take--
Now, the difference here is that this matrix here, we can design it. There's the machine with which we are measuring. We have the freedom to design it as we want in a general system. That will be important. So if we take this matrix at random in the same sense as we were taking at random the edges in the graph, the following happens. There is a phase diagram. So again, like for the water and ice melting, and steam or vapor, whichever you want, there is a phase diagram.
Here is how many measurements we are saving. So 1 means we are saving nothing and 0.2 means that we are doing with 1/5 of the time then we would have to. This axis is actually how much compressible the image was. So 1, it was not compressible at all. 0.2 means that we could compress it on 1/5 of its original size.
And then there is a green phase where the reconstruction is very easy with very easy algorithms. Red one where it's impossible whatever you do. Blue one where, with the optimal algorithms that stem from the line of work that better initiated, you can do it. An orange one, which is this hard phase due to a first order phase transition.
Now the difference here is that now you have a freedom actually in designing the matrix F. You don't have to take it at random. You can take it differently. And if you use that freedom, what you can do, you can start thinking back about the nucleation that we demonstrated on the hand warmers. And for that to happen, we needed two things. We needed the seed, the impurity. When you pinch the coin, the impurity happens. That should be fine.
And we needed the low, dimensional structure that makes it-- that the surface of the jar is eventually smaller than the volume of the jar. But we still also need some randomness here so that the idea of compressed sensing works as it needs. So we need the matrix F to be somewhat random, but we can make it random in this kind of a band.
Each of these rectangles inside will be random, but we organize the randomness in a one dimensional band-- so that's where the low dimensionality comes from-- and the first rectangle, we make it larger. There's the seed. In the first rectangle as if-- it's as if the temperature was higher, so we lower so it was actually easier to find the crystal. That's the seed.
And so if we choose the matrix this way-- and in this application of compressed sensing in principle we can-- and then we recompute what is going on, we will get a phase diagram where this hard phase disappears, and instead, we can do the reconstruction down to where it is information theoretically possible, where it was possible even before but by flying to the moon, now we can do it staying with our computer.
So in systems where you can design the randomness and induce the nucleation by having a seed and a low-dimensional structure, you can get rid of this hard phase. In the planted coloring, if we stick to the definition as we have it, this is not possible. But in some applications, it is possible.
Some other ones appear in error correcting codes. Those are algorithms that are run in your phones to actually correct the noise of the communication. So that's another application area where you actually design the structure of the error correcting code and where this idea of inducing the nucleation actually brings you closer to the limits of what is possible in terms of correcting the errors that necessarily happen because the communication is noisy.
So here is just one kind of nice illustration of the nucleation in a reconstruction of an image. There is a wave traveling also in the algorithm, the same wave that was this kind of-- that was spreading a crystal in the nucleation. So the seed gets reconstructed first, and then through the one dimensional structure, you get better and better reconstruction as the algorithm runs. And this kind of is not visible at the image because it's randomly reshuffled so that it actually works the way it needs to.
So phase transitions in machine learning. Let's tie it back to machine learning, but I will cheat here a little bit. I will just tell you that that's something that is studied since a long time. The first paper that I know of that describes phase transition in learning with neural networks-- and indeed this first order phase transition-- is in the '90s coming from the statistical physics community, the follow ups of Bethe and other works. And I will tell you more about that on Friday, for those of you who are actually coming to the computer science AI lunch.
And just an appetizer of a very recent work, I was kind of-- what can I say about the ChatGPT? So what I can say is that the system is built from neural networks that are called transformers. The neural networks called transformers are built from layers that are called dot-product attention. These are just names.
And what we can do-- and that's like very first step. It's not yet explaining why ChatGPT can do what it can do far away from it, but we do have a case where, in a neural network that has the dot-product attention layer, we have a first order phase transition between a phase where when we did not observe many data, just few samples, we only learn based on positions of the words in the sentence. And when we cross this threshold of number of samples that corresponds to the phase transition, we start learning based on the semantics of the meaning of the words in the sentence. So we have a phase transition from a positional to semantic learning.
So OK, this may not mean much to many of you. Don't worry about this. This is just an appetizer that, indeed, even in those systems at which we are aiming and which is the challenge for the century to actually understand how they are doing what they are doing, some traces of this phenomenology are appearing and there will be much more works. And perhaps this will be the solution, and perhaps something else will be the solution, and this will end up as a kind of scientific possibility that didn't realize this. Of course, we don't know yet.
And with this, I just want to show you the people with whom I have fun every day. And that's my long-term collaborator and also husband, Florent here, and this is our two groups plus some former members of the group this last summer in a conference in Corsica, which is the best place for conferences ever. In Cargese. So if colleagues among you, if you get invited to Cargese, take the opportunity. It's amazing.
And I will stop here. Thank you for your attention.
[APPLAUSE]