[AUDIENCE MURMURS] [LOUD CLICK]
SPEAKER 1: Well, if I were following Ira's lead, at this point I would [THREE LOUD CLICKS].
AUDIENCE: Bang it on the table there.
SPEAKER 1: OK. So this year, we're very pleased to welcome at Bethe Lecture our own Saul Teukolsky. He opened with a wonderful colloquium on Monday on testing general relativity with LIGO. And last night followed with a wide-ranging public talk on black holes, neutron stars, and what we've learned from gravitational waves. Part of the title was, Was Einstein Right? I bet you can guess the answer.
Here's the short version of an impressive CV. Grad student with Kip Thorne; inventor of an equation describing Kerr-Newman black hole, now generally known as the Teukolsky equation; Cornell faculty since 1974; leader of an international collaboration responsible for the first accurate waveforms used to interpret LIGO's first gravitational waves detections. Not to mention, Bethe Professor of Physics and Astrophysics, and now, Bethe lecturer. This week, he's had a hectic schedule meeting with faculty, students, and his research group.
But Saul is a master of time, handling without any obvious hiccups the separate callings of chair, classroom, teacher, author. Perhaps most impressively, he has a real open door policy. I can say from personal experience when you randomly show up at his office door, he straightens up, smiles, welcomes you in-- almost as if you were on time to an expected appointment, probably the scientific DNA of Hans Bethe.
Before he starts today, I also want to mention one last item. Saul repurposed various honoraria from this week to support the Wednesday astrophysics lunch. Repurposed means donated. Larry Kidder is in charge. See Larry with your suggestions. Today, Saul will preview the coming revolution in computational astrophysics.
SAUL TEUKOLSKY: Thank you. It's a great honor and a pleasure to be giving these lectures here among so many friendly faces-- at least, up till now, friendly. So I'm going to be talking about simulations in astrophysics.
There are many examples of these in all kinds of fields. In cosmology, everything from simulating large scale structure of the universe, galaxy formation, globular clusters, tidal disruption, accretion to all kinds of areas where we do these simulations. And I think, Sidney Harris, as usual, says it the best. I'll give you a second to read the caption.
OK, so obviously an example near and dear to my heart-- this is the first gravitational wave detection. So shown, this is the figure from the detection paper. Here is the strain in the gravitational wave detector at Hanford in Washington. And the blue here is at Livingston, Louisiana. And then, you can put the red on the blue, and you can see just by eye it's the same event.
And then here is a slightly processed version of the signal in gray, with sort of a width to show some kind of estimate of errors. And then superposed on top of that is a waveform calculated by our collaboration, simulating extreme space times. And you can see the agreement is quite remarkable.
If you want a quantitative estimate, basically if you subtract this best-fitting waveform from the signal, the residual is consistent with noise at the level of about 4%. So in order to do this, one has to be able to solve Einstein's equations in all their gory, nonlinear grandeur to be able to make predictions like this.
Now the history of the subject of numerical relativity is very interesting because it's going to make one of the key points that I would like to make today. So here is actually the first paper ever published in numerical relativity. And it shows-- so you can see the date. It's 1964-- so more than 50 years ago, Hahn and Lindquist.
And what they did, in modern language we would call the head-on collision of two black holes. And in the paper, they describe how they do this. And they ended up doing 50 time steps, three CPU hours. Here's the machine they used, the grid size.
And the units here-- m is roughly-- you can think of it-- it's proportional to the mass of a black hole. You can think of it as a light crossing time, so barely two light crossing times. In a modern simulation, of course, we would do it in full 3D with typically many more grid points. And we would go to several thousands of m.
And what's interesting is the conclusion of this paper. It says, "In summary, the numerical solution of the Einstein field equations presents no insurmountable difficulties." Now this is despite the fact that if you actually read the paper, you find the reason they stopped at 1.8m is their code blew up.
They got overflows. So in fact, the conclusion should have been completely the opposite. It presents completely insurmountable difficulties. In fact, it took until 2005 to surmount those difficulties.
So what I want to focus on is not all kinds of numerical computations one might do in astrophysics. I want to focus on solving partial differential equations. So this still covers a wide range of astrophysics. It includes doing hydrodynamics, magnetohydrodynamics, gravity-- whether it's Newtonian or Einstein-- radiation transport, and I'm sure you can think of other examples.
But the point is I'm not particularly interested for these purposes in talking about, say, N-body simulations, right-- well, for example, directly integrating equations of motion for point particles in some globular cluster or anything involving Monte Carlo simulations. Because what I want to do is look ahead to how computers are going to change over the next five to 10 years. And the way we currently do N-body or Monte Carlo simulations is not going to be particularly affected, as far as I can tell.
However, solving PDEs is going to become an even bigger challenge. And that's one of the things I want to explain. All right, so I'll let you in on a secret. If you look at the numerical algorithm in that Hahn and Lindquist paper, it's essentially the same algorithm that most people are using today to solve PDEs. OK?
And surely, you would think there's been some progress in 50 years. And the answer is yes, we have fancier ways of doing the kinds of things that were done there. But 95% of what is done would be comprehensible to Hahn and Lindquist from 1964.
And the method is called finite differencing. Basically, if you want to take a derivative, you take a function value here and a function value here. You take the difference. You divide by delta x. And that's your approximation to the derivative. OK?
And the more accuracy you want, the closer together you want to put your points. So the delta x gets small, so you need many more points. So you have a trade off between speed and accuracy. That's the basic idea.
Now there have been some stirrings already even within astrophysics to move away from this kind of technique. So for example, in our numerical relativity code, we do not use finite differencing to solve Einstein's equations. The reason is, because Einstein's equations are smooth-- right, so away from the singularities inside black holes, you don't have shock waves or anything like that-- so this suggests you should use higher order numerical methods.
So what that means is the simple idea of taking a derivative by taking the difference between two points. If you think about a Taylor series expansion, right, you're only keeping the first order term in that Taylor series. So there's errors at the next order.
A higher order method you would, say, use three points for the quadratic through, take a derivative. Now you'd approximate the first two terms in the Taylor series, and you would have a higher order error. So if you continue this idea-- so here I've shown a numerical grid. Let's just make sure I push this right.
So here's a nice uniform grid. Suppose you want the derivative at this point. So you can either work out with Taylor series or you can look up in books, a formula which will tell you-- I suppose I want to use these points to approximate the derivative there. There is some coefficients you need to use to write down that approximation.
And what one can show is that, for a derivative like this, as you add more and more points, the coefficients remain bounded. The problem comes when you get to the boundary here. You see, you have to do the approximation only using points on one side.
And as you add more and more points to try to get higher accuracy, these coefficients diverge. They blow up. And so what you find is if you try to use higher order methods by this finite differencing idea, that unless you can do something special at the boundaries, you tend to run into these kinds of problems, especially near the edges.
The trick is, to get around this, it's tied up with using a uniform grid. The trick is to bunch the grid points, non-uniformily, to bunch them near the edges. And this is the foundation of a method that's called the spectral method. So that's what I want to explain next.
So our code that we use for numeric relativity is called SpEC. That stands for Spectral Einstein Code. And right now, it's the best code that's out there. There are about, I don't know, 10 different groups that are in this business. And with spectral methods, we are about an order of magnitude more accurate and an order of magnitude less CPU time, compared with the other people who are using finite difference techniques.
So what's a spectral method? So one way of thinking about it is you all learnt about separation of variables, right? You want to solve a PDE, say it's a function of x and t, you expand it as a product of basis function, right?
So this is a basis function in x. And this is some coefficient, which will be a function of time. All right. And then, in a spectral method, these basis functions have an orthogonality condition. So you multiply both sides by a basis function, and you integrate. And then, from the orthogonality, you get a formula for this coefficient.
Now trying to do these integrals numerically, exactly, analytically, and so on can be complicated. And usually, that's not practical. And so Lanczos, who was one of these polymaths who did all kinds of neat stuff, invented what's called a pseudospectral method.
And the idea is you replace this integral by an approximate numerical quadrature. So that means you choose a bunch of colocation points, where you're going to evaluate these functions that are in the integral here. So the x ends. And you choose some weights. And then you can approximate the integral.
So the trapezoidal rule is an example of a quadrature. And the idea is, if you're using N-colocation points, you have a representation of these. You can calculate these coefficients.
And so that's equivalent to giving you the information that you wanted up here in your approximation to the function, right? You've made an approximation because you didn't let the sum go to infinity. You made it finite so it will fit on a computer.
And you can think of this as, if I know the function values at all the colocation points, I have the complete information about my approximate solution. Alternatively, if I know all of these coefficients, then I can just multiply them by the basis functions and also get my approximation to the solution. So it's a little bit like in quantum mechanics when you go back and forth between position space and momentum space. In fact, if these basis functions were e to the ikx, if I was doing Fourier series, this would exactly be the relationship between going between momentum space and position space except I'm doing it on a finite basis, not having an infinite basis.
And the nice thing, then, is I need to take a derivative of the function. Since I know what the basis function is, though, I could just take a derivative with respect to x analytically instead of doing the finite difference business. And the payoff for this is, if your function is smooth-- right, no shocks or discontinuities or anything like that-- as you increase n, the accuracy increases exponentially fast.
So usually in finite differencing, we're used to the idea that as we add more basis functions, say we double the number of basis functions, the error might go down say by a factor of 4 if it's a second order method. Here it goes down exponentially fast. And the problem is, a lot of astrophysical problems, we want to deal with things smashing into each other. We've got shock waves. We may have a surface of a star.
We have discontinuities. And there, as you know, if you try to represent a discontinuity, a step function, say, with a Fourier series or any of these basis function expansions, you get Gibbs phenomenon. You get the oscillations, the ringing, and that kills you. You can't handle shocks.
OK, so we want to include matter. And again, an example that's close to my heart is the collision of two neutron stars, or hopefully soon LIGO will see a black hole and a neutron star collision. So here's-- again, from the discovery paper, from the multimessenger paper on the neutron star merger last August-- here's the gravitational wave form as a function of time showing the frequency going up as the neutron stars merge together.
And so this is when they touch, the merger time. And then 1.7 seconds later is the start of a gamma ray burst. This is the output of the three gamma ray cameras on the Fermi satellite and on integral.
OK, so we would like to be able to study these phenomena as gravitational wave sources, as short gamma ray bursts, as kilonovae, and so on. All right. So you need full general relativity to do this. So this is a sort of schematic slide showing this idea that, from an event like this, you get emission in gravitational waves and then all across the electromagnetic spectrum from gamma rays through to radio waves, and in principle also, neutrino emission. So if something like this would have happened close enough to Earth, it might be picked up by one of the neutrino detectors.
And then, again since last August, this is the real picture. And so this is again showing a selection of images from the various detectors. Here's Chandra X-ray. These are the various optical telescopes that picked it up so the crosshairs are showing the actual event.
And these are just, as a function of time, showing when the first set of observations were made in each wave band. You don't have to-- probably can't read it from the back. But it says gamma-- gravitational waves, gamma rays, x-rays UV, optical, infrared, and radio. OK, so how do you simulate something like that on the computer?
By the way, I hate this word simulate, right, or simulations. Because it sounds like what we're doing is not real. You know, there's something kind of fake about it, right? It would be better to call these numerical computations.
Because you're taking the exact theoretical equations and you're solving them to some hopefully known accuracy on a computer by a numerical method. And if you want higher accuracy, you just pay more money and use more grid points. Get a better solution. But anyway, everyone calls them simulations, so I can't fight against that.
So here is some-- for the experts, just some jargon about what kinds of methods we're using. You can ignore that. What I want to draw your attention to is the typical accuracies or rather lack of accuracies that we get.
So even running on the biggest computers that we have access to, typically after doing 10 or 15 orbits, we have about 1 radian of accuracy in the gravitational wave phase. If we really push it, we can get this maybe a little bit smaller. But that's a typical number.
We get the properties, the black hole mass, maybe, to 1%. The properties of the matter as we fall, as this neutron star gets disrupted, much bigger errors-- maybe 10% to 50%. Now, just for fun, I'll show you a little movie of one of these simulations.
So this is a neutron star falling into a black hole. The black hole is about seven times the mass of the neutron star. There we go. OK.
And that little grid there is sort of-- we use a moving grid to follow the neutron star as it gets torn apart by the tidal forces of the black hole. And you end up with this accretion disk around the black hole. And hopefully this would have a very nice optical display.
And this is not the only thing that can happen. Depending on the mass and spin of the black hole, it can swallow the neutron star whole. In which case you would have a gravitational wave signal, but there would be no optical counterpart. So studying these kinds of events can give you information about the properties of the neutron star, how easily it's torn apart, about the nuclear equation of state, and so on.
AUDIENCE: Just for normalization, how expensive was this?
SAUL TEUKOLSKY: This is about a month of supercomputer time on a few hundred processes. This was not an exorbitant simulation.
AUDIENCE: Given that it took a month, can we see it again?
SAUL TEUKOLSKY: Sure. Yeah, all these movies are on YouTube by the way. If you just go to SXSCollaboration.
AUDIENCE: So you said that you had rather large errors.
SAUL TEUKOLSKY: Yes.
AUDIENCE: Is that because of the challenges with dealing with sharp interfaces of the neutron star?
SAUL TEUKOLSKY: No, not any of that. So in order to deal with the matter in this simulation, we had a hybrid code. We solved Einstein's equation with the spectral method. And the neutron star was dealt with with a conventional finite difference code.
And this was to show the finite difference grid we were using. And just because of the expense of that grid, finite difference grid, that limits our accuracy.
AUDIENCE: So your black hole simulations don't suffer from this--
SAUL TEUKOLSKY: Exactly.
AUDIENCE: --because they've been using [INAUDIBLE] spectral method.
SAUL TEUKOLSKY: That's right. Yeah. OK. All right.
So if we look ahead to what's going to happen in terms of-- and I've just picked gravitational wave physics sort of as the example. You can make analogous arguments in many other areas. So in LIGO, the signal for noise is going to improve by a factor of three over the next, they say, three years, OK?
So that means a factor of three in amplitude, you're seeing three times further away. That's 27 in volume. So the event rate will go from roughly one a month, as it is today, to maybe one a day.
Some events-- so right now the biggest signal to noise that LIGO has detected these events is roughly about 25 or 30. That signal to noise will be multiplied by a factor of three, so roughly 100. So that means if you want to do a simulation that's as accurate as the measurement that they'll be able to make, a rough rule of thumb is you want a phase accuracy at the end of the simulation of about 2 over 100, 1 over that signal of noise. So you need to get the accuracy of that waveform down to be about 0.1 by the time of merger.
In the next 10 years, there are planned upgrades to LIGO, which will give you another factor of 4 increase. OK? So that's 64 in rate and another 4 in signal to noise. And then if you look ahead to LISA, the strongest events for LISA will be mergers of massive black holes. And the projected signal to noise there is 10,000. So compared with where we are today, right, that's an increase in precision that's needed of more than a factor of 100.
The challenges for neutron star codes, right, are the computational areas even today are too large. In fact, it's very difficult to quantify the errors. Because the way that we would like to quantify the errors is you run the code and then you run the code at a higher resolution. And by comparing the changes, you get an estimate of the errors.
Well, often we can't afford-- if we run it coarser than we're running now, we know that there's something wrong. You can just see. Usually we can just barely resolve the physics, and we can't afford to do one more level of refinement.
So we can't even tell you precisely what the errors are. And of course, the simulations take too long. Who wants to wait-- I mean, some of these simulations take a couple of months, right? Who wants to wait that long?
And then, if you start getting events one per day, even if you don't simulate all of them, there'll be some that you want to do. And who wants to wait months in order to do that? So just looking ahead, you can see the kind of problems. And then the punchline is that the methods that we're using today, particularly these finite difference methods, if you look at the extreme scale machines that are coming up, those with millions of processors, if you took today's codes and tried to run them on those machines, they will actually run more slowly. They can't take advantage of having many more processes. And I'm going to explain why that is.
For the astrophysicists in the audience, you might be saying, well, come on. What's wrong with 1% accuracy? I mean, usually in astrophysics, a factor of two is good enough, right? What's this guy going on about 1% accuracy?
Well, so here are some examples. When that accretion disk forms from the tidally disrupted neutron star, it's about 1% of the rest mass that ends up in that disk. If the accuracy of your simulation is 1%, you basically have a 100% error in the properties of that disk.
You can't say anything about it in terms of what kind of optical properties or x-rays might be admitted. You can't say anything reliable. That doesn't stop some of my colleagues from saying things. But anyway, you shouldn't believe them, not yet.
The supernova problem is a problem that's been around for 50 years, and it's still not solved. We don't really understand what powers that explosion. In fact, if it weren't for the fact that we observe supernovae, if all we had was the theoretical simulations, we would say these stars just collapse and there's never an explosion.
And the reason is that-- it's not that the supernova people are not smart--
I won't-- OK, they're very smart. The problem is that the initial condition that you start with-- you have a very precise balance between gravitational potential energy and the thermal energy, the internal energy of the star that's about to collapse. They're very delicately balanced.
And then once the collapse starts, the potential energy gets converted into kinetic energy, but everything sort of proceeds more or less homologously in such a way that you're kind of preserving this cancellation of two numbers, a positive number and then a negative number for the gravitational energy. And so in order to predict the outcome, you have to track those differences to a precision. You're interested in that 1% difference between them. Because that's the whole story of whether you get an explosion or not. And so that makes the problem much more difficult than it might have been otherwise.
Some of us are very interested in studying these neutron star-neutron star mergers-- for example, that LIGO just detected-- to look at, as the neutron stars get close to each other, just before they merge, each one raises a tidal bulge on the other. That effect can be seen in the gravitational wave form because it's draining energy from the orbit and so you get a faster collapse the bigger the tidal bulge is that's raised.
So if we understood how big a tidal bulge will be raised as these things are spiraling in during this dynamical phase, and that depends upon nuclear physics, the equation of state, and our ability to model this accurately enough, one can hope to try to learn a lot about nuclear physics from measuring this effect. And again, this requires much better than 1% accuracy.
And then there are examples where if you don't have enough accuracy, you get completely the wrong result. So for example, there are many problems involving magnetic fields in astrophysics. For example, in these accretion disks, the magnetic field is often the most important thing in determining the dynamics.
And there are various MHD instabilities. For example, this one called the MRI, the magnetorotational instability-- you don't have to know what it is. All you have to know is that, if the grid that you are using is too coarse, then that instability is suppressed. You'll have the wrong physics.
If you make the grid sufficiently fine so that instability is triggered, then you get turbulence and that completely changes the properties of the disk. So I hope I've convinced you that there are many reasons why, even for an astrophysicist, we need to worry about having accurate enough simulations. Why is it difficult?
All right, well, I already explained that neutron stars, for example, have surfaces and shocks. Those are nonsmooth features. And so you can't just use these spectral methods and hope to get good solutions.
There are many time scales in these problems. Everything from light crossing times, if you have a black hole in the thing; sound speeds; time scales for-- if you're going to emit a gamma ray burst, that can be seconds later. Whereas the orbital period just before merger can be milliseconds, you might want to wait for a gravitational wave to get out. So just following the multiple time scales and similarly the multiple spatial scales is a big challenge.
And then, of course, you have situations where the whole geometry changes. So you might have set up a nice grid that can capture the properties of, say, the two neutron stars. And then suddenly they decide to become one and now you need to figure out how you're going to follow that.
And then, of course, we're ambitious. We want to be able to follow all of the physics, the general relativity, hydrodynamics, the magnetic fields, neutrinos, photons, nuclear reactions, and so on. So how are we going to do that? I have an answer. All right?
It's called the Discontinuous Galerkin method, right? So it's abbreviated DG. So right now it looks to me like the only candidate to replace finite differencing and spectral methods and the other things that I'm not talking about today is this particular algorithm.
And I believe that it can handle all of the problems that I've listed up here. So I want to spend just a little bit of time-- it's not too complicated-- to show you how this method works and why I think it is the answer. All right. So here are some pictures to contrast. So the finite volume is just a fancy name for finite differencing.
So in a finite volume method or a finite difference method, we have some-- I've drawn three uniform intervals-- so you think of the center of each of these as having a grid point. OK? And then so the finite volume means, instead of focusing on the grid point, you focus on this interval. In three dimensions, it would be a little cube.
And since we only have one function value in here, we're essentially representing our functions-- so green is supposed to represent the solution. It has a constant value everywhere in that interval because we only have one number to represent it. So then, at the interfaces-- for example, let's think of this as a hydrodynamics problem-- the equations of hydrodynamics are a set of conservation laws.
For example, any mass that leaves from this interval to the right must be added to the mass in here. Right? I can't lose mass. Similarly, I can't lose energy and I can't lose momentum. So that's all the equations of hydrodynamics actually are.
So the flux reconstruction problem is just the rule you use to connect these elements by conserving the flux between them. And there are various recipes in the literature for doing this. And the key thing, the reason that finite volume methods have survived so long, is that people have figured out ways to do this flux reconstruction that it can handle shocks.
But now suppose you have a region that's smooth, that has no shocks. So you'd like to use a higher order method. Well, you only have one grid point here. So you have to then look at this grid point here and this one here and maybe go this way.
So you're coupling together a whole bunch of intervals, right? The jargon people use is you're using a wide stencil to represent the solution. And that's a problem if you're going to distribute your work over a whole bunch of processes on a big computer. Because it might be that the function value here might be assigned to a different processor than the one over here.
So you have to communicate this information in order to be able to take the derivatives. And the wider the stencil, the worse the communication gets. Because you know, this might be on yet another processor and further and further apart. All right.
So now let's look at the spectral codes. So in a spectral code, you can again think of dividing up into subdomains. But now in each subdomain, we're going to make a spectral expansion in basis functions.
So here I've just drawn two basis functions, namely constant and linear in x. But in principle, you know, the next one would be some quadratic. And then you could have a cubic and so on. But let's just use those to make the picture simple.
All right, so now I'm approximating my green solution by a bunch of straight lines In each interval. So this method, since I can put more and more basis functions locally in any point here, in any interval, I can get my exponential convergence in smooth regions without using a wide stencil. It can be all done locally on a single processor.
Now I still have my flux issue. I have to connect across these intervals. And the flux that's used in spectral methods can't handle shocks. That's where you get the Gibbs phenomenon from.
A DG code is a hybrid of these two methods. So we still use local expansions in basis functions. So that lets us get high order locally if the solution is smooth.
But now we have more freedom-- the D here stands, remember, for discontinuous-- we don't require the solution to be continuous at these interfaces. Because if you think about it, since you're making an approximation here anyway in the interval, why require the solution to be continuous here? It can have a small discontinuity as long as the error you are making is smaller, is small enough, and as long as that error converges away as you use more and more basis functions and more and more intervals.
So that extra degree of freedom to allow a discontinuity at the interfaces is the key to the success of this method. Because this formulation gives you much more freedom in the fluxes. I put arbitrary in quotes. It's not really completely arbitrary.
This is still an active area of research. But the point is there are flux rules that are known that let you handle shocks. So that's the power of the method.
And any set of PDEs can be formulated in this way. So you don't need one code to solve Einstein's equations, another one to do hydrodynamics, another one to do magnetic fields. You can do them all in the same formulation.
All right. So just for the expert-- I don't usually put equations up in talks. But I thought just for people who are interested, I'll show you a little bit-- how does this method work? So here's an equation which looks like a conservation law. So there's some quantity u.
In a real application, u would be a vector of things, like mass, density, and energy, and momentum, and so on. But you can just think of this as a scalar. So here's a divergence of some f, and there may be a source term on the right.
So here we expanded in basis functions. So we expand the u and then the flux of u gets expanded. And then the Galerkin, the G, was Mr. Galerkin had this bright idea, when you do approximations, sort of one way to think about what's the best I can do if I'm using n basis functions.
Well, the best I can do is take those n basis functions-- they span if I take linear combinations, some subset of all of the possible functions. And I can project my error against those basis functions and require that to be zero. And that's all that Galerkin means. OK?
So here-- sorry-- so here I've done that. So I take this quantity, the quantity you would like to be 0, taking the s to the right-hand side. I project that against each base function and require that to be 0. So now, if you take this condition and you substitute the expansions for u and f in there, you see one of the terms you get is this divergence term here, right, multiplied by a basis function.
OK, so who can guess what you do next?
AUDIENCE: Integrate by part.
SAUL TEUKOLSKY: Yes. This is someone who's taken a course from me.
Because this is how you tell the difference between a theorist and an experimentalist. A theorist is someone who knows that when you're stuck, you integrate by parts.
And that's not meant to be a joke. That's absolutely true. So you integrate by part, so that means you basically let the derivative operator act on the product of these functions and then subtract off this extra term.
And then you can use Gauss's theorem on the first term. You turn the divergence into a surface integral. And here's where the art comes.
You now take this flux-- so this is on the boundary. And you replace this by some prescription that's called a numerical flux, right? You make a choice of how you're going to communicate the information where the discontinuity is between the two sides.
OK, so this part is art. And it turns out that this procedure, which I've written down here in nice Cartesian-like coordinates, can easily be generalized to curved spacetime. So you can do anything that you could have done in terrestrial physics, you can do in general relativity as well with this kind of technology.
OK. So here's an example done by Francois Hebert when he was a graduate student here. So this is a sort of standard test in relativistic hydrodynamics. It's a 2D example. So you have a highly relativistic inflow in the upper half here and similarly, coming in here.
And the initial condition is, in this quadrant, a high density gas and then here a very low density gas. So this comes in. You get the interacting shocks over here, and contact discontinuity is formed.
And if you compare this to finite difference codes, basically the figures are indistinguishable. And typically, you need much higher resources in the finite difference version of doing this. So this is very encouraging.
AUDIENCE: So this method goes way back. It's not--
SAUL TEUKOLSKY: Yes. So it was invented in the-- well, it doesn't go way back. It goes back to the '70s, OK?
Whereas finite differencing goes back to Newton, right? Yes, but the D was the key. So there is a version of this which is much more well-known, especially to engineers, where you require continuity of those basis functions. And that's the finite element method. And I'm not sure why the engineers stick with finite elements. To me, it's clear that this is superior. But they have a huge investment in finite element methods.
AUDIENCE: So for you and me, this may not be way back. But for others here, it probably is.
SAUL TEUKOLSKY: Yeah.
AUDIENCE: So the question is there must be a downside to everybody doing it.
SAUL TEUKOLSKY: No. So this method only became-- it was originally developed and used not so much for these kinds of applications. I would say it's only in the last 20 to 30 years that it's really become actively investigated by the applied math community as a good method for doing time-dependent evolutions of PDEs. It was used for other kinds of applications, mainly, before that. So this is, relatively speaking, recent-- 20 to 30 years.
AUDIENCE: So my question is really is there a downside? Is it more expensive in some way? Or does it have every advantage?
SAUL TEUKOLSKY: No, no, I mean, the downside is it's more complicated to write these codes. There's more infrastructure needed. There are some problems where-- so when I talked about those rules for handling the fluxes where there are shocks, there are some problems where finite difference methods work, but those flux rules-- people haven't figured out how to use them in here.
So I think think this is still an active area of research. It's not the complete solution, despite my Superman picture, it's not really yet. But I think it's good enough, and it's--
AUDIENCE: I mean, you gave the extension to general relativity, citing yourself to just two years ago.
SAUL TEUKOLSKY: Well, because nobody in astrophysics has been looking at this method.
AUDIENCE: No, I'm not questioning the citation.
SAUL TEUKOLSKY: Yeah, that's why it's not earlier.
AUDIENCE: I agree.
SAUL TEUKOLSKY: OK.
AUDIENCE: But it wasn't hard-- not for Saul.
SAUL TEUKOLSKY: If you can do Gauss's law in flat space, you can do it in curved space, right. OK. All right, so this leads me to what a lot of people in our group are working on. So we had our code spec, spectral Einstein code.
So obviously, I proposed son of spec. That got rejected. And the decision was made to call it SpECTRE. So how are we going to achieve our goal, right?
So we have this idea that maybe this DG method will allow us to set up a situation where the elements only have to communicate with nearest neighbors instead of having wide stencil. So that would be a good thing. But in order to get the kind of targets I was laying out earlier-- you know, to be able to handle very high signal to noise event, whether it's hundreds or 10,000-- to be able to simulate neutron stars smashing together with nuclear physics and neutrinos and magnetic fields, and do it to much better than 1% accuracy and so on, you might think that all I have to do is wait a few years.
Computers get more powerful, and everything will work. OK, so what's wrong? Why is Moore's Law broke? So the reason Moore's Law is broken-- so Moore's Law here, what I mean by that-- in the form that processor speeds double every, whatever it is, 18 months or whatever number you want to pick. And basically, the smallest structures on computer chips today are about 70 atoms wide, roughly. There's not a lot of room.
I mean, maybe they'll get to 60 or 50 or something. I don't know how many atoms you need, right? But the standard kind of designs-- we're basically done in terms of processor speed. So the way we're going to have more powerful machines is by having more processors.
So here I just got a picture. So this is actually not the latest generation. This is, I guess, at least two generations old in terms of supercomputer chips. But it's the one that I had the best picture of.
So what I want to draw your attention to here is this PU. Starts at 0, 1, 2, 3, going around up to 17. So there are 18 of them around on the edge here. So those are the processing units. So you can think of them as CPUs.
The reason they have 18 is actually interesting. So one of them is for controlling input output, I/O. So that leaves them with 17. And they actually only use 16 of them.
And that's a clever trick. It means that when they mass produced these chips, usually not all of them are perfect. But as long as 16 of them are OK, they can tolerate one bad one.
They just don't hook it up. So that's why there is an extra one there. That's cheaper than throwing away and only using the absolutely perfect chips that they produce.
So this thing in physical size-- it's 3/4 of an inch. And it has these 16 chips on it. And as I said, this is already two generations old. The newer chips of the same physical size are up over 100 CPUs on a chip this size.
So basically, if you ask how are we going to get-- so people talk about exascale computing, all right? So meaning computing at exaflop rates-- and don't ask me what exa means, right? I have to start at something I know like tera and then count upwards.
SAUL TEUKOLSKY: Thank you. Right. So the next generation of machines are going to have-- and then you can put together thousands of these chips. The average user who gets an allocation on the XSEDE supercomputer facilities-- which basically any NSF person with a proposal that says I'm going to do some computing, you get access to these machines-- you will have millions of processors at your disposal.
OK? And now I've already tried to make the argument that if you take the existing codes with their wide stencils, they spend all their time communicating. That's no good. All right? So we're going to implement, maybe, the DG method.
But that's not enough, OK? So here I'm just saying again that the processors are often idle during the communication. Yeah? AUDIENCE: Even in one node, you would have several different kinds of physics, several different kinds of--
SAUL TEUKOLSKY: Yes.
AUDIENCE: OK. Good.
SAUL TEUKOLSKY: All right. So now we get to the other stumbling block, and that's called load balancing. So we might have one processor just figuring out what's going on in some subdomain that's inside a turbulent region of a neutron star. And another processor could be out here calculating what's happening to some nice, smooth gravitational wave that's traveling out.
When we're doing black hole simulations, we're trying to locate where the surface of the black hole is. For the experts, it's the apparent horizon. That's because we don't want to solve the equations inside the black hole where there's a singularity. All right? The computer will not like us if we try to do that.
We might be doing radiative transfer where we have to follow rays for neutrinos or for electromagnetic emission and so on. So these are all different kinds of tasks that have to be done. And the way that we parallelize now, where we distribute the work by dividing up the grid into subdomains distributing across the CPUs and then each CPU is responsible for doing the calculations of the piece of the grid that lives on in its memory, before you can take another time step, you need to finish. You need to wait for the slowest CPU to finish because you need its information to be communicated, right, in order to do the derivatives or whatever it is that you're going to be doing.
All right. So we have a solution for that. And the solution is called task-based parallelism. And the reason I say "in principle" is because this is, again, something fairly recent. It's an old idea, but there haven't been implementations of it. You couldn't go out and find a nice package to download the way you can download MPI, which is a very standard package these days that distributes the work across processors.
So what is task-based parallelism? So in conventional parellelization-- so like we use in our current code-- so this is supposed to represent the work being done. The blue is different processors doing different work that takes different times. And then the orange is while they're waiting for the slowest one to finish.
In task-based parallelization, you have a list of all the tasks that need to be done. And then you have all your CPUs waiting. And then they get assigned tasks. And when one task is finished and this processor is idle, you give it another task to do. So it's a very simple idea.
As you can imagine, the bookkeeping for this can get very complicated. Because you might have a task that needs some other task to have completed to supply it with its information so that it can do its thing. So making sure that all the dependencies are satisfied before you start the new task, making sure that there's always work to be done and so on, this can be very difficult.
OK, so as I said, there are no standard packages available for this. In principle, you could try to write your own task-based parallelism software, using existing tools like MPI and OpenMP. There are computer scientists who are really into this and are developing packages.
For example, HPX is a package that's being done at the Louisiana State University by computer scientists there. We tried it. It's too bleeding edge to use, right? It crashes all the time and things like that.
Luckily, we found a package called Charm++, which was originally developed by some quantum chemistry people at University of Illinois. And this is at a sufficient state of maturity that we feel comfortable using it. And yet, here's one example.
AUDIENCE: What are these written in?
SAUL TEUKOLSKY: C++. Yeah. Yeah. So here's an example. This was a test problem we ran on the Blue Waters machine. This was the largest machine we could get access to without being inside a weapons lab.
So this is just a standard test problem in nonrelativistic MHD, but we ran it with our relativistic code. And so what's plotted here is the number of CPUs that we were using. So we took a fixed problem-- so this is the number of domains.
And we kept this fixed. And then we ran it on-- so the reason this doesn't start at one processor is the smallest unit on Blue Waters is one node. So we had to start with at least 32 cores.
And then all the way up to the maximum was about-- depending how you count-- 300,000 or 600,000 threads were being used here. And the goal is that if you double the number of processors to do the same work, you would like it to go twice as fast, OK? And so you see the nice-- I mean, the green is the straight line. And you get--
AUDIENCE: This is log, log. That would be linear. This slope is super linear, the 1.25.
SAUL TEUKOLSKY: Um, not sure about that.
AUDIENCE: I'm just looking. You've got a little over 10 to 10 to the fifth in the vertical. Makes it--
SAUL TEUKOLSKY: Yeah, but we're starting-- right? I mean--
AUDIENCE: I'm sorry. You're right.
SAUL TEUKOLSKY: And extrapolated back. Right. Yeah. OK. What really impressed me and made me a convert-- I mean, if I sound a little evangelical, it's for good reason-- was the code that we used. We took it from the laptop to the supercomputer and made zero changes.
That is completely unheard of, right? When people take codes from a desktop or a laptop to a supercomputer, typically some poor graduate student or postdoc spends six months hand-tuning the code to get optimum performance, OK? So I'm completely sold on this idea of task-based parallelism for being able to not only get these kinds of performances, but also to minimize human time that's needed to do these things.
Now this was a bit of a cheat because this was a fairly homogeneous problem. We didn't have all the load-balancing issues that a real problem would have. All right, so here's a time profile of that particular piece of code just doing 10 times steps.
So the different colors are different tasks that are being done. So the black at the top-- so what's plotted here, these are the time steps. And this is the percentage of what the computer is doing for what fraction of the time.
So the black at the top, that's about a 10% overhead from the Charm++ system. The white that you see here, that's the startup. That's wasted time, OK? So as the processes get started, some of them are not active.
But then this pale blue is the initialization. And then these other colors are different pieces of the algorithm that are executing. And you can see that once things get going, there's no white. Nothing is idle, all right? The computer is completely busy doing everything.
And you can see there are fluctuations in the amount of time that's being spent on the different parts of the algorithm. And that's just the task-based machinery doing its thing-- when something is idle, sending things off and so on and doing various things. And then you see, at the end, you see some white appearing as you finish the simulation. And so if we could get something like this for a real, say, neutron star-neutron star simulation or something like that, that's our goal.
All right, there's some challenges to doing this. One of the things is-- so don't bother. I'm not going to go into some details on this.
But one of the ways you get efficiency in these codes is by adaptively changing the refinement of your mesh. So you might have some region of the mesh where some small scale physics becomes relevant. And so you need to put more grid points there in order to resolve that.
But now if you did that everywhere on the grid, that would be very wasteful. Who needs a fine grid where your length scales are quite large? So you do this adaptive mesh refinement.
And now you run into an issue. That in the time stepping algorithm that's used, the size of the time step you're allowed to take is proportional to the grid spacing. If you'd make the time step too big, relative to that grid spacing, your method is unstable. So this is called the Courant limit or the Courant condition.
So even if you have fine grid over here and coarse grid over here, you're required to take small time steps over here. Well, what are you going to do over here? Well, what most people do is they take small time steps over here as well. What you would like to do is to be able to take small time steps over here and big time steps over here.
And now you see, if you have task-based parallelism, you can do that. Because this task to take the big time step doesn't have to be done. You can put all your processors to work doing the small time steps where they need it.
All right. So I'm going to give you a secret. If you submit a proposal to a computer allocation committee, here's how to fool them, right? They ask you to demonstrate that you're going to use their precious resource responsibly.
So they ask you to show a kind of a scaling diagram like I showed earlier. That as you increase the number of processors, you get nice scaling law. It doesn't have to be completely linear, but it should go reasonably, increase as you increase the number of processors.
So on a conventional code, if you're using adaptive mesh refinement and you use this local time stepping idea, you get very poor scaling, right? Because the processors responsible for the big time step sit there doing nothing while the processors taking the small time steps do something. So what everybody does is they run with the small time steps everywhere, wasting-- right, where they could have taken a nice big time step over here-- they're taking a whole bunch of tiny time saves just so they can make a diagram for their proposal to the allocations committee showing nice good scaling.
All right? So if you've not heard of that trick, now I've told you. And there was even a paper a little while ago where somebody did a survey of these things and they all got bad scaling if this local time stepping was turned on. There was one exception, which is a task-based parallelism code.
So to summarize, after 50 years of finite differencing, I think it's time to move on. OK? Algorithms like-- I mean, I don't-- yes, I do. I claim DG is the way to go.
But you have to have an open mind. Maybe there are other things out there. They are robust for shocks, and they get good scaling. And something like task-based parallelism is going to be essential if we're going to make use of machines that have millions of processors. Thank you.
AUDIENCE: Task-based method-- isn't there extra overhead in communicating because there's all this local data that formerly, [INAUDIBLE] looking at the neighbors. Now it has to-- when it's read-- it has to send off this data to this processor, task processor and then get the results back. It seems like it's a little bit of waste--
SAUL TEUKOLSKY: No, it doesn't have to go to the task-- the task processor just knows the tasks. It doesn't have to deal with the data, right? So it can assign the task.
So Charm++ is actually quite smart. It has a bunch-- I mean, I didn't go into detail. But for example, you can have data on this processor. This processor can become idle. The Master Scheduler says to this processor, OK, please compute the derivative of this data.
And the system knows that the derivative-- you know, the function to compute the derivative is being assigned here, and the data is sitting over there. And it can figure out-- either there is a default for it or you can actually go in and play with the various settings-- it can figure out whether it's actually better to reassign the function to this data over here or actually to move the data to here and then compute the function, rearrange things. So it has a bunch of facilities that can do load balancing.
And it can even do-- it can take measurements during the computation and decide OK, let's pause for a second, and let's move some things around. And then let's restart, because it will be more efficient. So we haven't really explored a lot of these facilities. But that kind of thing is possible.
AUDIENCE: Are you showing that the data transfer is not a large part of the time?
SAUL TEUKOLSKY: Yeah, because the idea, the whole idea is while the data transfer is happening, you don't wait. In a conventional MPI thing, what happens is I have some element here that needs information and it asks for that information and waits until that information arrives.
SAUL TEUKOLSKY: Yes. What happens here is I have something here that's producing information that this processor is going to need. When that information is ready, it just sends it and forgets about it and continues with its next thing. Meanwhile, this processor is doing its thing on some other task that it's been assigned.
So that time that it takes for that information to arrive is hidden while other work is being done. And then when this is received, when all the information is received, then the system knows that the next task that needs that data can be started.
AUDIENCE: And you were saying that it can actually do automatic configuration of the load balancing.
SAUL TEUKOLSKY: Yes, it can rebalance things based on measurements that are made during the startup of the calculation.
AUDIENCE: What's the fractional computational overhead of that?
SAUL TEUKOLSKY: About 10%, it seemed to be on that. I mean, so far that seems to be about the number, 10%. Yeah?
AUDIENCE: If you have several different simultaneous partial differential equations, the load on one node is going to be noticeable no matter how--
SAUL TEUKOLSKY: You don't have to have every u, every element of u vector on one processor. You could distribute the tasks-- you could--
SAUL TEUKOLSKY: Right? Because in other words, a task might be computing a right-hand side, OK? So you have this whole vector of right-hand sides, and you could decompose them into tasks.
AUDIENCE: In your description there, it did require an extra hardware feature, which was the ability to cache the incoming information while you weren't working on it. Is that something that any architecture can support?
SAUL TEUKOLSKY: Well, it's a question on how much RAM you have per processor. Ideally, this would be cached in RAM. Yeah?
AUDIENCE: MPI has that same problem.
SAUL TEUKOLSKY: Yeah, I guess that's a better answer. Right? Since we deal with that already. Yes?
AUDIENCE: Does this framework require people to code in a completely different way--
SAUL TEUKOLSKY: Well, somewhat different-- Nils thinks no.
SAUL TEUKOLSKY: So if you want to use a package like Charm++, there's a steep learning curve right now because it's one of these development packages and so on. But I think what's going to happen-- I mean, already people like Nils have figured out how to take the existing Charm++ stuff and wrap it in a way that nonexpert people, people who just know C++ but don't think about a lot of these things, can actually use these packages in a reasonably straightforward way. It does require you to think in a different way if you're used to MPI parallelization.
Because-- I gave one example with the communication. It's here. You don't ask for information.
You send the information, and then you just check, have I received the information? So that's one kind of different paradigm that you need. And then you have to make sure that your problem is subdivided enough, that there are enough tasks to keep the CPUs busy all the time. So there's a granularity of size that you need to have. But other than that, it's pretty standard.
AUDIENCE: The reason I ask is because it sounds like creating the task list itself is going to be a significant part of this problem.
SAUL TEUKOLSKY: No, you write just ordinary C++ function. Basically, it's a function call.
AUDIENCE: It turns out that-- people who have been working on for this is that you write it just like any normal C++ or Python program. You call the function. Some backend infrastructure goes, OK, where do I execute this function? And then it gets sent around the super computer and run wherever is necessary.
SAUL TEUKOLSKY: That's all hidden from the user. I mean, unless you really want to go and-- for some reason. But I mean, it's transparent.
AUDIENCE: Well, if it's all automated, can it ever make a mistake and do something wrong behind the scenes?
SAUL TEUKOLSKY: The same way MPI can make a mistake.
AUDIENCE: How do you benchmark it before you get it on the actual expensive production issue?
SAUL TEUKOLSKY: Well, so like all attempts to parallel something, you can never get good performance if you don't have good single thread performance. So the first thing you do is you make sure that the serial execution is profiled and optimized and so on. And then, you can do-- once you have that, you know you can do the scaling test, right? Do I get the speed up when I subdivide the tasks? And if I don't get that, then I have to work on you what is it that's blocking the system from--
AUDIENCE: Even after you have all that, how do you decide the actual task granularity you're going to use for the real calculation?
SAUL TEUKOLSKY: So the recommended way of doing this is to have-- I mean, let's take a simple example, right? I'm solving, say, just one single differential equation. And I have a bunch of grid points on which I'm going to solve it.
And then I subdivide the interval into subdomains. Not because there's any adaptivity or anything around, but just in order to parallelize it. Then a parameter at your disposal is, how many subdomains should I make? And that would determine what the granularity is.
So ideally, you'd like to have some parameter like that which has nothing to do with the physics, but purely controls how big or small tasks are, how many grid points are going to be assigned to a particular CPU, for example. And then you use that empirically, basically, to tune your code on a particular piece of hardware.
AUDIENCE: It seems like if you're going to send it to the processor that's going to use it it has to be scheduled ahead of time. And that brings up--
SAUL TEUKOLSKY: Well, not necessarily-- if there's enough work that can be done. Right? So ideally one would like to understand your algorithm well enough to start tasks that others depend on ahead of time. But it's not necessary, right?
Because suppose this task, this processor-- there's some task that's going to need some result. As long as there are other tasks that can be done, eventually this task gets scheduled and produces its data. And then meanwhile, this task which was sitting up in the queue, waiting for that data to arrive, it can now be started.
AUDIENCE: But it still has to know which processor it's going to send it to.
SAUL TEUKOLSKY: No. It just needs to know--
AUDIENCE: So you do need to know where to send it. But these libraries handle, OK, I want to compute on-- I want to call this function. In other words, like I was saying, you have some backend to that-- actual computer science people, right, that then goes, OK, where on my supercomputer does this data go?
SAUL TEUKOLSKY: So the user doesn't have to control that, right?
AUDIENCE: That I understand.
SAUL TEUKOLSKY: Right. And you know, these computer scientists are smart. We trust them.
AUDIENCE: So one thing with MPI, you basically know everything about how many things you have and roughly where they are. You don't have that with this. You have no idea.
You can never get a list of all of your tasks. It's just not supported. And that's the thing that makes it so parallel. So you basically send information and hope it gets somewhere.
SAUL TEUKOLSKY: So far, we haven't lost a single packet, right?
SPEAKER 1: All right. Well, OK. Jeffrey? Last question.
AUDIENCE: Sure. One thing I'm still fuzzy on is the relative importance of the method being having a small stencil and the task-based parallelism.
SAUL TEUKOLSKY: So in principle, I think you could use task-based parallelism with the finite difference code. But I think there the challenges would be that, as the stencil gets wider and wider, it's not obvious to me that you can hide all of those communication costs. All right?
I mean, you want to use-- you have a smooth gravitational wave going out. You want 12 grid points or something in each dimension. And that's a lot of communication. And now I'm not convinced that there's a way to use a million processors efficiently to have enough tasks to fill up a machine of that size. So I think it is crucial to keep the communication as small as you can, even with task-based parallelism.
SPEAKER 1: OK. Well, thank you so much.
We've received your request
You will be notified by email when the transcript and captions are available. The process may take up to 5 business days. Please contact firstname.lastname@example.org if you have any questions about this request.
Large-scale computer simulations are increasingly crucial in explaining astrophysical phenomena. One recent example is the stunning agreement of the full 3-d solution of Einstein's equations for colliding black holes with the observed signal from LIGO. For the past 50 years, the dominant computer method for solving these kinds of equations has remained essentially unchanged. To keep up with continuing advances in observation, simulations will require more fidelity and higher accuracy. One might think that with exascale machines becoming available in the next 5 years, this will be easy. This is not true.
In a Department of Astronomy Colloquium on March 29, 2018, Saul Teukolsky explains why Moore's Law is broken, and how the next generation of supercomputers will instead get their power by having millions of processors. Current codes will not be able to use these machines efficiently. He describes new methods for harnessing the power of such exascale computers to solve some of the largest problems in astrophysics and other areas of science. Teukolsky is the Hans A. Bethe Professor of Physics and Astrophysics.