SPEAKER 1: This is a production of Cornell University.
SPEAKER 2: Pleased to welcome you this morning to our Personalized Medicine and Nanobiotechnology, Your Health Care of the Future.
We have a wonderful panel for you this morning and need to ask your good graces with one request. We are videotaping for future use, And we would kindly ask you to turn off cell phones or place them on Vibrate so that we can have good filming session today.
I'm pleased to turn our program over to Vice Provost Dr. Stephen Kresovich, who will introduce our speakers.
STEPHEN KRESOVICH: Thank you. Good morning. I hope everybody's had a good week so far. I'm really excited about this presentation today. I've been working for a number of weeks trying to organize this, and I think you're going to really be pleased with what you see.
We have two presenters today. And both of these presenters are well-recognized across the university, across the nation, across the world, as being leading researchers in biological sciences and in nanotechnology. They're great teachers, and they're also great leaders on this campus that bring groups together and build bridges in new ways.
They're also key players as we try to link Ithaca and Weill for future research, bringing together the best science and teaching that we have on the Ithaca campus and linking it with the clinical and research programs in New York City.
Two presenters today-- Andy Clark is in the Department of Molecular Biology and Genetics and is the Schurman Professor of Population Genetics.
Our second presenter is Dr. Harold Craighead, the Lake Professor. And he's in the Department of Applied and Engineering Physics, and he's also a Director of the Nanobiotechnology Center. So I think you'll see presentations that fit together that highlight advances in sciences as it relates to diagnostics, biology, and therapeutics.
So with that, I'll turn it over to Andy.
ANDREW CLARK: Thanks, Steve. This really is an exciting time to be studying human genetics. As you know, the human genome was sequenced and published in 2001. And in short order after that, it became a real question of just how exactly was this going to be useful in medicine. How was it going to be actually applied to making medical discoveries?
And my field of study's actually population genetics, the study of variability within populations. And it was really amazing the extent to which that was the field that really had the keys to some major breakthroughs in using the genome sequence for understanding particularly complex disorders.
So complex disorders are diseases such as cardiovascular disease and diabetes and many cancers. That is, they have multiple genetic causes and environmental causes that we often know that there are particular environmental cues that will increase the incidence of those disorders. But also, they tend to aggregate in families. So they have some genetic component to them.
And these are particularly difficult to understand genetically because they're so complex in causation. Because we know that they don't transmit through families like a simple Mendelian gene, a regular single-gene kind of trait, it's much more difficult to study them.
Classically, the way we would study the genetics of disorders in humans is to collect pedigrees, collect families of individuals who are affected and then go collect siblings and parents and go back and collect the whole assortment of individuals who are related. We would then collect DNA samples from those individuals, score the genotypes at markers in the genome, and then use various statistical methods to try to infer, is there a transmission of a disease gene along with those markers?
The problem with this kind of method, called a "pedigree study," is that there are relatively few recombination events. In a family study of maybe 100 families of affected individuals, you only get a few hundred of these recombination events.
And this is a picture of one of the statistical methods, where you look at the log of the odds ratio, sort of a measure of to what degree do I believe that the gene is in that region of the genome, scanning along the chromosome of these individuals in this family. And the peak width is wide enough that hundreds of genes could fall in that region. And so there's a limit to the resolution of the classical ways of mapping genes.
So another way to think about this is rather than think of individual pedigrees or individual families of closely related individuals, use instead the whole population, because the whole population is also related to each other. We're all related to one another by fourth, fifth, sixth cousins or so. And so we can use that information, as well.
If no recombination occurs-- so chromosomes are always recombining one with another. Then if a mutation occurs that generates what we call a "polymorphism," a variable position on the DNA, and another mutation occurs that causes a disease, there can be a statistical association between those two, where individuals with a big A allele also have the big D, which is, let's say, a non-diseased allele, whereas even without looking at the little d or disease allele, we might be able to detect its presence by statistical association with a little a allele.
So that's what we're looking for is this kind of statistical association. And this is getting a little bit more than we need, but it's just to point out that there are measures of that degree of correlation that are just straight from standard statistics. An r squared is like a correlation between those markers.
Another maybe easier way to see it is suppose you have a chromosome, and a mutation occurs that's disease-causing. Any disease-causing mutation always occurs on a chromosome that has a set of markers on it, is a particular configuration of that chromosome in that individual that's different from anybody else. And so that means that there's somewhat of a history of that particular disease mutation by its association with what's flanking it on the chromosome.
Over time-- that is, over generations-- recombination events will occur flanking this disease gene. And the region around that disease gene that has the same background, the same markers that were in the original mutation, get smaller and smaller. And this is another way by which we can map. And we look for what are the markers that are statistically associated with the disease allele.
And it turns out the field that I'm in, population genetics-- this is a textbook that Dan Hartl at Harvard and I have written. It's actually the most popular textbook in the field. There's a lot of algebra and mathematics behind modeling.
What do you expect to be the behavior of genes in populations? And this is the fundamental thing that drove this whole set of studies, which is that degree of correlation is expected to be strong for things that are close together. And if things are further apart-- that is, genes on a chromosome are further apart-- that correlation should be lower and lower.
So about five years ago, we were able to start collecting empirical data. Each of these dots indicates a calculation of that statistic, that correlation statistic, for a pair of markers along the chromosome. Any pair of markers on the chromosome, we can calculate the distance between them. So this is the distance in thousands of base pairs of DNA sequence.
And what you can see is that you get some of these relatively high r squared, a high correlation for things that are far apart. But it's rare. Most of the time when you have a high r squared, the things are very close together. And as you go further away, you, in fact, see less correlation.
At another scale, this is particularly clear. Out to 30 million base pairs now, we have chromosomes that are 250 million base pairs in length. So even this is a tiny portion of the chromosome. But you see that it's very clean, very little correlation unless you're very close to the actual gene.
So now, we can turn this crank backwards. If we could identify markers across the genome and look for things that are correlated with diseases, that we could infer then that marker is close to the actual mutation that caused the disease.
So this launched this project, the HapMap Project. You can read about it at this website, hapmap.org. And it was a National Institutes of Health-funded project to look at 270 people from four different populations and to measure these markers across four million of these markers across the genome. And I was on the advisory panel for this and was a player in many of the calculations that were done. It's a tremendous international collaboration to understand variability across the human genome in many different population groups.
And this is just showing that that rate of decay of this correlation we also call "LD" or "linkage disequilibrium" the rate at which that decays varies across the chromosome. So there are regions where it goes down fast and slow. And that means our ability to map genes might differ. This is actually the whole human chromosome. We have these 22 autosomes plus the X chromosome, giving us 23 pairs of chromosomes.
One way to show this degree of correlation is by these figures that show here's a scan along the chromosome. And what we're looking at is for these. These red triangles are indicating highly significant correlation between those pairs of markers on the chromosome. So a region like this is less correlated than a region like this. So you're looking for a sort of degree of structure to that variability in a population.
Now, this is going to relate to some of the comments Harold's going to make, which is along the lines that technology really does drive a lot of the progress in areas like this. And the watershed technology here was the advent of these chips called "gene chips." One of the companies that makes them-- and several do-- is Affymetrix.
And they make a chip like this that allows you to score the genotype of 500,000 of these markers across the genome in one sort of two-day operation, taking a blood sample from an individual and then just scoring them. They're up to 1 million markers now on these chips. So it's very fast-moving technology.
And the astonishing thing is how inexpensive these are. $200 or $300 is your typical medical procedure. And now, you can get a million markers across your genome genotyped. So this is the design of the experiment that I'll talk about for the rest of the time. They call them "GWAS" or "Genome-Wide Association Studies."
And when I started saying this is a really exciting time, the first of these was published in 2005. And it was a quite small one. All the ones that were really the genome-wide scale have been published this year. So we're right at the watershed time of realizing all these studies.
Typically, they start by identifying 1,000 individuals who have some disorder cases and 1,000 control individuals, matched for age and gender and so forth. We genotype them with the SNP chips. And then there's some statistical analysis, doing that correlation analysis I just described, and looking for clusters of significant associations.
This is the very first one. It was published in 2005 in Science by Klein et at. It's on Age-related Macular Degeneration, AMD. Of course, this is the predominant cause of eyesight loss in the elderly.
This was a minute study. It was not 1,000 cases and 1,000 controls. It was only 96 cases and 50 controls and not 500,000 SNPs, but 100,000. For each of those markers, they do a test of statistical association. It was actually your straight chi-square test you learned in high school or freshman year or wherever. And what they identified were two SNPs in an intron of complement factor H.
So right out of the blocks, this study discovered a novel gene that had not been implicated in age-related macular degeneration. It implicates the immune system. It was a watershed change in direction for that whole research community, realizing we have to think about what activates the immune system.
How does complement factor H play? What are the possible drug interventions to play with this? And more progress has been made in the last year on macular degeneration than the previous decade as a result of this.
You may have heard of the Framingham study. This is a cardiovascular disease study in Framingham, Massachusetts. Essentially, the entire population was ascertained and is being followed longitudinally. Every individual in that study is being genotyped with these 1 million SNP chips so that they'll now be able to add on to all the epidemiology that's been done in these kinds of genetic studies. And those are just underway, that the genotyping's actually done. And the data are available, but nothing has been published from it yet.
In the United Kingdom, there's a thing called the Wellcome Trust. It funds a good bit of the medical research in that country. They set up a thing called the Wellcome Trust Case Control Consortium, looking right off the bat at seven disorders, bipolar, coronary artery disease, Crohn's disease, hypertension, rheumatoid arthritis, and type 1 and type 2 diabetes.
And they identified 2,000 cases and 3,000 controls, an even larger study, a total of 17,000 individuals across all these, published it in one big paper, and what these pictures are looking like again is it's a scan across the entire chromosome, 500,000 markers across the entire chromosome. And the y-axis is this minus log 10 of the p value.
So a big number means a very low probability. It means a significant association. And you can see for bipolar disorder, they essentially didn't find anything. This is possible. And we have to recognize that even with all the power of this technology, there can be disorders that are so slippery, so difficult to define or so difficult to work with, that we might not succeed.
Coronary artery disease, I'll show you in a little bit more detail. There are a number of very, very important hits and so forth. So we're just at the level of finding many statistical hits and having to take the next steps of understanding what does it mean if that gene is associated with this disease.
This is just to show you a zoom-in for coronary artery disease. So this is the many, many SNPs as we scan along here. And again, it's the minus 10 log p value. So minus 10 means 10 to the minus 10th or one in 10 billion. So you can see the very low probabilities, but many, many SNPs right in a tiny region that are associated with this pair of genes.
Now, this pair of genes had never before been implicated in cardiovascular disease. So again, it immediately opens doors to figuring out why is that gene associated with cardiovascular disease. I'm involved in a study of breast cancer. It's actually the Ashkenazi Breast Cancer Consortium.
Globally, there is an enormous breast cancer consortium that's nearly 30,000 cases now. And the whole research community is recognizing that these tests are much more powerful if we can pool resources and work together on them. And so there's been a tremendous effort to do that pooling of resources.
This is the downside of it. I have to be honest that not every study has a whoppingly huge effect on the disease. We're seeing very low p values, which means we're very confident there is a significant effect. But the magnitude of the effects are often relatively small. This axis is the odds ratio.
So an odds ratio of 1.2, 1.3 means a 20% or 30% increase in risk. That's pretty difficult to interpret what exactly that means, but there are plenty of things you do in your daily life that will double or triple your risk of cardiovascular disease or an accident here or there. So in fact, a 20% increase in risk is relatively small in the grand scheme of things.
So it's not a huge-- it's clearly only a portion of the total risk of breast cancer that's explained by these genes. And that's fairly typical. Most of these explain a small total percent of the risk. What this is showing is actually many, many different studies and how the studies, even though there's a small odds ratio, they're all remarkably consistent in the effects that they're seeing.
Despite that warning that we're seeing small effects, there is nevertheless major excitement about what these studies can show. I gave you the example of macular degeneration. That's a case where it's just changed the direction of the research. We now have a new gene to study.
Type 2 diabetes-- many of you are probably aware of the fact that we had long thought that type 2 diabetes was probably mediated by insulin resistance, that the body is still able to produce and regulate insulin, but the reception of the signal of insulin is what's reduced. These genome-wide studies pretty much throw that out the window.
It very clearly does not implicate genes involved in insulin resistance, but it implicates a number of genes that are associated with the actual amount of insulin produced and regulated. So insulin degrading enzyme was one of the key hits and a number of other genes that were not known to be involved in this pathway but subsequently are clearly involved in production and regulation of insulin amounts, rather than insulin resistance.
And so this is just, again, the list of the kinds of genes that we're finding from these things and very small p values that are repeated from many experiments. So we're getting more and more confident that these are actually playing a role.
So now, you have to take a step back and ask, what's the primary purpose here? Where are we going with all this? What are we really going to be able to do? And the primary split in the sort of intention of these studies-- and in reality, of course, the studies are trying to do both.
But different individuals have much different emphasis on etiology or the cause of the disease. What's the mechanism of the disease? We always want to understand that because the greater you understand mechanism, the more likely you're going to be able to understand how might a drug work. How might I be able to ameliorate that disease?
So that's always a goal. But what that entails is identifying genes. Even if they have small effects, they might be in relevant pathways that let me gain some understanding of mechanism. Prediction-- or useful for diagnostics, prediction is if I just get the genotype, can I say what's the chance that I'll get the disease?
And I think I've indicated already with those very low odds ratios that we're not here yet for any of these diseases. In fact, we're quite some distance from being able to predict, are you significantly more likely than you to get this disease or another? There's starting to be some sense that we're seeing what the directions are.
But it's clear that all of these disorders are highly polygenic. They involve dozens of different genes. They involve many different environmental cues that are sometimes very difficult to measure and to understand. And even more complicated, their particular interactions-- only some particular genotypes in some particular environments or diets or whatever are at elevated risk. So this issue of prediction is particularly a challenge still.
One of the dangers is the idea that this is going to hit the market. People are going to see, oh, this is great. I can just genotype with these cheap chips. They're so inexpensive. Commercial outfits can get a hold of them and do them. And this is a website of one of these companies, and they're claiming that they can measure your genotype and now predict, tell you, all of these sorts of things.
And the scientific community's very worried about these things sort of running well ahead of the science, that they're just scientifically not justified to be making the claims that they are already. And so keep your eyes out for that.
This is actually a talk from the former director of the National Human Heart, Lung, and Blood Institute, showing sort of her dream for the way we'll be able to use some of these approaches. And it's the four P's. So we'll be able to use these chips for at least getting some assessment of a genotype of an individual, to the extent that there is increasing knowledge of the use of that genotype for understanding what's the risk of that individual.
There'll be some degree of personalized prediction or personalized sort of recommendations at the bench. Often, I should say this kind of prediction can be very, very useful, can be specific for, for instance, drug dose determination. There, there are certain drugs now where we know much, much better what's the likely dose that this individual will have toxic effects than this individual because there's a very clear association for that sort of biochemical change.
Many times, these sorts of medical care can be preemptive if we identify somebody at inflated risk. The best example of this is something like PKU, where a change in the diet can radically change the outcome of an individual with PKU. And finally, of course, this all has to be participatory. Everybody has to be engaged at the level of understanding ethical concerns, understanding the degree of confidence and accuracy.
There are many very subtle statistical ideas about future risk that are entailed by these technologies. Cornell's played a real major and leading role in many of the aspects of this work. We're particularly strong in statistical inference. There are a number of different studies we're involved with.
Particularly, chronic obstructive pulmonary disease, there's a very good collaboration with the medical school now on that. That's the leading increased cause of death in the US, COPD. There's a project on prostate cancer. I mentioned the breast cancer study. And my laboratory's doing a lot of cardiovascular disease and collaborations with a number of other laboratories.
Cornell was also very key in comparative genomics, using the analysis of the human genome, comparing it to other genomes of closely related primates to make inferences about what genes are important and differences between them and differences in, for instance, the physiologies of cardiovascular and atherosclerosis. There's work going on in genomic imprinting and how these sort of epigenetic phenomena, the sort of chromatin state, influences disease risk.
Another comparative study is with Drosophila. There are six very active fly labs. We have 12 fly genomes now. Drosophila's an amazingly good research organism still today for human disease. There are models of diabetes in Drosophila, believe it or not.
And finally, canine genetics and genomics-- we are one of the key players in the dog genome analysis. And Carlos Bustamante and collaborators in the vet school are really taking a lead. And we share about 350 genetic disorders with dogs, so it's a terrific model for that.
And before I get dragged off stage, I'll turn it over to Harold.
HAROLD CRAIGHEAD: If you want to answer a question or two while I'm unplugging here, I'd be--
ANDREW CLARK: Yeah. We probably have a minute. Yes?
AUDIENCE: How advanced are the ethical considerations, such as information, insurance companies might know or telling patients what they're--
ANDREW CLARK: Yeah. So the official side is to what extent did the National Institutes of Health and people try officially to engage ethicists and so forth? And fully 10% of the budget of the Human Genome Project went to ethics. So at every meeting, they're present. There's a real presence of how do we assure that this is used the best.
On the other side, sort of the fact that this gets out there and there will be any company can hang a sign and start going, it will be a real challenge to keep up with that.
AUDIENCE: So have there been any published recommendations on how to develop this for mass--
ANDREW CLARK: Statements for how to interpret, like that website and companies that are advertising these things, we're behind on really raising the flags and saying how should the public interpret this. We need to get on the ball with that. And they know it. They're drafting it now.
Yeah. One more.
AUDIENCE: Are you working on or considering anything with the Melanoma Genome Project?
ANDREW CLARK: Melanoma-- so there are many projects going on with cancers. There's the Cancer Genome Atlas, where they're re-sequencing many, many genes in many, many tumors. And our connection at Cornell is somewhat less direct with that. But we're certainly aware.
I'm on the National Human Genome Research Advisory Council. So I sort of see all these projects. Richard Gibbs is a close collaborator of ours, and all of that sequencing is being done at his center. So we're kind of on top of it, but not direct players in it.
ANDREW CLARK: Yeah.
HAROLD CRAIGHEAD: So with any luck, there'll be chances for questions for all of us at the end. But I wanted to follow on from what you heard in a slightly larger context of technology.
There's quite a growing interest in a so-called "lab on a chip." And that is taking the capabilities to analyze all sorts of chemical biochemical compounds not in a full laboratory, but on something that resembles a little micro electronic circuit chip, something that's embedded in all of our little pieces of electronics.
And within that is the class of obtaining information of genetic information that's directly related to what we heard before. But the state of the art now is not nearly as sophisticated as I think it's going to be. Right now, there are some levels of chips, but they're really a little piece at the end of the stage.
A lot of the processing is done by conventional techniques. And what I'm talking about is a full lab on a chip, which is not widely used yet. So we're talking about something farther in the future. And this is just sort of my schematic of-- this might be the sort of types of things that are being done now in a miniaturized format, but everything from the initial sample collection, sorting through stuff that you want to look at, stuff you don't want to look at, multiplying by different processes the parts that you do want to look at.
And so that all happens not in a matter of days, but in a matter of, hopefully, minutes in something that you can then utilize rapidly, efficiently not only for gathering information for research purposes, but for diagnostics. And so I think much of the community is following along the revolution we saw in microelectronics, where things were done with big individual things that took up a lot of space.
And over time, those decreased in size and functionality that we now take for granted that things happen fast and low power and compact. And so that flavor of technology evolution I think is now in a lot of research laboratories, with long-term motivations for medical diagnostics identifying different pathogens, diseases, and also for research.
And so what you heard today was there's a trying need for lots of information about many organisms and individuals. And you heard about things related to human health, but there are people who care about plants and want to look not necessarily at diseases, but at desirable properties of plants and how they might be found in nature. And so there's a tremendous hunger for information from all walks of life sciences. And so if technology can contribute to that, we would hope it might follow something.
This is one of the original transistors at Bell Labs, which was something this big with a couple of wires stuck in it. And in about 50 years, it became what's an integrated circuit chip that has millions or billions of transistors not even individually visible on this image in a very tiny object. And so that took about 50 years.
And so where are we in sort of that? This has now become part of the-- everyone understands some level of what Moore's law is. This is sort of that same 50 years in terms of numbers of devices that are found on these chips. And this is an exponential behavior. So every three years, there are sort of 10 times more devices.
And that was enabled by the ability not to make one thing at a time, but many at a time, so-called "planar processing" so that sort of stencil-like processes done over and over can make many, many highly reproducible objects. And I put it in basic research because exactly how this should be directed wasn't done by accident, but a lot of largely industrial research labs.
And so big industrial research labs, like IBM and RCA at the time and Bell Labs were doing a lot of the basic materials and device processing studies. And I put that up there because I think there's been a change in the way that if we're talking about something happening in the life sciences, I think those large labs don't really exist.
And so I think that universities have to take up a lot more of the long-term research burden in making that happen. And so the nature of what allowed us to make those more and more devices was the ability to manipulate smaller and smaller objects. And when we talk about nanofabrication, nanotechnology, we're talking about something which is in the order of a nanometer, which is just about the size of a molecule.
So that's where the ideas that we're starting to talk about molecules and use technologies that access those sciences come together. And that's been the flavor of what a group of us here at Cornell and other institutions have come together with our different techniques from different backgrounds to address that.
And so somewhat an analogy to that original transistor, a lot of the original devices that are testing the basic ways of how we might analyze things in little volumes look like this. That's a quarter. And so there are little things that instead of having wires only, they have tubes and wires stuck into them. And that's where we do the initial analyses.
But those are very simple devices. And in something like 50 years, where will we be? And I don't think I can stand here today and tell you. But probably some of the students at Cornell will be the people who are actually driving that. And maybe we will have provided them some information to guide them on that path.
But what is happening now are developing some of these techniques that will allow us to further miniaturize and interface with biological systems. And so these are just different devices. These are little surfaces that have been modified with little spikes and structures, little things to carry liquid or to carry light or to carry electricity and to have different chemical structures.
And those can all be brought together. And that's a much more diverse material system than that's involved in electronics. And so this is a major task to try to sort out what the capabilities and possibilities are.
But these are pictures of individual cells that we can access. So if we're talking about a diagnostic device, we may want to select out of an individual some cells that are behaving strangely and identify what their nature is at that time. That may not all be genetic. That might be what proteins are available at that time.
So in the long run, we might be able to select just a few cells that we care about, ones that were looking a little suspicious and might be cancer or floating around in the blood where they're not supposed to be there. Or we find them in body fluids. So you select a cell that you want, and then you can do detailed evaluation of what's going on.
This, for example, is just a single bacterium that was grabbed on this little cantilever by selective chemistry. So we can, at some level, do these things today. We can grab things that we care about. And then we have technologies which are not fully brought to bear together, but we can also then access individual molecules.
And a molecule of great interest is the DNA molecule that carries the genetic information we're talking about. So the rest of what I want to say is targeted to that. But it's such a broad field that I'm not going to talk about all the other aspects of the analytical diagnostic capabilities.
But for genes in particular-- and I just want to make one more analogy to electronics. In electronics, we have wires that carry electric current. In these analytical systems, we tend to have tubes that carry chemistry. But it turns out those technologies are directly related. And so the things that would make a wire on an integrated circuit chip and might make many of them connect many little tubes.
So this is an optical microscope image of a bunch of little tubes with some fluorescent dye so we can see them. And they're made by these so-called "lithographic and planar processes." And so in those things, we can move by electric field. These are little fluorescent spheres, just so we can see them.
But those could be cells. Those could also be molecules. But we can do many of them at a time. And we can have them go through a little turnstile here so that we can see them one at a time. And with the power of modern electronics and optics, we can easily go 20 times faster than that.
And so this suggests that just like electronics, we can make many of them. And we can make them go faster. So that was only to get that point.
So this is not doing any analysis right now, but those are the types of simple tubes and optics that are being used to analyze the molecule we just heard about, which is the chromosomes, which are very long, which are macroscopic-length molecules but in nanometer dimensions.
And so those are typically very long molecules which, in conventional analysis, are extracted from many cells, amplified so you have lots of material, and then they chop them up in lots of little pieces and analyze lots of little pieces and then try to put back together the information, which is sometimes lost in the shuffle.
So it would be desirable in the long term to do the minimum amount of processing. And I suggested even extracting material from one of these cells that you are interested in, one bacterium that you want to know what it really is and what drugs will kill it, whether it's one cell that might be suspiciously looking like it'll become cancerous, that kind of thing. But we want to keep most of the information we would like to have, the correlations.
You saw some pictures of information that you want to know if this piece of genetic information exists. But you want to know not only that it exists, but where it is in proximity to something else. And so that relationship of things on this molecule is a useful piece of information that today is only derived very indirectly.
But it's in there, if we could just get to it. So some of the techniques we're talking about would allow us to do that. So this is one, a several thousand base-long DNA molecule, not a whole chromosome, but a fairly big chunk of one. And so what we're doing now is taking that individual molecule as if it was something we could stretch out like in that picture and look at.
So by applying electric fields, what we've done now is using a simple barrier here, use electric fields to pull on this molecule. And right now, so here's a single piece of DNA that we're sort of pulling on and stretching out in this real time. So the ability to actually get a handle on individual molecules using not any direct coupling but using these electric fields and nanostructures is something we're studying right now.
And so the hope is we can actually orient this molecule like in the cartoon of a chromosome and then apply probes of the same kind you heard about and just read them off. So we would have the whole intact information about where this information is.
And that would be a much more direct, much more powerful approach of getting much more information about this molecule. And then if we could do that cheaply and rapidly so this was all hands-off and you just kind of throw something on there and it happens, that would be the long term, which is not where we are now.
But this is sort of the lab in the lab today, not the lab on a chip. So we have a bunch of lasers and things pointed at things. So we're actually deriving information from these little devices. The hope would be that the technology-- well, the technologies already exist to miniaturize this. But today, we're looking at this as basically what the basic properties are that we can use.
And this I can probably skip briefly. But these are not one, but many hundreds and thousands of molecules zipping by this little analytical station. So as this is happening, those little flashes of light are being analyzed. And each one of those tells us how big that molecule was.
And that piece of information of how big that molecule is is basically what's driving a lot of current genetic analysis, which is done by chopping things up and measuring how long they are. And so we can do that quite rapidly with small numbers of things.
Each little flash if it's small means it's a small molecule, if it's big means it's a big molecule. So it's a very direct way of getting something from individual molecules, which is not obtainable, not used, in the current techniques.
So we heard about individualized medicine, a little bit about drug development. I wanted to mention just about the basic issues of finding genes that might be useful. We talk mostly about identifying disease, and the knowledge that one gets from lots of individual genomes is valuable.
But prospecting for genes-- Craig Venter goes out on his sailing craft and scoops up parts of the Sargasso Sea and takes all the DNA and sort of sifts through it to see if there's something useful. And that's sort of prospecting for interesting genes. But it would be more valuable if you could take not that whole soup and sort of find something, but select out an organism and then do that sort of analysis on that one organism.
So that's the sort of thing that we're pointing to and also pathogen identification, that you want to know quite rapidly. If I'm exposed to something, I'd like to know very rapidly what it is and what antibiotics might deal with it or what we should do.
So the idea of obtaining full DNA information-- that is, the total sequence-- if you had the whole molecule and it had every base pair's position, you would have all the information there is. And we talked about the Human Genome Project, which was, of course, a major enterprise, billions of dollars to do that not really for one-- it was really for one individual.
So when they say the "human genome," that doesn't mean that we know each of your genomes in this room. We know sort of on average what's happening in a human. We know where stuff is. But irrelevant for what drugs I should take would be my individual genome. That information is not practically obtainable.
But there's a drive motivated by, for example, or driven by the NIH has stated a goal of a $1,000 genome. So they want that any individual's genome could be obtained in a reasonable time for something under $1,000. So that's taken as a goal. And that's a government-funded project.
But a non-governmental motivation is the XPRIZE has just been announced for the first group that can sequence 100 individuals' genomes in 10 days. So it's been understood that this goal of rapidly obtaining sequence information is a target that people are going after.
And so what I've talked about before were sort of general techniques. But the idea of getting at single molecules that contain this sequence information is something that I think the words "nanobiotechnology" I think really focus on that sort of an issue.
So this is a model of the famous DNA, a ball-and-stick model. But the bases that carry the information, the information is separated by a distance of what's 3/10 of a nanometer, which is 3/10 billionths of a meter, which is 3/10 billionths of this. So it's a dimension which is not easily accessed by existing technologies.
So it really is a question for nanotechnology. That's the question. Here's some information which is separated by nanometers. Go get it. So that's a target for nanotechnology, which a lot of people are doing.
But of course, nature solved that some billions of years ago in the basic development of life. Every time a cell divides, the entire genome is read and copied. And that's done routinely without direct human intervention.
So that sequence information is done by this little enzyme, which reads the sequence and makes a copy. That's the polymerase enzyme that lives in all replicating cells and is necessary for the propagation of the genetic information to the next generation.
And so this is a different model of the DNA. And what happens is this enzyme reads the sequence and fills in the complementary bases by reading that sequence and making the other strand. And so we don't often think of that as reading the sequence because we don't really know what's happening at a detailed level. But if we did, then we'd have the sequence, just like that enzyme did.
And so one approach that a number of different groups are looking at is trying to watch that enzyme as it functions. And so this is even a simpler-- I keep getting simpler and simpler models until it's something I can sort of think about. And this is a simpler and simpler model of the enzyme and the DNA, showing the different shapes that reflect the different bases.
So those bases then selectively bind to their complement, and that's how these things replicate. And so if this represents a C, it will only be bound at one of these complementary places. And so if we had the ability to harness this enzyme and then watch what it was doing-- so let's see. Every time it incorporates a C, it sends out a little red beacon and says, now, I've attached a C-- then that would give us the answer.
And so this actually suggests one of the processes for monitoring that enzyme sequencing activity, which has now disappeared into the-- let's see. There it is. It hit the wrong button. There.
So the problem is, how do you read that beacon? And that beacon, if it's like a flashing light, is not alone. It's surrounded by many others. And so a technique that started here at Cornell was to use just a little narrow window so we could isolate what we look at only in proximity to this little enzyme.
So by using just a little mechanical optical window, we can shine light only near the enzyme and watch these little beacons flash only when they're right at the enzyme and ignore all these other ones that are not nearby. So that very simple nanostructure, which is trivial in comparison to what nature has given us, is what brings together the engineered nanotechnologies and the naturally occurring ones to try to give us that access to that sequence information.
So the human-engineered devices are these simple little holes that can be made in arrays. And then each one of those would then hopefully contain a little active enzyme that we could witness its function and use that to read the individual sequence.
And if we had that, if we had the complete sequence from individual entities, then we would have all the possible information that he would really want to put together these complex understandings of what influences disease or how different organisms are related to each other in ecologies or evolutionary senses. So if we had all that information, we would have a lot of information.
And I will skip to the-- I started out talking about labs on a chip. And so it's a very broad class. And the analysis of genetic information is part of that. But this is not only in research labs that this is being discussed.
This is from the National Institutes of Health website from a few years ago-- "Chips Ahoy. The lab-on-a-chip revolution is near." So the question is how near. Some simple ones are fairly, fairly near.
But I think the richness that we're going to see is going to take a long time. And so that's going to take some sustained effort, some sustained work at universities, working with industry.
But as I said, I think universities are going to have to take more of the shoulder, more of the effort now that the large research labs for those sustained efforts are not so in evidence. So with any luck, the groups here at Cornell and in other comparable institutions can pool their efforts and contribute to this coming revolution.
So all of the information that I showed was done in collaboration. The parts that I derived from the local work was done in collaboration with faculty here, most at the Nanobiotechnology Center and hardworking students and postdocs with mostly national funding for this research.
So with that, I'll turn it back to our moderator.
STEPHEN KRESOVICH: Good. These guys are very prompt. They're right on time. So basically, we have about 10 minutes for a question-and-answer session.
So I see somebody already jumping to an opportunity.
AUDIENCE: Harold, if industrial labs were the ones being hit-- and I think the electronics were the ones that were plumbing against them-- is the pharmaceutical industry, in terms of its acquisitions of our bio enterprises, going to be the place where industrial research will maybe plumb the depths of this?
HAROLD CRAIGHEAD: My opinion-- and this is editorial opinion-- is that they're not in a position to do this long-term technology development. Their goal is to get compounds that are useful in as near a term as possible. And there's talk about high-throughput drug screening using advancing techniques.
And so they will push some market for advances in high-throughput handling. But I don't see them-- and there are probably representatives of the drug companies in here who can address that. So we can debate this. But I don't see them providing the long-term resources, the basic issues of what are the devices that we can make.
How can we use these? I think going from 96 wells to 384 to some higher number of multiples of ways we're doing it now and faster robots, I think that's clear. And some level of effort will be driven by these places.
But I don't see any entity that has the long-term ability to drive the technology. And I'm willing to hear other opinions.
AUDIENCE: Well, I sell chemotherapy for breast cancer. And I sell nanoparticle Taxol packaged in human albumin. And it's, to my knowledge, the first nanotechnology chemotherapy on the market. I work for AstraZeneca. We do a co-production with Abraxis.
And a surgeon in Los Angeles figured out the technology, how to take this old chemo molecule and package it in human albumin so we can then deliver it to cancer tissues in a less toxic fashion. But to my knowledge-- and I try to keep an eye on these things-- there aren't new products coming soon that take this technology into account.
The other thing I wanted to sort of comment and ask about was the Oncotype DX testing. Are you familiar with that? Dr. Soonmyung Paik came out with a test. He did a retrospective analysis of breast cancer patients who were node-negative, hormone receptor-positive, thousands of tissue samples that he gathered from clinical trials, and does a genetic array.
And looking at outcomes, he says he can predict whether patients will respond or whether they need chemotherapy upfront or not. And so now, this test is on the market. And when women have early-stage breast cancer and they're not sure whether they're at high risk for recurrence later on, they run this test-- it's about a $3,000 test-- and do this genetic analysis of their tumor tissue. And they break them into three groups based on this retrospective analysis.
No, it bothered me. They're just now doing the prospective trials in these patients. But it all seems, having worked in [? Volker Folk's ?] Lab when I was an undergraduate here soon after graduation, these studies kind of make me a little nervous because there are so many variables. I'm not sure how they can--
ANDREW CLARK: Well, you did put your finger on what has been the key progress in cancer research recently, which is to recognize that cancer isn't one entity. There are many, many entities and that if you could just stratify them into subclasses that are different in mechanism somehow, you could direct therapies much, much better.
And there's very rapid progress in that area. That's sort of to me after the fact. Everybody's got cancer now. You'd like to push it up front and be able to predict and prevent, rather than to just deal with it. But for improving therapies, it's really proving useful already.
AUDIENCE: I've got a question-- actually, just a comment on another question. I was in academic medical research for 10 years. Now, I'm in the pharmaceutical industry, and I agree with your comment.
Between Medicare's pressure on the cost of drugs, Wall Street and the investment community's pressure on getting products out, pharmaceutical companies are not going to be able to invest. And that's been a long-term strategy. And I think you're right that it's really universities and collaborations that will develop that.
But what my question is is yeah, I'm not sure if you're familiar with early drug development, which is what I do. About a year and a half ago, there was a trial of a new antibody in [? Milton ?] Park, London. And there were six healthy, normal volunteers given these antibodies all taken to the emergency room and in intensive care for weeks.
When they analyzed the toxicology in monkeys, it was perfectly fine. They started the dosing agreements at 1/1,000th less the dose. And the genome of the human obviously is different than monkeys. And so the humans reacted very differently than the monkeys did.
I guess my question is, do you see applications looking at comparative genomes of the experimental animals where new drugs are tested and then it goes first into man where they could make those kinds of correlations with the drugs?
ANDREW CLARK: Yeah. So there have been some efforts to essentially put a human immune system in non-human model organisms-- like mouse, for instance. But to me, the most exciting thing here is to sort of understand it at the level of being able to predict all the epitopes that the human immune system recognizes, all the epitopes of all human proteins. And we're a ways from that still, but people are starting to think about how to model that.
But you put your finger on a really important point for any of these things. The sort of law of unintended consequences bites us all the time in drug development. And to sort of have the ego to think that we really do understand everything here is always a mistake, that there can always be unintended problems and that the only solution is to tread very cautiously.
And that's why these trials are done. They're done at a small scale, and they try to minimize risk. But it happens.
STEPHEN KRESOVICH: Another question.
AUDIENCE: Yeah. Professor Clark, it sounds like a lot of your research focuses on the statistical analysis and the data that's providing you. And it sounds like Professor Craighead is working at getting at that data.
So if you have an Affymetrix chip that looks at a million base pairs, is that-- that's correct, right?
ANDREW CLARK: It's a million positions that we know are variable among individuals.
AUDIENCE: But the human genome is what, two billion base pairs?
ANDREW CLARK: Three billion.
AUDIENCE: Yeah. So if you were provided that data, that $1,000 genome data, what does that do to your field of study? Obviously, you have a job for a long time.
ANDREW CLARK: Yeah. So it wouldn't instantly give all the answers, first of all. There really are challenges to understanding what all that data would mean at the level of being able to predict anything.
To backtrack a little bit, thought, there are only about 10 million of those three billion bases that are of significant sort of variability among individuals. So most of the research effort has been to go after those first and then recognizing that, yeah, but those are more common variates.
Probably, each individual is going to have their own separate minor mutations, as well. So we're going to be able to make a lot of progress in the next 10 years just looking at those polymorphic sites, without necessarily getting whole sequence.
STEPHEN KRESOVICH: Also, I would like to add as a plant person or as an organismal biologist, the things that Harold and Andy do are applicable across a broad range of organisms and a broad range of questions. So whether you're doing plant breeding for draw tolerance or looking for diseases in humans, these are fundamental technologies and concepts that have great applications across the life sciences.
OK. I think we have time for one last question. Yeah.
AUDIENCE: I wonder how much of the research in your fields, how much of it is in the US compared with other countries? And how much cross-border collaboration is going on in these areas?
ANDREW CLARK: So the Human Genome Project was first funded primarily by the US. So there really was a sort of an effort that was, I would say, perhaps 2/3 or so US-centered for the whole human genome. They tried to pull in the UK, and they contributed. The Chinese contributed.
With the HapMap Project, that one where they're looking at variation across individuals, again, it was primarily US, the UK, and the Chinese. It's still at least I would guess half in the US.
We do have a markedly-- and the reason is because of the National Institutes of Health. It is the best life sciences research institution in the world. It's just a gem of the US government. Our NIH is terrific.
STEPHEN KRESOVICH: Harold?
HAROLD CRAIGHEAD: If I answer for all of nanotechnology and not necessarily life science component, it's surprisingly comparable amounts of funding in US, Europe, and Asia. US is maybe slightly ahead, but not by that much. So those three groups have comparable amounts of investment.
There's much more discussion of international collaboration these days. The National Science Foundation, for example, will provide some small amounts of funds for people to travel around. But I find a very strange dichotomy between collaboration and competition. So you have sort of the same people saying, for national competitiveness, we have to do that.
And then another side as a group says, we have to collaborate. So I think the scientific community is quite used to collaborating and moving forward. But I think there's a strange discussion of simultaneous collaboration and competition.
STEPHEN KRESOVICH: OK. I think we're coming to an end. And you can catch Harold or Andy subsequently.
I'd like to thank you all for coming and one more time to thank Harold and Andy.
We've received your request
You will be notified by email when the transcript and captions are available. The process may take up to 5 business days. Please contact firstname.lastname@example.org if you have any questions about this request.
From nano-scale diagnostic and therapeutic tools to medication designed and developed specifically for you, new research directions at Cornell are breaking ground and shaping the health care experience of the future. Two of Cornell's preeminent faculty members with research at the forefront of these areas will explain how their work may impact human health.
Andrew G. Clark, Jacob Gould Schurman Professor of Population Genetics, Department of Molecular Biology and Genetics Harold G. Craighead, Charles W. Lake, Jr. Professor of Engineering; Professor of Applied and Engineering Physics; and Director, Nanobiotechnology Center