Remarks at Western Digital Conference
Western Digital Conference
I am so pleased to be here at Western Digital, one of the key players in the global data ecosystem, when so many interesting things are happening in that ecosystem, and because of that ecosystem…some bad, some spectacularly good.
Here are just a few examples:
- Hackers connected to the Russian government almost certainly interfered in our recent Presidential election;
- The recent WannaCry ransomware attack exploited a vulnerability in an older version of Microsoft Windows, using a tool developed by the National Security Agency, which was stolen by hackers, and used to freeze computer systems all over the world, including affecting the British National Health Service—as well as banks, cellphone operators, and railroads in Russia.
- Yahoo, with its core business about to be acquired by Verizon, was forced to lower its price by $350 million after two data breaches were revealed—a steep cost for a failure to make security a priority;
- Harvard University just rescinded acceptances to 10 incoming freshmen for highly offensive posts in a Facebook group chat that was intended to be private;
- Rensselaer Professor Juergen Hahn has developed the very first physiological, rather than behavioral, test for autism by applying Big Data analytics to numerous metabolites in a blood sample—and creating an algorithm to determine whether someone is on the autism spectrum—and indeed, to suggest where on the spectrum they might land. This breakthrough will not only allow earlier interventions for children—it may point the way to potential treatments.
- Cancer patients who never before could have been saved, are being saved by our expanding capacity to collect and analyze genomic data, with targeted therapies based on the sequencing of their tumor genomes. This data from tumor genomics is teaching doctors that where a cancer occurs in the body is far less important than which particular genetic mutations the tumor has, and which genes are being expressed. For example, a drug that targets a genetic mutation found in adult patients with melanoma now is generating dramatic improvements in children with brain tumors with the same mutation.
We humans are generating enormous amounts of data—according to one estimate, 16 zettabytes in 2016, which is expected to increase ten-fold by 2025.
But data is, as a data does. Its usefulness is linked to its connectivity and our connectivity. And together, as my examples suggest, data and connectivity create both opportunity and vulnerability.
Today, I will offer you my perspective on both, as the President of the nation’s oldest technological research university, but also as someone whose government service, policymaking roles, and business experience have required consideration of the possibilities, implications, and unintended consequences of the digital age in which we find ourselves.
Most recently I served as co-Chair of the President’s Intelligence Advisory Board from 2014 to 2017, which assessed issues pertaining to the quality, quantity, and adequacy of intelligence activities of the U.S. Government. I also served as a member of the President’s Council of Advisors on Science and Technology from 2009 to 2014, and the U.S. Secretary of State's International Security Advisory Board, and of the U.S. Secretary of Energy Advisory Board, where I co-Chaired the Task Force on Next Generation High-Performance Computing.
Today, at Rensselaer, we are deeply engaged in research that explores the promise and the perils of our digital future. In envisioning this future, I have drawn upon the research of our remarkable faculty and students, and the capabilities provided by our robust computational ecosystem, which includes AMOS, our Advanced Multi-Processing Optimized System, a petascale supercomputer that is the most powerful at an American private university, and the IBM Watson cognitive computing system.
Today, digitally-linked research at Rensselaer is a continuation of achievements of our faculty and graduates over many decades. It may surprise you to learn that Rensselaer alumni…
- co-founded Texas Instruments, where the first integrated circuit was invented—one based on germanium;
- co-founded Fairchild Semiconductor, where the first integrated circuit based on silicon was invented;
- invented, at Intel, the microprocessor;
- invented the Internet protocol that allowed email to be sent from one computer to another—a protocol that gave us the @ symbol as well;
- invented the digital camera—as well as technologies to store and transmit digital images;
- pioneered graphical computing and co-founded NVIDIA, whose GPUs have important applications in self-driving vehicles—and are used in deep learning, artificial intelligence, and high-performance computing;
- and created the cognitive computing system, Watson, at IBM.
We are very proud, as well, of our alumnus, Dr. Sivaram, and his colleagues at Western Digital, who are pushing forward new ways to capture and store data for future use, and innovations that allow us to access that data at near RAM-speed.
As we consider what our data-driven, digital future holds, we also must consider what a modern technological research education university must encompass. Two factors are paramount.
First are contemporary challenges such as a changing climate; our food, water, and energy supplies; national and global security; human health and the mitigation of disease; our need for sustainable infrastructure; and the allocation of valuable natural resources.
Each of these challenges impacts the others, and indeed, human civilization. Climate change exacerbates issues surrounding our food, water, and energy security. It also is likely to increase the spread of vector-borne diseases, such as malaria, as our planet warms. It influences our national and global security, and the interactions among nations. For example, vast reserves of petroleum, natural gas, and mineral wealth in the Arctic, made accessible by melting sea ice, are likely to be a source of new geopolitical tensions.
Climate change also is likely to worsen the inequalities between rich nations and poor ones, as it undermines food and water security at the lower latitudes, which may fuel terrorism.
When such natural and social challenges are linked with the interconnectedness enabled by technology—and the dependence of our automated systems on the perfect operation of a network of devices—it is clear that when there is a triggering event, intersecting vulnerabilities can, and do, result in cascading consequences.
Just two weeks ago, British Airways was forced to cancel all flights from Heathrow and Gatwick, stranding 75,000 passengers, after a power surge occurred at a single data center, with the subsequent failure of its backup systems. This is not the first time this has happened: Last summer, Delta Airlines was forced to cancel more than 2000 flights over a two day period when faulty power supply equipment caused a fire at a data center. Shortly before the Delta event, Southwest Airlines also cancelled thousands of flights over several days—thanks to a faulty router. When one considers how many systems are automated in air travel—everything from the scheduling of flights and crews, to reservations, to check in, to the routing of bags, to the routing and tracking of flights, to monitoring engine performance—and how essential air transport is to many other systems—the possibility for chaos seems large, indeed; and the economic and safety considerations are legion.
Another instructive example is offered by the Great Sendai Earthquake of 2011 in Japan and the disaster at the Fukushima Daiichi Nuclear Power Plant. When the earthquake knocked out the electricity at the power plant, the reactors successfully shut down, and the backup generators kicked in to deal with the “decay heat” from the fuel rods. However, when the backup generators were flooded by the subsequent tsunami, cooling systems failed, and hydrogen explosions occurred in overheated spent fuel pools, allowing radioactive material to escape.
As a result, not only was there the loss of life caused by the natural disaster—as well as the loss of electrical, transportation, and housing infrastructure—and worldwide economic effects—there was also environmental contamination, and the long-term risks of cancer from radiation exposures.
Arguably, the natural disaster could not have been avoided, but one weakness exposed at Fukushima was that a tsunami from a much earlier historical period was not recalled, and therefore not incorporated into the modeling of plant design—which allowed a retaining wall to be constructed at under 6 meters, when the tsunami that caused the damage reached 14 meters.
Again, our intersecting systems and societies leave us vulnerable to domino effects—and as you know, with the Internet of Things, such connections and vulnerabilities are proliferating beyond previous imagining.
At the same time, as we move into the Fourth Industrial Revolution, with its merging of the digital, physical, and biological—the advent of powerful new tools of discovery and understanding offer us new hope for addressing complex problems and improving lives on a grand scale.
Rensselaer is taking full advantage of these opportunities—including a new ability to collect and analyze masses of data to address questions such as, how do we preserve the world’s fresh water resources?
Given a world population that will expand by over a billion people in just the next dozen years, increasing droughts and other forms of extreme weather due to climate change, the fact that many subsistence farmers around the world remain dependent on rainfed agriculture, and crises of water contamination in the United States and worldwide—this is a crucial question indeed.
But now, we have the resources and tools to address such a question in a more data-informed, science-based, way. Indeed, at Rensselaer, we are using Lake George, a beautiful lake at the foot of the Adirondack Mountains with famously clear water, as a testbed for a new paradigm for fresh water conservation.
A partnership of Rensselaer, IBM, and The Fund for Lake George—The Jefferson Project —has turned Lake George into the smartest lake in the world, using a network of 41 sensor platforms—including weather stations, vertical profilers, and tributary monitors, some of which were invented for this very purpose. These smart sensors can communicate with one another, and autonomously adapt to changing environmental conditions—for example, taking additional measurements in a storm. They supply the Jefferson Project’s 60-plus scientists with 9 terabytes of streaming data per year about the physical, chemical, and biological properties of the lake.
Advanced computer modeling (bolstered by water – and land-based experiments) allows us to use this data to consider the lake not merely through a single lens—but as a system of systems, encompassing weather, water circulation, the food web, and runoff, and all that the runoff introduces into the lake. The sensor data allows our models to be continually refined over time—and simulations allow us to predict the impacts on all systems, for example, of chemicals used to make human traffic around the lake safer in the winter.
The Jefferson Project, is so named because Thomas Jefferson, declared in 1791, "Lake George is, without comparison, the most beautiful water I ever saw.”
Please allow me to tell you about one other great research project taking place at Rensselaer, that is allowing us to turn data into intelligence. At our new Cognitive and Immersive Systems Laboratory, or CISL at EMPAC—another partnership with IBM—we are creating an astonishing platform for collaboration. The CISL brings together data analytics and Rensselaer research in high-performance, neuromorphic, and cognitive computing; in artificial intelligence and human cognition—with research in computer vision, acoustics, haptics, and immersive technologies of all kinds, at human scale.
The goal is to create Situations Rooms that bridge human perception with intelligent systems in an immersive, interactive setting—enabling environments such as a cognitive design studio, a cognitive boardroom, a cognitive medical diagnosis room, a cognitive analyst room, or a cognitive classroom. These are rooms that see, hear, and understand their occupants. They will be able to take in and use structured and unstructured data from embedded databases and the World Wide Web. They also will be able to take in not just unstructured natural language data, but perceptual data such as gestures and tones—in order to follow a conversation, with all of its ellipses, ambiguities, trust considerations, and status signals. These smart rooms then can anticipate the need for information, and inform their occupants in multiple modes—with the ultimate goal of vastly enhancing group decision-making and learning.
As important as the work of the CISL is, another kind of group decision-making and collaboration occurs in social cognitive networks that reside on the Internet. At Rensselaer Polytechnic Institute, the Army Research Laboratory Social Cognitive Network Academic Research Center (SCNARC), led by Professor Boleslaw Szymanski, has been created and funded as a part of the US Army Network Science Collaborative Technology Alliance, together with three other centers focusing on different kinds of networks.
The rapid growth of web-based social networks has redefined social interactions. Web-based networks do not require personal, direct contact, but they do provide rich traces of data about activities on the network. These kinds of social networks, and the behaviors that govern their dynamics and evolution, are the subject of this research. The current work of the Center is organized into projects focusing on the fundamental science and engineering of such networks, and applications ranging from military to industrial to personal. One project, led by Dr. Ching-Yung Lin of IBM, investigates social networks in formal organizations.
Another project, led by Professor Malik Magdon-Ismail of Rensselaer, studies hidden communities adverse to the prevailing ideologies that often arise within large interacting groups existing in the Internet. For example, some terrorist cells develop through electronic communications. The basic questions are “How can we use the massive streams of data to detect adversarial networks?” and “How do adversary networks evolve?” The project also will further our understanding of how information flows within such networks.
A project on cognitive aspects of social networks is led by Professor Wayne Gray of Rensselaer. The main focus of this research is to understand how limits of human cognition influence our interaction over the networks, how they dictate the way in which the network information should be presented, and how to include such limitations in realistic models of network interactions. The project also investigates how a social and cognitive network can quickly extract the most meaningful information that is useful in all aspects of their operations, for a soldier, or other decision-maker—from supporting humanitarian operations, to force protection and full combat operations.
Because we are concerned with data literacy, we have instituted a program at Rensselaer entitled DATUM, or Data Analytics Through Undergraduate Mathematics, which incorporates data analytics into our mathematics courses, beginning in the freshman year. Some of our DATUM students—working with a visualization technology developed at CISL called Campfire—used a semantic data tool for analyzing brain development in embryos, and saw an unusual window of susceptibility to disease in brain development. This suggested that, when the Zika virus is present in this window, microcephaly can occur. Indeed, laboratory experiments using neural stem cells infected with Zika have supported this hypothesis. Once again, we see the power of data analytics tools.
As enchanted as we are at Rensselaer with the data-driven tools we, and others, are helping to shape, and the opportunity they offer for us to collaborate and to understand vastly more about our world, this era is not without challenges. Let us consider some of those challenges, since Western Digital clearly will be instrumental in helping us to address them.
The first great challenge, as data becomes an ever more valuable asset, is capacity. As you know, we store just the tiniest fraction of the data we produce today—and both the quantity of data and its potential value are soaring. The new technologies this audience is developing will be crucial to answer this demand for storage—as well as our need to find, to access, to analyze, and to share that data quickly.
Clearly, we also need smart technologies to whittle the data we are producing, particularly in dense formats such as video, down to size. But we also need technologies that keep us from being too frugal.
For example, the detectors at the Large Hadron Collider, the world’s most powerful particle accelerator, record 600 million collision events per second, that yield raw data of about one million bytes apiece—or 600 terabytes of data per second.
CERN, the European Organization for Nuclear Research, cannot store all of this data, so it runs algorithms to focus only on 100 or 200 events per second. Losing the rest may be a practical reality, but it is nonetheless a shame. In a world in which we know next to nothing about the 95% of our universe represented by dark matter and dark energy, can we tolerate such dark data?
In the past, scientists used to think of a journal article announcing their findings as the product of record, and the data and data tools that led to a discovery as relatively meaningless. Today, many scientific endeavors, such as the Human Microbiome Project, seek to produce reference data, and there is increasing recognition that the data produced by any scientific investigation is a treasure trove, not to be carelessly discarded—in that it may propel new discoveries by other researchers in years to come.
Professor Jim Hendler, Director of The Rensselaer Institute for Data Exploration and Applications, or The Rensselaer IDEA, is one of the inventors of the Semantic Web, which structures and tags the data in webpages, so that it can be identified by, accessed by, and aggregated by computers. He and his colleagues are making it increasingly easy for researchers to access each other’s data, and to connect data from different sources—revealing new correlations and new answers to questions that never could be answered before.
The second great challenge is data-centric computing. As we seek insight within the flood of data derived from connected devices, social media, digital photography, scientific and medical journals, and many other sources, we must absorb enormous amounts of diverse data quickly, recognize patterns within that data, and make inferences and predictions from those patterns. This is driving new data-centric computing designs and the use of artificial intelligence, and new approaches to putting the computing where the data is.
Professor Chris Carothers and his students, at Rensselaer, are using the supercomputer, AMOS, to model the next generation of exascale computing, including a hybrid design that combines the power of both traditional CPUs with GPU processors. The hybrid machine will combine data analytic and machine-learning techniques with sophisticated high-performance modeling and simulation methods. The same team is exploring the exciting field of neuromorphic computing which mimics the functioning of the human brain with very low power requirements compared to traditional computers.
The third great challenge is, of course, privacy, as we leave digital clues about ourselves now, with nearly every step we take. Governments and societies always lag behind new technologies, but as we move forward in this world, we must consider the social implications of technological development.
In 2014, Facebook secured a patent connected with the idea that a borrower’s credit risk could be gauged from the creditworthiness of other people in his or her social network. I am assuming that most of us are quite proud of our friends—but would not necessarily welcome being turned down for a mortgage because of them.
And, of course, given an explosion in personal health-related data, including both genomics and the lifestyle data tracked by our smartphones and Fitbits, it is easy to imagine a world in which health or life insurers assess the risks we represent—not by employing population-based statistics—but using the most personal and intimate information about our lives. In such an environment, all of us need to be empowered to be stronger players in what happens to us in the healthcare space.
Through a new multi-year partnership with IBM—the Center for Health Empowerment by Analytics, Learning, and Semantics, or HEALS—we are focused on preventing the progression of chronic diseases such as diabetes and hypertension—which are so costly in terms of both human suffering and health care resources. HEALS is bringing together Big Data analytics, state-of-the art machine learning, and the technologies of the Semantic Web, to find insights within data from many different sources, including clinical data, lifestyle data provided by the patient, health or wellness data from mobile fitness tracking devices, and social network data from shared online activities. The goal is to enable individuals to improve their own health by offering them recommendations customized for their specific medical, environmental, and life situations.
The final great challenge of the age is data security.
Before this great age of interconnection, data security was largely about disaster recovery—the physical protection of data from a power outage or a flood.
Today, data no longer sits behind a moat. Instead, it lives within an ecosystem where there are bad actors ranging from hackers whose purpose is mischief, to cybercriminals, to cyberterrorists, to states engaging in cyberwarfare—all capable of exploiting infinite points of entry.
The more data we store, the harder and harder it is to protect—as the surface area of exposure becomes larger and larger. The threat is end-to-end—beginning with the chips themselves, as we have moved from a fab to a foundry model, where the supply chain is not under one unified corporate control umbrella, leading to fear that “backdoors” can be added in manufacturing. To this point, a group of researchers at the University of Michigan last year demonstrated that a minute and undetectable backdoor could be added to a computer chip that would, over time, allow a hacker access to a full operating system.
We have ample evidence of the dangers of network compromise, which is growing as the Internet of Things offers new avenues into networks through connected devices with weak security controls—such as DVRs and home baby monitors. In 2016, the Mirai malware, for example, infected cheap Internet-connected devices such as routers, DVRs, and surveillance cameras, and used them for distributed denial of service attacks in the US, Germany, and other nations.
While the Industrial Internet and sensor technologies are a great boon to advanced manufacturers, allowing them to monitor and to service their products remotely—opening remote connections to their systems and products around the world represents a great security concern for them.
There are also ransomware attacks such as WannaCry that simply stand between us and our data. These regularly target hospitals, which often pay the ransom because they cannot function without access to patient data, and because of the sensitivity of the information they hold.
And of course, we are arriving at an era in this Fourth Industrial Revolution, when data, and its use, are more strongly linked to our physical reality—and if someone mucks around with the data, physical destruction can result.
We already have seen the compromise of critical infrastructure using digital industrial control systems–including those that are accessible through virtual private networks. Ukraine’s power grid was disrupted in December of 2015 by remote cyber intrusions at three regional electric power distribution companies, using such industrial control systems. There are current concerns about new easier to use variants of malware used for such intrusions and disruptions.
Airports now are moving to virtual control towers. Again, what if the data is manipulated, so that what air traffic controllers see no longer matches physical reality? The potential damage could make TSA checkpoints meaningless.
Finally, there are security risks even in the sheer ubiquity and accessibility of information in our world, which offer bad actors recipes for bombs, and the ability to radicalize people.
Clearly, our reliance on data—and our interconnectedness—encompasses many intersecting vulnerabilities with potentially cascading consequences.
Can we deter all such security risks? With the advent of new cognitive systems such as the IBM Watson, it is more likely that we will be able to detect and to mitigate them. But we need additional technological answers. As you innovate speedier access to stored data, what are the implications of emergent technologies such blockchain, or different approaches to data architecture?
It is likely that blockchain technologies, which underlie the cryptocurrency Bitcoin, will find new uses. These are, as you know, distributed databases that are not necessarily maintained by single enterprises. Once data is entered, it is time-stamped and encrypted in ways that it cannot be altered or removed without leaving a record of the earlier version. Any modification requires confirmation and agreement, which allows the verification of changes, and an audit trail of transactions. Such blockchains could be strengthened even further using cognitive assistants, that could further verify the trust-worthiness of any changes.
Can we design our data architectures differently, so that our systems have an immune-like response to malware, to isolate and limit the damage?
This would be an end-to-end immune response that detects patterns of suspicious activity in any part of the computer system or network, and shuts down the compromised part—possibly overseen by cognitive systems such as Watson that are extremely good at recognizing patterns and departures from them.
Our data architectures should have not only such triggered isolation responses—but also should be fluidly reconfigurable. We need to engineer both for resilience and real-time response. Patching after the fact is not nearly enough in the digital age we live in.
In the end, we need a culture change in every part of the digital economy. Up to this point, security challenges have allowed for a kind of planned obsolescence, in which tech companies have benefitted economically, by persuading consumers to buy the newest and safest version of their products. And “safe harbor” provisions have protected online platforms from the actions of third parties, in terms of copyright. But the Fourth Industrial Revolution, in which physical harm can result from security lapses—in which even medical devices are Internet-connected—has raised the stakes. A new sense of responsibility is in order.
End-to-end security that allows for the immune-like response to intrusion, that I described earlier, probably will require consortia to set standards. Moreover, privacy concerns are linked to issues of data sovereignty, i.e., where should data reside, and who controls it—a major issue, which has been subject to new regulations in the European Union. Given such data sovereignty concerns, as well as issues of national and global security, we may also need an international agreement on the use of cyberspace—something analogous to the United Nations Convention on the Law of the Sea (UNCLOS). UNCLOS defines the rights and responsibilities of nations with respect to their use of the world’s oceans, establishing guidelines for governments, businesses, environmental protection, and the management of marine natural resources. The Convention came into force in 1994, and as of June 2016, 167 countries and the European Union, have joined in. Since the Internet supports a “sea” of data, the undergirding premises of this convention may be helpful in achieving global consensus around some key principles in the digital realm.
Clearly, we all must work together to secure this great natural resource called data—and to make sure our interconnections are used for good, not ill—not for crime, terrorism, or acts of war, but to uplift humanity.
I hope that the examples of medical, environmental, and technological breakthroughs that I have described point the way to the future. We should be cheered that the next generation is already on the job! One of the most popular clubs on our campus is called RPI-SEC—students with a passionate interest in cybersecurity. In November, at the world’s largest student-run security competition, the 13th annual New York University Cyber Security Awareness Week, the Rensselaer club won second place in Capture the Flag, the signature event of the competition.
I am confident that these students will help us to turn what most challenges us in the Fourth Industrial Revolution, into great opportunities, and change the world in the process.
Thank you listening, and now I would be delighted to answer any questions…