GIS Day 2020: Mapping the Pandemic Cases, Traces & Mutations

>> Paulette Hasier: Hello, I am Dr. Paulette Hasier, chief of the geography and math division at the Library of Congress And I want to welcome you to GIS Day 2020, mapping the pandemic The geography and math division has been sponsoring GIS Day talks for more than a decade, many of which, like today’s program, have concerned themselves with important applications of GIS and cartography to the critical issues of the day This year, we are presenting four speakers who are at the frontline of using GIS and cartography to help understand the COVID-19 pandemic I want to thank them and all of you for participating here today >> Este Geraghty: Happy GIS Day, everyone It is my honor to have this opportunity to celebrate this day and to build awareness about the various uses of GIS in organizations around the world This year, GIS has been front and center in health in particular and the fight against the SARS-CoV-2 pandemic, which has commanded so much of our global attention And perhaps you’ll find it a bit ironic that this all happened in the year 2020 I’m sure you’ve all seen the memes in which people are sharing their frustrations regarding this challenging year by likening 2020 to a four letter word Well, today, to the extent possible, I’d like to put a positive spin on this year and think about how GIS has brought increased clarity to our lives in ways that not only helped us to respond to the pandemic but also have the potential to significantly enhanced our future resilience If I were to put it very simply, I’d say that GIS helps us go beyond collecting the dots, which is something that health organizations are very good at doing With GIS, we can use place and location analytics to connect the dots to create new insights, new information, and informed action So, let’s test this thought for just a moment What might the world look like without GIS? Well, this here is one example, data that resides in a spreadsheet or a PDF document Well, if we add graphics, it can really help make greater sense of the data, and it improves our vision and the ways that we connect the dots But it simply doesn’t hold a candle to the way that geographic information puts data into context, showing how places compare, what the trends are in the short term and in the long term If you just look at this last graphic for a moment and focus on California, you are seeing and interpreting data from 928,000 cases across 58 counties over more than 280 days I mean, geographic data provides something special that helps us to understand patterns and relationships and use that information to intelligently respond And people did just that I’d like to show you a short video highlighting some of the innovation that we saw early on in this crisis [ Music ]

So, how did it all start with COVID-19 and the geographic innovations that helped us to connect those dots? Well, to summarize, I’d say that it started with a bang, and it moved very fast Starting here with the world famous, and in fact, I’d say iconic, Johns Hopkins University dashboard This dashboard built with GIS technology is the most viral map-based application in history And it really woke up the world to the evolving outbreak Now, they collected the dots for sure, and they continue to do so to this day And one thing that’s so important about this dashboard is that from the beginning the JHU team made sharing those dots their first priority Now this engaged or enabled all kinds of organizations to engage to connect the dots Through their visualizations, their analysis, and their planning scenarios, they really enhanced their decision making for their own localized COVID-19 responses The dashboard had tremendous uptake; it quickly became a backdrop in operation centers around the world, from the US Health and Human Services secretaries operation center to emergency response meetings that were held in Germany and Coronavirus briefings going on in Italy It was a backdrop in Ireland, among so many others The influence of this dashboard broadened when respected journalists started writing about it And the coverage was extensive, but a few notable examples included the New York Times, BBC World News, and our own Esri website used it for a landing page that we had focused on COVID-19 resources The story of the dashboard was featured on National Public Radio It was in “Science”, “Nature Index”, and the “Wall Street Journal”, among many other publications And actually, I was even invited to talk about the dashboard and mapping of Coronavirus on CNBC news The dashboard went viral very quickly This graphic is showing you the first three months of feature requests starting from the dashboard’s introduction Now just take a really close look at the scale on that vertical axis Each division is actually 500 million feature requests Now at a peak in late March, the dashboard was being requested four and a half billion times per day Now compare that to the current world population And you’ll start to get a good sense of how remarkable those numbers are Beyond alerting the world to the developing crisis, the JHU dashboard inspired others around the globe to configure their own dashboards to map and monitor the disease, like this one that you’re seeing built by the Africa CDC showing their COVID-19 status for an entire continent, with the option to focus in on various regions across Africa In South America, we can look at the country scale, and see that Columbia is paying close attention to the origin and importation of cases In Australia, for the state of New South Wales, the University of Sydney took their analysis all the way down to the postal code In Asia, Hong Kong is sharing data on cases and hospitalizations, of course, and they’re also mapping the actual buildings where there have been confirmed cases Here in North America, the state of Maryland shares data at the zip code level, along with detailed information on testing volume and the percent of positive tests As a leader in transparency,

this state is also sharing this dashboard indicating ventilator trends and ventilator availability, as well as this remarkable dashboard that’s giving us statistical information for each and every nursing home So, you can see across every continent at every scale, even all the way to here to the region of Timbuktu in Mali, people have been using map-based dashboards to understand the novel coronavirus And people learned a lot from these dashboards, and the experience of using geospatial information inspired them to see things a little bit differently, and the GIS innovation for COVID-19 started to evolve And with COVID-19, governments wanted to start better understanding gaps in services– [ Inaudible ] Who is more or less likely to be affected Now, those that are more likely to be affected by something are considered — are vulnerable populations And with COVID-19 people can be vulnerable for any number of reasons, and we can visualize and analyze those Now, it may be older age or medical comorbidities that makes them more susceptible to severe disease People might be vulnerable because they’re essential workers, and they’re more likely to be exposed to the virus The vulnerable — or perhaps the vulnerability is because they live in a congregate housing situation, like nursing homes, university dormitories, or prisons We can overlay all of these various vulnerabilities to find the highest risk places for COVID-19 But people also used analysis to perform capacity forecasting And this is where you can take a model that pulls in hospital bed utilization data, along with social distancing behaviors and understand what might happen over time with local hospital bed capacity Every hospital in every government wanted to predict whether their curve was going to stay flat, maintaining that health system’s ability to care for each and every person that walks through their doors, or if some hospitals may become overwhelmed and exceed their capacity to provide care But capacity goes even beyond the idea of looking at hospital beds and looking at ventilators It also includes the availability of personal protective equipment Simple survey tools that are connected to dashboards help collect and then manage the critical supply inventories That could be for each hospital within a health system or all of the hospitals in a jurisdiction, and it allowed them to report in regional or national ways to get a more cogent, big picture Now, it turns out that two staples of geographic problem solving are site selection and calculating the geographic accessibility for places Now, these methods have been used for things like standing up new testing sites, distributing different kinds of food resources, and extending healthcare services in those places where hospital capacity actually does become overwhelmed And to do this, it uses a process called location allocation Let’s say that we want to stand up new testing centers We would begin by determining the population demand or you might say the population need actually for this service And we can do this by overlaying all of the factors that are contributing each of the five COVID risk types That is transmission risk, socioeconomic risk, susceptibility to the disease risk, exposure risk, and resource insufficiency risk Now, in the end, these create a risk ranking over small geographies, let’s say neighborhoods, that you can use in the next part of the analysis So, to calculate the best testing center locations, you pull in those risk rankings that you just calculated And then you look at things like road network data, so you can calculate drive times and drive distances to new centers Then you input any constraints that you may have for your model Let’s say you only have a budget for 10 new sites, or maybe you only have staff that can support five new sites,

who you put all of those different kinds of constraints in And the model will take them all into consideration and select the most optimal sites from your entire population of options So, you’ve done all of that You’ve placed your sites Well, how do people find the resources that they’re going to need? Well, they can use what we call resource locators like this one that came from Valencia in Spain When placing a pin on the map, the user can get information about the nearest hospital, pharmacies, social services, or testing sites, and it’s in their immediate area So, looking at all of these things, I think there’s no question that COVID-19 inspired great innovation in connecting the dots to support response to the pandemic Now, here’s an application that I’d like to show you with several innovations combined You’re seeing the state of California with social distancing grades over time, where a red indicates an F grade, and blue is an A grade This comes from aggregated and anonymized cellphone data, where we can add information like the date that California put their stay-at-home order into effect And then you can start to see that the coastal counties did pretty well with social distancing, while the inland counties did a little bit less well with their social distancing measures Well, then something happened a little bit later toward the end of April And we can add that to the map, we started to see crowds at the beaches in California So, you want to think about what are these influences on social distancing behaviors? Well, the weather certainly came into play here So, we can add data that comes from NOAA to examine the weather patterns We can build that up over three dimensions and over time in a layer that we call a voxel layer That’s really the combination of volume and a pixel to create a voxel So, the dark orangey, brown areas are higher temperatures, and the blue areas are lower temperatures And now you can start to make visual comparisons Now, I will tell you that Esri headquarters is in San Bernardino County We’re going to focus in on that county and there you go It turns out that as the weather got warmer, San Bernardino County started heading to the beaches maybe Maybe it was with some of my Esri colleagues, I don’t know But this kind of information I think you can see, not only helps you to understand at a deeper level, what’s going on, but potentially put into place new policy initiatives Well, I know that a lot of people are wondering, what does the future hold? Well, the first thing that I’ll mention is the vaccine for this virus The trick is, how can we distribute a vaccine to all of the people of the world in a way that is equitable and ends the pandemic as soon as possible? Now, there’s some really very complex and challenging issues in this whole vaccination effort that GIS can nicely support, like identifying the facilities that can store and administer vaccines, determining where the concentrations of high priority populations are, and exposing gaps in access to vaccine, formulating plans so that we can reach everyone GIS can also help manage and monitor the vaccine administration and inventory system, as well as share progress on vaccination delivery across our communities One of the key tasks for health departments will be to figure out how to prioritize vaccine delivery in a way that ends the pandemic as quickly as possible, as I mentioned before And this means considering vaccinating healthcare workers who are on the front line first, then prioritizing other populations in consideration of things like those most likely to suffer severe disease like the older adults, people who are most exposed like essential workers, and those most likely to transmit disease, like people in congregate housing situations GIS can make understanding where all of those populations are much easier Now, once you know where all those critical populations are, and perhaps you did some of that location allocation I talked about to find all of the best vaccine venues, where are you going to be able to administer the vaccine?

Well, that’s when you can start to calculate things like their reach How many of the critical population can be served by each of your vaccine venues? Now, it’s inevitable that there will be gaps Some populations are going to be out of reach of the vaccine venues And we can make plans to site new vaccine venues and see how well we can serve that underserved population We could also partner with locations that already exist, like retail pharmacies who may be able to support a broader distribution for some difficult to reach populations Our fourth key task is to talk about tracking and managing the vaccine inventory, along with the vaccine distribution kits and the PPE equipment needed to be able to safely administer vaccine Easy to use mobile tools can read the barcodes on the vaccine cartons, and it can parse out the information into relevant fields, like the quantity, the lot number, and the expiration date Now this is going to make inventory check in really quite quick and easy In the very same way, inventory can be checked out when supplies are taken from that inventory and are ready for use out in the field All of this movement of supplies is monitored in real time with connected dashboards As this enormous global challenge continues to evolve, it’s going to be really important for decision makers to constantly evaluate the process and the productivity of each vaccination venue And I think it’s going to be equally important to share vaccine progress with the public and keep them well informed of what’s going on and why certain decisions are being made Now, speaking of vaccines, did you know that childhood immunizations are decreasing during the pandemic? And this is really just one of several health issues that are worsening due to COVID-19 Now I call these ignored health needs the COVID orphans People are delaying their important health needs like cancer screening and chronic disease management They’re suffering from the mental health burdens related to the pandemic, which are leading to increased substance misuse and depression It’s really pretty easy to see how this could happen when all of our attention seems to be focused in just one direction, but the results could be dire And so, , we need to encourage people to reconnect with their healthcare Now, one way that that can happen is through changes in healthcare delivery And we’re seeing that there’s a lot of healthcare organizations who are ramping up on their telehealth and their e-Health options Well, folks are using GIS to determine if there’s sufficient need for those kinds of services by assessing things like the travel burden across their communities, checking to see if their broadband coverage is adequate, and refining the service offerings by understanding the underlying population health needs This is really pretty much the same pattern that I showed you for determining the COVID-19 testing center locations or planning vaccination venues These methods are actually quite repeatable for many different kinds of health challenges All the way from Hippocrates in the 5th Century BC to modern days, the connections between health and place are strong, and they go well beyond all of the amazing work that’s been done for COVID-19 People are doing things like providing community resources to patients and mapping social vulnerabilities and disparities They’re improving access to health care and planning new resources to address various risk factors for things like homelessness They’re plotting preferred mosquito habitats, understanding what mosquitoes like and don’t like, and then using that to prepare control measures to decrease the risk for Zika virus, dengue fever and chikungunya The benefits of making stronger connections between health and place are great So, in this unprecedented year, 2020, we’ve seen some of the tremendous power and understanding

that applied GIS can bring to our world At the same time, our technology gaps have become clearer And I think we need to ask ourselves a few questions Were we as prepared as we could have been? How can we better interconnect our systems at all levels and support a clear need for finer scale and real time mapping? Can GIS help us better coordinate our responses across governments and various scales? How can we do a better job of sharing authoritative data that everyone can use to make better decisions? As a group of GIS enthusiasts, I think we can all contribute to incorporating these lessons that we’ve learned from COVID-19 into our organizational workflows in a way that will prepare us for whatever comes next Because it’s really never been enough to simply collect the dots We need technologies like GIS to support health by helping us to connect the dots in ways that inform, inspire, and initiate change that makes us all healthier And with that, I wish you all a very happy GIS day >> Ensheng Dong: Hello I’m Ensheng Dong I’m the second year PhD student at John Hopkins University Today, I want to share some background stories about the world famous Coronavirus dashboard In this presentation, you will know how we got started with this dashboard, who our users are, the data collection and insurance strategy, and the challenges we have been facing and how we solve them First, let’s take a look at the dashboard You may have been very familiar with this dashboard back into January The main part of this dashboard is the map, and each red dot represents the total number of the cases in that location For example, we are tracking the US at the county level Canada, Mexico, the UK, Australia, China, and several other countries are at the subnational level, for instance the province or the state level And the rest of the world we are tracking at the country level And for the country level, we call that [inaudible] zero level At your left-hand side and right-hand side next to the map, you can see several lists of numbers They include total cases that — recovered cases, which country of origin We also have more specific lists, which include cases that recovered, cases testing rate, but they all at the state level At the lower-right corner of the dashboard, you will see a couple of [inaudible] curves, which can provide bar charts for daily cases and the line graphs for the total cases and the log [inaudible] for the total cases too Down below the map, we can see we have tabs, and each of them is a map with other epidemiology variables And they provide you a different perspective of understanding the Coronavirus The first map would be active map Active cases means the number of total cases minus recovered cases minus the deaths And the second one is incidents [inaudible] map Incidents refers to cases per 100,000 persons And the third one is the case [inaudible] ratio which is calculated by the number of recorded death over the number of cases And the last one is the testing rate, and this is a US date level only We also collaborate for JHU Coronavirus Resource Center and see the impact And here, it’s the US Coronavirus dashboard This provides more detailed information about the local demographics, public health facilities, and other information For example, if you click on a county here in the map, you will see a popup window And in the popup window, click on the infographic, and you can get more information for that county For example, the number of the residence and the number of the beds, the number of the physicians in that county So, this one is really helpful for the researchers to conduct a further understanding

of the Coronavirus You know how many people ask us, why you started this Coronavirus dashboard? And this is because my advisor, Dr. Lauren Garner [phonetic], and I, we had a meeting back to January 21st Dr. Garner brought me a cup of coffee in the library, and we were discussing about what we should do next semester And in the meeting, I told her I was collecting the data of the Coronavirus, and she suggested me to create a dashboard, and I’m so happy on that And right after the meeting, I created the first version of the dashboard, and on the second day, January 22nd, Dr. Garner posted the tweet to promote our dashboard Since then at John Hopkins University and [inaudible] disease, they all relate to our dashboard So, how many users or how many [inaudible] we have received? This is usage [inaudible] in the early days We can see the highest daily requests can go up one and a half [inaudible] You can tell with the world population; you can get a sense of how many — almost every one in the world has [inaudible] in our dashboard And as of November Second, for the future layers we received approximately 200 daily requests, but that’s really impressive number Let’s break down the usage into countries As you can see, each color represent a region or a country And the up and down go with the surge of the Coronavirus For instance, when Italy got their first Coronavirus, we got a surge, and the most users are from Europe And the second surge in Europe, Union, and the Middle East, we have another surge When the US got the first surge in early March, and we have another surge from the US While [inaudible], and this is one of our user [inaudible] Mike Pence When he was visiting the Department of Health and Human Services and the back one is our dashboard And this is in Germany government when they are discussing the policies to control the Coronavirus And here is another photo from an Italian cabinet The leader of the government, they were also using our dashboards to discuss the solutions to control the Coronavirus And you can see in the early days, there were only so many public available aggregated sources for the Coronavirus So, the leaders worldwide, they rely on our dashboard as the only one reliable and in real time updated data source And the same thing is for the United Nations stats We are also very proud that our data has been used to make projections For example, the left one is from the institute for health metrices and evaluation from University of Washington They were using our data to make the projection for the future Coronavirus situation And their results has been used by White House And also, our data is adopted by the projection from the CDC projection groups Like “New York Times”, they were using our data for the global Coronavirus case tracking And this is a typical [inaudible] and then screenshot, and you can see, our number is always shown up on the upper-left corner And below our data is the stock market Sometimes you can see when the Coronavirus cases is going up and the stock market, can then see it’s going down And our story is also reported by the NPR This one is reported by [inaudible], one of the [inaudible] reporters from NPR And our story is also covered by science [inaudible] and the National Index and along with a variety of other mediums And the Wallstreet Journal, they even call us a historical first That’s how I got my title, historical first Some other similar dashboard inspired by our dashboard So, for example, this one is from the UN WHO from the early days They adopt a similar layout of the dashboard And this is from Japan And this one is from Hong Kong, and this one is from [inaudible], and this one from China CDC They all use the similar layouts or design of the dashboard

to make the public informed And this one is from Italy, Singapore, Thailand, and Israel You can see for Thailand and the Israel dashboard, they adopt our data but use their own languages And this one is for the Philippines, and another one is for Australia As you all know, the Coronavirus data is very important to set up the dashboards There is a note saying garbage in and garbage out So, since from the beginning, we are very cautious on the data quality of the dashboard So, as you can see, in the early days of the dashboard, we are manually collecting all the data We use a Google Sheet to help us to allocate the tests For example, each Google Sheet you can find [inaudible] is the [inaudible] order and which region he or she is working on and what the data source is So, this one help us to — can trace back to each cases during that time As the cases is becoming more and more and especially for the US, we decided to track at the county level, and in the US, we have more than 3,000 counties It’s kind of impossible for us to track the cases manually So, we adopted automation for this process There are three major blocks in this new automation process, data collection, data procuration and sharing, and the visualization In the first procedure, data collection, we are getting the data from the local health departments and the major news media We collect all the information, translate the language, and then unify and standardize all the formats into one And then we post all the date into GitHub That’s [inaudible] data procuration process In the GitHub, we share the data into two formats One is the daily report, which has all the raw data each day you’re seeing in the dashboard And the other one is a time series table This one, it’s designed for the researchers, where not only you have the access to the time source table, you can download the table, open in Excel, and draw the time series [inaudible] easily in Microsoft Excel table And then we’re also sharing the data through the RTS host [inaudible] layer This one enable the government agencies or a third party to use our data with a few locations When we are collecting all the data, there is another well-written [inaudible] procedure, which is the anomaly detection Why that’s important? Because sometimes you will see some weird data coming in For example, if a location normally have only 100 cases reported daily, but some days they recorded 1,000 case, this is really suspicious And if that happens, our team member will go through the data and see if that’s an error because of our pipeline or because of the data source One way to find out that solution will involve push the data manually to the system This is anomaly detection And right now, it’s partially automated After we are sharing the data — collect the data and sharing the data, and we are making the data to the [inaudible] platform and making two version of the dashboard One is the desktop version The other one is a mobile version We can switch different versions by clicking the button at the upper — or upper-right corner of the dashboard And we are also proud that in our GitHub report got over 24,000 stars And we were even ranked number one in their [inaudible] list Someone maybe asked me, why you switch from Google Sheet to GitHub? Well, the reason is very straightforward and simple Because Google Sheet, you can only afford the first 100 users to download our data, which means the 100 first user, he or she can only read the data, but they will never get a chance to download the raw data It’s not very convenient Because of the scope of our users, we decide to move from the Google Sheet to the GitHub As you can see, since the early February, we are tracking the data and provide the data in the GitHub [inaudible] poll Every day we are doing that, and we will keep working on that until the end of the pandemic Here is some — here is a overview of the GitHub If you log in to our GitHub, you can see we are sharing data

within three different folders The first and the second, they are raw data from the dashboard The first one is for the whole data, for the whole world And the second one is US only because sometimes researchers prefer to study the US data because US data has the county level and state level as being a much more spatial resolution And the third one, the time series tables, we include the global cases and global death and the US cases and US deaths and also for the recovered cases For all the data sources, we are also making a list on the GitHub You can see, we have the aggregated data sources and the US specific data sources So, sometimes for each state, we are referring the data from both the state level health department and also the local health department So, we all list the data sources here Recently, we updated our terms of use Previously, we only allowed our data to be used in the non-commercial organizations or only the educational purpose But right now, we are opening our data to almost all organizations and persons in the world And Google Map is one of our first users under the new terms of use, and they use our data to show the Coronavirus cases in the Google Map app So, right now, you turn on the Google Map and tap on the [inaudible], and then you can choose the COVID-19 information And we’ll show you the detailed information of the Coronavirus tendency in your location From a geographer perspective, we always [inaudible] challenges Here, I will show you something Here, I give you two different dashboards If you are tracking us, you can tell the first one, the left one is the first version of the dashboard Compared with the latest version, we changed a lot For example, the default exchange we changed from Eastern Asia to the whole world And we also add many epidemiology variables such as the global total cases, the global death, global recovered And we also have a bot for the data cases and data death Those are all the improvements, and some of them, they are requested by the users Here, I will give you an example about the [inaudible] location When the US government decided to evacuate some patients from the Diamond Princess back to the US continent, we want to denote those cases on the map However, those cases, they were allocated to different military bases or different hospitals country wide So, it’s really hard for us to track their locations Even though we can make a dot of each patient on the map, this kind of violates to the person’s privacy Because it’s really easy to find them We have a patient, you search the military base, and he or she might be the only one And it’s easier for the public to find or trace her name or his name So, it’s not very good for the privacy protection So, how we did in that scenario? We aggregate all the cases into one dot, and that dot was placed in the geological centroid of the US, which is under Kansas That might be a temporary solution because the second day, in the morning, the residence of the [inaudible] in that location emailed us and told us, we don’t have any Coronavirus in our apartments Why you place a dot on top of our apartment? Well, we recognized the situation, so we have to move these dots back to Japan, where the Diamond Princess cruise ship was However, that’s not very informative because that’s not the real locations of those patients Eventually, we adopted a technology called Null Island, which means you won’t see those patients on the map because their location is kind of privacy related At the same time, we can show the data in the list, so the total number of the patients will be counted toward the total number of cases However, they won’t be show up on the map This is our solution Another visualization for the cartography related issue is the

level of the dots So, previously, I mentioned for the US, we are tracking Coronavirus at the county level However, some places, they don’t provide the county level information For example, in Utah, some users asked us, where are the cases for certain counties? I cannot find that in the map That’s the reason because of the Utah Department of Health They only [inaudible] cases at the jurisdiction district level like here So, we follow the states guidance and only report the cases for those jurisdiction level instead of the county level The data validation This is another very important data strategy to tell us if our data is reliable and reported in real time We compare that with data [inaudible] show In the early days, you can see our bar charts are kind of matched with the next day’s bar chart of the [inaudible], which means our data is accurate But we report the cases much earlier than [inaudible] And another example is we are catching up with the latest criteria that’s [inaudible] show For example, you can see a couple of spikes Here, this is the China data in the early days The orange color, orange bars are JHU data, and the blue bars are WHO’s data We catch up the criteria change for the [inaudible] much earlier than WHO’s So, that why we have a couple of extra bars for this than WHO And we also compare when the country or region receive their first Coronavirus cases From this graph on the lower right corner you can see, all the blue countries, that means JHU, CSSE data isn’t reporting the first case of that location earlier than WHO And the red ones means we are a little behind WHO One reason is in the early days especially and generally, all the data is collected by myself manually And I do remember Australia got their first case on a Saturday morning And I was so exhausted that morning collecting all the — collecting data for the whole week, and I missed the first cases in Australia But Dr. Garner called me and asked me, hey, Ensheng, we got — and get updated for the first case of Australia, and I did it, but still a little late, on the same day So, we published our comparison on [inaudible] infectious disease, and up until now, we received more than 2,000 citations, which is very impressive For the US data, we have comparisons with other data sources For example, the US [inaudible] New York Times, and the COVID Tracking Project As you can see, most of the time, we are kind of [inaudible] with each other But sometimes, some dataset is higher than the others That’s the reason because of — that’s the reason because of the probable cases Maybe some datasets, they don’t recognize the probable cases as the total cases Another interesting example is the [inaudible] data Compare with WHO, you can see we have huge spikes compared with your reports That’s the reason we report both the cases, confirmed cases and the probable cases at the total number of cases If you take a look on the right, that’s how the French health department report their case The light blue block means the confirmed cases inside of the hospital and also at the hospital And the dark blue one, the dark blue boxes means all the cases, both the — which means confirmed cases and the probable cases also out of the hospital So, as you can tell, there are so overlap between those two blocks, the light blue one and the dark blue one And the overlap will be the confirmed cases outside of the hospital So, if we want to get those cases, which means both the confirmed cases and the probable cases inside and outside of the hospital, we need to figure out what the overlap numbers are But the French government or the major media, they never report such number So, we have to ask our volunteers whose mother tongue is French and then listen to new breaking every day and to figure out what the exact numbers is So, we update our data accordingly

Another reason why our French report is higher than WHO is we are using the US State Department Naming Convention, which means all the overseas territories of France, we are — the cases from them are counted towards the French total cases So, that’s another reason our numbers are higher than WHO Here is another example of [inaudible] As you can see, our data works perfectly with WHO data, but if you are familiar with the duel politics of the Cypress Island, then you can see there maybe several political entities on the same island We are not so sure the data collected is only for the south part or for the whole island Luckily, we have — I have a classmate, and his father is a Greek diplomat And he has some connection with the Department of Health of the Republic of Cypress From his connection, we got to know how this data is collected [inaudible] for the Republic of Cypress is that of the whole island And then, we may also facing other challenges, like can Iranian people use our dashboard under the US international law? And another interesting data collection is, should we report the cases from Crimea to Ukraine or to Russia So, because of the geopolitical issues, we have to pick up how we name them So, we decide to use the US Department of the State’s Naming Convention and solve all the problems Another challenges we’ll be facing if we want to update the data cases, for instance, in the early days, if we only used midnight, US eastern time to update the data, you will see the time that we updated the data cases, the western part of the US has not been updated their cases yet So, that maybe give you some gap And also, for some other countries like India, they are reporting their data irregularly Sometimes this [inaudible] after 3:00 AM UTC, sometimes it’s later So, when we report the data cases, we may only catch up with the old data, or we may catch up two days’ information We are constantly updating our data update time Because of that, we released all the data modification in the GitHub report So, all the users, if you have any questions on more data like the French or the India or the time or the frequency — updating frequency, you can refer to our data modification records to get more information So, ever since we started this project, everything is new because there is no other studies has been done previously for us to make a reference For example, the [inaudible] definitions is really ambiguous because there is no standard line For example, for the date, what do you name a date? Is that the report date or the onset date or the JHU collection date? So, we have different definitions on that We have to standardize that, and another thing is the discrepancies in reporting across different resources For example, in the state of Texas, they are reporting [inaudible] with one number, while the [inaudible] they are reporting following another standard So, they always have a gap between the state reporting and the county reporting How can we deal with that topic? And another challenging could be the inconsistencies in reporting medium But with all the local health departments, they are using the [inaudible] platform or certain standardized platform that will be much — make our life much more easier, but things not always the case For example, some location, they prefer to use a PhD file, that’s really hard for us to scrape the data from a PhD — sorry, the PDF file And so, metimes, they report the cases in the Facebook or the Twitter, just not very formal But we have to assign real people to look at their Facebook or Twitter social account and to get the exact numbers

This is not very convenient In the future, we hope noticing organizations about the public health, we can unify their reporting system Another concern is on the privacy Should we report the cases per county that’s less than five cases? That’s something we need to consider And then, I wanted to standardize the whole reporting system and who should report the cases Should the government work on that or local health department or third-party organizations? Those are all the things we need to discuss So, basically, we published a paper in the [inaudible] practices to discuss all those questions You’re welcome to take a look And you’re also welcome to take a look at the Johns Hopkins Coronavirus Center, where you can get more information of the data and the bot and the other latest technology to handle the Coronavirus I’d like to say thank you to our JHU CSSE Center and also Applied Physics Lab And also, Esri, they provided many technical support to our dashboard We also are grateful to our supporters Because of their support, people will see Johns Hopkins is not a medical school They see us as a Coronavirus lab school We’re simply — we received the Making a Difference Aware from Esri, and this is all our lovely team members And I’m so proud that my advisor, Dr. Garner, she was on the list of 500 most influential people in 2020 For more information, you’re welcome to visit our systems website as systems.jhu.edu Thank you >> Mike Schoelen: Hey, everyone I hope you’re enjoying the virtual edition of GIS Day 2020 In this brief presentation, we’re going to be talking about supply chain at a pretty introductory level, specifically as it relates to COVID-19 So, hopefully by the end of all this, you’ll understand how location intelligence is a critical element when designing a supply chain that is resilient but to get materials like vaccines and protective equipment to the right people at the right time But before we dive into that, who am I? So, my name is Mike Schoelen I’m a solution engineer with Esri Health and Human Services My team, alongside of our Disaster Response Program, have been working almost around the clock with federal agencies and local governments to uncover how GIS can be applied to solve the tough challenges related to supply chain during the COVID-19 pandemic Prior to coming to Esri, I was actually the contracted GIS administrator for a branch at the Defense Health Agency on the Integrated Virus Surveillance Team And then actually prior to that, I actually worked at the Library of Congress more recently as a contracted GIS administrator with the GAT project But before that, I was a research fellow alongside of Evan, Erin, and Amanda [inaudible] digitizing content down in the Geography and Maths Division So, it’s really great to be back So, while I have been around the block a good bit in GIS, I do you want to highlight something small I’m not an epidemiologist Now, I want to highlight that for two reasons Number one, don’t take any medical advice from me But number two, if you understand GIS, you don’t have to be a subject matter expert to assist in solving some of the world’s toughest challenges Some challenges are just basic GIS, and I hope to highlight that today So, before we dive into supply chain, let’s just get a quick refresher on GIS in the real world So, if you’re watching this, you might already have some familiarity with GIS But let’s go over a practical example that highlights how we interact with these systems every day If you’ve ever been house shopping or apartment shopping, and you find that right place, what factors make it good? I’m not talking about, you know, stainless steel appliances or a laundry room on like the second floor We’re thinking, is it in a good school district? What’s the distance from this house to my work? What’s the commute like, or what’s the land value like in comparison to the properties around me? Or more likely, rather than each one independently it’s some kind of weighted matrix of all three of these kind of balancing out in your head together? Well, GIS allows us to break down that conceptualization into what we call layers of information So, think of a map of the distance to the metro or the distance to the parks and the amenities

and the restaurants and then another map of the distance to your work all stacked on top of one another And then you take this information, and then you have a detailed understanding of a particular place on the map Well, the same workflow is going to be used by the government and commercial groups to understand things like, where do we put a restaurant that serves a target demographic, or where’s the best place to put a library? Without GIS, each of these pieces of information would exist separately Fortunately, geography is that glue that binds information together Everything occurs somewhere So, this is more than just making pretty maps, and GIS can do a really good job at that But it’s really about using location to make very difficult decisions quickly with a whole range of information All right So, what does all of that have to do with building a resilient supply chain for COVID-19? Well, it’s because there’s a very similar planning and information framework You’re going to have to, you know, rather than selecting an apartment and house and a restaurant, select manufacturing facilities that have the optimal distance to the demand You have to create an navigational routes for thousands of workers and vehicles through the optimal territory to save both money and energy You have to monitor your destination facilities for disasters that could further disrupt distribution And then once that chain is built, you just casually have to deal with this thing called supply and demand But to top it all off, this is going to be made significantly more complex by the specific impacts of COVID-19 Like the new and emerging hotspots of cases, the impacts of social vulnerability, and the impact that — the need for these resources, and above all else, uncertainty Because we’ve never dealt with something like this before, but not to worry, this is all spatial information If we layer our facilities and our routes with inclement weather, we can get a forecast of not just snowfall but the impact to the supply chain if one facility were to close And if we layer population estimates with our distribution centers, we can get an accurate prediction of demand at each facility And because this is occurring within a GIS, we can model it over and over again So, when something bad happens, there’s almost no impact to those at the receiving end of the chain Let’s proceed And this isn’t conceptual This is a real concern And we’ve seen time and time again where the sole manufacturer of a critical element of a pharmaceutical or rubber gloves is in a region that’s hit by natural disaster So, using GIS to plan for these vulnerabilities, it allows us to avoid the worst case scenario Like where, let me get hypothetically here, you have a COVID-19 vaccine, but the manufacturer who makes the models was hit by a blizzard And so, , you can’t get the product to the end user So, that was a good deal of background here into the fundamentals of GIS But let’s get technical, and let’s see how we can use specific tools to gather intelligence, to preplan scenarios, and then observe and attempt to keep our supply chain running And we’re going to do this for both national distribution and also for local distribution, which is a little bit different So, the map we’re looking at here is what we call a common operating picture, or a COP It’s arguably the most simple application of GIS, which also makes it the most widely used With modern GIS, the layers of information, they’re not really stored locally in most cases anymore They stream in over what we call services So, in this example, we have weather coming in from NOAA We have earthquake data — live earthquake data coming in from USGS And then finally, we have our data that we’re contributing here, which is our supply chain or manufacturing facilities and distribution centers, and the routes layered in the middle of all of it And then as the data updates from those sources, it’s also going to update here on our COP This is where GIS takes it to the next level, you can build a system or a script that says, all right, every minute, I’m going to look for the overlap between these layers So, if there’s a severe weather alert within 10 miles of a major hub, I want you to send me an alert And that’s something you just can’t do with a table of information You have to account for space and location Otherwise, you’ll see an alert for the zip code of say 21771 but your facilities and 21770, and so, you know, our table and everything’s okay

But next thing you know, you have no staff working because they all got snowed in, in the neighboring zip code And so, , the culmination of all this kind of spatial thinking is what we see on the left-hand side here If that location intelligence saying you’re not all these layers, X, Y, and Z, but facility three alert That’s it It just gives you an actionable piece of information that doesn’t necessarily even require reading a map It’s just intelligence and information And so, , that’s where they’re going to get really critical here that we can bring this data together It’s a lot like the difference of someone giving you the readout of barometric pressure, versus just telling you to grab an umbrella We want to just tell you the about it With COVID-19 supplies, we can actually even use — rather than live weather data, we can use historical weather patterns to find the most vulnerable facilities and their routes, and then start planning alternates before anything bad actually happens But let’s say hypothetically, we want to start doing this preplanning of scenarios We want to model what happens if a facility goes offline, like hypothetically, a hurricane were to shut down a distribution center Well, by layering the service population or the demand, these are — these people with our supply chain, we can actually make an accurate estimation of the impact of closing one facility So, in this case, we see that losing this facility would have caused most of Florida to become un-serviced Now that wouldn’t be great But it might be possible then in our GIS to add another facility in the same general region, and it would pick up the demand and keep things running But as we move up the supply chain, things get far more complex, and the stakes get far higher So, for example, if we move a major distribution hub from the system, it actually impacts the entire east coast And GIS can show us the exact sub-distribution sites and the communities that would be impacted if that facility went out So, up until now, we’ve kind of been acting in reaction to these events But what if kind of like I had mentioned earlier, we got ahead of the ball What if we tasked our GIS to make a plan? So, if any given facility went offline, there was a backup plan ready to go Now, again, we’re talking about the power of GIS And this example on the screen here, we’re looking at a GIS function is called location allocation So, every county in the US, the tool assigns the nearest distribution site that will be responsible for servicing it It goes over and over again, iterates, for every single county in the nation It can work at any scale These tools can even account for traffic and blockages related to the storm So, if, you know, a road gets shut down, it would maybe potentially reroute a county to a different distribution site So, what this means is that we can create an unlimited number of models, some people use the phrase digital twins, that are going to show us a few things It’s going to show us digitally, what do things look like under normal operations, which is what everything is orange here And then we see what happens if we shut down a facility, then we see another facility pick up that load And we can even calculate if that added load is going to strain the system, or if we need to, in the third case here, bring in a backup facility, which is highlighted here in green on the screen very quickly So, what we see here is there’s that word that we used in the description of this talk, it’s a resilient supply chain It’s not a supply chain where every element is unbreakable That’s unrealistic And that’s how the Titanic gotten in trouble But it’s this chain where if something breaks, there’s no surprises There’s active monitoring, and then there’s this robust plan for every single facility to keep parts moving There’s backups and reiterations of [inaudible] So, that’s where GIS is just so cool All right Well, up until this point, we were national We were successful at getting our supplies and our vaccines and our PPE, personal protective equipment, down to a regional distribution center And that might just be a few places within each state or country or county But now we need to get that last mile delivery to actual people, to the clinics, to the pharmacies, where they’re going to be used and injected and given to actual citizens Well, GIS is actually really going to help us with this as well So, let’s take that vaccination example again We know that is not going to be ready all at once It’s going to be released in phases, specifically, three phases, where first responders and those that at risk of severe COVID-19 are at the very front of the line Well, because geography, again, is the glue that binds information together, we can get a layer of hospital information to see the number of healthcare workers I mean, get a layer of market potential data to see who’s buying inhalers and who’s buying insulin I mean, get a layer of demographics from the US Census, and suddenly GIS has given us this framework and these estimated groupings of how much the population falls into each phase in the community

This is going to give local governments the ordering estimates they need to give to the national supplier to say, Hey, here’s how much of each vaccine dose that we need and each of the pieces Again, it wouldn’t be possible without being able to bind that information together as [inaudible] GIS GIS is also going to allow us to account for an ongoing situation like active COVID-19 cases So, we have here on the map is a sample dataset of COVID-19 cases, each dot is one case that someone [inaudible] And point data can be very helpful when there’s a small number of clusters or a few cases where you can easily look at a map and say, okay, it’s going on here, here, here, and here Well, with was something as widespread as COVID-19, point data maps don’t help us out too much All I see here is just the outline of the sample region But using GIS, we can actually create information products that tell them a much more detailed story We can use a process called emerging hotspot analysis We’re not going to just see a hotspot or a cumulative case count of cases forever Because that really won’t help out too much But we’re going to find these areas where things have been intensifying over time and where they are currently intensifying, which would be these bright red areas on this map So, imagine you had an area with a massive spike in COVID-19 case, but it’s since started to dramatically decline Maybe hypothetically, I haven’t seen a case of this yet We’ve reached herd immunity from just natural infection, and, you know, the most that’s happened there has happened Well, in that case, you might want to prioritize a different neighborhood where cases are still very much on the rise, again, in this hypothetical scenario Well, these are the areas where things are on the rise where the vaccine is going to be extremely helpful And it should be considered and is being considered by the counties to make these distribution decisions So, now you get to kind of wrap your head around that on [inaudible] GIS Day It’s not just about space It’s also about time as that third or fourth dimension, depending on how you look at it And finally, here, GIS can ingest data from CDC on social vulnerability We’ve seen this kind of throughout the pandemic, that those who are in highly vulnerable categories are being hit the hardest by severe COVID-19 for a number of reasons Then adding this index into our calculations, we’re ensuring equitable access to those resources for those who need it most So, let’s bring it back again to this concept of location intelligence Taking all of this information together, we’ve got the total population in each phase as a layer We’ve got a layer of the status of COVID-19 in that region, and then we have this new layer from CDC, the social vulnerability index We can combine all of these together into GIS into what is called a risk surface This is a weighted surface that tells us what communities relative to each other to focus on first, given whatever input factors we want to add to the system Now in most GIS software, you can actually even calibrate that weight So, say focusing 70% on social vulnerability and 30% on emerging hotspots or vice versa The controls can get extremely finite, even considering the capacity of how many vaccines might be available But ultimately, some pretty basic algebra just kind of put into the spatial conceptualization, which means that when it’s finally time, after all of this supply chain talk and finally getting the vaccine ready to go and it’s time to distribute it, the GIS can use the risk surface And place resources, not just where there’s the most people, which is what we saw kind of early on COVID-19 testing, testing system put in just large cities and people in the rural areas who didn’t have access But we’re actually now able to place resources where they’re needed most So, for example, imagine two neighboring towns, each with one potential distribution site, and we as the city planners or the local health department have to pick one Now one town has a very affluent population Everyone owns a vehicle And then right next door, a neighborhood is less affluent, and there’s many homes that don’t have a vehicle They might not be able to drive to go get a vaccine from as far away That’s actually a metric that influences the CDC social vulnerability index by the way But using that kind of weight within our GIS, the tool will place the resource in the town with the greatest need, which is going to skew towards this ensuring of equitable access to resources, which again, is super technical and super, you know, nerdy GIS But the impact in the real world is astronomical and so impactful Then we’ll move forward towards 5GIS, or you may hear the phrase Enterprise GIS, you can create browser-based simulations, which means that anyone with a computer now without GIS software on their laptop and without this severe technical know-how or a number

of degrees on their wall, they can now just log into a browser and see the impact of adding a new distribution facility in one place versus another just with a click of a mouse and very limited know-how in GIS And you can even build mobile applications that will navigate citizens to their nearest distribution site, which, again, is a huge part of this supply and demand thing of actually getting people to be able to access these resources And so, kind of finally here, kind of bring it all to an endpoint, we’re seeing that GIS, especially now that COVID-19 is kind of in the middle of our lives, it’s no longer just a group of people who have been working in the basement office printing paper maps all day Granted, that’s still an important role And that’s how, you know, I got my start at GIS was printing maps in the basement But instead, it’s expanding out into this capability that is so central to combining layers of information into something actionable, that can be interacted with by anyone, regardless of their GIS experience So, if you’re interested in learning more or if you have any questions, please feel free to send me an email at mschoelen@esri.com It’s on the screen right there But most importantly, thank you so much to you all, and thank you to the Library of Congress Please stay safe and happy GIS day 2020 >> John Hessler: Hello, everyone, I’m John Hessler I’m a specialist in computational geography and GIS at the Library of Congress And today, I’m going to talk about mapping the mutations of COVID-19 Now COVID-19, obviously, we’ve had some papers before of mine, which talked about cases and talked about numbers But we’re going to go a little bit deeper into the cartography of actually tracking the mutations around the world And this really gives us a really good sense and a really good idea of how the virus has moved around the planet And we’re going to use some really technical and advanced ways of doing this, which combined both cartography and geospatial data along with phylogenetics So, we’re going to begin by talking about the structure of the Coronavirus itself And this is really going to be important when we start talking about mapping mutations because we’re going to have to talk really specifically about certain structures and certain parts of the Coronavirus Now, the Coronavirus is an RNA virus And what that really means is that it’s part of a whole family of emerging viruses that of late have come to the fore We don’t really know why RNA viruses have become so prevalent in zoonotic and in emerging disease at this point But it’s just one of the group of viruses that have sort of emerged from animal hosts in recent years It has a very small genome So, we’re only talking about 30,000 base pairs, 30,000 nucleotides, and it has a super high mutation rate And the real thing that we’re going to concentrate on in a lot of the talk is really the spike protein, really, these spikes that are coming out of the capsid of the virus, which kind of give the virus really its name And it’s called the Coronavirus because it looks like a crown Now, when we talk about mapping mutations, what we’re really going to be doing is we’re really going to be looking at how the various mistakes and how the various mutations that accumulate in the virus as it has moved around the world And so, on the left-hand side of your screen right here, you see a phylogenetic tree with all sorts of colors And every one of the dots that you see there is a different genome that has been sequenced after the virus has been inside a human host And on the right side, what you see is you see how those mutations have moved around the world The phylogenetics and the mutations are color coded to the map And we’ll be talking a little bit about that in just a bit And on the bottom of your screen, what you really see is you see the sequence of the genome itself And that graph at the bottom shows the mutations in each of the different points in the genome and how many there are So, the larger the little line coming up from the base there, the more mutations and the more frequency of the mutations at those particular sites And we’ll get into this in some detail Now, all of this had been made possible just recently This is really one of the first pandemics or large-scale outbreaks of disease where we’ve been able to map the phylogenetics and the geospatial movement of the mutations of the virus around the world in real time And that’s really taking place simply because there are organizations like GSID

which are bringing together all of the genome sequences that we’ve seen done around the world into one place And it allows qualified researchers like myself to go in there and actually pull out all of those genome sequences and actually map them and look at what’s going on with them The other group that has been really important as a group called Nextstrain Nextstrain has built a platform — an open source platform, where we’ve been able to use the data from GSID to actually track in real time the actual evolution of the Coronavirus And so, we’ll be talking a lot about the data from GSID and the platform of Nextstrain So, all of the data and all of the visualizations that you’re going to see in this presentation are for the most part come from those two sources So, we’re going to begin by kind of just talking really briefly about a phylogenetic tree This is the actual tree of the coronavirus What we see here is we see all of the mutations As I said before, every one of these dots is a genome sequence They’re color coded to each of the places where each of these tests in each of these genomes originates And you’ll see that I’ve divided this up into six basic regions And we’ll be talking in a little bit more detail actually about North America more specifically than the global phylogenetic tree that you see here But really, what this is, is it shows us the relationship of the mutations So, in other words, if a virus comes into a human host, and it mutates, in other words, if the genome changes in some random way, in some place, and then I pass that disease onto the next person, those mutations will travel with that disease, with that virus entering the other person And in tracking these mutations, which are in a way in the stakes in the genome, we can track how the virus has moved around the planet And that’s really what we’re going to be focusing on here So, if we blow that tree up, so in other words what we’re doing is we’re zeroing in and kind of expanding that tree And we’re looking at a snapshot of just a specific area here And what this is actually showing us is really the first case that entered the United States, and that was into Washington State What you see on the left-hand side in the purple are all of the various mutations and all of the various genome sequences that were down in Asia and China in the earliest phase of the outbreak And so, what we can see here is we can see that red dot That red dot signifies North America Red is North America, and we can see that the first Washington state case came from China It actually came from a group of mutations that are coming from Asia And then you can see that that red dot expands into this huge mass of other red dots And that is actually tracking the virus as it spreads into North America Now, if we go to another part of the tree, so this basic tree here, the Washington case date that we just talked about ,cases down here, and we can then expand the tree to look a little bit deeper, a little bit further out Again, these are all of the mutations in purple that are coming from Asia And then we see here, this branch, which is a branch that is coming from the larger Wuhan group, in other words, the larger place where this virus originated in China And we see that it’s expanding into Europe So, basically, this particular branch here is the first cases in green that are coming into Europe And we’ve been able to do this We’ve been able to actually look at the geospatial mapping of the disease in this way simply because we have three things We’ve got the time that the genome is actually sequenced We have an idea of when the person whose genome that’s been sequenced actually got sick We have the genome itself In other words, we have a real tracking of all the mutations and all of the various nucleotides and amino acids in the virus itself And then we have where the person was So, we’ve got these three things which allow us to basically trace a network of transmission around the world Now, if we blow this up again, we — down here we see that original part that we just showed, originating in France And then we see this branch here spreading out into Europe, and these are European cases, and, of course, the red are North American cases But then we see these particular branches up at the top And so, what this actually tells us is these are the actual mutations that are coming into New York The sequences on the right-hand side, these dots sequences,

are from New York City, as are some of the cases down here And so, what this basically has allowed us to ascertain is that there are two introductions of the Coronavirus into the United States There’s one in Washington, which is coming — Washington state coming directly from China — from the Chinese cases and the Asian cases And then we see another group, which is actually coming from the European branch, and so we see these two independent introductions At the time of the European introduction, there had been flight restrictions coming from China, but there are no travel restrictions coming from Europe So, we see this introduction coming from a European source into New York So, using the phylogenetic tree, using the mutations, and using the geospatial data associated with those, we’ve actually been able to determine that there are these two important introductions into the US Now we can blow this up even more And really, this is just to give you an idea of how complex a problem tracing and tracking the virus has actually been So, we can see here, we’ve blown the — shrunk the tree back down And there is that case — that first case coming into the United States from Washington State You see a little bit below that also from the Asian branches There’s two introductions into Illinois and one into Arizona But if we really expand that Washington state case, we can see here this entire sequence of related places that these genomes come to and that went to So, we can see Texas We can see Ecuador We can see New Zealand, all of which are associated with this group of mutations that are the same as the Washington state case So, in trying to track how the virus moved around the world in these early stages, we see really the complexity of travel We see the complexity of how people in the early stages of the pandemic we’re mixing And this is something that right now we just have so much data that we’re trying to look at, it’s going to be a long time before cartographers and epidemiologists really get a handle on exactly how the virus moved during these early parts of the pandemic What’s important now is we have so much data, so much geospatial data, and so much possibility for mapping the genome, that we’re going to find out a lot of information about this pandemic, more than we’ve probably had for any other outbreak of any other disease in history And so, it’s going to be a long sequence of investigations But this is kind of the data that we’re working through right now Now, this can be even more fine grained When we talk about the Washington — introduction into Washington, that first introduction, we can see there’s actually even two introductions into actual Washington state And one that comes out of a very large outbreak, which basically had 384 sample genomes after it And then a second introduction, which is a smaller outbreak, a smaller phylogenetic tree, as you can see, with 39 samples And in this case, I’m not looking at time down the bottom here, but actually looking at the number of mutations And really, this allows us to kind of follow the trail of each of these introductions as it spreads, as it fans out across the country and across the world And so, when we start looking at these and we start looking at how it travels from Washington state and then moves across North America, it’s a very complicated network But we have the data to actually be able to figure it out And that’s really what we’re kind of mapping at this point Now, why is this important? Why is it important to really track the — and map the mutations? Well, one of the things that COVID-19 is, it seems to be optimized in its jumped from bats or pangolins or whoever it made the jump — whatever animal it actually made the jump from, it seems to be optimized to bind to what’s called an AC to human receptor And this is the way that the virus actually links itself up with a human cell And this respect, this linkage happens when the virus cell or when the virion, one of these little pieces of virus, actually links itself up with a human cell And it does that by attaching its spikes Protein, and these are the spikes

And there’s particular proteins that are in these spikes, and it also happens to be one of the most variable parts of the actual genome This is a computer reconstruction, a molecular reconstruction of what the molecular structure of the Coronavirus looks like and as it is basically coming to attach itself to this ACE2 receptor in human cell Now, one of the things we’ve been able to look at is a particular mutation in this area So, this is the actual spike proteins And one of the things that’s been happening is there’s been mutations occurring in this particular region of the genome, in other words, in the protein’s — that structure, the spike proteins And in the last couple of weeks, there has been a couple of very important papers on a mutation called D614G And this is basically an amino acid change that occurred at the 614 location in the genome And it’s been a very important mutation to track, simply because it doesn’t seem to be affecting the mortality or the severity of the disease, but it does seem to be giving us hints that it is making the disease — allowing it to attach to human cells in a much more efficient manner And where this actually occurs — so this is a blow up of the spike protein And so, it’s occurring at some very important locations in the spike protein This is where the receptor binding domain is And these top proteins here, these top areas are really where the first contact of the virion comes into when it attaches it to a cell Now, if we decide to map this genome, this — the genome and this particular mutation, what we can see is that — now we’re not — we haven’t colored this based on geography I’ve colored it based on the switch in mutation So, the turquoise at the bottom keeps the D amino acid And then at the 614 location when it mutates into — when it mutates, we see the yellow here And this is really important because what we’ve been able to look at is when we look at the cases and you look at this, you can see that there’s a huge number of cases after it’s expanded out of Asia This encompasses almost all of the European and North American outbreak And what we see, and we can see this in the pattern of actual cases in the early part of the pandemic, we don’t see very many cases with this mutation But then when we get into July, we see a huge spike in this mutation traveling through the cases that we’ve been able to look at And so, in mapping these mutations, we can start to look at exactly the way the disease has changed, where it has changed, how it has spread, and then some of the important mutations that have been occurring around the world and really what their effect has been happening on the transmissibility of the disease itself And as I said, all of this has been done by the combination of really complicated phylogenetics and GIS and mapping resources It’s important to understand that these are in fact models We don’t have all of the data So, all the conclusions that I’m drawing are based on the fact that I’ve only looked at a certain percentage of the genomes Obviously, not every person who’s had Coronavirus has had their genome sequenced And we’re really only talking now about 150,000 complete genome sequences of the millions of cases So, we’re working with limited data But we think the trends that we’re looking at are predictive and representative of what’s going on in the cases around the world Now, the last thing I’m going to mention is the fact that we see really five basic clades In other words, five basic overall families of mutations that have occurred during the pandemic up until now There’s the two branches down here, the one in turquoise and the one in blue, which are really prevalent in Asia during the first months of the pandemic Then we have this major clade, which is called 20A, and this is where the European outbreak begins And we really see two European outbreaks, a really large one and then a smaller one that happens a little bit later

And then, of course, up here called 20C is the North American outbreak Now, when we map these, when we actually map the representation of these clades across the planet, which is shown on the left-hand side here We can see, for instance, in the United States that this particular group of mutations is, of course, dominant in the United States, but we still have representative mutations of all of the other clades in the US and pretty much around the world You can see that even though that these clades began and are somewhat geographically isolated and prevalent in particular areas, that they have mixed across the entire spectrum of the world So, this is really where we are right now in the research and in the network mapping of the coronavirus based on mutations As I said, it’s a very complicated and data rich area of research tight now We’re running lots of different kinds of mathematical models and Bayesian analysis on this There’s some machine learning algorithms that have been started to work into the status, so we can map it a little bit more precisely But in effect, at this point, obviously, the cases are growing The data is growing, and we, at this point, see no end in sight to the analysis and data that we have available And I want to thank everyone for listening And I want to thank you all for coming in and participating in the Library of Congress’s GIS Day for 2020 Thanks very much