Azure Responds to COVID-19

>> Hi. My name is Mark Russinovich I’m Chief Technology Officer at Microsoft Azure I’m going to give you a behind-the-scenes look at how Azure has responded and continues to respond to the COVID-19 crisis This crisis truly is a humanitarian crisis Tragedy has been unfolding all around us At the same time, it’s forced just about everybody on the planet to adapt to new ways of living Whether that means now working from home instead of going in and office place, whether it means teaching online and learning online, instead of going into a classroom, or staying connected with the friends and families virtually, rather than physically The rise of digital technology usage has been dramatic, and I think it promises changes that are here to stay It’s also had a huge impact on our health care systems The rise of digital technologies to help us respond and them respond to this has also been instrumental Huge amounts of data are being collected all the time that we’re rushing to analyze to better understand how the virus is spreading, how to mitigate the spread of the virus, how to provide the most effective treatment options, and we’re of course, on a race for cures Throughout the crisis, Microsoft and Azure have been focused on prioritizing our most critical customers Those customers, of course, are the ones that are at the front lines of this crisis as it’s been unfolding It is the first responders, it’s the doctors and nurses in the hospitals It’s the emergency management services It’s also the critical government infrastructure that we rely on to keep society operating From Microsoft’s perspective, we want to make sure that our technologies can help people adapt to this new world of living, and learning, and working online So in this presentation, I’m going to start by talking a little bit about some of Azure for good programs Those programs where we donate resources and expertise to help people that are working to find ways to address the crisis, then I’ll talk about our response framework As we saw, people shift the way that they work and live to online modes We saw a huge demand for Azure resources and Microsoft service resources So I’ll talk a little bit about the response framework we applied, which is meeting immediate demand, and then optimizing Then I’ll get into some specifics to show you just how that surge impacted our services, including our global one and our networking services, and how we responded to increase network capacity, to how we saw huge spikes in usage for Windows Virtual Desktop and Teams How we responded to scale up and then optimize the usage of those resources so that we can have Azure resources for use by critical customers in other domains How we also applied security services and expertise to help our customers respond to new threats, threats from actors trying to take advantage of the situation Then I’m going to conclude by taking a look at confidential computing technology that’s been innovated in Azure and that we’ve been working on for several years that could help combine different data sets and service of getting better insights into all those ways that we deal with the virus Let’s start by taking a look at Azure for good Azure for good as founded on four guiding principles just like Microsoft for good programs We don’t want to overwhelm the people that we want to help We want to be outcome-driven where we’re focused on the most urgent priorities In the case of COVID-19, it would mean flattening the curve and improving health care system capacity We also want to make sure that we’re bringing truly unique Microsoft value to affect outcomes Then finally, we want to be opening collaborative so we can involve and participate with the whole community One of the projects we launched is an employee hackathon We had over 1,000 employees participate delivering over 180 different projects across a variety of different areas Some of those projects have actually already made it into production The one you’re looking at here came out of a 10-person Team in Taiwan This one is focused on helping hospitals detect when people are coming into the hospital that might be at risk, those people that have high temperatures and aren’t wearing masks So it consists of AI models that detect face masks as well as high temperatures It’s connected up to Azure so that it connects to apps on PCs and on phones to let health care workers know when somebody’s entering that might be at risk It’s actually been deployed at the 300 bed Cardinal Tian Hospital in Taipei Since it’s deployment, the lines at that hospital to get in are much much shorter now that this automated system can make sure that people are coming in safely We’ve also been developing and delivering AI health bots These health bots are built on top of a bot service,

which is as az app for building AI Bots that interact with people in a conversational interface through websites or applications This frees up doctors, nurses, administrators, other health care professionals to provide critical care because patients and potential patients can be interacting with the bot to get answers to their questions You can see two of the healthcare bots here that we’ve deployed, a CDC self checker bot that you can go and use to see if you might be at risk of having the virus COVID-19 plasma lines bot, which can be used to see if you’re a candidate for donating plasma to help other people We’ve had over 1,300 deployments of the AI bot, including at the UAE ministry of health and in the state of Washington Just like the other services that I’ll be talking about here, this is built entirely on top of Azure You can see here the architecture of the AI bot service, which consists of SDKs at the heart and AI bot service that makes use of other Azure services like Azure Kubernetes Service and CosmosDB, and then connects out to apps and websites like I mentioned, that provide the interface and hosted for interacting with the bot service One of our for good programs is donating resources and expertise, in this case, to ImmunityBio, a late-stage biotech company that’s focused on immunotherapy programs in oncology and infectious diseases They turn their attention to trying to understand the COVID-19 spike, which is the receptor that opens up and connects to the H2 receptor on cells in the respiratory tract to cause an infection By understanding how the spike works, that could lead to vaccines and better understanding of how the virus might mutate We donate at 1,800 NVIDIA V100 Tensor Chore GPUs, which together created 24 petaflops of computing power This allowed them to simulate 198 microseconds of molecular dynamics modeling of that spike in a single day Now that sounds like a long time for a short period of modeling time, but what it let them do was process in two months what would have taken them a three and half years on their existing resources In a similar way, folding at home project came out of Stanford for leveraging donated compute resources from home PCs to study proteins focused its resources on COVID-19 It’s arguably the only exit flop computer in the world because of all the donated compute capacity, including compute capacity from Azure We donated hundreds of GPUs to help them also understand the spike protein You can see right here that they were able to model the spike protein opening up what you can see it doing in the video there as it attaches Also I made available, if you’re interested in donating your compute resources in Azure, a arm template, Azure Resource Manager template that you can go deploy into Azure, and that has installed the folding at home client so that your VMs that you donate can participate in further understanding the virus Another for Good Program is in the Team space So Teams of course used in online learning and collaboration, but we also see potential for teams to be used in events like graduations Faculty with the A1 faculty license for Office 365 can host graduation ceremonies on top the Teams platform, including with Minecraft created campuses So it’s free for up to 20,000 attendees at live graduation event, up from the 10,000 limit that we’ve previously had in Teams for a short time to support these graduations that are now going to be remote Now let’s talk a little bit about our response framework We published a series of blog posts on the way that we responded Like I mentioned, this talk is going to go one level deeper to show you behind the scenes in the engineering side of it, how we responded When it came to the Engineering Teams and the response, the first thing that we absolutely had to do was meet the incredible spike in demand that we saw for Cloud services of all types, from IaaS to PaaS to SaaS As lockdowns went into effect, people started to work from home and teach and learn from home We saw Xbox spike, we saw Teams, of course spike Our first goal was therefore to meet that demand by addressing capacity in those regions that were the hardest hit Now after we scaled other services to meet that demand, the next thing we did was start to forecast the demand that we’d see if the spike continued at that rate it was on We wanted to make sure that we were prepared for that potential case So we immediately started to implement brownout controls

I’ll talk about some of them as we go along Now, most of these brownout controls we never had to implement, but we had them there just in case the demand exceeded our ability to keep up with capacity Then finally, we went and started optimizer services We wanted to free up capacity as customers continued to adopt our Cloud Services to make room for them So we went into our services like Xbox, Teams, and Windows Virtual Desktop and others that I’ll talk about, and optimized their CPU usage as well as their Cloud Service resource usage We also shifted their loads from the hot regions into ones that had available capacity so that our customers would have capacity in those hot regions The first area we’ll take a look at that we apply the framework to is our network, and our network is one of the largest in the world We’ve got over 160,000 miles of fiber and subsea cable connecting our 61 regions together When network packet enters our network on one region and traverses to another one around the world, it stays completely on that dark fiber network that is managed and owned by us To connect our network to the outside world we’ve got over 170 edge sites, sites where traffic comes in from ISPs and corporate networks into our backbone Within our data centers themselves, we have over one million miles of fiber connecting our servers together The number of regions we’ve got is something that is differentiated in the Cloud We continue to add them all the time Just in the last two months or so, we’ve added three new regions; one in Italy, one in Poland, and one in New Zealand One of the services that saw an incredible spike in usage is our Azure virtual private network service, Azure VPN VPN services allow corporate users to connect into their Azure networks, which then connect to their corporate networks securely over encrypted channels Leveraging Microsoft’s backbone to talk between their edge sites themselves, their corporate sites that are in different parts of the world, or from their home PCs We’ve got multiple VPN applications, clients that customers can download, and we saw a 700 percent increase You can see here on this chart that tracks downloads over time starting in February and going up into mid-March of downloads of those VPN clients, and we saw a 94 percent growth in connections to the Azure VPN service during that time We also saw incredible growth in WAN usage Traffic connecting our regions together and connecting our regions to the outside world spiked by 40x, and we saw, coincident with that, a 50 percent increase in DDoS attacks, where malicious actors were trying to take down our corporate customers, which we defended through Azure DDoS service Fascinating, you can see a direct correlation between lockdowns around the world, and an increase in WAN usage Here on this graph, you can see China, which lockdown in January 23rd in Wuhan immediately followed by a surge in WAN traffic that continue to grow out through May Similarly, in Italy, you can see some WAN usage leading up to the March 9th, lockdown date, and immediately after, a 300 percent in growing increase in WAN usage Then finally, in the United States, you can see WAN usage that immediately jumped by 60 percent in some cases, after the US lockdown in early March To meet this demand, we scaled out our WAN We started by adding 12 new edge sites around the world so that customer traffic could enter our backbone as quickly as possible and not have to traverse between regions We also increased our peering capacity by 25 percent, and that meant signing contracts with ISPs and deploying network gear to expand the network capacity into our networks You can see on this chart the growth is pretty modest up until the US lockdown and then continues to accelerate after the US lockdown and other countries began to lockdown and increased in network traffic grew In total, we added 110 terabits of increased capacity to our WAN in less than two months Just to give you a concrete example of how we increased our WAN bandwidth, here you can see the four network cables that we’ve got connecting the European and US continents One of them, the Marea cables, one that I’ve talked about before One of the largest subsea cables on the planet, in terms of the bandwidth that it supports If you take a look at the network traffic on the Marea cable, you can see that it’s pretty constant right up until the US Europe lockdowns, and then we saw a massive spike, and so we needed to increase capacity on other cables The AEC cable that you can also see in the picture is the one that we focused on

At the time we ran into this surge in demand, the AEC cable was closed, meaning we didn’t even have the physical ability to add extra bandwidth to it What we required was networking gear on either side that supports dense wave digital multiplexing or DWDM traffic, which is what traverses over those cables What we did, we had another project in another region around the world that was being deployed with new DWDM equipment that we borrowed for expanding the AEC cable, opening it up on both ends We did that in less than six weeks, and we’re able to expand the capacity of the AEC cable, which relieved pressure on the Marea Cable The other thing that we did to optimize our WAN traffic in the optimization phase was to load balance traffic If you take a look at all the ways the traffic flows into and out of the WAN, we’ve got the Internet, we’ve got our edge sites, and we’ve got our datacenters connecting our regions together We have a tool called ORCAS that we’ve developed that lets us model network traffic and take into aspects like constraints and forecasts on network bandwidth usage, time of day changes and shifts in network usage failure modeling So what happens if link goes down? What happens to the traffic and how do we reshape it? You can see from the picture that we had many links that were incredibly highly utilized Using ORCAS, we were able to model changes to the network flows across regions, deprioritizing traffic, shifting traffic to different times of days, and with that tool, able to smooth the usage across those links and get everything into a healthy state One of the services we relied on to help manage our traffic across our WAN is Azure traffic manager, native Azure service Azure traffic manager lets you create DNS based profiles to route traffic from outside of Azure to regions that you want it to go to For example, your users that are in Ireland can have their request routed to a deployment of your application in our Irish regions User that is in the UK would get routed to UK region As the spike happened, we saw a 46 percent month over month increase from January to March of ATM usage What this meant is that we needed to expand capacity across regions for ATM The team were spending morning and night checking demand, working with our services inside the company to help them shape their traffic across our regions One of those teams was the Team service itself When you look at this chart, you can see that as users wake up and go to work after a weekend, there are spikes right in the beginning of the morning as they log in, they start Teams calls to talk about what they’re going to do in the day, and the spikes therefore are offset by several hours across regions because of the time zones they’re in To relieve pressure on our WAN and our hottest regions during those times of the start of the morning, the Team service working with the ATM service came up with a plan to shift some subset of traffic for Teams from the hot regions to the more lightly loaded region nearby What they realized as they were coming up with this plan is that they were missing the ATM feature they wanted for this, which is a time-based profile, one that where you could say at eight o’clock in the morning, shift 20 percent of the traffic that would normally go to this region to another one So they came up with an innovative idea which is to use another Azure service, Azure runbooks, to create a scheduled task that would go and update the profile at the appropriate times, and with this, they were able to help smooth traffic across our WAN and relieve pressure on those hot regions Now let’s talk about how we optimize some of our services and how we put it in kill switches and brownout switches to help relieve capacity The first one is Microsoft Teams; the service that saw the highest spike in usage, achieving 75 million daily active users, 200 million daily meeting participants, and 4.1 billion daily meeting minutes You can see very directly on this chart exactly how lockdowns started to shift people until making heavy use of Teams Now here you can see the Teams architecture Teams is 100 percent built on top of native Azure Services and tightly integrated with Office 365 So you can see the clients across the top You can see the core team services in the middle of the picture It leverages microservices all around it, from Office, from Azure Active Directory, and its own microservices that manage chats and video streams One of the microservices that they focused

on was the chat and presence microservice This is the one that’s responsive for letting people know when other people are active, for example, typing It’s the one that will show the animation on the three dots to show that they’re actually typing something To relieve CPU usage of that service, they decreased the refresh rate So meaning reduced the polling rate that they would check to see if somebody was actually typing They also added a kill switch for read receipts, which is the animation that shows that somebody’s actually received and read a message that you’ve typed to them The kill switch was used in the hardest hit regions, at peak load to reduce CPU usage In fact, both of these kill switches for typing indicators and read receipts, enabled a 30 percent reduction in cores during peak load The next service that they focused on was Exchange Now Exchange is used by Teams to go fetch calendar information They added a kill switch to stop getting the next week’s calendar, and that reduced calls to their Azure Active Directory Graph service by 25 percent, reducing load on Office 365 They also implemented code optimizations, for example, there’s a service that relies on a Redis cache to keep track of chats and that Redis cache had a deployment per data center They re-architected that to have one Redis cache for an entire region, and that gave a much higher hit rates for caching, which gave back a substantial number of cores Lastly, in the logging and metrics area, they switched to a last known good solution for metrics and heartbeats, which saved two orders of magnitude of logging volume Then like I mentioned, they leveraged the Azure Traffic Manager, time-based profiles to shift load between regions and reduce utilization of our WAM The next service that we optimize was Windows Virtual Desktop, which also saw a giant spike in usage as lockdowns happened and corporations went from working on PCs in an office environment to working on virtual desktops in the Cloud We saw a 3x growth in just four weeks, and Microsoft itself became a heavy user of Windows Virtual Desktop to support our employees Windows Virtual Desktop is built 100 percent on top of Azure, just like Teams is You can see the architecture for it here To optimize usage and to meet demand, one of the first things we had to do was scale out the cluster partition, which is how traffic enters the Windows Virtual Desktop service and where the RDP traffic flows through So we added more gateways and front-ends per cluster We added additional clusters per region We deployed to more regions, and more regions are coming because customers want Windows Virtual Desktop servers to be in the regions that their corporations are in We also optimized just like the Teams service did So one of the services that was being hit the hardest is an Azure SQL Database instance that stores information about what Windows Virtual Desktop sessions are active That service had inefficiencies in the way that it optimizes indexes, and so by optimizing the indexes, they were able to drastically reduce the CPU usage load on the Azure SQL Database and avoid having to scale out The team also made other optimizations, like leveraging the read-only replicas of that Azure SQL Database instance to serve queries, instead of going to the primary all the time That also help reduce CPU load Then finally, they increased client-side caches on the front ends services, so that they could avoid going to the SQL database altogether Finally, they also rebalanced traffic routing to nearby regions to help reduce the load on the WAM Now another service that I’ll talk about is an Xbox The Xbox Live service is one that people are turning to, to socially connect, to play online, and also to entertain themselves We’ve seen since the lockdown a 50 percent increase in multiplayer usage, and a 30 percent increase in concurrent usage, meaning the times when people are playing at the same time We’ve seen also a 50 percent increase in daily new accounts All of this, when you chart it on time charts, just like you’ve seen me show previously, correlates very strongly to when countries went into lockdown For example, here you can see the city of Wuhan lockdown and usage of Xbox Live apps and games went up to over 180 percent after that In Italy you see the same pattern; lockdown happens, usage went up to more than 180 percent Then finally, in United States usage continued to grow after that Another way of looking at is the number of daily active users; the number of users that are playing

Xbox games every day consistently You can see the same thing happened here, where the number of people playing Xbox games every single day has grown to a new level Now, Xbox is a little bit different than our other services in that, we’re able to move microservices out of a region to other regions, potentially in different geographies For example, we shifted our authentication authorization service back to the US East region from Dublin in order to relieve pressure on the Dublin region In Asia and the EU, we limited our capacity footprints of the PlayFab multiplayer party chat and game sessions, by moving meaningful amounts of the user activity to US East and US West We also scaled out to meet demand because the Xbox Live service built on Azure is a microservice-based architecture, which allowed us to have targeted scale out of the busiest services The result of these optimizations, allowed us to return tens of thousands of cores back to Azure for use by our other customers In addition, we wanted to make sure that Xbox traffic from Azure to home environments wasn’t competing with corporate traffic or teaching and learning traffic So what we did is work with ISPs, regulators in the game publishers to ensure that game updates and releases were only happening off business hours in the particular regions that they were targeting Now let’s take a look at how Azure responded from a security perspective Of course, there were many new challenges that our corporate users faced as they shifted from an office environment to working from home as well as IT and helping manage this With unmanaged user devices, new phishing and ransomware attacks that were taking advantage of the COVID situation, requirement to collaborate remotely but yet do it securely, scaling VPN endpoints and monitoring the traffic across those VPN endpoints Then also leveraging internal business applications from outside the corporate firewall During this increase in usage of Cloud services, we achieved 300 million active Azure Active Directory users We saw a 30 percent growth in authentication requests, and we saw a 21 percent growth in third party SaaS app monthly active user growth, meaning the number of users leveraging third-party SaaS apps, authenticating to it from Azure Active Directory You can see here the Azure Active Directory architecture, which consists of a number of different core services One of them that’s incredibly relevant during a work from home environment where we need to be even more secure than ever is the multi-factor authentication service We’ve seen that multi-factor authentication blocks 99.9 percent of identity attacks, and it’s not surprising, therefore, that we saw 250 percent spike in usage of MFA One of the services that you see in the picture is the application proxy service Another service, it’s on incredible growth in usage as lockdowns happen What application proxy does is allow you to publish web services and line-of-business apps from inside your corporate firewall, so that they’re accessible from outside using Azure Active Directory as the way to authenticate them You can see here on this time-based chart, just like the other ones, early March, the US lockdown, we saw a 100 percent increase in global traffic and a 40 percent increase in monthly active users of this service Other service related to security that saw a massive spike in growth is Microsoft Cloud App Security That’s Microsoft’s Cloud access security broker service This one instead of making your line-of-business apps available through Azure Active Directory authentication from outside the corporate firewall, actually makes available Cloud services to your corporate users through a gateway that controls and monitors their access It also includes a discovery system that allows you to see what Cloud apps your corporate users are using, but where we saw the big spike in growth was in that gateway capability That gateway capability grew by 100 percent month over month in proxy load, and you can see the MCAS proxy architecture before we started to see this giant spike in demand You can see here, it has completely independent deployments across our different datacenters The team had already been on a journey to re-architect it to be more efficient, and instead of duplicating resources, consolidate them here at the gateways, so that traffic could go to the appropriate datacenters and the appropriate regions that we could

leverage the scaled out front ends across different regions, and that way relieve pressure on the hottest regions To support that 100 percent growth, we added four new proxy regions By consolidating the architecture in the way we did, we were able to reduce resource consumption by 75 percent What that meant is all those freed cores got returned to Azure customers I mentioned before that we saw an increase in denial-of-service attacks on our customers We also saw new campaigns that were COVID related We saw new techniques being deployed, like Excel VBA stomping, which is a new technique in malicious docs to try to get users to execute code Thirty new crime families in all are what we’ve seen so far In order to protect those customers from these malicious attachments, whether they’re documents or remote access trojans that attackers would then use to gain control of a target’s machine, we leveraged a service that we internally called Sonar It’s really a detonation service So when a user gets an attachment, or when they try to execute a piece of code on their machine with Microsoft Defender ATP, that piece of code or attachment gets sent to a Microsoft Cloud Service, deployed under a virtual machine, which is a lockbox where it actually detonates or runs and we see what happens We see if it’s talking to malicious websites We see if it’s modifying the system in a malicious way, and that let’s us understand, is this malicious or benign? For those malicious ones, we get signatures for them and then we start to block them and notifier our users that they’re being attacked, and you can see examples of that, both from URLs and attachments and malicious executables here being blocked to protect our customers In a first, we’re also making those signatures available to our customers Any customer that is using Microsoft ATP, Advanced Threat Protection, or Windows Defender Advanced Threat Protection, automatically it’s protected with these signatures But in order to help customers that aren’t using those services defend themselves against these new threats, we’ve published the hashes for them in a GitHub repository here, so that they can be incorporated in other security services that you might have deployed Similarly, we’ve got a service called Azure Sentinel, which you can think of as a Cloud-based sim Sentinel lets you pump data into it from a diversity of sources, including Microsoft’s own services, Azure, Office 365, also your own on-premises sims and event logging systems can pump data into it, and you get then a common security data pool that you can apply machine learning algorithms and queries that look for specific threats What we’ve done is take the queries that are relevant for online collaboration, hacks, and signs of compromise, and publish them so that customers can use them directly on their Sentinel deployments You can see here a number of different Teams detections, which include new users being added, new applications being added to Team sites, and we’ve got an equivalent set of detections for customers that are using Zoom Now that I’ve told you how we scaled our services, met demand, I’m going to turn my attention to something slightly different, which is an Azure innovation that has huge potential to help us understand and fight the COVID virus That technology is confidential computing To understand it, you need to look at the way that we’ve been protecting data up to this point, where we protect it when it’s at rest, when data gets written to disk, gets encrypted We protect it when it’s in transit When it’s being sent between different servers, we use TLS, for example, to keep it encrypted so that nobody can snoop it on the wire What we’ve been missing up to this point are technologies that allow you to protect that data while it’s in use If you consider, for example, an analytics job as it’s processing that data that’s encrypted at rest as it reads it in to the server to process it, that data ends up being in the clear What that means is that somebody with physical access to the server might get access to that data, a malicious insider might get access to that data, the administrator to the server has access to that data What confidential computing promises is protection against all these threats It does so by leveraging a technology called trusted execution environments, or TEEs, which are enclaves, or black boxes into which you can store the code that will process the data That code, once it’s in the black box, can’t be accessed from anything outside, not the operating system,

not the hypervisor, not the host OS, not even people with physical access to the servers That means that your analytics jobs can run inside that TEE on the data, and yet that data is effectively encrypted while it’s in use When you take a look at applying confidential computing to COVID scenarios, there’s a few that come to mind One is contact tracing, where you can combine different datasets, for example, Bluetooth Low Energy tracking, plus GPS tracking, plus information from travel systems like airplanes where users have been so that you get a richer view of who might have been infected or just how the virus is spreading One of the concerns with all the different data sources that are available from different healthcare providers, government organizations, corporations, is that their privacy concerns are limiting their willingness and ability even to share those datasets with each other and get deeper insights out of them Then finally, if we could combine these diverse datasets, we could also get better insights into treating the symptoms and understanding the behavior of the virus Here’s one concrete example that I want to show you on real datasets that can be combined to get deeper insights Now, these are both public datasets You can see a COVID-19 case database set from the University of Montreal available on GitHub, and on the right side, dataset from the Radiological Society of North America This one doesn’t have anything to do with COVID, it’s actually pneumonia labeled chest X-rays But by combining these two, we can increase the accuracy of COVID detections, because they won’t be confused with pneumonia detections In the case of these datasets coming from different hospitals or different organizations that might be unwilling to share their data or unable to share their data with each other, they can do so in a completely private, [inaudible] preserving way using confidential computing So let’s take a look at the architecture of the demo You can see here in this slide we’ve got the two hospitals, hospital A and hospital B, they each have their own subscriptions with their own datasets Now, hospital A is going to be where we run the confidential computing experiment So what we’re leveraging is a DSv2 confidential VM This is built up using Intel SGX hardware, SGX being a type of hardware enclave that will protect the data while it’s in use, from even physical access to those servers You can see that that is at the center of the picture, and that’s what we’re going to run our ML training algorithm Azure Machine Learning is what we’re going to use to create a learning pipeline that will take the data from those two hospitals and deploy it into that confidential VM where they get preprocessed, and then trained, and then what will come out is a model That model then is encrypted and made available to both those hospitals So now they can both take advantage of the deeper insights without having seen each other’s datasets Just because that VM is in hospital A subscription, it has no access to the data from hospital B as it’s being computed over in that confidential virtual machine Now, here I’ve got Visual Studio Code open, and you can see I’ve got several Python scripts here The Python script one here, is going to normalize that dataset from the University of Montreal, and it’s going to normalize it into a set of features Remember this is the COVID-19 dataset You can see here is the dataset itself with the metadata in it with the different columns, including the patient information Now, we’ve got a similar script for hospital B, preprocess script two, and then we have this merge Python script, which is going to take the results of those two preprocessed output files and merge them together into one dataset Now that dataset’s going to be computed over using a Docker container That Docker container is going to be running a version of Linux alpine 3.1 and you can see there, that leverages a new capability that we’ve got, which is confidential Linux containers With confidential Linux containers, you can place a Linux operating system and Linux compatible code into that enclave and process it What we’re doing in this case is adding TensorFlow and a bunch of dependencies for the machine learning training into that container Now, when we launch the container, it’s going to be ready to process that AML in that AML pipeline that we use a script to create That script is going to do the preprocessing, do the merge, do the training, and then get the output model The next step is to have each hospital encrypt and upload it’s data That encryption process is going to ensure that nothing but that confidential container has access to the data So that script ensures that it’s

talking to that confidential container, releases a key to it that allows that confidential container to decrypt its dataset Here you can see the hospital B also doing the same thing, and then here you can see after we’ve done the deployment, the AML pipeline I’ve already run the training job and so you can see the green ticks on the various parts of that training pipeline to show that they’ve completed Once they’re completed, we have our confidential model That model, we don’t necessarily even want to make it available to anybody, but we can release it to the hospitals Also have it run in confidential enclaves, and then now they can submit queries to it, including chest X-rays to ask the model, is it COVID-19 or not? What you can see here is that we’ve deployed to the model and got back estimates on whether it’s those models or those chest X-rays show pneumonia or COVID-19 So again, what we’ve just seen here are two hospitals fictitiously set, share these datasets that are different, coming from different sources in a way where neither hospital got access to the other’s data, and yet they were both able to take advantage of the better insights into prediction that resulted from combining them So that brings me to the conclusion of the presentation Azure was really architected to scale like this From the people, the architecture, the processes that we’ve got in place, the 61 regions that we have, the trusted and compliant processes that we put in place to meet customer demand and understand the needs of our customers With 200 services and the hardware and supply chain innovations that we’ve got and the elbow grease we applied, for example in the case where we pulled in DWDM hardware to expand our WHAM from another region and borrowed it to expand capacity of our subsidy cables, are all built on top of the expertise we’ve had in running Azure for the last decade So with that, I want to thank you very much and thank you for having confidence in Microsoft and I wish you-all health and safety during these challenging times