Superpowers with TensorFlow.js (TF Fall 2020 Updates)

JASON MAYES: Hello, everyone I’m Jason Mayes, Developer Advocate for TensorFlow.js here at Google, which basically means if you’re using machine learning in JavaScript, there’s a good chance our paths will cross Now, today, we’ll talk about how we can achieve superpowers in the browser with TensorFlow.js So let’s get started m the first question you might ask yourself is why would you want to do machine learning in JavaScript in the first place? And JavaScript has many unique selling points to consider here So let’s look at some of those First, JavaScript enables you to use machine learning anywhere that JavaScript can run And that includes the browser, service side, desktop, mobile, and even Internet of Things based devices And if we dive into each one of these stacks in more detail, you can see many of the technologies we know and love On the browser stack, we’ve got modern web browsers Server side is driven by Node.js, React Native for mobile native apps, Electron for desktop native apps, and of course Raspberry Pi for IoT Now, JavaScript is one of the only languages that can run across all of these devices without extra plugins being required, giving you the ability to deploy and run anywhere with one code base And that’s very powerful stuff Now, with TensorFlow.js, you can run or retrain via transfer learning or write your own models completely from a blank canvas, just like you can do in Python with the original TensorFlow, but in JavaScript And with this, you can use it for anything you might dream up– things like sound recognition, gesture-based interaction, sentiment analysis, conversational AI, and much, much more Now, there are three main ways we can use TensorFlow.js based on your familiarity with machine learning, JavaScript, or both And the first way is to use our pre-trained models These are easy to use JavaScript classes that can be used for many common use cases There are many situations where you don’t need to train a brand new model from scratch, and instead you can leverage existing work that exists So let’s take a look at some of those Here you can see several popular premade models available with TensorFlow.js today– things like object detection or body segmentation, the act of classifying each pixel in an image to determine if it belongs to a human body or not Or what about pose estimation to understand where the joints in the human skeleton may actually be? There’s various natural language processing models such as sentence encoding or our BERT Q&A model and many, many more Let’s see some of them in action First up is object detection This model is using a model known as COCO-SSD behind the scenes and is trained on 90 common objects As such, it can recognize those objects in images and provide us with the location of each object with the bounding box information it also exposes to us, as you can see on the image on the right Now, notice how it can detect multiple objects and multiple classes of objects at the same time This is different from image recognition, which understands something might be in an image, but it won’t tell us where or how many And this is why COCO-SSD is super useful So let’s see it in action with a live demo Now, if we [INAUDIBLE] to the web page that I’ve created, you can see COCO-SSD running in the browser I can click on any one of these images like so, and we can see we get real time classifications coming back And notice how it recognizes the dog and the ball and the cup– all different types of objects with high accuracy and the context of where they are in the image But we can go even better than this We can enable the webcam, and we can see what’s happening in real time And here I am, talking to you live right now today And you can see it’s recognizing me as a person with 90% confidence, which is pretty damn good Now, of course was really good to note here is not only is this working really fast in the web browser with a high frames per second, it’s able to do this all in the browser on the client side So none of the webcam imagery is being sent to a server for classification All the inference is happening locally on a client’s machine And that’s really useful for privacy, because all their data stays on the client machine, which is very top of mind these days OK, back to the slides Now, next up, we’ve got face mesh This model is just three megabytes in size, and has the ability to recognize 468 facial landmarks on the human face And not only does this work super robustly, we’re starting to see real world use cases of people using this in production, too On the right hand side, we see a demo by Modiface, who’s part of a L’Oreal group, for AR makeup try on In the image on the right, it should be noted that the lady is not actually wearing any lipstick Instead, this demo uses TensorFlow.js combined with WebGL shaders to augment the chosen color onto the person’s lips in real time in the web browser And let’s see some face mesh in action Because it’s actually a really cool model, so I want to show you it live today as well So over to the demo OK, so now we can see the demo And you can see face mesh running live

in the web browser And on the left hand side, you can see the machine learning action highlighting on my face where it believes the key points to be But because this is JavaScript, we can go beyond just during the raw machine learning, and we can combine this with 3D graphics such as Three.js to produce a beautiful point cloud on the right hand side as well, which can see me dragging around and rotating in real time at the same time as I’m speaking And you can see how it reacts and updates in much the same way And JavaScript is very powerful for this kind of stuff It’s been designed for the presentation of information from day one And its very, very rich libraries in 3D graphics, data visualization, and so on and so forth enable you to do this kind of stuff with ease Now, you can also see here that this is running a good frames per second, around 15 in my browser right now whilst I’m livestreaming at the same time So do bear that in mind However, we can actually fit the back ends to different forms, such as WebAssembly to get more performance on the CPU, or we can stay in WebGL to use my graphics card to get better acceleration on that So it’s up to you what environments you want to execute on OK, back to the slides And next up, we’ve got body segmentation This model can distinguish 24 body areas across multiple bodies, all in real time Now, this is really hard to demo live, as I need more space in my bedroom But notice from the image on the right how the bodies of each person are correctly segmented with different colors for different body parts Now even better, we can get the pose estimation too– those lines in blue– to estimate where the skeleton is, so we can do things like gesture recognition and much more In fact, with this model, we can be super creative, as we’ll see on the next slide So with a little bit of imagination, we can emulate some of the superpowers we were promised from sci-fi movies And I’d like to show you some demos today from both myself and the community to show this in action So first off, invisibility– which is more advanced than simply replacing backgrounds with a static image For that, you wouldn’t even need machine learning But notice how when I go into the bed, the bed still deforms in the image on the right as I move around, or how the laptop screen still plays as I move behind it Now, this prototype uses BodyPix, which we just spoke about in the previous slides, to calculate where the body is not, so it can eventually learn all the background and keep updating parts where it’s safe to do so Now, even better, this whole prototype was made in just one day using our premade model and runs entirely in the browser, meaning for many people you can try it out globally, even without having any machine learning background Simply click a link, and it just works No images are sent to the server for classification, leading to real-time results Or next, what about lasers? Another member of the community from the USA combined his love for WebGL Shaders with TensorFlow.js to enable him to shoot lasers from his eyes and mouth, much like Iron Man in the movies Now, this uses the face mesh model that we just previously spoke about to run in real time in the browser without issue And whilst it’s a fun demo, you can imagine using this for a movie launch to amplify reach with creative experiences for fans and much more Or how about teleportation? Combining TensorFlow.js with other emerging web technologies such as WebRTC for real time communication, AFrame for web-mixed reality, or even Three.js for 3D, we can now create even more beautiful digital experiences such as teleportation to teleport ourselves anywhere in the world in real time And here, you can see me segmenting myself from the bedroom I then transmit my segmentation anywhere in the world and recreate myself in a different physical location with Web XR Now remember, all of this is running in the web browser No app is installed or required to be installed, leading to a frictionless experience for the end user And having tried this myself, it really feels much more personal than a regular video call, as I can walk up to the person, hear the audio from the right area in the correct direction And maybe next time I’m presenting to you, I’ll be able to do so in your own room like this, as if I was really standing in front of you And of course, there are many other delightful creations we can make too, beyond just superpowers How about this clothing size estimator that I created? Here I’ve created a tool that can estimate your clothing size in under 15 seconds in the web browser to automatically select for you the correct size of clothing on a website Now, I don’t know about you, but I can never remember my sizes for clothing And now with this tool, I can simply enter my height, stand facing the camera and once to the side, and it can automatically choose for me the correct size of clothing at checkouts And, of course, this means less returns and less time wasted overall This was created in just two days and can potentially be used by anyone with a single click at the point of checkout on a website And even better, this is entirely running in web browser So user privacy is preserved, as no images are sent to the third-party server for classification

And one more example from the community– here, someone has managed to bring an image of a model from a magazine to life using a combination of Web XR and WebGL Note even with these fancy particle effects and machine learning running, this is actually running on a two-year-old Android device And still, the performance is great The creator of this piece is currently working on making these avatars speak, too So that’s super exciting, and stay tuned for more Now, the second way you can use TensorFlow.js is via transfer learning At some point, you’re going to outgrow the premade models that we’ve created and you want to use your own custom data And transfer learning allows us to take an existing model, use what it’s learned, and then apply similar problems of the same domain to this, such as recognizing a cat instead of a dog using the same image recognition model Now, if you’re familiar with machine learning, you can of course do your best programmatically in code But today, I want to show you two easier ways to get started Now, first up is Teachable Machine Now, this is super easy to use and runs entirely in the web browser, both for training and for inference, which is basically the act of using the model to classify something new Now, the best way to explain this is with a demo So let’s go ahead and try it out OK, so now we can see the demo screen And this is basically on the website called teachablemachine.withgoogle.com And if you go to that site, you’re presented with three options here You can see we’ve got an image project, an audio project, or a pose project Today, we’re going to go and try and do a custom image project So let’s click on that Now we’re presented with the following screen We’re allowed to add different types of classes on the left-hand side So let’s go ahead and give them some more meaningful names Today, I’m going to try and recognize my face So let’s put my name here as Jason Along with a deck of playing cards, so let’s put the name Cards in class 2 And of course, if we want it to recognize more than this, we can add more classes by clicking the Add class at the bottom But now we need to provide some training data So we click on Webcam and allow access, and now we get a live preview from the webcam And here, I can simply add some recordings of my face at different angles and rotations to give it some variety to understand what a Jason face actually looks like So let’s do that right now OK So I’ve got about 30 images of me moving my head around, which is enough for this exercise We’re now going to do the same thing for cards And you can see here I’ve got a deck of playing cards that I’m going to bring to the screen and do a similar thing And it’s important to get roughly the same number of images to avoid any bias So I’ve got 42 versus 38 That’s close enough So now, we click on Train Model And what’s going to happen now is live in the web browser, it’s going to retrain the top layers to attempt to classify the differences between the training data I presented to it And you can see in just under 30 seconds, it’s already come back with a live trained model, and is currently predicting Jason on the output here, as you can see on the bottom right If I bring the cards into view, it says cards And Jason Cards Jason Cards And you can see how fast and responsive that is Now, if this is good enough for what your needs are, you can actually click on Export Model at the top right here, and click on Download And you can download the resulting TensorFlow.js model files that you can then use on any website you wish to then do something useful with So maybe you can control some animation when it spots a cat in your room, or something like this, or send you an alert, or whatever you want it to do It’s completely up to you OK, back to the slides Now, Teachable Machine is great for prototypes But if you want to launch a production model with gigabytes of training data, then maybe Cloud AutoML can be used for this instead It even supports exporting to TensorFlow.js in this example, we can see someone trying to classify flowers All they’ve done is uploaded the folders of flowers to Google Cloud Storage, and then we can move onto the next step of a training process In the next step, we can see how a user can now select if they want to train for higher accuracy or faster prediction times Of course, there’s usually a trade-off between the two So you can select your preference at this point You then simply set your budgets and continue to allow the model to train And at the end, it’ll give you the option to download You can see here, once complete, you can now export to TensorFlow.js as shown Simply download the files, and host on your website or content delivery network You now may be wondering, how hard is it to use this model that you just generated in TensorFlow.js? Well, actually, it’s pretty easy In fact, it’s so easy it fits onto a single slide Let me walk you through this First, at the top, we’ve got two HTML script inputs The first one is for the TensorFlow.js library, and the second one is for the Cloud AutoML library Next, we have an image tag for a new image that we want to classify In this case, I grabbed an image from the internet– a daisy–

but it could be anything Could even be an image from a webcam stream, if you wanted And then finally, we have the actual JavaScript code It’s actually just three lines of JS to do the hard work The first line we simply call await tf.automl.loadIm ageClassification and parse to it the location of our machine naming model that we trained In this case, it’s called model.json and is located in the same directory This is the file we downloaded in the previous step and is simply hosted somewhere on your web server You then use the await keyword here because the model load is asynchronous, meaning that it takes some time to complete This allows us to wait for this to finish before continuing sequentially Now, once a model’s loaded, we can then grab a reference to the image we want to classify using document.getElementById and parse the ID of the image we wish to use In this case, it’s the daisy image, which represents the image tag above, as you can see on the top here Now finally, we can call await model.classify and parse to it the image we want to classify Again, depending on the model, this can take several milliseconds to execute So this always uses the await keyword too And then you’ll get a JSON object return to our predictions constant, which you can then loop through and print the results or do something useful with It should also be noted that you can call model.classify as many times as you like we have different images once the model is loaded, which is how we can achieve webcam detection in real time Now finally, the third way to use TensorFlow.js is to write your own models from a blank canvas Now, of course, to give a tutorial on this would require a whole new talk So today, we’re going to focus on why you might want to consider doing this in JavaScript and the benefits you can get if you choose to do so First, let’s start by expanding the TensorFlow.js architecture We’ve got two APIs We have a high-level APIs known as the Layers API, which is very similar to Keras, if you’re using Python already In fact, if you know Keras, you’ll feel very comfortable with our Layers API Next, we’ve got a low-level API known as the Ops API, which is more mathematical in nature and allows you to do things like linear algebra and so on and so forth This is similar to the original TensorFlow API So let’s see how these come together Here, you can see how our premade model sits upon the Layers API, which itself is sitting above the Core, or Ops, API Now, this lower level API can speak to many different environments, such as the client side, which includes things like the web browser, for example Each one of these environments can execute on a number of different back ends– for example the CPU, which is always available But we’ve also got WebGL for GPU acceleration if it’s supported, or WebAssembly– or WASM, for short– if supported for faster performance on CPUs And there’s a similar story for server side environments via Node.js, too Note here that our Node.js implementation can talk to the same TF CPU and TF GPU bindings that Python TensorFlow talks to So yes, that means you can get the same or better performance as Python for model inference with the same CUDA and AVX support, as you’ll see in a few slides’ time Now, if you do prefer to use Python for your machine learning research, you can still continue to do so using Keras models and TensorFlow SavedModels in Node.js without conversion by loading them directly with our Layers and Ops API, accordingly And this is great as it allows you to integrate with web teams who are highly likely using Node.js, allowing you to then make your model available to the world with the reach and scale of the web And even better, if you want to convert your TensorFlow SaveModel to run in the web browser directly on the client side, you can use our command line converter to do this as long as the ops are supported by TensorFlow.js on our client side implementation Now, this will convert the SavedModel to the JSON format required so the model on the client side can run in the web browser And let’s talk about performance Here, you can see for MobileNet V2 running on GPU and CPU on Python and in Node.js And as you can see, the GPU has a very negligible performance difference between Python and JS In fact, I think it’s less than one millisecond there, so within margin of error, essentially For all intents and purposes, this is basically the same result However, it gets much more interesting when you attempt to consider more than just the inference times Typically, an ML model requires pre- and post-processing code as well And if you convert this code to be in Node.js, we see much faster performance in Node for some architectures as shown on the next slide And here, we can see how Hugging Face– who are very popular for natural language processing models– manage to convert there DistilBERT implementation into a full Node.js one for the pre- and post-processing layers This led to a two times speed boost for full end-to-end results, which

is just one reason you might want to consider using Node.js too And this is made possible due to the just-in-time compiler that JavaScript has to optimize its code at runtime and is a unique feature to JavaScript So now let’s talk about client side superpowers that can actually only be achieved by running in the web browser Now, the first is privacy Inference is performed on the client machine, so that means no data is ever sent to third-party servers, maintaining data privacy for the end user This is particularly important for medical or the legal industries where it might be a requirement not to transfer the user data to third parties And there’s growing concerns around privacy these days So with TensorFlow.js, you get it for free Next up is lower latency As JavaScript has direct access to the sensors on the device, such as the microphone, the cam, accelerometer, and much, much more, there’s no roundtrip time from the server to the client and back again Latency could be close to 100 milliseconds or more if using a mobile connection Assuming 0 latency for processing and inference, the maximum FPS caps out at 10 frames per second if you’re sending images one by one, which is less than ideal With TensorFlow.js running on device, we can go much faster than that, of course Next up is cost If you’re not sending data to the server, then of course we can save on costs on the hired CPU, GPUs, and RAM that you might be needing otherwise on the server side And assuming you need to fire up just 10 high memory machines with a GPU, you can easily be hitting an additional $60,000 per year And as there’s no server, you can just pay for the bottle hosting and website assets, which is far cheaper from running an ML server And then finally, here are some benefits to using Node.js We can use the TensorFlow SavedModel format as discussed without conversion, and of course this is compatible with TensorFlow and Keras models We can run larger models than on the client side I believe there’s some limitations due to GPU memory limits and so on and so forth that limit the upper bounds of the size of modules you can run in the web browser And with Node.js, we can leverage the full power of a server hardware just like Python can It also allows us to code in just one language, and this is great for code reuse across the stack And if your existing developers already know JavaScript– which currently 67% of devs choose to use JavaScript in production, according to the Stack Overflow survey of 2020– that is a great win for you, too And Node.js itself was the most popular choice by developers in that same survey for frameworks and libraries, with over 50% of respondents using it And JavaScript, of course, is the world’s most used language right now There’s a huge community and support for this if you choose to do so And then finally, we have performance It takes the same C-bindings, just like Python has So you’ve got the same CUDA acceleration and AVX supports on the GPU and CPU, respectively And due to the just-in-time compiler in JavaScript, you get that boost if you’re doing a lot of pre and post processing, too, which you wouldn’t get in the Python lang So finally, we’re going to wrap up with some resources for you to get started if you want to continue your TensorFlow.js journey If there’s one slide you should bookmark and share with folk, let it be this one This slide has all the resources you need to get started Our website and API are available on tensorsflow.org/js, our models are also available to use on that same website Now, today, we just covered three of them There was actually many more to be used, too So do go check that out We’re fully open source, so do look on GitHub, and we welcome contributions And for more technical questions, check out our Google Group We’ve also got lots of boilerplate code showing how easy it is to use our premade models in minutes over on Codepen and glitch.com And finally, if you’re looking for an all-in-one book, Deep Learning With JavaScript by Manning Productions was written by folk on our team and takes you from 0 to ML hero in just a few chapters And as long as you know JavaScript, that’s all you need to get started So finally, a quick shout out to our community Do check out the #MadewithTFJS hashtag on Twitter or LinkedIn to see many more amazing examples that I couldn’t fit into the presentation today New content is coming out every week, and it’s a great way to get inspired and learn more about what the community is up to Now, if you are making something with TensorFlow.js yourself, please also use the hashtag for a chance to be featured at our future events and blog posts And finally, the only question left is, what will you make? This final example comes from a community member over in Tokyo, Japan By day, he’s a dancer But he’s managed to use TensorFlow.js to create his amazing hip-hop video with some pretty cool visual effects using BodyPix Now, the reason I show you this is that machine learning really now is for everyone, and I’m super excited to see how TensorFlow.js will enable more people to start their journey with machine learning Creatives artists, musicians– no matter what your background,

you can still use models in ways you’ve never even dreamt up by the model creator And as you saw from just a few of the demos today, we’re super excited to see what you create, too Please do use the #MadeWithTFJS hashtag if you choose to do so, so we can find your work And with that, feel free to stay in touch with me on LinkedIn or Twitter And if you’ve got any further questions, I’m happy to answer them over there Thank you for listening