Statistics, Open Source & ML Research | Python for ML | Interview with Sebastian Raschka

Sanyam Bhutani: Hey, this is Sanyam Bhutani and you’re listening to “Chai Time Data Science” : a podcast for data science enthusiasts, where I interview practitioners, researchers, and Kagglers about their journey, experience, and talk all things about data science Hello, and welcome to another episode of the “Chai Time Data Science” show. In this episode, I interview Dr. Sebastian Raschka, currently an assistant professor of statistics at University of Wisconsin, Madison and the author of Python for machine learning book. Sebastian has a background in biology and holds a PhD in quantitative biology, biochemistry, and molecular biology. In this interview, he talks all about his journey into the intersection of biology, machine learning, statistics, open source, and machine learning research. Yes, these are all of the topics that Sebastian is currently involved in. We also talk about his journey with writing the book. I’m sure you all are familiar with the book and the book Python for machine learning has been rewritten thrice over the past few years, we discuss how the book has evolved. And the latest additions to the version that has just come out, the third edition that is. We also discuss the vast Ian’s current research interests and his research efforts in the areas that he’s currently actively contributing to. We talk about another area that Sebastian’s are also active in, which is open source Sebastian is an assistant professor at UW but he shares advices that apply to any student at university or otherwise in the field of machine learning looking to learn anything. So I’m really excited about this interview. A quick reminder to the non native English speakers. This will have checked subtitles on YouTube. So if you are watching it on YouTube, please remember to enable the subtitles. And the blog version of this interview will later be published on the links that you can find in the description on this podcast if you’d like to check out the blog version later. For now, here’s my interview Dr. Sebastian Raschka. All about machine learning statistics, open source and research. Please enjoy the show Hello, everyone, I’m on the call with one of my favorite teachers, not from university, unfortunately, but through his book, Dr. Sebastian Raschka Thank you so much for joining me on the “Chai Time Data Science” podcast Dr. Sebastian Raschka: Yeah, thank you for the invitation to be on the Chai Time Data Science podcast. I haven’t done a podcast for long time, and I’m maybe a little bit rusty. But I am excited. So yeah, I’m happy to be here Sanyam Bhutani: I’m really excited as well. I’ve held all three versions of your book, which unfortunately, I didn’t finish properly as my homework but I’m really excited to be talking to you Dr. Sebastian Raschka: So you have a homework assignment, right there right? I mean, three, so yeah, I appreciate your support there. Hope you liked them. If you haven’t finished them, I hope it’s not because you didn’t like them It’s more like maybe you were busy Sanyam Bhutani: Certainly the latter. We’ll talk definitely a lot about the book but I want to talk about how you got started You’re currently an assistant professor of statistics and you’re doing many interesting research in the intersection of a few fields that we’ve just talked about. But I want to talk about how you got started. You were doing your base in biology And then you discovered a PhD program by stumbling into a flyer, I believe, during your undergrad days. Where did stats and machine learning start to come into the picture of biology for you? Dr. Sebastian Raschka: Yeah, so that’s an interesting question It’s actually a long time ago, but you’re right, there was a flyer so I was studying in Dusseldorf where I did my undergrad in biology, and there was this exchange program with Michigan State University. And yeah, you know, Germany is a small country. At some point, you’ve pretty much seen everything I thought. Infact interesting exciting opportunity to just see something different So I went to Michigan State and we could pick our own courses So I was back then already dabbling with uh, I was taking bioinformatics classes or Pearl and stuff like that. Yeah, and then I took a statistics class was an advanced class. Molecular evolution class like only

computation focused and where you basically just also use statistical techniques. I’ve been really interested in statistics and then I joined Michigan State University of the PhD program later. And during my first semester, I took statistical data classification of class in computer science And that got me hooked basically. So there was basically a Bayesian statistics Mostly, we focus on Bayes Optimal Classifier as my space and all everything revolving around the base theorem. But then I based on that, I mean, this is just a small picture of the machine learning field. I also took a data mining class and there was also the time when I started writing my blog, got super excited about everything And yeah, I was just getting hooked back then. And I could also see how that applies to my work and kept studying and that’s, that’s basically how I ended up in machine learning, I guess Sanyam Bhutani: And I believe this is sort of a unique situation because maybe I have the wrong friends but my friends from biology usually completed distant from statistics and programming at all. What led you to become passionate about these things specifically? Philipp Singer: Um, I think it was more like I always like to create or build things or tinker with things. It’s maybe a little bit sidetrack here, but I was always as a kid into computers and video games and stuff like that. This is like really a long time ago, but there was for example, on the video game modding scene and stuff like that. It was before online gaming was, internet was new. It was super exciting. And you shared your you had a level editor, you were making levels, and I even had a web server for a certain computer game back then. I like to tinker around with things and programming is kind of like that. So you it’s kind of freeform. It’s like, you have a canvas and you can create pretty much everything. And that is what I find exciting. I was never like a person about informatics where I just don’t know how to use a tool. I usually, maybe it was sometimes making myself making my life harder. But I was trying to do things myself if I had to, for example, analyze the DNA sequence, I would just write my Python script to give me the statistics of the different amino of the different bases like adenosine and guanine and stuff like that distribution of that and stuff like that. But this is a long time ago, I don’t work with much political data these days anymore. So I’m more like in statistics focused on machine learning and deep learning, not just a nice application areas that have some collaborations with colleagues in that field, so Sanyam Bhutani: Interesting. And I believe in your undergrad itself, we’ll talk more about this, you were even exposed to open source contributions, which again, you really enjoyed. What made you pick research over industry because you were also oriented to the industry as well in terms of coding practices and open source contributions? Why go ahead with the research? Um that is an awesome, interesting question. So maybe that is a good question. I would also say it’s maybe related to why I like coding, I like my freedom. So in academia, it’s also the way you structure your day, structure your research, it’s, for me it’s a lot of responsibility. But I also like being flexible in terms of what I like to work on and how I like to work on things. But also to be honest, I mean, it’s not really among other people are listening. But I was kind of (with teaching), in my last semester, I was really busy, was getting all the papers out and writing the thesis and just getting ready for the PhD defense and I honestly, I was a little bit procrastinating there regarding job application site And single one. So I thought my strategy was maybe I wouldn’t recommend it to anyone, but my strategy was just let’s finish the PhD first and then I will take maybe a one two months break and we’ll think about things what I want to do next because I like you said, I mean, this is like, there are the pros and cons in both like industry and academia is, I wouldn’t say one is universally better than the other. It’s like a trade off and both both fields is a trade off. So what’s my cave? Let me finish one thing first, and then I will just focus on making a decision while just think about things early on, there was there were some small opportunities I had back then where companies reached out, started the interviews with Google. But at this point, I really didn’t know where I want to be at that point. I really wanted to take a break first. But then my pitch defense was like early December, and one professor here from my current department sent me an email, hey, we have openings here, why don’t you apply? He will be excited to invite you for giving a talk and basically interview you. And I thought, huh, yeah, why not? So this is basically was my only job application sent. So I got kind of lucky that I was invited to give an interview for the interview here to give a talk And I really liked my current colleagues. I liked the department. They liked me apparently. So that’s one thing actually other basically and no, this was really exciting because in my first year, I was designing two new courses or

machine learning course a deep learning course. And right now I’m really doing what I really love. So I teach and do research both and this is perfect for me, I would say that’s a good fit. I was lucky in that way Awesome. I really envy your students who get to study from you in person rather than us who have to rely on your book. But we’re fortunate that we have it What does a day in your life currently look like? Because I believe you juggling multiple roles as well. Do you have Chai and Code and then you have to teach a class? Or how do you also maintain this balance of all these areas that you will come across? Dr. Sebastian Raschka: Yeah, so I this is a very good point This is yeah, you have a lot of things on your plate. Basically, you have to teach, you want to do your research, but also your students that also help you with research or you help them with research is kind of symbiosis basically. But also can be a little bit much sometimes. So you have to find a balance at some point, for me, I maybe that’s my German gene, I’m kind of I try to be organized. So I have a daily to do list. And every Sunday, basically, I do a review of what I want to do this month and the upcoming week and then try to find a balance between teaching,research and spending time with my students but also doing some coding. So because my classes are very, I mean, they are theory concepts, but they also involve practical portions. So in that way, I also want to keep up to date with current technology basic and I don’t want to be a dinosaur at some point, teaching from the last decade or something like that. I mean, nothing against my grad school experience. But I remember too well writing things in MATLAB and I don’t want to be at that point where, you know, the similar situation with you So I try to find a balance of staying up to date with coding stuff and technology. I’m doing my teaching basically. Which is two times a week currently. So it’s basically my Monday is preparing for Tuesday and my Wednesday is preparing for Thursday. But except that I try to do as much research as possible and meet with my students attending seminars and talking to colleagues. It’s like a, it’s a trade off, you can’t do everything that, the day only has so many hours. But yeah, it’s also because it’s such a mix. It’s never boring. So it’s always something new, something interesting Sanyam Bhutani: Definitely Dr. Sebastian Raschka: I wouldn’t say I have the super great tip for balance or something like that. But planning ahead definitely helps like writing things down making a plan where I want to be at the end of the year, what do I want to accomplish? And then kind of focusing on the things that are related to your goals basically Sanyam Bhutani: I think there’s no secret. It’s just planning and solid discipline, like you said. Talking about; Dr. Sebastian Raschka: Hm Sanyam Bhutani: Sorry. Talking about research. So I haven’t worked in a certain facility, but I believe research is about asking the right questions that lead you to working on problem Can you tell us how does a research pipeline for you look like? How do you approach new ideas? And what questions if I may, you ask while working on a new problem? Dr. Sebastian Raschka: Yeah, that is also a good question Also, here, there’s maybe no golden rule. But I think there are multiple ways you can identify an interesting problem to work on or find some ideas So one would be basically reading papers, and then maybe noting something that doesn’t make quite sense. Or you get ideas, for example, for improving something. I mean, traditionally, scientific progress. I mean, deep learning is a fast moving field, you have a lot of new things, but traditionally, also, I mean, science is like a continuous progress where you take something and improve it over multiple stages. So in that way, just also seeing what other people do. I mean, I wouldn’t do that too much, because then you are kind of kind of a little bit limited on what what your horizon is basically. So but yeah, definitely reading other people’s work helps. So for example, when I was working on this new network architecture for ordinal regression, that was basically inspired by the fact that there was a collaboration where we had auditor labeled. So I was thinking what methods can we use instead of just a classification or regression method, regression method what is out there for deep learning? I was looking at literature and then I read a paper basically where they presented such a method but then while reading this I noticed okay, there is this method works really well so there’s no criticism here but it’s um, it could be improved because it didn’t have a rank consistency basically. So I don’t want to go into too much detail in there. But then I thought okay, we can have basically an improvement that has this rank consistency and that that basically to a research project, and then also when design research projects, you always have in the back of your mind your other research projects. You I mean, you never finish really, a research project is also a progress of process where you start working on something and you get some

exciting results, and at some point you write them up and share them. But there’s always more to be done, always something that you can improve I mean, this is like, it’s kind of an endless process. It’s an infinity in a way. So in that way, this project, for example, I use the data set of face recognition data set of image data set of faces, where we had age labels, which can be thought of an ordinal regression problem if it’s a non stationary progress. So if you think about it, let’s say the difference between five and six year old is different. So the H, it’s a between five and seven. You notice more than that the age difference between 70 and 72, for example, where maybe only yours you get maybe a few more wrinkles, but not really noticeably. So in that way, it’s more like as non stationary process and that is basically something where you can maybe apply autoregression and face images were then related to another research area of mine where we were working on protecting privacy and face images. So That way always also try to find when you design a research project, some connection so that you don’t just do some random thing just because it’s the cool idea of the day basically. So just to maintain some focus also, that’s also I think, important not to get sidetracked when designing the research project. But um, regarding a question, the research pipeline how to approach the problem, really, if you have it, I try to write down as much as possible, brainstorm ideas, but then also talk to people Sanyam Bhutani: Okay Dr. Sebastian Raschka: Uh, to my students. First of all, the project must be a good fit for my students also, because otherwise, it’s maybe it’s not interesting. Student wouldn’t be motivated or it’s maybe not the direction the student wants to go. So it has to be a mutual interest that we both want to work on this basically. For example, I want him to carry specializes more in transformer architectures, but another student focuses more on graph neural networks. So in that way, I also try to see that this is basically this project is aligned to what the student is interested in and what the student has worked on also before. I mean, of course, it’s always good to learn something new. But there should also be a balance that you don’t want to overwhelm basically students because it’s also not motivating if you work on something, and you never get results. So you need you want to also design the research project, it’s also very important point in a way that you can have a sense of incremental progress. You don’t want to pick a project that is very ambitious. And then you maybe never get there, at least But in the next five years, you want to have something where you have like checkpoints, because these checkpoints are basically your conference papers of updates and talks you may give, because if you work on something, and in you don’t have anything to present for the next five years, it’s I think, very frustrating. It can be frustrating. So in that way, designing a project that you solve it in multiple steps basis, I think also helpful Sanyam Bhutani: It’s similar to how MIT I think the professors at MIT sat down and decided to solve computer vision over a summer in the 1940s, over solving maybe object detection over certain objects, for example Dr. Sebastian Raschka: Yeah, this is also a good point, it’s like, summer is the productive time when you don’t teach. So it’s also, sometimes I really leave my most ambitious projects for the summer, where I can really spent a few days uninterrupted, so that it’s also sometimes helpful to really have a chunk of uninterrupted time, so Sanyam Bhutani: So another, on the flip side is another question that when you have this uninterrupted time, you might continue exploring, or you might have to put brakes on the project because at least in machine learning, nothing works until it does no model works until it perfectly does. How do you decide whether you want to continue exploring? Or maybe it’s an idea it’s a it’s a time to put an end to the experiment? Dr. Sebastian Raschka: I maybe I have a hard time sometimes when I already spend a lot of hours on a project to say okay, maybe that’s not going anywhere. But there were times like that, for example, in the past, when we designed, we designed a new app for pricing for Facebook, related to face recognition, but we wanted to protect the privacy of images. And working on this project, I noticed basically that there was an imbalance in terms of certain groups that were not represented well in the data set. And that led to higher areas for example, for individuals with a darker skin color, for example. And we worked on that investigating that for different commercial classifiers and also open source versions but then I mean, we saw at some point I have this Google scholar recommendation stone on there was a paper recommended to me which was basically called Jenna sheets, paper where they did exactly that. And so then that way, there was no point in waiting to continue the spacey so that way, we didn’t spend too much time but I mean, luckily someone I mean, I would say fortunately, and unfortunately,

because fortunately, someone also cares about this project So we know more than, you know, before, because someone already looked into this, unfortunately, because I had a student working on this in this room was of course little bit disappointed because some work was basically for nothing but then the student worked on something else. And it was still a useful experience by just working on this, you also learn things and so it’s never completely wasted, because you always learn something. And it’s kind of part of the training basically. On the other hand, it’s sometimes important to say, hey, maybe, maybe we should work on something else, because that is maybe not the best investment for my time. But yeah, it’s, it’s hard. Sometimes it’s because if you spend hours, you also want to somehow finish it because it feels like otherwise you wasted your time. But it’s not the only thing Sanyam Bhutani: I think with intuition, maybe you start to get an idea of what areas to look for, even in terms of literature and then decide about the project Dr. Sebastian Raschka: Yeah, so literature is a big part of this. It’s like, you also, you want to work on something no one else is working on but you also want to have some, some some framework basically, you don’t want to do something crazy. No one cares about or yeah Sanyam Bhutani: Yeah. Talking about training there’s also this, actually the message that I try to get across to the audience, but there is this misconception that the researchers always have to have a lot of graphic cards. Maybe you’re hiding graphic cards on other end of the camera. Is that true? Can you confirm or deny that belief? Dr. Sebastian Raschka: So yeah, GPUs. I think for deep learning work are important, but they are not everything. But one of my students. I think he just made this up. But he said, a famous researcher once said you have to have at least eight GPUs to be productive Sanyam Bhutani: Hehehe Dr. Sebastian Raschka: I think that was just his way of saying, hey, I need my eight GPUs all the time Sanyam Bhutani: Hehehe Dr. Sebastian Raschka: So he has eight GPU’s he’s doing a lot of product tuning with that. I think so, I’ve assumed one of my students on- only uses one GPU, it’s enough because it depends on what you work on. I have also my own GPUs, I use maybe two to four at a time, both for teaching and for experimenting with things. But if you, it really depends on what you’re working on, if it’s something where you so the way what’s helpful is, is basically that in deep learning, there’s no really principled way of no finding out what happened without choices you should use or all the different architectures you want to explore that may be candidates for for your new network designer, even just comparing your method to related methods basically, all that takes time basically, computing time. So usually, I would say the more the merrier. But of course, also, at some point it takes, I mean, you need to also time, you have to write their own things. You have to write your paper at time. It’s always a trade off. If you have too many, maybe you finish your experiments, but you didn’t even have time to think about what you want to work on. So that way, it happened, you run things you’ve never used them. So in that way, there’s a balance. So a certain number is good. So you can run things you want to run But at that point, they almost maybe at some point, you may have too many. So you run thing, you have idle GPU. So it’s also a balancing act if you have a limited budget, how much you should invest in new hardware And so we are fortunate at UW Madison so we have a system called condo basically, which is connecting all the computers on campus Sanyam Bhutani: Okay Dr. Sebastian Raschka: Um, you can use that for CPU. So we have basically a giant, general enormous CPU resource there because all the campus computers are connected. It’s mostly like a ram cluster, but including not only the data science machines as well as the data center machines, but all computers also personal desktop computers, if they are items so they can also be utilized, but GPUs are I mean, GPUs, they are just starting to announce, so we don’t have that many yet. But we last year got a grant approved to also add more GPUs to this So we have, I think this year already, they’ve bought like 32 new GPUs. So in that way, it’s also becoming easier for students to get resources, of course, so it will take some time and in my classes, I use a mix between local GPUs and also using Google color and the Kaggle kernels, which are great They’re great tools for students, which are free can only use maybe one GPU at a time. But that mean for learning, it’s usually already or a very helpful to have at least a bunch of you Sanyam Bhutani: I think, even for people who just want to maybe get a taste of the field, they can go ahead and check out collab, Kaggle kernels, even the free resources that I hope GCP continues to provide and they’ll get a taste and then maybe decide if they really want to continue being frustrated or maybe invest some money into an expensive graphic card Dr. Sebastian Raschka: Yeah, and also one thing I have to think

about when you’re teaching. So the class I’m teaching many students are also new to computing basically. And I mean slurm. And Linux is not super complicated, but it takes you at least. So I don’t want to overwhelm my students by Hey, you have to set up a or even a AWS instance, it’s not trivial to get up if you have never worked with Linux. So that way Google call up is awesome, really appreciate it. Because you can just go online and your web browser have a Jupiter notebook with a GPU connected and you can just start working, because this really lowers the friction point people have when you start with deep learning, because the running already is a lot of stuff to learn about. And then if you have to worry about computers and how to set up your Linux stuff, it can easily get frustrating. So in that way, it’s at least one barrier less if you can use something like Google call up Sanyam Bhutani: I like to joke about it as if once you own the hardware, you have to take the secret oath of work, oath of walking through fire with setting up CUDA drivers which are always annoying. You tend to mess up something and I think that’s also barrier, like you said Dr. Sebastian Raschka: Mhm Sanyam Bhutani: Now, coming back to your current research, can you tell us more about your research you’re working on? You still very active in the intersection of privacy and semi admissible networks? Can you tell us more about why are these interesting to you and more about the field? Dr. Sebastian Raschka: So yeah, the privacy unseen adversarial network. So that was a series of papers basically, that started with a friend of mine by V Mirjalili, so he was we were both students at Michigan State University and he was developing a method with us Professor, based on face swapping if I remember correctly. So in a way back then we talked about, so deep learning methods that we can use to achieve maybe better results. So the challenge is what what our goal was basically to protect privacy in images in that context. We started with hiding a certain attribute, face attribute and so here what we did is basically hiding, trying to hide the gender. Because, for instance, I mean data collection on the internet, it’s very prevalent nowadays, you can just download certain information on people or people that you’re maybe not supposed to. It’s maybe it’s private. So in that way, we just also wanted to develop a general method that you can retain the utility of your data, let’s say the face image for face recognition, it let’s say, biometric scan, airport scanners and so forth But you shouldn’t be allowed to extract more information than you’re supposed to. For example, even if you have a security camera that you can have still the security footage, you had maybe some incident you can recognize who that is in the picture. But you can’t maybe do a large scale study of different people who, let’s say the distribution of gender, visiting your story and things like that So in that way, we designed to simulate your network with two goals in mind, basically retaining face recognition accuracy, while hiding the genetic code. It’s basically a kind of a constraint optimization problem. For that we started very simple using simple autoencoder. We did. So that was the idea a little bit earlier before even generative adversarial networks. But so the first idea when we had that was a little bit before that. So but later we recognized, which was very networks could be even more useful in that way. And the last paper we had was basically using again, with the cycle consistency, but yeah, going back to this idea, we had this autoencoder, giving the autoencoder face image. And then attached to it, we had gender classify and to face recognition. Both were in that it could be anything but we had used commercial networks for all of the modules, sub networks And when you basically feed the image to the autoencoder, it will try to reconstruct image but then we add a constraint to that. The way that the gender classifier prediction should be flipped. So that’s basically at the circle. And the not adversary goal is retaining the face matching accuracy, which is why we call it semi adversarial So it’s half of the circle, basically Sanyam Bhutani: Okay Dr. Sebastian Raschka: Yeah. And we have kind of series of papers on that. There’s basically improving it also incrementally, like we talked about, it’s basically you have this idea, you get it to work, it works. So we had kind of proposed a method offers paper, but then we knew okay, we can actually improve this by many different ways. But what if you would basically do all of them at once it would be a multi year project and it would be a 50 page long paper that was great. So we kind of did this incrementally. So the first paper was the basic idea basically Sanyam Bhutani: Okay Dr. Sebastian Raschka: The second one, we improve the issues like imbalance, data set imbalance, because for example, people from certain subpopulations were over or under represented and the third one was basically idea where we basically connect on different modules of these because every

module adds a certain number certain perturbation to the image. And we can basically control how much we can disrupt image by having multiple these same effects or networks in a sequence. Because basically you can, what you get is basically you have to input as a face image and then your output is a modified face image, you can you can take this modified face image and give it to another senior officer network to add some other perturbation on top of the faces. So you can modify it step by step. And the third one was basically extending this idea. So not to only hide general information, but also to hide race information and age information basically so doing three things at once. And that was also with a cycle consistency. And it is also selective framework. So you can say I want to, let’s say hide the gender, but I don’t want to hide the age or I want to hit both at the same time and so forth. So it was more like a little bit more, more complex in that way. So a little bit more general also in the way that you can target specific attributes and you had the psychokinesis since we borrowed this idea from the CycleGAN, paper that made also the results look much better basically Sanyam Bhutani: Okay Dr. Sebastian Raschka: Number one area of research, right? Yeah, so Sanyam Bhutani: That was a high level overview. But in case anyone wants to take those papers out, I’ll have those linked in the description of this podcast if you want to read them. Now, another aspect that I think not many researchers, I wish they were active in is open source, you are always kind enough to share open source implementations also for research, and also cause you’ve been contributing to it outside And I think you discovered your passion about this during your undergraduate days, can you tell us about your open source then you have contributed to Scikit-Learn TensorFlow and even Pytorch Dr. Sebastian Raschka: Yeah, so um, that started basically, all of passion. I like coding at some point I discovered, hey, I’m using open source software And it might be useful if I find some issue with it, if I not only fix it on my computer, but also share the solution with others so that everyone can benefit from it. And I have to say, in the early days when I contribute to the second round, I really learned a lot because people working on secular and they were really good coders they had like, what I really liked about this is oscillators very rigorous style expectation that you can’t just upload any code, you have to have the documentation alongside with it and unit tests, and also stylize it has to be good code. And this was really inspiring, I would say, it’s like, that inspired me really to do a proper job when coding pissy. I mean, I mean, there are days when I don’t know I have to sort some files or rename some parts of my computer. I don’t read documentation for little things like that. But if I write research code, I always try to document it and also have good practices because even though I may not use it everyday or share it with other people right now, there may be a time next two months to work on some project I can give the student my old code and it’s the readable. And in that way, open source really encouraged also good habits because it’s like, even if you if you write personal notes, if you have, let’s say a diary, you maybe don’t care as much about your handwriting. But if you put an article online where many people can see it, you want to make sure that it’s also legible and the arrows in there, the same way with code, I think open source, but also what I liked about it is basically it’s bringing the best of the best it makes you do good job. And it’s also very satisfying when you hear people will find out what people who find it useful and give you feedback. That’s also another aspect of when I contributed or had my own open source libraries. Other people were using the code, and they had ideas I didn’t have and they improve the code a lot. And they pointed maybe arrows and made and I learned so much just by interacting with people use my code basically. So one recent example I have this emmonak stem library, which basically just started about, it was just seriously a loose collection of things related to machine learning. It’s also why it’s called ML extended machining extensions. I mean, it wasn’t maybe the best name, but I just wanted to; Sanyam Bhutani: Hehehehe Dr. Sebastian Raschka: It was, I just wanted to have a guitar project basics. So I just picked the name. And then I just, every time I needed some new functionality that was not implemented elsewhere, I implemented it and edit it to that. So, for example, for one research project, I was needed doing some frequent pattern mining, Sophie Preqin itemsets, and things like that are using the a priori algorithm first to find frequent itemsets. And I couldn’t find a good Python solution, there was this C++ package, but it was not, what I wanted was a command line tool So I wanted something where I can have the output in the pandas data frame for the (the project). So I brought my own apriori algorithm and put it into an extent. And this is a more I would say now the most vital used things and accept the frequent data mining things. But

I only uploaded the very rudimentary, rudimentary, but it was a working version, the apiori version. And then other people use the code and made it much faster. They saw, okay, why are you doing this? You can generate the subsets more like the combinatorial search more efficiently using this and that And so they kind of improved the code. And it was amazing. I mean, it was like growing now, there are so many different I mean, not that many but at least three different frequent item set mining algorithms like FP roles, FP Max and stuff like that, and it’s much faster than before and it’s only because other people contributed I don’t I wouldn’t be able to do that and may not stop engineering that sense that I, I mean, I, it would take a considerable amount of effort. And this way I really learned not just by having other people doing the contributions and pointing things out. And not it’s kind of amazing. It’s also it helps you learn basically because these are things you don’t learn from textbook basically. I mean, piano, there’s not a textbook who will explain to you how to take this algorithm and make it faster in general. It’s like it’s very useful to be I think involved in open source of course, there’s always a balance you also have other responsibilities but it’s like a nice freetime activity I think Sanyam Bhutani: You’re still actively contributing to ML extend another project of yours bio pandas, I believe that you still active on. Can you tell us why are you still active on this? And maybe a little bit more about bio pandas? Dr. Sebastian Raschka: Bio pandas, right. There, I’m not that active anymore, because it is, um, I would say, I wouldn’t say it’s finished, but it was maybe a smaller idea. So the idea was basically so I did back then a lot of small molecule screening that is basically a relative practice covery of a small molecule. And you want to find similar molecules and do some screening and compare molecules and compute statistics about the molecules. And there were so many different tools that you can use to compute certain things about molecule, a molecule is basically a, you can think of it as a graph. It’s basically a atoms that are connected by bonds to come However, when you don’t know what these files are, there’s, for example, one firm called mo to. And for proteins, proteins, a larger molecule, you can think of it as a large molecule peptide peptide. That started to major a or file for proteins as a PDB, the protein databank file, and which is basically on a text file. It was basically developed for Fortran back in the day, so it’s kind of based also. So you have the coordinates of each item and the item type. That one, you don’t even have one information, just the coordinates and the item types. So it’s just a list of these things. And it would be so there are many, many different tools that let you read that in and have their special API to analyze and parse that. And honestly, font is a little bit cumbersome to learn all these different API’s of the specialized tools. So I thought maybe convenient to just have a pen a state of it because it’s just the table. It’s just the coordinates. And there’s item type. So why not having a pen a stable and then I don’t have to learn an API. If I want to count the number of certain items, I just use some in tennis, for example, or I can compute other statistics with it. Basically, just using Excel, tennis or doing a selection, basically, the powerful selections in text So bio pandas was really just inspired by the fact making my life easier to develop basically, it was basically a tool that takes this data file and converts into a data frame, in a panda’s data frame, which is why it’s called bio pandas, hehe Sanyam Bhutani: Makes sense Dr. Sebastian Raschka: So again, the same same thing with a small molecules for the more two files. And I used this pretty heavily in one project screener where we screened 15 million molecules. And I also added multi processing to it and things like that. But bio pandas was basically the core engine for reading in these files, basically. So in that way, I would, I wouldn’t say it’s finished. Nothing’s really ever finished. But it is basically you can add more functionality But the goal was really just being able to use pandas API on molecule data basically Sanyam Bhutani: Awesome. I’ll again, have both the GitHub links in the description of the podcast in case anyone’s anyone wants to check them out or maybe add any functions. But talking more about open source, you’ve also contributed to TensorFlow, also in a book format. Now, the book is also updated to TensorFlow 2.0. Can you tell us more about the book? How did that journey start for you? And what led you to writing the book? Dr. Sebastian Raschka: Yeah, so that is just so, good question The TensorFlow part came a little bit later. So the first edition, it didn’t have any TensorFlow, when I remember correctly, it’s already four years ago, I had a very brief section on psionic then, which I use, um, but yeah, so the book was mainly, it started as a Python machine learning. And also back then I think it was even called Python machine learning essentials. So I added 250 pages. So I had to keep

everything very short, I can remember I was writing chapters and it was like 30 pages and and send them to the publisher. And they say ahuh just if you have 12 chapters, you can’t have 30 pages per chapter. So I had to keep everything pretty short. But then the reviewers really liked the book, and also the publisher really liked the book. So they dropped the essentials from the title, and then I could really expand it. And then, but then, of course, the there was not much time left for the first edition. So the first edition was focusing on machine learning and Scikit-learn. And then later, we added basically, deep learning and TensorFlow and this is where V Mirjalili also helped me a lot. So because I was already, that was 2017. It was kind of late in my PhD, I was also working on other projects So we both worked on it together, basically on the deep learning chapters on the TensorFlow chapters, basically Sanyam Bhutani: Okay Dr. Sebastian Raschka: And then the last year, last summer, we started working on the third edition, which was basically taking to the TensorFlow 2.0 Basically, just it was a large rewrite. All the chapters that were related to expanding TensorFlow had to be rewritten Then we had the CNN and RNN in chapters, we could still use the general explanations but TensorFlow 2 changed a lot. I mean, there are a lot of little things that changed. But also just the whole paradigm using a dynamic graphs now make everything a little bit different. And then, of course, the new chapters that we had about cans and reinforcement learning also, which was pretty excited. Because yeah, reinforcement learning is a hot research field right now, we didn’t ever come out reinforcement learning. And people always said, hey, you mentioned reinforcement learning in the introduction, but you never talked about it in the book. So finally wehad a chance to also expand it. Like I said before, it was a little bit due to page and time constraints that we couldn’t do everything But doing this over the year was been really cool that we can could finally go where we wanted to go with that. So basically having a chapter on reinforcement learning and GANs also Sanyam Bhutani: I think this was at the point when you were already an active blogger. And you were really enjoying blogging. Did you see a gap maybe in the resources, which led you to writing this book? Or is there a story? Dr. Sebastian Raschka: That’s a very good point, I think one led to the other. So I started with basically blogging. But you may also notice that I’m not blogging that much anymore Because the book is basically an extension of the blog. Now it’s like, I think that’s maybe why the publisher contacted me they liked maybe maybe my blog articles and asked me to write a book. And for me, I thought I was thinking about this, but I thought it might be a good opportunity to have to basically do what you do in the blog, but to have a book where you can reference I mean, each chapter can be it can be a sequence for the blog, you have an article here and an article there. And they are more like this joint, you can maybe write a series of blog posts, but then it is basically a book basically. So it was basically an opportunity for me to write about everything in a more structured way. And having a sequence instead of doing a random blog post here and there to have this book Some more complete source basically, this is also where I initially wanted to go with the blog. I basically had a I think I had an article like introduction to a single layer neural networks, which ended up being the second chapter of my book, then the perceptron. And adaptive linear neuron basically. And also the PCH chapter in the book is inspired by the blog post, basically so it’s all kind of connected in that way Sanyam Bhutani: Did you expect the book to become this famous if I may famous, or is it still behind on your vision? Dr. Sebastian Raschka: Honestly, I didn’t expect that at all. I was right. I wrote a book on heatmaps in R that was in 2013 And I don’t know maybe 100 people or 12 people on it. So that was not a very popular book. And I expected something about the same for this book, too. So it was I didn’t write this because I wanted I wanted it to be popular, I was more like, as a student it was, I was just excited to write and this was a good opportunity I thought because having someone also reviewing your work and helping you putting it together this in one framework, basically, I thought that might might be a good opportunity, but I never expected that people would like it that much. And it’s really nice to see Sanyam Bhutani: Now, the book has been rewritten thrice, I think, which also represents the pace at which our field grows really fast. It also took a lot of efforts. Can you tell us what’s latest in the third edition? And what are the exciting updates in the; Dr. Sebastian Raschka: I hinted at this a little bit, like a few moments ago, but let me start again, because I thought I was going a little bit on tangents here. So yeah, what we basically did is we took all the first 11 chapter, chapter 10, was plus clustering, the one after until the one until the first chapter into TensorFlow, not much to change there, but there was a lot of reader feedback, a lot of questions I got by email, little things that I maybe didn’t explain well, so I went back and

kind of updated all the explanations where things were unclear. So you won’t maybe notice a big change, but it is like a little bit polished, I would say like making it smoother, and also adjusting everything to the latest version of Scikit-Learn. Make sure everything still works. But then the other real change happened after in the first TensorFlow chapter where we basically rewrote the first two chapters on TensorFlow, it was TensorFlow 2.0 in mind. So the first one was more like a general introduction to deep learning or, or deep learning libraries And the second one was on TensorFlow was more like the mechanics like a little bit more in detail. So that had to be almost 90% rewritten basically, I think we may be kept some of the sub sub header had us but except that all the content and code had to change. So that was basically I would say, most significant update, and then the scene and an art and chapters also good an update because we have now the different version of sentence about which changed a lot of things, especially around the data loading. And then the next chapter was the GaNS chapter where we basically brought a new chapter from scratch also highlighting GaNS Now we also over the last GaNS, Wasserstein Gan. But we didn’t go into too much detail in terms of all the different ganic, ganic architecture because this, I think, a GitHub repository that has like 200 or 300 different GaNS, so we only focus on the fully connected one, conditional GaNS and the Wasserstein GaNS. And then in the next chapter of the reinforcement learning base, which is maybe the most exciting one because it’s really new It’s something I felt was missing in all the previous versions and people really wanted they said, hey, you have as an introduction, why, why is there no chapter on reinforcement learning? Personally, I my research is not on reinforcement learning. So that was also asked for the majority of the work, a lot of work to get good examples to run. So it is not easy to try and reinforcement or lettering agents. But I hope we did a good job there. And yeah, that’s maybe my highlight, I would say Sanyam Bhutani: How did how did you decide on the topics that you wanted to include? Because I’m sure there’s this huge number of things happening in machine learning. How did you narrow down your focus on maybe skipping a few things, deciding what makes it to the book? Dr. Sebastian Raschka: Yeah, there is. It’s really hard to say no to certain topics. So but I thought, I mean, our ends and scene ends at least having a deep learning chapter on images for one on analyzing text. The GaNS is very interesting, because like I mentioned these privacy related projects were involving autoencoders and then GaNS. So that is also something I wanted to have in there. And also a lot of people work nowadays with GaNS, it’s across different application and research areas. So that is a, I thought something, even if you may not, as a reader may use GaNS yourself, I think it’s good to know about it. And then yeah, the reinforcement learning was basically first I was excited about myself, because I’ve never really worked with it before, like an application. And then also because people were also really interested in that. So I thought it’s just doing both of us a favor. Basically, I get to do something new. And also, the readers basically get what they wanted, so Sanyam Bhutani: Okay. Now, coming to focus of the book, which is on TensorFlow 2.0. And I noticed the recent I believe, research papers are in the Pytorch area. And many, especially students struggle with this question, should I learn Pytorch? Should I learn TensorFlow, should I learn statistics? Do you have any insights on what package should they use? Do you punish your students if they don’t use TensorFlow 2.0? Dr. Sebastian Raschka: So, yeah, that’s a very good interesting and a really sensitive question Yeah, so my research students, they use a little of both. I think, knowing both and being able to use both is really helpful. Because if you read research papers, today, maybe a 50-50% chance that it’s either one. And if you want to compare your methods with other methods, you need to run both basically So and usually it’s not as easy as downloading a package and then running it, usually you have to change a little bit of make some tweaks, modifications, maybe even part of the architecture to your language So in that way, you have to be able to kind of understand what’s going on. And also what’s interesting, so yeah, like you mentioned, the book is TensorFlow 2.0 Sanyam Bhutani: Yes Dr. Sebastian Raschka: My research, I also use Pytorch. In my classes, I currently use Pytorch. But it’s like, I mean, there’s advantages and disadvantages on both sides. But what I noticed especially both are kind of converging. So it’s kind of interesting to see is that TensorFlow started with a study graph paradigm and then people wanted more like the

Dynamic Loss because it’s easier to the back easier to work with and more familiar with you come from NumPy So, they added this TF Eager mode which is now making it also more dynamic graph like PyTorch. In PyTorch however, people would like Pytorch they were saying okay this is not so good for production, how do I export my thing into C++? So, I can put it on my mobile device and whatever people do. So, that way they can have also the the basic numbers because they end up know the quantization to make things more efficient the mobile stuff and also touch script which can kind of convert if I get this right, the Pytorch code into a kind of intermediate representation and then that can be exported to C++ basically Sanyam Bhutani: Yeah Dr. Sebastian Raschka: So now Caffe 2 was also known as now part of Pytorch basically under the hood. So in that way, Pytorch got also static graph capabilities so that you can export a model to static representation. And TensorFlow got this eager mode basically so they kind of converge to the same thing. So right now, I think it’s more like a matter of preference. They have both slightly different API’s. But they’re also both user friendly now. So for TensorFlow yourself, the TF.Keras() now, and TensorFlow 2.0, before it was compact module nodes, like an official API and TensorFlow, so also that is easier to use. So right now, I wouldn’t say there’s really a protocol on both sides. And because this is like our versus Python is like a hot debate always. And I think using or knowing both was useful, because you can see both are kind of equally used in research. And you have to kind of then the opportunity to make the best of both worlds basically, Sanyam Bhutani: I think one thing that also most students miss out on this and I’m sorry, I haven’t completed the book, but in your book and not taking aim at any other books, but you try to teach the concepts and then shows how it’s done in TF rather than here’s how you use an API. So if you really knew the concept it’s also really easy to pick up another framework, assuming they’re well documented Dr. Sebastian Raschka: No, this is absolutely right. I think because also PyTorch, it’s kind of like NumPy, you can, you have basically this obtained support to you have like this template library under the hood, it just add an API on top of it, which makes it convenient. But it’s helpful to understand what’s going on under the hood. And then you can learn a little bit specific API characters a little bit different than Pytorch. But under the hood, if you know, basically all the building blocks, how convolution layers work and things like that, it’s super helpful to just understand the whole concept, basically Sanyam Bhutani: Yeah. Now coming to another topic that I personally am always interested in learning about learning. I believe your book is and you are definitely an example of a person who’s putting out updated resources on learning, which isn’t the case with most books But you’re also an active teacher at UW, what are your suggestions for future students who maybe aren’t fortunate enough to take a course at a university and using the, using your book or online courses to fill the gaps? Dr. Sebastian Raschka: I think, this is interesting question. I think the online courses have come a long way. There are a lot of different online courses but also on documentation has become much better. There are so many tutorials now out there, back then you had a textbook, maybe that does explain something and you had to wait maybe a few years until they update the things like that. Nowadays, I wouldn’t even go so far and say, prefer maybe even the official documentation of something if you want to learn about a specific tool. So my book is more like about learning the concepts, but there are applications in a certain language to give you examples, but I would say the examples are not the core of this. It’s more like learning the concept. If you want to, for example, learn a specific tool. I think learning on a high level by book diagram, where you get an idea what it is about is useful. But then also going to the website and look what official documentation is out there Because in Pytorch for example, they have a lot of tutorials and TensorFlow that is 2. So these are more in depth, basically, they help you do certain things And then what’s very important, I think, also for learning is to have collaborations with someone, either open source, but also, they have like this forum, let’s say the fight partnership discussion forum, or even Stack Overflow for TensorFlow, where you get feedback basic and see what questions people have. You kind of communicate because now so many little things, you may not know that you’re not doing them right, or something like that, because things are complicated and having always a second pair of eyes looking over your code or you maybe help other people. It’s always a good thing to do to kind of learn. I think it’s a good learning experience to have also discussions about things why do you do this and like that and not the other way? It’s like very helpful, I think. But it

also in general, just working together with people, and even even writing a book basically So you learned about something new, and then you write about it. And there is, I think, also partly how I learned by writing that down, because then you know, what you don’t know. And you will automatically be motivated to learn more than you already know. So that you can basically expand as well Because if you only know something on a surface level, it’s usually not sufficient to write about as well. So it kind of writing about it encourages you to dig a little deeper, and then also publish. It’s like a blog post. People have ideas, additional ideas, like, hey, I noticed you suggest doing this, but have you thought about doing it this way. And this way, you kind of stumbled upon new things way more efficiently than reading everything out there because if you start reading all the material about the topic, you spend a lot of time reading the same thing over and over again, because the basics are kind of covered in the same way in different ways, or different penetrant offers, but there’s maybe only a very small chunk that is new. So by reading multiple resources, you spend a lot of time just reading the same thing. And if you write about something, and this may be an expert that just points exactly what you need, and you get them more efficiently, basically Sanyam Bhutani: And people usually get afraid, especially like I can speak at a personal level, I was afraid that it’s a scary world out there. And it’s generally the opposite, the machine learning community, even on Twitter or otherwise is very warm and welcoming. They provide useful feedback and not criticisive, uh criticizing feedback most of the times Dr. Sebastian Raschka: Except for the Reddit question Sanyam Bhutani: Hehehe Dr. Sebastian Raschka: I would stay away from that maybe, or maybe just reading it, not commenting unnecessarily Sanyam Bhutani: I’ve been banned from it because I keep sharing Chai Time Data Science and apparently they don’t like; Dr. Sebastian Raschka: Yeah, it’s I also I don’t think I comment much. It’s just more like I read the news spacey sometimes, but yeah, it’s a little bit not so welcoming Maybe but except as I guess, yeah, Twitter is generally very welcoming and that is my main source also for new information There’s also some newsletters I subscribe to which are great to just stay up to date on next two podcasts is so, for newsletters there’s, let me see if I get this right. The batch I think, this is why and Andrew Ng-he was doing the Coursera course and he has also newsletter he covers on some of the recently running stuff. And by Jack Clark from opening ideas; Sanyam Bhutani: Yes Dr. Sebastian Raschka: Also is a good newsletter. And this is always helpful to see like a brief summary of what’s new and also for staying up to date Sanyam Bhutani: I’ll try to find those and have those linked in the description in case anyone wants to check them out. Now coming to general tips, do you have any tips for beginners who are just maybe, who just purchase your book and are listening to this podcast or who are just getting started in deep learning? Dr. Sebastian Raschka: Oh, yeah, what I would say is, you know getting started is always a good thing not getting hung up in details. I think it’s always you have to find the balance between doing things right the or doing things like perfect and just getting started. Because if you I mean, for example, one example is, for example, learning math is very important. But if you just learn, if you buy five textbooks right now, one linear algebra, one in calculus, and maybe someone based in statistics and so forth, and you read all these books, you will spend like five years without doing anything may be exciting I mean, some people will say, reading these books is already exciting. But what I mean is doing things alongside is, I think, very motivating. It’s like picking some project or problem to solve and then working on it. And that keeps you motivated. Because otherwise I mean, it’s may be beneficial to just learn all the math up front, it’s maybe efficient, because then you don’t have to look things up when you learn deep learning. But I think that you will have a hard time maintaining your focus because at some point, you get bored or something like this. So you will think why am I doing this uh; Sanyam Bhutani: Hehe Dr. Sebastian Raschka: is exciting, right? So one thing how I also learned a lot of that same recording practices with Pandas is that picking a project that is kind of interesting or cool that I liked. It was fun So in my case, I was doing fantasy sports predictions. So there was, I don’t want to say those disappear website, but it was a website where you could, on weekends submit, basically your roster for Premier League Soccer, basically. So you left it on 11 players for your team And it was basically a constraint optimization problem, basically. So you had a certain budget and you could buy any certain players and they got scores based on how well they did in the real world. And you also had injuries and things like that and you wanted to basically maximize the number of points given your salary. So in that way, there was a lot of data set, is was kind of exciting because I like soccer It was interesting to watch soccer and observe and then tinkering with the data Collecting real data from the

web, basically, and writing my scripts to automate everything and using machine learning to make the predictions. That was basically very exciting. I had some project where when I learned about something, I kind of implemented this into my framework there and experimented with this. I thought that was, what keeping, uh kept me motivated basically Sanyam Bhutani: I think finding your passion project is also a secret to not getting frustrated while your model isn’t converging is important, especially in this field Dr. Sebastian Raschka: Yeah, it’s, I think, very important, because otherwise, yeah, it’s I would say it’s hard to maintain the focus. Yeah, so finding something you’re excited about is always a good idea Sanyam Bhutani: Yeah. Now, this has been a great interview. My final question to you is generally speaking about the field, I believe you were also one of the moderators on archive, and you probably would definitely, I think you’d be seeing a lot of people’s everyday. Do you have any insights of where the field is headed, and how can we stay up to date with this huge overflow of trends and literature of the Sesame Street and things that keeps shifting every year? Dr. Sebastian Raschka: Yeah, so yeah, the moderation process is pretty interesting. So, once I think I should say about this is we don’t review papers, we just check that they have the right category because you know, if you categorize something as machine learning or computer vision and so forth, they also actually to machine learning categories. One is the computer science, the machine learning and the statistics machine learning, but they are cross reference if some article gets published in or uploaded in one category, the other category is automatically assigned. So when so we are basically three moderators for the machine learning, one in computer science, from Dietrich, with lab and I and we kind of share the work we don’t so I don’t do this every day, but two to three times a week. And every day, there are at least one to 200 new papers uploaded. So this is like a huge amount of information going on archive just machine learning. And yeah, so it’s impossible to keep up with everything. So basically, you read the headlines. But what I can tell you your question was basically, what’s maybe currently the trend what I would say. So you see a little bit of everything. But certain things come up more frequently. Now, I would say, topics related to self supervised learning. So self supervised learning, I think one of the hottest trends recently. So self supervised learning is basically you do supervised learning, but you use information that is in the image or in the data already. For example, if you have images, what you can do is this jigsaw example where you chop up images and two different sub images, you shuffle them, and you have the network, predict the right order of limiters to reassemble it. And I mean, if you think about it is kind of a supervised learning problem. But you can do this with unlimited data because you can just generate your label, label specific order the pieces are supposed to be in. So I think it’s very powerful for pre training your networks basically. So the other one, it’s kind of also one student of mine is working in that area, that is very popular is graph neural network. So recently we’ve seen a lot of new projects on graph, neural networks, first method development, but also a lot of applications to social network graphs, but also computational biology, everything related to molecule, it’s basically rep discovery. So yeah, last week, we, soon when we worked on a review article, where we were, what was the executive it was basically on machine AI and machine learning based methods for ligand, bioactive ligand discovery and GPR ligand recognition, can maybe also put this in the show not if people are interested in that but also the main thing that came out of it when we were reviewing recently learning methods were they are more all mostly graph based. So that is, it’s actually very interesting because traditionally, machine learning was only good for text and images but I mean except text and images, there are so many other types of problems Special edition biology. So extending our toolkit of deep learning towards autographs, it’s I think it’s very cool to see. Yeah, so these are, I would say, most, I would say, the hottest ones. But also, I see a lot of papers. This is a very important topic about fairness in AI. So there are many, many different tools about interpretability and fairness in AI and how to diagnose problems and how to visualize even what the network is doing, basically, that’s also very, very important and hot topic right now Sanyam Bhutani: What best advices do you have for someone who wants to keep up with this, use overflow people do? Should they go to archive and see every paper that’s being uploaded every day? Or how should they go about it? Dr. Sebastian Raschka: Yeah, there is a good point I saw you keep up. I think the two newsletters I mentioned they are kind of sub-filtering things. So that is helpful. There’s the archive, sanity server. So but yeah, this is if you have a specific research area, periodically, maybe checking related articles basically, to see basically what what is

company related to, there are some new articles. The Google on Google Scholar, there’s also this recommendation alert feature, which is one one little tool in your toolbox that also can help you stay up to date Yeah, and the one where I were, we were just talking about where we wouldn’t recommend commenting in it, because it may be a little frustrating. But reading the Reddit machine learning read is maybe also just reading what new research articles are There’s also may be nice because people pre select by voting nice research articles, like mostly the general ones. So all these resources are very helpful for keeping up to date. But yeah, it’s more like a general section of the field. If you are more specific than that, I think you need to still use search engines like Google scholar to find related work Sanyam Bhutani: I think the Twitter community is also great and the recommender engine also as on top of it. If you follow your favorite researchers and they like a certain post, it always pops up at the top and maybe in the first three or four tweets you can find your paper to read for the day Dr. Sebastian Raschka: Yeah, no, Twitter is I think, maybe also the social website where are media website. The only one I’m really using and my favorite one by far because yeah, there you can really use what I like about this, you know, the people your work. I mean, it’s not like anonymously, you at some point, you know, different interests by people and you have your audience and so basically, you can retail it to your interest base, which is nice Sanyam Bhutani: Yeah. Now, so last thing before we end the call, what should be the best platforms for the listeners who want to follow you and follow your work? Dr. Sebastian Raschka: Um, yeah, I think that would be Twitter That is an easy one Sanyam Bhutani: Could you spell out your Twitter handle? I’ll have it linked. But in case anyone who’s lazy to scroll you into the bottom, Dr. Sebastian Raschka: So yeah, my Twitter handle is uh, RASPT So ‘raspt’ – RASPT. So yeah, the name is a little bit weird, but I think I wanted to do something that was related to my name, but all the short ones were gone back then I think like that, but back then it was really cutting into your character limits. I want to do something short. It’s just for rational Sebastian. So that’s where it’s coming from, anyway Sanyam Bhutani: Okay, awesome Thank you so much for joining me on the podcast. And thank you so much for your contributions, even to the open source, the research and even for creating the book Dr. Sebastian Raschka: Yeah, thanks for the interview. That was actually really fun. I really enjoyed it. And it’s right now on Monday, so that was a great start. I wish I can start every week like that Sanyam Bhutani: It was an honor to have you on the show. Thanks so much Dr. Sebastian Raschka: Thank you. Bye bye Sanyam Bhutani: Thank you so much for listening to this episode. If you enjoyed the show, please be sure to give it a review or feel free to shoot me a message you can find all of the social media links in the description. If you like the show, please subscribe and tune in each week to “Chai Time Data Science.”