{"id":811969,"date":"2019-04-24T19:15:20","date_gmt":"2019-04-25T02:15:20","guid":{"rendered":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/?post_type=msr-blog-post&#038;p=811969"},"modified":"2022-01-13T19:28:28","modified_gmt":"2022-01-14T03:28:28","slug":"froid-and-the-relational-database-query-quandary-with-dr-karthik-ramachandra","status":"publish","type":"msr-blog-post","link":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/articles\/froid-and-the-relational-database-query-quandary-with-dr-karthik-ramachandra\/","title":{"rendered":"Microsoft Research Podcast: Froid and the relational database query quandary with Dr. Karthik Ramachandra"},"content":{"rendered":"<p><a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/04\/Karthik-Ramachandra_Podcast_Site_11_2018_1400x788.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-580048 size-large\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/04\/Karthik-Ramachandra_Podcast_Site_11_2018_1400x788-1024x576.png\" alt=\"Dr. Karthik Ramachandra\" width=\"1024\" height=\"576\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/04\/Karthik-Ramachandra_Podcast_Site_11_2018_1400x788-1024x576.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/04\/Karthik-Ramachandra_Podcast_Site_11_2018_1400x788-300x169.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/04\/Karthik-Ramachandra_Podcast_Site_11_2018_1400x788-768x432.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/04\/Karthik-Ramachandra_Podcast_Site_11_2018_1400x788-1066x600.png 1066w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/04\/Karthik-Ramachandra_Podcast_Site_11_2018_1400x788-655x368.png 655w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/04\/Karthik-Ramachandra_Podcast_Site_11_2018_1400x788-343x193.png 343w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2019\/04\/Karthik-Ramachandra_Podcast_Site_11_2018_1400x788.png 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/p>\n<p><iframe loading=\"lazy\" src=\"https:\/\/player.blubrry.com\/id\/43462564\/\" width=\"100%\" height=\"138px\" frameborder=\"0\" scrolling=\"no\"><span data-mce-type=\"bookmark\" style=\"display: inline-block; width: 0px; overflow: hidden; line-height: 0;\" class=\"mce_SELRES_start\">\ufeff<\/span><\/iframe><\/p>\n<h3>Episode 73 | April 24, 2019<\/h3>\n<p>In the world of relational databases, structured query language, or SQL, has long been King of the Queries, primarily because of its ubiquity and unparalleled performance. But many users prefer a mix of imperative programming, along with declarative SQL, because its user-defined functions (or UDFs) allow for good software engineering practices like modularity, readability and re-usability. Sadly, these benefits have traditionally come with a huge performance penalty, rendering them impractical in most situations. That bothered <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/people\/karam\/\">Dr. Karthik Ramachandra<\/a>, a Senior Applied Scientist at <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/lab\/microsoft-research-india\/\">Microsoft Research India<\/a>, so he\u2019s spent a great deal of his career working on improving an imperative complement to SQL in database systems.<\/p>\n<p>Today, Dr. Ramachandra gives us an overview of the historic trade-offs between declarative and imperative programming paradigms, tells us some fantastic stories, including The Tale of Two Engineers and The UDF Story, Parts 1 and 2, and introduces us to <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/project\/froid\/\">Froid<\/a> \u2013 that\u2019s F-R-O-I-D, not the Austrian psychoanalyst \u2013 which is an extensible, language-agnostic framework for optimizing imperative functions in databases, offering the benefits of UDFs without sacrificing performance.<\/p>\n<h3>Related:<\/h3>\n<ul type=\"disc\">\n<li><a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/project\/froid\/\">FROID<\/a>: View more about Froid<\/li>\n<li><a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/podcast\">Microsoft Research Podcast<\/a>: View more podcasts on Microsoft.com<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/itunes.apple.com\/us\/podcast\/microsoft-research-a-podcast\/id1318021537?mt=2\">iTunes<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>: Subscribe and listen to new podcasts each week on iTunes<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/subscribebyemail.com\/www.blubrry.com\/feeds\/microsoftresearch.xml\">Email<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>: Subscribe and listen by email<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/subscribeonandroid.com\/www.blubrry.com\/feeds\/microsoftresearch.xml\">Android<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>: Subscribe and listen on Android<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/open.spotify.com\/show\/4ndjUXyL0hH1FXHgwIiTWU\">Spotify<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>: Listen on Spotify<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/www.blubrry.com\/feeds\/microsoftresearch.xml\">RSS feed<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/note.microsoft.com\/ww-registration-microsoft-research-newsletter-s.html?wt.mc_id=S-webpage_podcast\">Microsoft Research Newsletter<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>: Sign up to receive the latest news from Microsoft Research<\/li>\n<\/ul>\n<hr \/>\n<h3>Transcript<\/h3>\n<p>Karthik Ramachandra: To start the story right, if you look at a database like <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/sql-server\/sql-server-2017\">Microsoft SQL Server<\/a>, which is what the focus of our work has been so far, SQL server introduced scalar user-defined functions way back in 2000 as a means for users to be able to express their custom behavior, you know? Some of this custom logic is easier expressed using imperative code, so there was a demand for it, and they introduced this feature. It was good and happy, but in a few years, people realized that scalar UDFs are good when it comes to modularity and code re-use and some other metrics, but with respect to performance, it turns out that they\u2019re evil.<\/p>\n<p><strong>Host: You\u2019re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I\u2019m your host, Gretchen Huizinga.<\/strong><\/p>\n<p><strong>Host: In the world of relational databases, structured query language, or SQL, has long been King of the Queries, primarily because of its ubiquity and unparalleled performance. But many users prefer a mix of imperative programming, along with declarative SQL, because its user-defined functions (or UDFs) allow for good software engineering practices like modularity, readability and re-usability. Sadly, these benefits have traditionally come with a huge performance penalty, rendering them impractical in most situations. That bothered Dr. Karthik Ramachandra, a Senior Applied Scientist at Microsoft Research India, so he\u2019s spent a great deal of his career working on improving an imperative complement to SQL in database systems.<\/strong><\/p>\n<p>Today, Dr. Ramachandra gives us an overview of the historic trade-offs between declarative and imperative programming paradigms, tells us some fantastic stories, including The Tale of Two Engineers and The UDF Story, Parts 1 and 2, and introduces us to Froid \u2013 that\u2019s F-R-O-I-D, not the Austrian psychoanalyst \u2013 which is an extensible, language-agnostic framework for optimizing imperative functions in databases, offering the benefits of UDFs without sacrificing performance. That and much more on this episode of the Microsoft Research Podcast.<\/p>\n<p><strong>Host: Karthik Ramachandra, welcome to the podcast.<\/strong><\/p>\n<p>Karthik Ramachandra: Thank you, I\u2019m happy to be here!<\/p>\n<p><strong>Host: You\u2019re a senior applied scientist at the Microsoft Research Lab in India in Bangalore, and I\u2019m lucky to have you in the booth today. Good to see you face to face. Met you at Techfest.<\/strong><\/p>\n<p>Karthik Ramachandra: Yup.<\/p>\n<p><strong>Host: Now we\u2019re in the booth.<\/strong><\/p>\n<p>Karthik Ramachandra: Yup.<\/p>\n<p><strong>Host: Tell us, what does a senior applied scientist do for a living? What gets you up in the morning?<\/strong><\/p>\n<p>Karthik Ramachandra: Well, as an applied scientist, the good thing about my job is that I get to ask the really hard questions which are not commonly asked. And the other thing is I get to work with really smart people to solve those problems. And the third thing is that the problems that I solve have the potential of impacting a huge customer base, worldwide, which is very inspiring and exciting to me, and that gets me excited every day.<\/p>\n<p><strong>Host: Yeah, so on the spectrum there, there\u2019s a sliding scale of pure research to applied research, or industrial research. Where do you fall on that spectrum? Because you\u2019re working in a pure research institution, really, but it has some play in it, right?<\/strong><\/p>\n<p>Karthik Ramachandra: Yeah, so I like to place myself in the middle of the spectrum. My interest is in taking ideas and technologies that are coming out of pure research outcomes, and then coming up with a way to make them practical or make them real in real systems and reach to customers and users. So, I like being in the middle there, to be a bridge between research and practice, so that\u2019s where I think I would place myself.<\/p>\n<p><strong>Host: I love that, because, as we\u2019re going to find out shortly in this interview, you\u2019re also bridging some other kinds&#8230;<\/strong><\/p>\n<p>Karthik Ramachandra: Yes.<\/p>\n<p><strong>Host: &#8230;of technologies together. So maybe that\u2019s your calling in life is to \u201cbe the bridge.\u201d<\/strong><\/p>\n<p>Karthik Ramachandra: Yeah, yeah, I think so, maybe.<\/p>\n<p><strong>Host: Well, let\u2019s set up our podcast with a \u201c<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/microsoft-virtual-earth-3d.en.uptodown.com\/windows\">Virtual Earth 3D<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u201d view of relational databases, which is the heart and soul of your work.<\/strong><\/p>\n<p>Karthik Ramachandra: Yes.<\/p>\n<p><strong>Host: Tell us about the two main programming paradigms in the relational database world and their relative strengths and weaknesses, so we\u2019re at least in a mindset to understand how your current work is reconciling them.<\/strong><\/p>\n<p>Karthik Ramachandra: Yup. So, if you look at relational databases today, the primary way to interact with the database is through this language called SQL, or structured query language, which falls under this declarative paradigm of programming, which basically says the user needs to tell the system what they need in this declarative high-level language, and the system figures out an efficient way to do what the user has asked. So that\u2019s sort of one main paradigm, or the primary way we interact with databases today. That comes with the advantage that, you know, the users can stay at a higher level of abstraction, not having to go to the detailed implementation of how things are done. And it also allows the system to optimize and come up with efficient algorithms to solve the query or the question that the user is trying to ask.<\/p>\n<p><strong>Host: Yeah.<\/strong><\/p>\n<p>Karthik Ramachandra: That is one paradigm, and on the other side, we have this imperative program style which is a slightly lower level of abstraction in the sense you are basically telling the system how to go about doing what you want it to do. And, as a result, you\u2019re sort of binding the system to implement it in the way you are telling it to do. The advantage in imperative programming languages is that you have more scope for modularizing and reusing code and so on, but there\u2019s a limited scope for the system to figure out efficient ways to do data processing.<\/p>\n<p><strong>Host: So declarative, you\u2019re telling the computer what to do but not how to do it\u2026<\/strong><\/p>\n<p>Karthik Ramachandra: Yes, yes.<\/p>\n<p><strong>Host: \u2026or telling the database what you want, but not how.<\/strong><\/p>\n<p>Karthik Ramachandra: Yes, yes. Yes.<\/p>\n<p><strong>Host: And the imperative, you have to know a bit more, or you have to be more specific on how you want it to carry it out?<\/strong><\/p>\n<p>Karthik Ramachandra: It\u2019s more like a preference, sort of. In many cases, you can express your requirement in either paradigm. But in my opinion, it\u2019s just a matter of choice.<\/p>\n<p><strong>Host: Well, there you go. Because on the larger scale, we\u2019re going to be talking today about why you\u2019re leaning towards the imperative paradigm, and so let\u2019s go there right now. I usually try to set my questions up in a progressive, somewhat linear manner. But your life and career and your work all kind of go together and they\u2019re intertwined, so I\u2019m going to go a little more freeform today and have you tell us some stories. Because I know you\u2019re good at storytelling. I want to start with one that will put a finer point on this declarative\/imperative approach, the tension between the two.<\/strong><\/p>\n<p>Karthik Ramachandra: Yes, yes.<\/p>\n<p><strong>Host: You call it a Tale of Two Engineers. You actually have a name for this story, which I love, being a lit major. Uh, tell us the story.<\/strong><\/p>\n<p>Karthik Ramachandra: Yeah, yeah. So, this is a tale that I always use when I want to drive home the point about this problem. So, let\u2019s say there\u2019s this e-commerce company like an online retail firm, right? Typically, these companies have a large database of customers who are placing orders and also the order information, like what orders are placed and so on. So, it turns out that in one such company, there is like a manager who owns this data and is responsible for doing some data analytics on this data. And this company now has this new requirement that they want to introduce something like a rewards program where they want to help the loyal customers by having some offers and so on. So, they want to basically categorize their customers into, let\u2019s say, three categories like platinum, gold and regular, and they have some simple logic to do this. If you bought stuff worth some amount of money or more, then you are platinum, otherwise you are regular and so on. Based on how much business you do with them you fall into one of these buckets. So, she calls two of her engineers on her team and tells them, look, we have this new requirement, please go and implement this and give me a report which shows all the customers, and which bucket they fall into.<\/p>\n<p><strong>Host: Right.<\/strong><\/p>\n<p>Karthik Ramachandra: So, it turns out these two engineers get into a fight about how to do this. I mean, this happens. This is not very uncommon.<\/p>\n<p><strong>Host: Software wars.<\/strong><\/p>\n<p>Karthik Ramachandra: Yes. So, it turns out that these two engineers decide that they will do it in their own way and they both go off on their own. One of them happens to be an SQL expert, right? So, he\u2019s done a lot of SQL in his past years of work. So, he comes up with one complex SQL query which can answer this problem right away, right? So, it\u2019s one query, but it does the job, but probably only he can understand it because it\u2019s quite complex.<\/p>\n<p><strong>Host: Right.<\/strong><\/p>\n<p>Karthik Ramachandra: The other engineer is a programmer. He\u2019s not an SQL expert, so he writes a simple query and writes an imperative, user-defined function which, again, does the same thing in a different way, right? So, he writes it using variables and if\/then\/else and conditional branching and so on and different constructs that are common. So now both of them come back to the manager with their respective solutions. Both of them think that their solution is better, so they come to the manager and show their solution. Well, it turns out that the manager runs both of those and decides to promote one of them and fire the other one. So, this is a more dramatic part of the story. We may not fire them, but the point is that one of them is promoted and the other one is not. Can you guess the reason?<\/p>\n<p><strong>Host: I cannot. Not even. I would imagine&#8230;<\/strong><\/p>\n<p>Karthik Ramachandra: Can you guess who was fired and who was promoted?<\/p>\n<p><strong>Host: Well, I\u2019ve seen the slides, so I know the end of the story, but I know our listeners don\u2019t!<\/strong><\/p>\n<p>Karthik Ramachandra: So, it turns out that the SQL expert was promoted, and the imperative programmer was fired in this case. And the reason being that the SQL query ran in a couple of minutes over the database of millions of customers and billions of orders, whereas the imperative function took a few hours to run on their database. And that was not acceptable to the business, so they had to choose the more efficient solution, despite the fact that the function had other benefits to it.<\/p>\n<p><strong>Host: Okay.<\/strong><\/p>\n<p>Karthik Ramachandra: So, this sort of demonstrates that both solutions are correct. They give the right answer, right? They\u2019re not doing something wrong. But just because you wrote the program in a different way, you are penalized, right? So that\u2019s the tension there that I\u2019m trying to reconcile.<\/p>\n<p><strong>Host: Okay, and I love this because I already know the other end of the story, which is the cool technology you\u2019re working on right now. But let\u2019s diverge again, because there\u2019s some other setups I want to do. Normally, I wait to have you tell about yourself until the end of the podcast, but I want you to do that now, since I also happen to know that your early academic and professional experiences led directly to the research you\u2019re doing. So, tell us how the first job you got after your graduation led to a nagging dissatisfaction that got you here now.<\/strong><\/p>\n<p>Karthik Ramachandra: Yeah. Yeah, that\u2019s actually an interesting story. So, as a part of job earlier as a software developer and a tech lead, we had this requirement from a client where we had to build a dashboard based on data that was present in a relational database backend. So, we built a nice little dashboard tool which could do these reports as the customer wanted, and we used all the good programming practices by writing modular code and following all the design patterns that are recommended in best practices for programming and so on.<\/p>\n<p><strong>Host: Hmm.<\/strong><\/p>\n<p>Karthik Ramachandra: But it turned out that the tool that we built, although it was doing its job, it failed the performance requirement miserably because the scale of the data was huge, and the way we had written our tool, it could not scale to, you know, large data sets and multiple concurrent users using it at the same time. So, it was basically not able to match the performance requirement. As a result, when we did the analysis and figured out what was the reason, it turned out that we had to manually remove some of these good programming practices and undo some of these good things that we had done in terms of software engineering practices, and we had to rewrite a lot of our programs as huge SQL queries which did the job more efficiently, but in terms of maintainability and other factors, we lost out on something. So, in some sense, we had to trade off readability and modularity for the sake of performance. So, we did this, and the customer was happy after that, but what left me nagging with this whole experience was, why should we do this trade off, right? Is there a way to get both performance and this flexibility and this modularity together? I don\u2019t want to give up on either of them, right? So that\u2019s sort of what got me thinking. And when I left my job and went to grad school at IIT Bombay, I saw that my advisor was actually with another PhD student at the time, was doing something very similar. In fact, recently, they had started looking at the same problem. So, at that moment, I realized that I had come to the right place, and I just joined them and took that project forward. Karthik Ramachandra: So that\u2019s how this all started.<\/p>\n<p><strong>Host: So, did you actually quit your job to go back to school because you were bothered by this, or&#8230;<\/strong><\/p>\n<p>Karthik Ramachandra: Well, uh&#8230;<\/p>\n<p><strong>Host: &#8230;was it more complex than that?<\/strong><\/p>\n<p>Karthik Ramachandra: Yeah, I mean, it was not that I quit my job and went to grad school purely for this particular problem.<\/p>\n<p><strong>Host: Yeah.<\/strong><\/p>\n<p>Karthik Ramachandra: I had the desire to go back to school and go deeper into some area of computer science anyway, and this fell in place really well because, this was also a problem that was at the back of my mind, and I saw that my advisor, Sudarshan, at IIT Bombay, he was also looking at the same problem with another PhD student, so it was just like something that clicked immediately.<\/p>\n<p><strong>Host: I\u2019m just going to say, too, this is a kind of \u201clife wisdom\u201d thing: get a job after undergrad before you go back to graduate school to find out what the real world is doing and what problems there are, and maybe it\u2019ll inspire you for&#8230;<\/strong><\/p>\n<p>Karthik Ramachandra: Exactly, yeah. Especially for these applied research areas, right, where the problems need to be motivated by real needs.<\/p>\n<p><strong>Host: Yeah.<\/strong><\/p>\n<p>Karthik Ramachandra: It was really valuable, in hindsight. I knew that I would go to grad school when I took up my first job, but in hindsight, it\u2019s clear to me very much that my experience before getting into grad school has helped a lot in the research that I\u2019ve been doing and all the work that I\u2019ve been doing.<\/p>\n<p>(music plays)<\/p>\n<p><strong>Host: All right, well let\u2019s keep this story train going. You have another one, and it has two parts. I love the fact that you title your stories. This one is called The UDF Story, and we haven\u2019t addressed what UDF is, but it\u2019s integral to the imperative paradigm, so tell us the UDF Story, Part 1 and Part 2, or how UDFs went from evil to magic in less than 20 years!<\/strong><\/p>\n<p>Karthik Ramachandra: Yup. That\u2019s a nice subtitle for my story. Well, so UDF stands for user-defined function, and it\u2019s essentially a way where, in a database system, you can have imperative programs like this return as user-defined functions which can be called from an SQL query, like from a declarative query you can call into this imperative piece of code which will get executed as part of the query. So that\u2019s what UDF stands for. So, to start the story right, if you look at a database like Microsoft SQL Server, which is what the focus of our work has been so far, SQL server introduced scalar user-defined functions way back in 2000 as a means for users to be able to express their custom behavior, you know? Some of this custom logic is easier expressed using imperative code, so there was a demand for it, and they introduced this feature. It was good and happy, but in a few years, people realized that scalar UDFs are evil when it comes to performance. So they are good when it comes to modularity and code we use and some other metrics, but with respect to performance, it turns out that they\u2019re evil, and evil is not my choice of the word, but these are articles which people have written, experts.<\/p>\n<p><strong>Host: And tweets and&#8230;<\/strong><\/p>\n<p>Karthik Ramachandra: And tweets and, yeah, blog posts and all that on the internet you will find a lot of those articles. And this is not specific to SQL server. This thing is common to all relational databases.<\/p>\n<p><strong>Host: Sure.<\/strong><\/p>\n<p>Karthik Ramachandra: But our focus was SQL server. So, it turned out that, yeah, it went to such an extent that we, ourselves, as Microsoft, we had to advise our customers to avoid using user-defined functions whenever performance mattered to them.<\/p>\n<p><strong>Host: So, so let\u2019s clarify. Evil means slow\u2026<\/strong><\/p>\n<p>Karthik Ramachandra: Yeah, yeah, evil with respect to performance, yes.<\/p>\n<p><strong>Host: I mean, so \u2013 so that\u2019s just an indictment on our culture. We have zero patience. I get it, though. I mean, time is money for corporations, so it does matter. But really, we\u2019re talking slow.<\/strong><\/p>\n<p>Karthik Ramachandra: Yes, uhh\u2026<\/p>\n<p><strong>Host: All right, so I interrupted the story, and you were&#8230;<\/strong><\/p>\n<p>Karthik Ramachandra: Yeah.<\/p>\n<p><strong>Host: &#8230;at the point where you said, Microsoft said, \u201cDon\u2019t use UDFs.\u201d<\/strong><\/p>\n<p>Karthik Ramachandra: Yeah, so we had blog posts in MSDN and our own blog engines where we advised customers to avoid using UDFs. So, there was a lot of this negativity which was there, and that continued even until, I mean, there are articles even until 2016 and so on where people have kept complaining about it. So, this was around 2010, I think, and that was when I had joined with my PhD at IIT Bombay. And around 2012, this other PhD student, with my advisor who had then moved to IIT Hyderabad, another IIT in India, and we started working on this idea of a way to optimize such user-defined functions. So, this was a collaboration with a bunch of people, and we came up with a publication in 2014 which was sort of one of the first papers that said, you know, you can actually optimize these user-defined functions that run in a database. And I graduated in the same year, 2014, and joined Microsoft in Madison, where we have this lab called Gray Systems Lab.<\/p>\n<p><strong>Host: Okay.<\/strong><\/p>\n<p>Karthik Ramachandra: And in 2015, this was in the back of my mind as to how can I take my ideas that I worked on during my PhD, and how can I make them real in the context of Microsoft\u2019s products and services. So that\u2019s when I was inspired by this paper that we had written about UDFs, and I said, okay, can I try to implement this or make this real inside of SQL server? So I started building a prototype and that\u2019s where Froid was born, was I started building a prototype inside Microsoft SQL Server and, after a year, I showed it to some people and that got people interested and more people joined the team to help me with this, and we were able to convince the product team that this is an improvement that should be a part of SQL server. And so, then we went on towards productizing it, and in 2017, we had this publication that we brought out in <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/www.vldb.org\/2019\/\">VLDB<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, which is one of the popular database systems.<\/p>\n<p><strong>Host: VLDB.<\/strong><\/p>\n<p>Karthik Ramachandra: VLDB.<\/p>\n<p><strong>Host: Yeah.<\/strong><\/p>\n<p>Karthik Ramachandra: Yeah, this is one of the top-tier database journal and conference. And then in 2018, I moved back to Bangalore, and in the same year, later in the year, we publicly announced Froid as a feature of SQL Server 2019. And we allowed users to start using a beta or a preview version of this and we have received like really positive feedback and response from users who have tried it out and that\u2019s where some of the users said that it\u2019s now magic and all that, so those are, again, responses that we got over the internet or tweets and other social media.<\/p>\n<p><strong>Host: Yeah.<\/strong><\/p>\n<p>Karthik Ramachandra: So, it took a while but, you know, I think we are at a point where we have made some significant progress in that direction.<\/p>\n<p><strong>Host: Well, we\u2019ve kind of buried the lede here, as they say in journalism. And we\u2019re talking about this project called Froid, F-R-O-I-D. Not Freud like the psychoanalyst.<\/strong><\/p>\n<p>Karthik Ramachandra: Yes.<\/p>\n<p><strong>Host: Although, software problems could put you on the couch\u2026 Um, so tell us sort of high level \u2013 because I\u2019m going to ask you next to get us into the technical weeds \u2013 but tell us what Froid is.<\/strong><\/p>\n<p>Karthik Ramachandra: Yeah.<\/p>\n<p><strong>Host: Specifically.<\/strong><\/p>\n<p>Karthik Ramachandra: So Froid is basically a framework for optimizing these imperative programs inside of a database system. So, database systems are known for optimizing SQL queries and running SQL queries, but not imperative programs. So&#8230;<\/p>\n<p><strong>Host: Hence the performance gap.<\/strong><\/p>\n<p>Karthik Ramachandra: Yes. So, Froid basically tries to address that gap by coming up with a novel way to optimize imperative user-defined functions inside of a relational database. So that\u2019s like the one sentence description, if you will.<\/p>\n<p><strong>Host: And that\u2019s perfect as a segue into how you did this. Let\u2019s go into the weeds and talk about Froid in more depth. How, technically, did you bridge the gap between these two \u201cfrenemies\u201d of declarative SQL and imperative UDFs with Froid?<\/strong><\/p>\n<p>Karthik Ramachandra: Yup. So, as I was mentioning, the intuition comes from this paper that we wrote way back in 2014, and the key intuition is that the imperative programming paradigm and the declarative paradigm are in two different levels of abstraction in which you\u2019re dealing with the system. So, in order to bridge this gap, we came up with an automatic and a systematic way to take these imperative programs and translate them into equivalent declarative programs, or equivalent relational algebraic expressions, to be more precise.<\/p>\n<p><strong>Host: Yeah.<\/strong><\/p>\n<p>Karthik Ramachandra: So, in a sense, we are taking imperative programs and translating them into a declarative form. So now you are in the same world where SQL and your imperative program are both expressed in a declarative form, so now that you\u2019re in the same world, today\u2019s database query optimizers can understand what\u2019s going on inside the user-defined function and also be able to efficiently execute those. So, the key idea, to summarize, is that we show how you can systematically and automatically translate imperative programs into a declarative form, which is relational algebra in our setting.<\/p>\n<p><strong>Host: Hence bridging the gap between the two.<\/strong><\/p>\n<p>Karthik Ramachandra: Hence bridging, exactly.<\/p>\n<p><strong>Host: And gaining performance with UDF abstraction levels in a SQL environment.<\/strong><\/p>\n<p>Karthik Ramachandra: Yes, so we have done several experiments on many of our customer workloads as well as benchmarks and so on. And we have seen a really significant, order of magnitude performance improvement because of this bridging of this gap. Mainly in situations where users are having this tradeoff between good code and modular code and performance.<\/p>\n<p><strong>Host: Yeah.<\/strong><\/p>\n<p>Karthik Ramachandra: With this, now we are able to say that users can still write code in the way they like and maintain all the good programming practices and not compromise on the performance aspect of it.<\/p>\n<p><strong>Host: All right, so give us an overview of the stats, because they\u2019re pretty impressive. Give us a sort of verbal dashboard, since this whole thing began with a dashboard, on how Froid has performed comparatively against the other paradigms.<\/strong><\/p>\n<p>Karthik Ramachandra: Yeah, as I was mentioning, we have tried Froid on several workloads and even on many of our early adopters who have already tried out Froid have started writing a lot of articles about it where they have done the comparison themselves on their workloads. And it\u2019s very heartening to see others who are trying this out on their queries and UDFs and finding positive results. So, we have seen, like I said, orders of magnitude of up to, you know, even hundreds of times faster, or even more than that in several cases. And also, the other interesting thing is the larger your database is, the more improvements you get in performance.<\/p>\n<p><strong>Host: Oh, really?<\/strong><\/p>\n<p>Karthik Ramachandra: So, for smaller data sets, this may not even matter a lot, because you\u2019re mainly dealing with a small data size.<\/p>\n<p><strong>Host: Yeah.<\/strong><\/p>\n<p>Karthik Ramachandra: It starts mattering all the more when you have larger datasets, which is more often the case nowadays, right? So, people have huge databases with terabytes of data and being able to efficiently crunch through these large volumes of data is critical.<\/p>\n<p><strong>Host: In a short amount of time.<\/strong><\/p>\n<p>Karthik Ramachandra: In a short amount of time.<\/p>\n<p><strong>Host: Short enough amount of time.<\/strong><\/p>\n<p>Karthik Ramachandra: Yes.<\/p>\n<p><strong>Host: Can people get this code?<\/strong><\/p>\n<p>Karthik Ramachandra: Yeah, so Froid is available as part of Microsoft SQL Server 2019 beta, or public preview release, which is available for download on the Microsoft SQL Server website.<\/p>\n<p><strong>Host: Yeah.<\/strong><\/p>\n<p>Karthik Ramachandra: So, people can download it and try it out, and we are happy to get any feedback or any thoughts that people may have about it. I would encourage people to try this out and let us know what they think about it.<\/p>\n<p><strong>Host: What\u2019s on the horizon for the future of Froid? What are the open problems still, and how are you tackling them as researchers and, and developers and engineers?<\/strong><\/p>\n<p>Karthik Ramachandra: Yeah, Froid is just a first step in my opinion. There\u2019s still a long way to go. Currently, Froid has some limitations in terms of the kind of functions that it can optimize, right? So, one of the main limitations being we don\u2019t handle loops at this point, like if your imperative UDF has a loop inside it, we currently do not do it. But that is actively being worked upon, and we have some ideas and we have some prototypes that we are building to address that, and that is one of the areas. And the other is also to expand this to a broader set of languages. Currently, we can do this for Transact SQL or the procedural extensions of SQL. But we plan to expand this to languages like C# and Python and so on, which are used in context of machine learning and data science and so on. So, we want to broaden the scope in those directions.<\/p>\n<p><strong>Host: And those are active research threads that are happening now?<\/strong><\/p>\n<p>Karthik Ramachandra: Yeah, these are active, yes.<\/p>\n<p><strong>Host: Some people have called it magic, where you are now. What do you go to in the next 20 years?<\/strong><\/p>\n<p>Karthik Ramachandra: (laughter) Yeah, that\u2019s uh&#8230;<\/p>\n<p><strong>Host: Don\u2019t even know.<\/strong><\/p>\n<p>Karthik Ramachandra: Leave it to them to say\u2026<\/p>\n<p>(music plays)<\/p>\n<p><strong>Host: This is about the time where I ask the question, is there anything that keeps you up at night? And there don\u2019t seem to be too many looming issues that should scare us about databases, I think. Perhaps there are some things we want to be mindful of as we outsource more of our decision-making to our understanding of data dashboards.<\/strong><\/p>\n<p>Karthik Ramachandra: Mm-hmm.<\/p>\n<p><strong>Host: What\u2019s your thinking on that? Is there anything that keeps you up at night?<\/strong><\/p>\n<p>Karthik Ramachandra: Well, I think, you know, to put it in a different way, right? Even today, what a database says to a user is, hey, talk to me in SQL, right? And then, if you talk to me in SQL, I can do what you want me to do as efficiently as I can.<\/p>\n<p><strong>Host: Mm-hmm.<\/strong><\/p>\n<p>Karthik Ramachandra: So that\u2019s, in some sense, a narrow restriction that the database expects users to talk to it in a specific way, right? I think what keeps me interested, or keeps me thinking, is that people talk in different languages and they have different ways to express their requirements. And as a database system, we have a lot of good technology which is built in over the last several decades which have gone into these systems, and I think there is an opportunity to broaden the scope, or the reach of databases, database systems, by accepting that there\u2019s diversity of languages and trying to make the technology of databases reach to a broader set of users who may not be SQL experts, right? So even if I don\u2019t know SQL, I should be able to get the best out of a database system, which is not entirely true today. Froid takes one step towards that. But as I said, it\u2019s still a long way, and that\u2019s something that, I mean, making databases more user-friendly or programmer-friendly in some sense, or, you know, more broad is something that I keep thinking about. And that\u2019s the direction I want to keep pushing through on. Even if I don&#8217;t know SQL really well, I should be able to get the best out of a database system. And today, that\u2019s not entirely true.<\/p>\n<p><strong>Host: So, you may not be losing sleep over it, but it\u2019s certainly something that motivates you.<\/strong><\/p>\n<p>Karthik Ramachandra: Yes, yes. I do get good sleep. It\u2019s not&#8230;<\/p>\n<p><strong>Host: That\u2019s good. Good to know.<\/strong><\/p>\n<p>Karthik Ramachandra: I don\u2019t think I\u2019m losing sleep over it. Yeah.<\/p>\n<p><strong>Host: So, I usually ask people to tell us about themselves right now\u2026<\/strong><\/p>\n<p>Karthik Ramachandra: Mm-hmm.<\/p>\n<p><strong>Host: \u2026and how they came to MSR. But you basically covered that earlier on.<\/strong><\/p>\n<p>Karthik Ramachandra: Yeah.<\/p>\n<p><strong>Host: It\u2019s such a good story. So, let\u2019s get a little more personal here. What\u2019s one interesting thing, maybe a trait, a characteristic, a life event, that people might not know about you that may have influenced your career as a researcher?<\/strong><\/p>\n<p>Karthik Ramachandra: Yeah, one thing that I can say is that I listen to a lot of music, especially Indian classical music. And I also play the tabla, which is an Indian music instrument. And since a young age, that has been my hobby, and it has been with me in inexplicable ways. I think that has sort of enriched my life and also, whenever I want, I get rejuvenation and you know, I get refreshed whenever I go back to music. So, although I can\u2019t explicitly quantify how it has helped my career and our research and so on, I think it has played a very important role in my life as a whole, and I keep continuing to go back to music, especially, you know, I\u2019m fascinated by the intricacy and the depth behind these highly-evolved styles of music, one of which is Indian classical music. And that\u2019s something really fascinating, and I keep getting amazed by it.<\/p>\n<p><strong>Host: That is not at all where I thought you might go with this, and that is fascinating. There\u2019s a big connection between math and music.<\/strong><\/p>\n<p>Karthik Ramachandra: Yes.<\/p>\n<p><strong>Host: Drilling in a little bit there, you say I can\u2019t really quantify it, and I love that you can\u2019t quantify it, because not everything is quantifiable\u2026<\/strong><\/p>\n<p>Karthik Ramachandra: Yes.<\/p>\n<p><strong>Host: \u2026but what do you think? What does your soul tell you that music does for your work?<\/strong><\/p>\n<p>Karthik Ramachandra: See, I think there are hidden connections between all these things that look different on the surface. Again, as I told you, I don&#8217;t know what the connection is explicitly, but my gut feel is that there are strong connections. And I don&#8217;t know what happens in the brain when you listen to music. I mean, there have been a lot of studies that try to understand this, but I think overall it has a really positive impact, uh&#8230;<\/p>\n<p><strong>Host: Wow.<\/strong><\/p>\n<p>Karthik Ramachandra: So yeah.<\/p>\n<p><strong>Host: All right, so this has been a great conversation. As we close, I like to ask my guests to share something meaningful with our listeners. This could be advice, wisdom, inspiration, maybe even \u201cthings I know now that I wish I knew then.\u201d Is there anything you\u2019d like to say to would-be database researchers? Because now\u2019s your chance to say it.<\/strong><\/p>\n<p>Karthik Ramachandra: Yeah, so, over the years, what I\u2019ve realized is that database systems offers like a wide spectrum of interesting research challenges. And all the more, as data sizes keep growing and technologies like machine learning and AI are making inroads into everything that we deal with on a day-to-day basis, I think as they become commoditized or democratized, in some sense, I think database researchers have a very important role to play in how it can be integrated more easily into applications and software in general. And so I would like to say that, you know, as a researcher, I think it\u2019s very important to continue to focus on the fundamentals of, you know, principles behind the research that we do, and always keep an eye out for real problems that practitioners keep complaining about, or the problems that they face. Because as researchers and scientists, it\u2019s very easy to get attracted to some intellectually challenging problems. I mean, I\u2019m not saying that\u2019s wrong, but at least, as applied researchers, I think we need to be cognizant of what\u2019s happening on the ground. We need to be grounded by, you know, what are the problems that users face, or what are the directions that these technologies are taking us, and where there are gaps to fill? So, I think balancing this between intellectually challenging problems and practically relevant problems is something which is very important at least for applied research areas, I think.<\/p>\n<p><strong>Host: Sort of the balance between the solution in search of a problem and the problem in search of a solution.<\/strong><\/p>\n<p>Karthik Ramachandra: Yeah, that\u2019s a nice way to put it, yeah.<\/p>\n<p><strong>Host: Karthik Ramachandra, thank you so much for joining us today. It\u2019s been awesome.<\/strong><\/p>\n<p>Karthik Ramachandra: It was great talking to you. Thank you so much for having me.<\/p>\n<p>(music plays)<\/p>\n<p>To learn more about Dr. Karthik Ramachandra and how Froid is delivering the best of both worlds to database users, visit <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/\">Microsoft.com\/research<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\ufeff Episode 73 | April 24, 2019 In the world of relational databases, structured query language, or SQL, has long been King of the Queries, primarily because of its ubiquity and unparalleled performance. But many users prefer a mix of imperative programming, along with declarative SQL, because its user-defined functions (or UDFs) allow for good [&hellip;]<\/p>\n","protected":false},"author":39507,"featured_media":580048,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-content-parent":199562,"msr_hide_image_in_river":0,"footnotes":""},"research-area":[],"msr-locale":[268875],"msr-post-option":[],"class_list":["post-811969","msr-blog-post","type-msr-blog-post","status-publish","has-post-thumbnail","hentry","msr-locale-en_us"],"msr_assoc_parent":{"id":199562,"type":"lab"},"_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/811969","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post"}],"about":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-blog-post"}],"author":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/users\/39507"}],"version-history":[{"count":2,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/811969\/revisions"}],"predecessor-version":[{"id":811978,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/811969\/revisions\/811978"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media\/580048"}],"wp:attachment":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media?parent=811969"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=811969"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=811969"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=811969"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}