He, along with a stellar group of cross-disciplinary colleagues, are bridging the gap with CHET , a compiler and runtime for homomorphic evaluation of tensor programs , that keeps data private while making the complexities of homomorphic encryption schemes opaque to users. Saarikivi tells us all about CHET, gives us an overview of some of his other projects, including Parasail , a novel approach to parallelizing seemingly sequential applications, and tells us how a series of unconventional educational experiences shaped his view of himself, and his career as a researcher.
So, how do you even evaluate a neural network model on top of homomorphic encryption?
Which is a thing you need to be able to do before you can actually do training. So, what CHET is doing is, it is building a complier for homomorphic encryption that automates many of these concerns that we would otherwise have to deal with by hand. He, along with a stellar group of cross-disciplinary colleagues, are bridging the gap with CHET, a compiler and runtime for homomorphic evaluation of tensor programs, that keeps data private while making the complexities of homomorphic encryption schemes opaque to users.
Saarikivi tells us all about CHET, gives us an overview of some of his other projects, including Parasail, a novel approach to parallelizing seemingly sequential applications, and tells us how a series of unconventional educational experiences shaped his view of himself, and his career as a researcher. That and much more on this episode of the Microsoft Research Podcast.
Olli Saarikivi: Yes. But for now, I want to start kind of broad strokes. Olli Saarikivi: So, currently, all of my projects are, in some way or the other, about performance. And, um, well there was that privacy preserving thing…. Olli Saarikivi: …preserving privacy more accessible, while giving them a good performance when doing that. Host: Okay. But your most recent work has shifted to the thing you just mentioned: performance.
And so, I want to know what you mean by that and what prompted the pivot from your interest in testing and verification to performance. Olli Saarikivi: Yeah. It is a very broad term. So, indeed my background is in program analysis topics, like symbolic execution, software verification, that kind of stuff.
And really, we need to start thinking about these problems in like a very domain specific way or what are the like specific constraints of what are we compiling to. On a very high level, it looks a lot like one of these accelerators. You get things like a very constrained programming model, weird performance constraints and all of these kind of low-level details that a typical developer has a hard time grappling with. Olli Saarikivi: …a bit of a higher-level language. So, is that how you would define your target audience, is developers? Now, the thing is that that developer probably will not be a cryptographer who is like intimately familiar with all the details of homomorphic encryption, but they have all of this domain-specific knowledge for their own domain.
Olli Saarikivi: And now we want to enable them to effectively use tools from homomorphic encryption in their own domain without burdening them with all of the crypto-specific details. Host: So, they can be using the tools, but not being experts in the science behind the tools. What are the technical underpinnings, motivation, rational, and what do you hope the outcomes will be?
Olli Saarikivi: So, this is work that I did during my two internships at Microsoft Research before I became a post-doc. So, again, the point here is to make performance accessible without having to kind of go in and do all the low-level details yourself. So, you have a kind of like many stages when you process input into output. And a nice way to write these kinds of programs is to write it as separate stages. One reason is that, if you write it as a separate stages, you typically get some kind of buffering in between the stages.
Sometimes buffering is good, but typically, you would get kind of excessive buffering…. Olli Saarikivi: …if you just write it in these small stages that you compose together. But if you now compose it with some component that is actually correct, and guaranteed to produce properly formatted data, then all of those defensive checks in these latter components are unnecessary.
Like, you should just remove them. So, what we do instead is that we actually compile all of these stages separately into this model of computation called symbolic transducers, which is very suited to representing stream computations. And the nice thing about these is that we have a fusion operator defined, which allows us to combine many of these stages as symbolic transducers, into one big symbolic transducer. Olli Saarikivi: …happening inside there. And then when we generate code for this, we can actually get some very efficient code that does these inter-stage optimizations and removes buffering and stuff like that.
Olli Saarikivi: So, the targets for this kind of thing is mainly when you are actually dealing with enough data that throughput matters. So, typical things might be cloud query applications, so we were actually looking at an internal database system for integrating this into systems where you are already burning like lots of computational power and like running queries against your system and you want to reduce that to a lower level to save money. Host: In computation and other areas. So, are there other areas like in the regex field…? Yeah, so that is actually a direction we took this project.
So, we actually looked at doing regular expression matching using theory based on symbolic automata, which is actually important because regular expressions are not just over some small alphabets. Olli Saarikivi: But symbolic automata allow you to deal with Unicode and larger alphabets, which is the reality in dealing with strings and doing pattern matching these days.
And, yeah, that was actually a very fruitful line of work we are…. Olli Saarikivi: …currently beating RE2, which is this well-known kind of default library that people go for…. And how does it defy conventional wisdom? So, Parasail is a line of research that is concerned with parallelizing seemingly sequential computations. But it actually turns out that you can do symbolic execution on each of these stages, given a concrete input, and by doing symbolic executions such that you kind of assume that the starting state is unknown, you can do some pre-computation based on that input.
And this allows you to parallelize a lot of the computation before you actually have to do this final sequential step of stringing together these individually executed steps.
For example, doing large aggregated queries over cloud-scale databases, which have data split onto terabytes and terabytes of data style things. And then, obviously, machine learning. So, we were actually looking at parallelizing streaming computations represented as symbolic transducers. Olli Saarikivi: …that I was mentioning. But it would also be useful to parallelize that. And now the idea is to do symbolic execution on a symbolic transducer to do the parallelization. And this is why I find this project especially interesting. The meta-level idea of this kind of parallelization has been very fruitful, and when we look at specific problems, for example, this stream processing stuff or machine learning stuff, you end up with very different instantiations of the same idea.
So, we are kind of getting a lot out of this very simple idea just by applying it to different domains. Olli Saarikivi: And to be clear, it is a lot of work to apply it to a new domain, but it kind of sets the framework for the research. So, it actually turns out that many existing encryption schemes are slightly homomorphic with respect to some operations.
View Compilers Research Papers on terntergbereat.tk for free. View Compiler Design Research Papers on terntergbereat.tk for free.
So, it has the property that, if you encrypt an integer A, using RSA and you get a ciphertext for A, and then you encrypt an integer B, also with RSA, so now you have two ciphertexts…. Olli Saarikivi: …so, you can multiply these two ciphertexts together. So, RSA has a special homomorphic property that if you multiply two ciphertexts together, you get a new ciphertext that is the encryption of what the multiplication of A and B would have been.
So, what, in effect, you have done is that you have done computation on encrypted values. So, homomorphic encryption is a form of encryption that allows you to do computation on encrypted data, without having read-access to the data. For example, if you want to evaluate a polynomial, you need both multiplication and addition. And that is actually the hard part for the cryptographers to arrive at. So, the first homomorphic encryption scheme that supported both operations, and could be called fully homomorphic, was introduced ten years ago.
And the encryption schemes have a come a long way since then. So, now we have encryption schemes that support both addition and multiplication of encrypted integers. The thing is that it is still a bit slower than normal computation, but the great thing about it is that it gives you a trust model that really nothing else can. Host: I want to do a little bit of a detour and bring the issue of privacy front and center because good artificial intelligence requires data and a great deal of data is stuff that we gather from what we do on the internet or put in the cloud, and without certain safeguards, like homomorphic encryption, generally things are not private, right?
What is Private AI and how can it help us? And it is very broad because there are lots of very different kinds of privacy concerns. So, if we take, for example, homomorphic encryption, what homomorphic encryption allows you to do is you can make some parts of your data encrypted. So, as a privacy concern, it addresses the concern of your data being leaked as you hand it off to someone else.
It has to learn something about your data. And this is a form of privacy that is addressed by something called differential privacy.
And, as a technique, this is completely orthogonal to homomorphic encryption and it addresses a concern that is very orthogonal to what homomorphic encryption can address. Olli Saarikivi: Making people realize, what are the actual implications of giving out their data or using it in training?
Host: I love it. So, this is actually a project that got started with this Parasail research that I was talking about previously. So, it actually turns out that doing parallelization helps with homomorphic encryption because homomorphic encryption behaves better with parallel computations than serial computations. So, this is how we got into working with homomorphic encryption in the first place. So, how do you even like evaluate a neural network model on top of homomorphic encryption?
Olli Saarikivi: For example, it selects encryption parameters automatically based on the program you want to run.
Our test cases are on average two orders of magnitude smaller than the state-of-the-art, require 3. Our random program generator, comprising only lines of code, took 12 hours to train for OpenCL versus the state-of-the-art taking 9 man months to port from a generator for C and 50, lines of code.
With 18 lines of code we extended our program generator to a second language, uncovering crashes in Solidity compilers in 12 hours of automated testing. Analyzing 30 PB of traces over 90 days we devised a new iterative black-box capacity planning model using the discovered relationships between workload, utilization, and quality.