Sliding Right into Information Theory

Published 2018-12-03 in sections English, Math.

It’s hardly news any more, but it seems I have not blogged about my involvement last year with an interesting cryptanalysis project, which resulted in the publication Sliding right into disaster: Left-to-right sliding windows leak by Daniel J. Bernstein, me, Daniel Genkin, Leon Groot Bruinderink, Nadia Heninger, Tanja Lange, Christine van Vredendaal and Yuval Yarom, which was published at CHES 2017 and on ePrint (ePrint is the cryptographer’s version of arXiv).

This project nicely touched upon many fields of computer science: First we need systems expertise to mount a side-channel attack that uses cache timing difference to observe which line of a square-and-multiply algorithm the target process is executing. Then we need algorithm analysis required to learn from these observations partial information about the bits of the private key. This part includes nice PLy concepts like rewrite rules (see Section 3.2). Oncee we know enough about the secret keys, we can use fancy cryptography to recover the whole secret key (Section 3.4). And finally, some theoretical questions arise, such as: “How much information do we need for the attack to succeed?” and “Do we obtain this much information”, and we need some nice math and information theory to answer these.

Initially, I focused on the PL-related concepts. We programming language people are yak-shavers, and in particular “rewrite rules” just demands the creation of a DSL to express them, and an interpreter to execute them, doesn’t it? But it turned out that these rules are actually not necessary, as the key recovery can use the side-channel observation directly, as we found out later (see Section 4 of the paper). But now I was already hooked, and turned towards the theoretical questions mentioned above.

Shannon vs. Rényi

It felt good to shake the dust of some of the probability theory that I learned for my maths degree, and I also learned some new stuff. For example, it was intuitively clear that whether the attack succeeds depends on the amount of information obtained by the side channel attack, and based on prior work, the expectation was that if we know more than half the bits, then the attack would succeed. Note that for this purpose, two known “half bits” are as good as knowing one full bit; for example knowing that the secret key is either 01 or 11 (one bit known for sure) is just as good as knowing that the key is either 00 or 11.

Cleary, this is related to entropy somehow – but how? Trying to prove that the attack works if the entropy rate of the leak is > 0.5 just did not work, against all intuition. But when we started with a formula that describes when the attack succeeds, and then simplified it, we found a condition that looked suspiciously like what we wanted, namely H > 0.5, only that H was not the conventional entropy (also known as the Shannon entropy, H = −∑p ⋅ log p), but rather something else: H = −∑p², which turned to be called the collision entropy or Rényi entropy.

This resulted in Theorem 3 in the paper, and neatly answers the question when the Heninger and Shacham key recovery algorithm, extended to partial information, can be expected to succeed in a much more general setting that just this particular side-channel attack.

Markov chains and an information theoretical spin-off

The other theoretical question is now: Why does this particular side channel attack succeed, i.e. why is the entropy rate H > 0.5. As so often, Markov chains are an immensly powerful tool to answer that question. After some transformations, I managed to model the state of the square-and-multiply algorithm, together with the side-channel leak, as a markov chain with a hidden state. Now I just had to calculate its Rényi entropy rate, right? I wrote some Haskell code to do this transformation, and also came up with an ad-hoc, intuitive way of calculating the rate. So when it was time to write up the paper, I was searching for a reference that describes the algorithm that I was using…

Only I could find none! I contacted researchers who have published related to Markov chains and entropies, but they just referred me in circles, until one of them, Maciej Skórski responded. Our conversation, highly condendensed, went like this: “Nice idea, but it can’t be right, it would solve problem X” – “Hmm, but it feels so right. Here is a proof sketch.” – “Oh, indeed, cool. I can even generalize this! Let’s write a paper”. Which we did! Analytic Formulas for Renyi Entropy of Hidden Markov Models (preprint only, it is still under submission).

More details

Because I joined the sliding-right project late, not all my contributions made it into the actual paper, and therefore I published an “inofficial appendix” separately on ePrint. It contains

an alternative way to find the definitively knowable bits of the secret exponent, which is complete and can (in rare corner cases) find more bits than the rewrite rules in Section 3.1
an algorithm to calculate the collision entropy H, including how to model a side-channel attack like this one as a markov chain, and how to calculate the entropy of such a markov chain, and
the proof of Theorem 3.

I also published the Haskell code that I wrote for this projects, including the markov chain collision entropy stuff. It is not written with public consumption in mind, but feel free to ask if you have questions about this.

Note that all errors, typos and irrelevancies in that document and the code are purely mine and not of any of the other authors of the sliding-right paper. I’d like to thank my coauthors for the opportunity to join this project.

Comments

Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.

Go up