1

I come across this recent paper "Discriminating Traces with Time", which proposed a program analysis technique based on machine learning, with application to security (side-channel attack). Some information:

  • This paper was published in TACAS 2017, which is a very good conference in program analysis, but most (if not all) people have no idea about security or machine learning.
  • The second and third authors are professors with strong publication records in program analysis, but I couldn't find a single paper of theirs related to machine learning before this one. So I guess this is all the work of the first author, who is a PhD student.

There are 3 reasons I think this is research misconduct:

1) In page 14, "Threats to Validity.", it was written as the following:

Clearly, the most significant threat to validity is whether these programs are representative of other applications. To mitigate, we considered programs not created by us nor known to us prior to this study.

This is a lie.

In the same page, immediately previous this paragraph is the discussion about the case study TextCrunchr. The authors provides 4 types of text inputs, and one of the four, reversed-shuffled arrays of characters, is the one that lead to worst-case behavior that the tool needs to detect. They even admitted that:

Although the input provided to Discriminer for analyzing TextCrunchr include carefully crafted inputs

How did they craft the input if they hadn't known the program prior to that study? By their own words, this is the most significant threat to validity. So the conclusion is this is not a valid method?


2) The experiments are inadequate.

The TextCrunchr case study above was said to be taken from a DARPA program. Searching with keywords "DARPA" and "TextCrunchr" leads me to another paper which demonstrated on exactly the same case study.

Symbolic Complexity Analysis Using Context-Preserving Histories. ICST 2017.

However, that paper used some sophisticated analysis to derive the inputs with reversed-shuffled arrays of characters instead of assuming they are given.

Another case study from DARPA discussed in the paper is SnapBuddy. The authors claimed that they discovered a vulnerability with their tool at the beginning of page 14.

...What leaks is thus the number of 1s in the private key.

I was surprised. This case study is about modular exponentiation (modpow). It is well-known for decades that several implementations of modpow have timing channel that leaks all the private key. While in this case study, knowing the number of 1s is practically useless. So again I searched for DARPA and SnapBuddy, and managed to find the source code of this case study in the appendix of another paper.

This confirmed my guess, there is a vulnerability in the method standardMultiply, which can leak all the private key. The authors of the machine learning paper had fail to recognize.

So the tool is useless even with crafted inputs.


3) This point may be controversial. I don't think this approach will ever work in practice. For example, the TextCrunchr takes text as input. So assuming all the text are ASCII, and only has 11 characters, enough to store "HelloWorld!". The state space is 2^88. How do sampling work in this input space? The encryption in SnapBuddy is more than 1500 bytes, i.e. more than 2^1500 in the input space.

Amazingly, the authors managed to discovered the vulnerability by sampling a couple of thousands. How can I believe their results?


With the facts I gave above, is this the case of data manipulation? What should I do?

To be honest, I'm very upset that a paper with this quality could get into TACAS.

John Doe
  • 19
  • 1
  • As should probably be clear from my answer, I really don't think this is "Primarily Opinion-based" and indeed has a pretty clear answer. – Fomite Sep 21 '17 at 16:57

1 Answers1

13

To address one of your points:

How did they craft the input if they hadn't known the program prior to that study? By their own words, this is the most significant threat to validity. So the conclusion is this is not a valid method?

They said "Prior to the Study" - meaning before this research began. Essentially, that they weren't using a set of fixed tools that they had on hand (and thus could presumably design something to be specifically robust to). That meant they went out and found something new - and then used it. Using it can include things like crafting the input.

Your other objections are in the realm of "Do I think this is a good paper or not?" not research misconduct or data manipulation.

Fomite
  • 51,973
  • 5
  • 115
  • 229