I've developed an algorithm that I believe should theoretically improves upon a previous work. In the previous work, the authors have provided results for a baseline method and their own method. The authors however did not release any codes for the paper. As this work is in machine learning domain, I understand that it could be difficult to replicate their method's results due to a variety of reasons. However, the baseline method they've used is very simple and I am certain that I'm able to implement it correctly.
After running some experiments, it turns out that my method's result were not better than the numbers reported for their method. However my method was able to improve upon my own baseline numbers by a greater amount than their method against their baseline numbers. The problem lies in that my replication of the baseline results is different (worse) than their reported numbers for the baseline.
Now I am considering a few options for benchmarking my work:
- Ignore their reported results and implement their method by myself. Compare my method against my own implementation of their method
- Compare the difference-of-difference, i.e. how much my method improves upon my baseline numbers versus how much their method imporves upon their reported baseline numbers
- Just mention that I am unable to replicate their results due to lack of code, and present my method's results only
What would be the best option for benchmarking my method?