This is a difficult project. The blog post seems to hint at reasonable feasibility, this stuff is hard! We build a less ambitious tool in the university lab: "ASTANA: Practical String Deobfuscation for Android Applications Using Program Slicing" [0].
Would advise to first read the reverse engineering related work. Genetic programming is just a technique best used when everything else has failed :-)
> Theoretically, we could also go for finding the semantically equivalent C code. However, last time I researched, checking semantic equivalency is a very complex problem. I think it was NP hard.
Already deciding whether two finite automata decide the same (regular) language is PSPACE-complete; it's undecidable for anything that can decide arbitrary context-free languages (which C programs can clearly do).
This is a difficult project. The blog post seems to hint at reasonable feasibility, this stuff is hard! We build a less ambitious tool in the university lab: "ASTANA: Practical String Deobfuscation for Android Applications Using Program Slicing" [0].
Would advise to first read the reverse engineering related work. Genetic programming is just a technique best used when everything else has failed :-)
[0] https://arxiv.org/pdf/2104.02612
Not exactly the same goal but in the ballpark, reminded me of the linux-libre patches (removes binary blobs and non-gpl code from the kernel)-
https://www.fsfla.org/ikiwiki/selibre/linux-libre/
> Theoretically, we could also go for finding the semantically equivalent C code. However, last time I researched, checking semantic equivalency is a very complex problem. I think it was NP hard.
Already deciding whether two finite automata decide the same (regular) language is PSPACE-complete; it's undecidable for anything that can decide arbitrary context-free languages (which C programs can clearly do).
I thought this would be an actual process, not a blog post with idle musings.
I'm having a really hard time parsing this title.
A dash between GPL and violated would probably help.