Cem makes the point that all the crypto and execution protection magic that ARM is building is limited by the question of what the human holding the phone thinks is going on. If a malicious
program app fakes up the UI, then it can get stuff from the human, and abuse it. This problem was well known, and was the reason that NT 3.51 got a “secure attention sequence” when it went in for C2 certification under the old Orange Book. Sure, it lost its NIC and floppy drive, but it gained Control-Alt-Delete, which really does make your computer more secure.
But what happens when your phone or tablet has a super-limited set of physical buttons? Even assuming that the person knows they want to be talking to the right program, how do they know what program they’re talking to, and how do they know that in a reliable way?
One part of an answer comes from work by Chris Karlof on Conditioned-safe Ceremonies. The essential idea is that you apply Skinner-style conditioning so people get used to doing something that helps make them more secure.
One way we could bring this to the problem that Cem is looking at would be to require a physical action to enable Trustzone. Perhaps the ceremony should be that you shake your phone over an NFC pad. That’s detectable at the gyroscope level, and could bring up the authentic payments app. An app that wanted payments could send a message into a queue, and the queue gets read by the payments app when it comes up. (I’m assuming that there’s a shake that’s feasible for those with limited motion capabilities.)
There are probably other conditioned-safe ceremonies that the phone creator could create, but Cem is right: indicators by themselves (even if they pass the white-hot parts COGs gauntlet) will not be noticed. If solution exists, it will probably involve conditioning people to do the right thing without noticing.