How to make your software tamperproof

This is a chapter excerpt from: Surreptitious Software: Obfuscation, Watermarking, and Tamperproofing for Software Protection. Inside you will learn about tamperproofing and securing your software application.

Book cover

To tamperproof a program is to ensure that it "executes as intended," even in the presence of an adversary who tries to disrupt, monitor, or change the execution. Note that this is different from obfuscation, where the intent is to make it difficult for the attacker to understand the program. In practice, the boundary between tamperproofing and obfuscation is a blurry one: A program that is harder to understand because it's been obfuscated ought to also be more difficult to modify! For example, an attacker who can't find the decrypt () function in a DRM media player because it's been thoroughly obfuscated also won't be able to modify it or even monitor it by setting a breakpoint on it.

The dynamic obfuscation algorithms in Chapter 6 (Dynamic Obfuscation), in particular, have often been used to prevent tampering. In this book, we take the view that a pure tamperproofing algorithm not only makes tampering difficult but is also able to detect when tampering has occurred and to respond to the attack by in some way punishing the user. In practice, tamperproofing is always combined with obfuscation:

  1. 1. If you both obfuscate and tamperproof your code, an attacker who, in spite of the tamperproofing, is able to extract the code still has to de-obfuscate it in order to understand it;
  2. 2. Code that you insert to test for tampering or affect a response to tampering must be obfuscated in order to prevent the attacker from easily discovering it.
About this book
This is a chapter excerpt from, Surreptitious Software: Obfuscation, Watermarking, and Tamperproofing for Software Protection.

About the authors:
Christian Collberg and Jasvir Nagra bring together techniques drawn from related areas of computer science, including cryptography, steganography, watermarking, software metrics, reverse engineering, and compiler optimization. Using extensive sample code, they show readers how to implement protection schemes ranging from code obfuscation and software fingerprinting to tamperproofing and birthmarking, and discuss the theoretical and practical limitations of these techniques.

Watermarking and tamperproofing are also related. In fact, if perfect tamperproofing were available, watermarking would be easy: Just watermark with any trivial algorithm, tamperproof, and by definition the attacker will not be able to destroy the mark! It's precisely because we don't have perfect tamperproofing that we need to worry about watermarking stealth: We have to assume that an attacker who can find a watermark will also be able to modify the program to destroy the mark.

The prototypical tamperproofing scenario is preventing an adversary from removing license-checking code from your program. So ideally, if the adversary is able to change your code on the left to the code on the right, the program would stop working for him:

One way of thinking about this is to note that the program consists of two pieces of semantics: the piece that both you and the adversary want to maintain (because it constitutes the core functionality of the program) and the piece that you want to maintain but that the adversary wants to remove or alter (the license check). Clearly, maintaining the core semantics and removing the checking semantics are both important to the adversary. If not, it would be easy for him to destroy the tamperproofing code: Simply destroy the program in its entirety! The adversary may also want to add code to the program in order to make it perform a function that you have left out, such as the print function in an evaluation copy of a program.

There are many kinds of invariants that you may want to maintain but that your user may want to violate. For example, a free PDF reader might let you fill in a form and print it but not save it for later use. This is supposed to act as an incentive for buying the premium product with more functionality. However, from the user's point of view it's also an incentive for hacking the code to add this "missing feature." Other products don't allow you to print, games don't provide you with an infinite supply of ammunition, evaluation copies stop working after a certain period of time, VoIP clients charge you money to make phone calls, DRM media players and TV set-top boxes charge you to watch movies or listen to music, and so on. In all these scenarios, someone's revenue stream is depending on their program executing exactly as they intended it to, and an adversary's revenue stream (or debit stream) depends on modifying the semantics of the program to execute the way they want it to.

While not technically a case of modifying the program, observing its execution to steal algorithms, cryptographic keys, or any other proprietary code or data from the program secrets is sometimes also considered a form of tampering. For example, a common application of tamperproofing is to protect cleartext data or the cryptographic keys themselves in a digital rights management system. If an attacker can insert new code (shown here shaded) in the media player, he will be able to catch and save the decrypted media:

Observing the player under a debugger can achieve the same result without actually modifying the code:

Here, we've set a breakpoint on the decrypt function such that, whenever we enter it, the cleartext media gets printed and execution continues. Technically, an attacker typically modifies the program with the intent to force it to choose a different execution path than the programmer intended. He can achieve this by:

  1. Removing code from and/or inserting new code into the executable file prior to execution;
  2. Removing code from and/or inserting new code into the running program;
  3. Affecting the runtime behavior of the program through external agents such as emulators, debuggers, or a hostile operating system.

A protected program can try to detect that it's running under emulation, on a modified operating system, or inside a debugger, but this turns out to be hard to do reliably. For example, how can you detect that a user has turned back the system clock so as not to trigger your "license-expired" check?We will show a few popular techniques, but in practice, tamperproofing algorithms tend to focus on making sure that the program's static code and data haven't been changed, even if this certainly isn't enough to ensure it's running properly.

Conceptually, a tamperproofing system performs two tasks. First, it monitors the program (to see if its code or data have been modified), and the execution environment (to see if this is hostile in any way). Second, once it has determined that tampering has occurred or is likely to occur, a response mechanism takes over and retaliates in a suitable manner. This can range from making the program exit gracefully to punishing the user by destroying his home directory. In simple systems, detection and response are tightly integrated, such as if (emulated()) abort(), but this makes it easy for an attacker to work backwards from the point of response to the point of detection and modify the program to bypass the tamperproofing code. It's therefore important to separate the checking and the response functions as widely as possible, both spatially (they should be far away from each other in the executable) and temporally (they should be far away from each other in the execution trace).

Typical software tamperproofing approaches rely on self-checking code, selfmodifying code, or adding a layer of interpretation. In general, these techniques are not suitable for type-safe distribution formats like Java bytecode—in Java it's simply not possible to stealthily examine your own code or generate and execute code on the fly. In this chapter, therefore, you will mostly see algorithms that protect binary executables. Some algorithms are based on the idea of splitting the program into two pieces, allowing most of the program to run without performance penalty in the clear (open to user tampering) and the remainder to run slower, but highly protected. The protected part could be run on a remote trusted machine, in a tamper-resistant hardware module such as a smartcard, or in a separate thread whose code has been heavily obfuscated.

This chapter is organized as follows. In Section 7.1_405, we give essential definitions. In Section 7.2_412, we present algorithms based on the idea of introspection, i.e., tamperproofed programs that monitor their own code to detect modifications. In Section 7.3_440, we discuss various kinds of response mechanisms. In Section 7.4_444, we cover so-called oblivious hashing algorithms that examine the state of the program for signs of tampering. In Section 7.5_453, we discuss remote tamperproofing, i.e., how we can determine that a program running on a remote machine has not been tampered with. In Section 7.6_464, we summarize the chapter.

7.1 Definitions

An adversary's goal is to force your program P to perform some action it wasn't intended to, such as playing a media file without the proper key or executing even though a license has expired. The most obvious way to reach this goal is to modify P's executable file prior to execution. But this is not the only way. The adversary could corrupt any of the stages needed to load and execute P, and this could potentially force P to execute in an unanticipated way. For example, he could force a modified operating system to be loaded; he could modify any file on the file system, including the dynamic linker; he could replace the real dynamic libraries with his own; he could run P under emulation; or he could attach a debugger and modify P's code or data on the fly.

Your goal, on the other hand, is to thwart such attacks. In other words, you want to make sure that P's executable file itself is healthy (hasn't been modified) and that the environment in which it runs (hardware, operating system, and so on) isn't hostile in any way. More specifically, you want to ensure that P is running on unadulterated hardware and operating systems; that it is not running under emulation; that the right dynamic libraries have been loaded; that P's code itself hasn't been modified; and that no external entity such as a debugger is modifying P's registers, stack, heap, environment variables, or input data.

In the following definition, we make use of two predicates, Id (P, E) and Ia (P, E), which respectively describe the integrity of the application (what the defender would like to maintain) and what constitutes a successful attack (what the attacker would like to accomplish):

Definition 7.1 (Tampering and Tamperproofing). Let Id (P, E) and Ia (P, E) be predicates over a program P and the environment E in which it executes. P is successfully tamperproofed if, throughout the execution of P, Id (P, E) holds. It is successfully attacked if, at some point during the execution of P, Ia (P, E)∧not Id (P, E), holds and this is not detectable by P.

For example, in a cracking scenario, Ia could be, "P executes like a legally purchased version of Microsoft Word," and Id could be, "The attacker has entered a legal license code, and neither the OS nor the code of P have been modified." In a DRM scenario, Ia could be, "P is able to print out the private key," and Id could be, "The protected media cannot be played unless a valid user key has been entered ∧ private keys remain private."

Conceptually, two functions, CHECK and RESPOND, are responsible for the tamperproofing. CHECK monitors the health of the system by testing a set of invariants and returning true if nothing suspicious is found. RESPOND queries CHECK to see if P is running as expected, and if it's not, issues a tamper response, such as terminating the program.

7.1.1 Checking for Tampering

CHECK can test any number of invariants, but these are the most common ones:

code checking: Check that P's code hashes to a known value:

result checking: Instead of checking that the code is correct, CHECK can test that the result of a computation is correct. For example, it is easy to check that a sorting routine hasn't been modified by testing that its output is correct:

Checking the validity of a computed result is often computationally cheaper than performing the computation itself. For example, while sorting takes O(n log n) time, checking that the output of a sort routine is in sorted order can be done in almost linear time. Result checking was pioneered by Manuel Blum [43] and has been used in commercial packages such as LEDA [145].

environment checking: The hardest thing for a program to check is the validity of its execution environment. Typical checks include, "Am I being run under emulation?", "Is there a debugger attached to my process?", and, "Is the operating system at the proper patch level?" While it might be possible to ask the operating system these questions, it's hard to know whether the answers can be trusted or if we're being lied to! The actual methods used for environment checking are highly system-specific [166].

As an example, let's consider how a Linux process would detect that it's attached to a debugger. As it turns out, a process on Linux can be traced only once. This means that a simple way to check if you're being traced is to try to trace yourself:

If the test fails, you can assume you've been attached to a debugger:

Dig Deeper on Topics Archive