With the proliferation of IoT devices come increased embedded security attacks. Historically, embedded system engineers have ignored device-layer security despite the many areas of embedded devices that are vulnerable to bugs. Serial ports, radio interfaces, and even programming/debugging interfaces can all be exploited by hackers. Fuzz testing represents an important venue available to engineers for finding weaknesses in embedded devices, and should be considered for hardening IoT device interfaces.
What is fuzz testing?
Fuzz testing is like the mythical million monkeys typing randomly to write Shakespeare. In practice, works of fiction require many random combinations to produce a simple phrase, but for embedded systems, we just need to change a few letters from a known good sentence.
Numerous commercial and open-source tools are available for implementing fuzz attacks. These tools generate strings of random bytes, also called fuzz vectors or attack vectors, and submit them to the interface being tested, keeping track of resulting behavior that could signify a bug.
Fuzz testing is a numbers game, but we cannot try an infinite number of possible inputs. Instead, we focus on optimizing test time by maximizing the rate of fuzz vector submission, the effectiveness of the fuzz vectors, and the bug detection algorithms.
Fuzz testing concepts
Because many fuzz testing tools were designed to test PC applications, it’s easier to adapt them if you run your embedded code as a natively compiled PC application. Running embedded code on a PC yields a huge performance advantage, but has two drawbacks. First, PC microprocessors do not react the same as embedded microcontrollers. Second, we must re-write any code that touches hardware. However, in practice, the advantages of running on a PC outweigh the disadvantages. The real barrier is the difficulty in porting code to compile natively on the PC.
How do we know when a fuzz vector triggers a bug? A crash is easy to spot, but it’s harder to identify fuzz vectors that cause a reset. Memory overflow bugs or stray pointer writes—the type of bugs most valuable to hackers—are almost impossible to discern from outside the system as they typically do not result in a crash or a reset.
Many modern compilers, such as GCC and Clang, have a feature called memory sanitization. This marks blocks of memory as either clean or dirty, depending on whether they are in use, and flags any attempt to access dirty memory. However, memory sanitization consumes flash, RAM, and CPU cycles, making it difficult to run on embedded devices. So, instead, we may test a subset of code, build a version of the device with more resources, or use a PC.
A test’s effectiveness can be evaluated by the amount of code exercised. Here too, compilers can track memory usage by employing bread crumb subroutine calls. The code coverage library maintains a table of usage values for each code path, incrementing them when the bread crumb executes.
However, code coverage numbers are tricky to interpret for embedded fuzz testing because much of the code is inaccessible to the fuzz vectors; for example, a device driver for a peripheral running independently of the interface. Therefore, it’s difficult to define “complete code coverage” for embedded systems—perhaps only 20% of the embedded code is accessible. Code coverage also consumes large amounts of flash, RAM, and CPU cycles and would need specialized hardware or a PC target to run.
When the fuzz test finds a vector that causes undesired behavior, we need detailed information. Where did the bug happen? What is the state of the call stack? What is the specific type of bug? All this information helps to triage and eventually fix the bug.
Bug triage is crucial in fuzz testing. New fuzz projects often find many bugs, and we need an automatic way to determine their severity. Also, fuzz bugs tend to be blocking bugs because they often mask additional bugs further down the code path. We need a quick workaround for issues as they arise during fuzz testing.
Embedded clients are not as willing to reveal their information as PCs. Usually, a crash will simply cause the device to reset and restart. While this is desired in the field, it erases the device’s state, making it difficult to learn whether a crash occurred, where or why it happened, or the code path taken. The engineer must find a consistent reproducing vector and then use a debugger to trace the bad behavior and find the bug.
In fuzz testing, a test may yield thousands of crash vectors for a few bugs, giving the false impression of a buggy system. It’s important to quickly determine which vectors are associated with the same underlying bug. For embedded devices, the location of the crash itself will typically be unique for the bug, and it’s usually not required to find the full call stack trace.
Continuous fuzz testing
Because of the stochastic nature of fuzz tests, running them for longer periods increases their chances of finding issues. But no project plan could absorb delays from a lengthy fuzz testing cycle at the end of development.
In practice, fuzz testing would begin on its own branch after the release process. Any newly-discovered bugs would be fixed in the local branch, so that the testing could continue without the new bugs blocking additional bug discovery. As part of the release cycle, bugs discovered from fuzz testing prior releases would be evaluated for inclusion in new releases. Finally, fuzz vectors that have discovered a bug should be added to normal quality assurance processes to verify the fix and to ensure these bugs aren’t inadvertently reintroduced into the code.
We should run fuzz tests of devices in different scenarios; for instance, a device responds to connection requests differently if networked. It’s impractical to run fuzz testing on every possible scenario, but we can include fuzz tests for each value of possible state. For example, run fuzz tests with each different device type while keeping other variables the same. Then run different values for another variable, such as network connectivity state, for one device type.
Fuzz testing architectures
Two prominent fuzz testing architectures are directed fuzzing, where fuzz vectors are specified by an engineer before the test, and coverage-guided fuzz testing, where the fuzz tool begins with an initial set of test vectors and automatically mutates them based on how well packets penetrate the code.
Additionally, not all code will run on a PC, and developing a PC simulator for an embedded application may be impractical, depending on what is being tested.
Below is a summary of four fuzz testing architectures:
- Direct interface testing on embedded hardware—running the normal production image on the embedded device with fuzz packets injected over the interface
- Packet (stack) injection testing—calling incoming packet routines directly without having to exercise the interface over the air
- Directed fuzzing with a simulator—using PC-based simulation techniques for developing and testing embedded code
- Coverage-guided fuzzing with a simulator (shown as Libfuzz below)
Multiple fuzz testers
After locking down an embedded device with debug interface lock and secure boot, we need to consider fuzz testing of the device’s interfaces. Many of the same tools and concepts used to secure web servers can be adapted for use with embedded devices.
Use the right tool for the job. Coverage-guided fuzzing is necessary for continuous fuzz testing, but if your code only executes on embedded hardware, directed fuzzers can be a good choice for providing some level of fuzz test coverage.
Finally, you should employ multiple fuzz testers in as many scenarios as possible, as each will test the device slightly differently, maximizing coverage and hence the security of your embedded device.
DeWitt C. Seward is a principal engineer at Silicon Labs.