'Trojan Source' Attack Method Allows the Injection of Vulnerabilities Into Open-Source Code

Contents:

Unfortunately, by using this method the vulnerabilities cannot be detected by human reviewers.

To build vulnerable binaries, Trojan Source uses a simple method that does not need to alter the compiler.

Malicious actors can utilize the technology to target supply chains attacks because it works with some of the most frequently used computer languages today.

The “Trojan Source” class of attacks, that might compromise first-party software and supply chains, was revealed and demonstrated by the researchers from the University of Cambridge.

They illustrate how an attacker may target the encoding of source code files to introduce vulnerabilities in projects written in C, C++, C#, JavaScript, Java, Rust, Go, and Python.

The trick is to use Unicode control characters to reorder tokens in source code at the encoding level.
These visually reordered tokens can be used to display logic that, while semantically correct, diverges from the logic presented by the logical ordering of source code tokens.
Compilers and interpreters adhere to the logical ordering of source code, not the visual order.

Source

A threat actor can reorganize source code to modify its logic in a way that generates an exploitable vulnerability by employing control characters included in comments and strings.

We have discovered ways of manipulating the encoding of source code files so that human viewers and compilers see different logic. One particularly pernicious method uses Unicode directionality override characters to display code as an anagram of its true logic.

Source

Unicode controls for bidirectional text (e.g. LRI -left-to-right isolation, and RLI -right-to-left isolate) can be used to specify the direction in which the material is shown, according to the researchers. CVE-2021-42574 has been assigned to this technique.

LRI and RLI are invisible characters controlled by bidirectional (Bidi) controls, however, they are not the only ones.

Trojan Source attack - Bidi override characters

Source

A compiler can produce code that looks nothing like what a person sees by introducing these instructions.

An attacker might “create syntactically-valid source code in most current languages where the display order of characters displays reasoning that differs from the true logic” by injecting Unicode Bidi override characters into comments and strings.

The human eye will regard both functions as identical in a homoglyph Trojan Source attack, but the compiler will discriminate between the Latin “H” and the Cyrillic “H” and consider the code as having two separate functions, resulting in a different consequence.

The researchers point out that bidirectional (Bidi) override characters survive during copy/paste operation on most browsers, editors, and operating systems in a study [PDF] outlining the new Trojan Source attack mechanism.

The researchers put the Trojan Source assault to the test against a variety of code editors and web-based repositories that are often used in programming and discovered that it worked on a lot of them.

As explained by BleepingComputer the researchers found a few techniques that allow exploiting of the source code:

Early Returns – a genuine ‘return’ statement that will be hiding in a comment and that can cause a function to return earlier than it appears to
Commenting Out – able to human review by placing important code in a comment, therefore making the compiler or the interpreter disregard it
Stretched Strings – reverse-ordering the code in order to make it look like an outside a string literal

Rejecting the usage of control characters for text directionality in language specifications and compilers that implement the languages is one technique to fight against Trojan Source.

In most settings, this simple solution may well be sufficient. If an application wishes to print text that requires Bidi overrides, developers can generate those characters using escape sequences rather than embedding potentially dangerous characters into source code.

Source

Even though over two dozen software vendors are aware of the problem, various compilers are still unable to block the Trojan Source attack technique.

Because many maintainers have yet to apply a patch, the two researchers advise governments and businesses to identify their suppliers and put pressure on them to install the essential safeguards.

If you liked this article, follow us on LinkedIn, Twitter, YouTube, Facebook, and Instagram to keep up to date with everything we post.

Dora Tudor

Cyber Security Enthusiast

Dora is a digital marketing specialist within Heimdal™ Security. She is a content creator at heart - always curious about technology and passionate about finding out everything there is to know about cybersecurity.

CHECK OUR SUITE OF 11 CYBERSECURITY SOLUTIONS

The ‘Trojan Source’ Attack Method Allows the Injection of Vulnerabilities Into Open-Source Code