develooper Front page | perl.perl5.porters | Postings from March 2022

Avoiding Source Code Spoofing

Thread Next
Karl Williamson
March 3, 2022 18:49
Avoiding Source Code Spoofing
Message ID:
[board image] 
has convened a group of experts in programming languages, tooling, and 
security to provide guidance and recommendations on how to better handle 
international text in source code, as well as providing code to help 

Recent reports have highlighted problems in the review of source code 
containing non-ASCII Unicode characters (the so-called “Trojan Source 
exploit”). A person reviewing a submission of source code could be 
fooled into thinking that the code was okay, when it was actually 
malicious. The basic problem occurs when the actual text is different 
from what the reader perceives it to be, based on what is displayed. 
This can result either from the presence of characters used in 
right-to-left scripts (such as Arabic or Hebrew) that can change the 
visual ordering of text, or from the presence of characters that look 
like others (also known as “confusables”).

The problems here are not solely a security issue: text with different 
writing directions or confusable characters can be hard to work with. 
Finding a solution here is important from both security and usability 
points of view. Developers of source code editors or compilers should 
not be required to have a deep knowledge of Unicode to provide good user 
experience and robust security mitigations.

Unicode’s mission is to allow everyone to use their own languages on 
computers and mobile devices. The above issues are part and parcel of a 
character set that covers all the writing systems of the world – and 
have been documented in the Unicode Standard since its very first 
version in 1991. Unicode’s past efforts have focused on misleading URLs 
and identifiers, and correct visual ordering of plain text. And while 
much of this material is relevant to source code, this group of experts 
will now collect, curate, and supplement that early documentation with 
concrete recommendations to support source code editors and compilers.

While it may seem that it is easiest to simply go back to limiting 
source code to only ASCII characters, ASCII-only environments make it 
much harder to write and maintain software that can be used all over the 
world – a fundamental requirement for modern software. Moreover, this 
approach disadvantages software developers who use languages other than 

More details on the source code spoofing issue, the proposed plan, and 
formation of this group are found in document L2/22-007R2 

/Over 144,000 characters are available for adoption 
<> to help the 
Unicode Consortium’s work on digitally disadvantaged languages/

[badge] <>

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About