bugtraq - Re: Thoughts and a possible solution on homograph attacks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <42400216.2065.24383521@localhost>
Date: Tue, 22 Mar 2005 11:31:34 +1200
From: Nick FitzGerald <nick@...us-l.demon.co.uk>
To: bugtraq@...urityfocus.com
Subject: Re: Thoughts and a possible solution on homograph attacks


Duncan Simpson wrote:

> Homograph attacks might be a closed subject but nobody has mentioned this, so
> maybe I should. Surely it is possible for a web browser to apply some similar
> character mapping rules and react only if it finds something.
> 
> Thus if the IDN looks like www.ebay.com on the screen the web browser will
> notice www.ebay.com exists, pop up a warning and deny access if you just click
> OK. An option safe from those who just click OK without reading anything could
> allow access to those websites. 
> 
> The best fix would be to stop the registry's granting homograph names to random
> people and revoking he existing ones with immediately effect but I do think
> this is within the power of bugtraq.

In fact, it seems that the Unicode folk are thinking about these issues 
and similar solutions.

   http://unicode.org/reports/tr36/

   Draft Unicode Technical Report #36

   Security Considerations for the Implementation of Unicode and
   Related Technology

   Summary

   This document describes security considerations that are important
   to be aware of when working with Unicode, and provides specific
   recommendations for dealing with the issues that arise.

   [ToC]

   1. Introduction

   Unicode represents a very significant advance over all previous
   methods of encoding characters. For the first time, all of the
   worlds characters could be represented in a uniform manner, for the
   first time making it feasible for the vast majority of programs to
   be globalized: built to handle any language in the world.

   In many ways, the use of Unicode makes programs much more robust and
   secure. When systems need to use a hodge-podge of different charsets
   for representing characters, it was possible to take advantage of
   differences between those charsets, or in the way in which programs
   converted to and from them.

   However, because Unicode contains such a large number of characters,
   and because it incorporates the varied writing systems of the world,
   incorrect usage can expose programs or systems to possible security
   attacks. This document describes some of the security considerations
   that should be taken into account by programmers, system analysts,
   standards-developers, and users.

   We anticipate that this document will grow over time, adding
   additional sections as needed. Initially, there are two areas that
   will be discussed: canonical representation and visual spoofing. For
   more information, see also the Unicode FAQ on Security Issues.

   Each section below presents a background information on the kinds of
   problems that can occur, then a list of specific recommendations for
   avoiding the problems.

This report discusses most of the issues raised in this thread, and 
often from a greater depth of understanding.  For example, several 
posters disagreed with my suggestion that "homograph" was the wrong 
term, but seemed to only be aware of the problems that could arise from 
Latin letters that have effective equivalents in non-Latin character 
sets, (necessarily) represented by different codepoints in Unicode.  
The examples provided in TR36, although still noted by its author as 
needing further samples, provide many other cases of clearly non-
equivalent, but visually confusable characters, especially when 
displayed at small-ish point/pixel sizes as is common in web browser 
address/location/status/etc bars.

   3. Visual Spoofing

   Visual spoofing is where a similarity in visual appearance fools a
   user, and causes him or her to take unsafe actions. This is not new
   to Unicode: it was possible to spoof simply with ASCII character:
   "inteI.com" for example, uses a capital I instead of an L. The
   infamous example here is of course "paypaI.com": 

      ... Not only was "Paypai.com" very convincing, but the scam
      artist even goes one step further. He or she is apparently
      emailing PayPal customers, saying they have a large payment
      waiting for them in their account.

      The message then offers up a link, urging the recipient to claim
      the funds. But the URL that is displayed for the unwitting victim
      uses a capital "i" (I), which looks just like a lowercase "L"
      (l), in many computer fonts. ...

                                             Beware the 'PaypaI' scam

   And the spoofs nowadays are pretty clever. One is an email that
   looks like it comes from a trusted source, like your bank. It even
   has an explicit disclaimer to not trust links in email, and directs
   you to copy text to your address bar in your browser. The text looks
   ok to you, so you won't realize that you are going to a completely
   different site, which is then set up to simulate your bank well
   enough to get your password.

   These spoofs depend on the use of visually confusable strings:

   D1.  Two different strings of Unicode characters are said to be
        visually confusable when their appearance in common fonts in
        small sizes at screen resolutions is sufficiently close that
        people easily mistake one for the other.  

I'll leave the rest of TR36 to those interested in reading it (it deals 
extensively with IDN-based spoofing), and recommend it despite that I 
still disagree with the term "homograph", which TR36 uses often.  All 
the definitions of "homograph" I've seen have it meaning "same spelling 
but different meaning and/or pronunciation and/or origin".  My 
suggestion of "pseudograph" much better captures the point of these 
things, which is that they are (deliberately) constructed "mis-
spellings" that falsely represent something they are not, or disguise 
that they are not what they represent themselves as (think of 
"pseudonym" as something of an analog).

Anyway, discussion of the terminology is probably less important than 
working out some useful ameliorative actions, so I'm going to drop the 
homograph vs. pseudograph issue and suggest anyone interested read TR36 
as it makes several suggestions and recommendations, but is a work in 
progress and may be amenable to change through your input...


Regards,

Nick FitzGerald