full-disclosure - Re: [WEB SECURITY] Unicode Left/Right Pointing Double Angel Quotation Mark bypass?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <005e01c9e709$04c21540$0e463fc0$@com>
Date: Sat, 06 Jun 2009 17:43:47 -0700
From: "Chris Weber" <chris@...abasec.com>
To: "'Arian J. Evans'" <arian.evans@...chronic.com>,
	"'Stephen de Vries'" <stephen@...steddelight.org>,
	"'Full-Disclosure'" <full-disclosure@...ts.grok.org.uk>,
	<websecurity@...appsec.org>
Subject: Re: [WEB SECURITY] Unicode Left/Right Pointing
	Double Angel Quotation	Mark bypass?

Here's a common pattern in software, especially transcoders, converters,
parsers, filters, the good stuff:

1. accept input from any charset (e.g. shift_jis, iso-8859-x)
2. convert input to Unicode for further processing
3. convert input back to its original charset for output/display

Now, lots can go wrong here, too much to get into right now, but in general,
what happens when there's no clear mapping between 1/2 or 2/3?  Quite often
characters will get mapped to the private use area (PUA) and at that point
it's up to the vendor/framework to handle it how they want.

But that's not normalization.  I just want readers to be clear that
'Normalization' in Unicode isn't a squishy 'maybe this maybe that' sort of
thing where characters and strings transform by the whim of a given
framework.  It's a well-defined standard of both 'compatibility' and
'canonicalization' decomposition and recomposition.  Some of the mappings
are data-driven, X == Y, and some are algorithmic.  If all frameworks
implement this standard, they should all normalize the same, differences
would occur when one framework uses an old version of the Unicode database,
or of course if they botched something in the spec.

Your discussion point #2 seems to digress, talking about the confusables and
lookalikes don't seem to lend to the original subject.  Unless, you're
suggesting that they somehow add to the canonicalization of strings that
White Hat is seeing?  If someone wants to study the 'best-fit' mappings
between charsets and Unicode, they're mostly available at
http://unicode.org/Public/MAPPINGS/.  Whether or not they've been
implemented that way by each vendor is another study.

-Chris




-----Original Message-----
From: arian.evans@...il.com [mailto:arian.evans@...il.com] On Behalf Of
Arian J. Evans
Sent: Friday, June 05, 2009 4:31 PM
To: Stephen de Vries; Full-Disclosure; websecurity@...appsec.org
Subject: Re: [WEB SECURITY] Unicode Left/Right Pointing Double Angel
Quotation Mark bypass?

response inline

On Thu, Jun 4, 2009 at 11:23 PM, Stephen de Vries
<stephen@...steddelight.org> wrote:
>
> Hi Arian,
>>
http://jeremiahgrossman.blogspot.com/2009/06/results-unicode-leftright-point
ing.html
>
> Was there a common library or framework in all the vulnerable sites that
was
> responsible for this?

Excellent question. Chris's response covers part of this, but I will
add below where I disagree with his 99% excellent response.


Long and short: Yes I think so, though Blind Black Box testing only
informs you is that the culprit is guilty, not *who* the culprit is.
:)

There are three classes of these issues, and they all occur for
different reasons. Chris already excellently addressed this,
ironically using three categories as well, though I label them a bit
differently:

1. Valid but Alternate Encodings that are normalized
2. Literal Transcodings that occur to avoid one issue (security,
false-familiar, name-collision, etc.) while creating a new
vulnerability
3. Interpreters Bugs that were truly unintended

---

1. Valid Alternate Unicode Encodings that are Normalized:

Some of these issues, like Fullwidth encoding, use valid and
legitimate Unicode representations that the software normalizes to a
canonical form. However uncommon and unexpected the encoding may be --
when you find these they tend to be broadly spread in an application,
and my speculation is that they are the result of a framework or
near-universally used library.

These are the most common issues. (though not in the dataset I
reported yesterday.) They tend to be "vendor" or "open source
framework" issues.


2. Transliteral transcodings (a=A because they look the same) including:

+ in Unicode terms "whole & mixed-script confusables"
+ in language terms "false-familiars"
+ as Chris described "best-fit mappings".

All names == the same. Which, incidentally, is the problem. :)

These are usually found in one specific location, usually specific to
a function or set of functions in an application, from what I see.

My further speculation is that they are the result of a specific
library (or emergent behavior due to a specific combination of
libraries) to facilitate a specific function in that location. Chris
addressed most of the "why" in his response, and I agree most of what
Chris said.

These are, to a degree, Unicode problems insomuch as I think their
recommendations cause some of them. To solve the "confusables" and
"false-familiars" problems that are leveraged for multiple types of
fraudulent and criminal activity (phishing, luring, fraud and stalking
on social networks) Unicode recommended a set of practices to minimize
or avoid name-collisions for Security's Sake:

Unicode Security Considerations
http://unicode.org/reports/tr36/

Unicode Security Mechanisms
http://unicode.org/reports/tr39/


These security recommendations lead to another set of security issues:
the increase in available characters that can be used to launch any
given type of syntax attack.

These are the next most-common issues we see of the 3 types.

These happen for a variety of reasons: sometimes custom coding, and
increasingly in Europe we see libraries and functions baked into
frameworks and packages to do these types of things, sometimes
transparently to the end-developer using the framework.


3. Bizarre Behavior By Interpreters:

And some are simply the result of interpreter bugs/unfathomable
behavior. None of the examples I have given so far are ones that I put
in the #3 category. Yosuke Hasegawa linked in Jeremiah's blog post on
this gives some really good examples of what I mean here.

Namely -- sometimes interpreters (browsers like IE) do really weird or
unsafe things for no clear good reason. I spend very little time
researching these as my focus is on web software and not compiled code
interpreters, so I/we/WhiteHat tend to only find these through
accident.

There is a fuzzy line here as some interpreter bugs allow you to
exploit an application with #3, but it's not my specialty. At WhiteHat
we have been heavily researching #1 and #2 and actually have a wealth
of exploit data to share. The problem was bigger than we thought so
have continued to expand the scope and range of our testing in these
areas, which sometimes allow you to stumble upon a #3 issue.

These are all simply my speculations...to be clear.

I do not feign expertise on this subject like I do when it comes to
motorcycles, mistresses, and martinis.

---
Arian Evans

----------------------------------------------------------------------------
Join us on IRC: irc.freenode.net #webappsec

Have a question? Search The Web Security Mailing List Archives: 
http://www.webappsec.org/lists/websecurity/archive/

Subscribe via RSS: 
http://www.webappsec.org/rss/websecurity.rss [RSS Feed]

Join WASC on LinkedIn
http://www.linkedin.com/e/gis/83336/4B20E4374DBA


_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/