full-disclosure - Re: [WEB SECURITY] Unicode Left/Right Pointing Double Angel Quotation Mark bypass?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <005c01c9e5ab$5e5cd400$1b167c00$@com>
Date: Fri, 05 Jun 2009 00:00:53 -0700
From: "Chris Weber" <chris@...abasec.com>
To: "'Arian J. Evans'" <arian.evans@...chronic.com>,
	"'Prasad Shenoy'" <prasad.shenoy@...il.com>
Cc: 'Full-Disclosure' <full-disclosure@...ts.grok.org.uk>,
	'3APA3A' <3APA3A@...urity.nnov.ru>, websecurity@...appsec.org
Subject: Re: [WEB SECURITY] Unicode Left/Right Pointing
	Double Angel Quotation	Mark bypass?

Two patterns in Unicode account for these behaviors:



  1. Normalization (less of what's happening here) and

  2. best-fit mappings (most of what's happening here)



The first is a true Unicode standard, the second is a loosely defined set of
mappings provided as a convenience to software vendors.  In fact, best-fit
mappings are mostly a vendor problem, and arguably not Unicode's issue at
all.



The characters White Hat found in their study are a mix of things.  Only two
of the characters you listed have Normalization mappings in Unicode, which
suggests most weren't normalized by some API in the stack.  In fact, you can
always count on the full width Latin characters having normalization
mappings, because they all do.



U+FF1C and U+FF1E ＜＞ normalize to < >, but only using the two
'compatibility' decomposition forms NFKC and NFKD.



In any browser, click the following link containing full-width Latin
characters, and you'll see they all get transformed to their ASCII
equivalents.  That's because IDNA calls for Normalization form KC which all
browsers implement in URL/IRI handling.  It's useful for a quick n dirty
Normalization test.



http:// <http://ＡＢＣ１２３.nottrusted.com> ＡＢＣ１２３.nottrusted.com



The other characters you guys found point to some other things too, but not
Normalization.



U+00AB and U+00BB have no direct mappings suggested for U+003c and U+003E,
so this may be a case where the Web-app has implemented its own mapping.
Same for U+27E8 and U+27E9.  It's sort of uncommon for some developer to go
to that extreme, but I've seen it done too.



All of the others, like U+3008 to U+2039 have best-fit mappings suggested in
the Unicode documentation, located at
http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252
.txt. It's important to realize that those mappings are not a standard,
they're provided as a convenience.  Chasing these characters down can prove
useful - you've found stuff, I've found stuff - but as you're well aware
it's more complicated than that.  Vendors can implement best-fit mappings
however they want.



In fact, many of the major frameworks do implement things differently,
including ICU, Java, .Net, and some of the native Windows libraries.  This
happens unbeknownst to many developers as strings get transformed along the
stack between API's that use wide chars and others that use native chars.
It also happens when:



  1. a given character doesn't have a direct mapping

  2. when it's been transcoded to a different character set, or

  3. just because the API's design chose to behave that way - see
http://msdn.microsoft.com/en-us/library/dd374047(VS.85).aspx and
http://msdn.microsoft.com/en-us/library/ms404377.aspx.



Good finds, fun fun,

Chris



PS. The upcoming release of our Watcher security testing tool includes
detection of character best-fit mappings in Web-apps.





-----Original Message-----
From: arian.evans@...il.com [mailto:arian.evans@...il.com] On Behalf Of
Arian J. Evans
Sent: Thursday, June 04, 2009 4:42 PM
To: Prasad Shenoy
Cc: 3APA3A; Full-Disclosure; websecurity@...appsec.org
Subject: Re: [WEB SECURITY] Unicode Left/Right Pointing Double Angel
Quotation Mark bypass?



On Thu, Jun 4, 2009 at 4:22 PM, Prasad Shenoy <prasad.shenoy@...il.com>
wrote:

> Has %uff1c %uff1e become very common?



We have seen 44 sites in the last year at WhiteHat Security that were

vulnerable to Fullwidth unicode-encoded attacks. This one tends to be

more ubiquitous than others when you find it. In the applications weak

to this -- we found roughly 200 locations vulnerable to attack in

those 44 applications, and each location would have multiple inputs,

so you are probably talking 1,000+ inputs vulnerable to attack using

this encoding.



> I have found a few places where these

> are still exploitable. Sometime in the coming week I will post my

> observation from one particular encounter of this vulnerability to get
some

> responses on what, why and how it is happening.





Interesting. I did some research here too, and found a new Unicode

standard that I think might be a culprit.



I won't be posting any more of the data in this thread. There is

simply too much of it



Jeremiah will be posting some of it at his blog below, and ultimately

there needs to be a good paper on canonicalization. None has yet been

written for the web world. The VXer crowd went through this in the

90's with all of their encoding-evasion techniques for viruses, and

then K2's Polymorphic Shell Code tool brought similar concepts to

payloads delivered across network protocols.



Now the same notions of multiple representations and re-assemblies of

data, in this case to form exploits, is rearing its head in the

webappsec world. Nothing is new under the sun. :) Attackers already

use encoding in the wild for SQL injection, and at least one XSS I

have seen.



Probably 50% of the encoding techniques I know of that can be

leveraged to form attacks -- I cannot even find documented. So I know

our community has some large knowledge gaps on this subject at the

moment and needs more work here.



-ae







> This email gave a good head start.....

>

> Cheers,

> Prasad Shenoy

>

> On Thu, Jun 4, 2009 at 6:10 PM, Arian J. Evans
<arian.evans@...chronic.com>

> wrote:

>>

>> Hello 3APA3A -- Remember this thread you started 2 years ago? Long

>> Time no discussion on this topic... :)

>>

>> Turns out you were spot-on. We verified six different variants of

>> this. Jeremiah Grossman published details on his blog:

>>

>>

>>
http://jeremiahgrossman.blogspot.com/2009/06/results-unicode-leftright-point
ing.html

>>

>> It is important to note that when you read the number counts that say:

>>

>> 11 exploitable XSS in 8 websites:

>> %u00ABscript%u00BB

>>

>> The count of "11" is "11 /path/ locations or forms in a web

>> application", not "11 vulnerable inputs". The location might be a .cgi

>> or a servlet, with 1 or dozens of inputs in that same location that

>> are all "vulnerable" to the same attack technique.

>>

>> (We call the individual inputs "attack vectors" instead of

>> "vulnerabilities" to help people group them and make them more

>> actionable. e.g.-people usually don't go fix one input, but instead

>> fix the CGI, servlet, form-input/request-handler and all the

>> associated inputs at once. So reporting each input individually

>> doesn't provide any benefit besides make reports bigger.)

>>

>> Anyway, there are many more of these kind of

>> false-familiar/transliteral transcoding and canonicalization issues.

>>

>> I will continue to feed anything interesting to Jeremiah and it will

>> probably wind up on his blog.

>>

>> Thanks again for opening my mind up to some new angles for

>> filter-evasion tricks! :)

>>

>> ciao

>>

>> --

>> Arian Evans

>> I invest most of my money in motorcycles, mistresses, and martinis.

>> The rest of it I squander.

>>

>>

>>

>>

>> On Tue, May 22, 2007 at 9:52 AM, Arian J. Evans <arian@...chronic.com>

>> wrote:

>> >

>> > I'll let you know if this hits. I am running this test currently on

>> > about 600 + sites.

>> >

>> > -ae

>> >

>> > On 5/22/07, 3APA3A < 3APA3A@...urity.nnov.ru> wrote:

>> >>

>> >> Dear full-disclosure@...ts.grok.org.uk,

>> >>

>> >>   By  the  way:  I saw Unicode Left Pointing Double Angel Quotation

>> >> Mark

>> >>   (%u00AB) / Unicode Right Pointing Double Angel Quotation Mark

>> >> (%u00BB)

>> >>   are  sometimes  translated  to '<' and '>'. Does somebody

>> >> experimented

>> >>   with

>> >>

>> >>   %u00ABscript%u00BB

>> >>

>> >>   in different environments to bypass filtering in this way?

>> >>

>> >> --

>> >> http://securityvulns.com/

>> >>          /\_/\

>> >>         { , . }     |\

>> >> +--oQQo->{ ^ }<-----+ \

>> >> |  ZARAZA  U  3APA3A   } You know my name - look up my number (The

>> >> Beatles)

>> >> +-------------o66o--+ /

>> >>                     |/

>>

>>

>>
----------------------------------------------------------------------------

>> Join us on IRC: irc.freenode.net #webappsec

>>

>> Have a question? Search The Web Security Mailing List Archives:

>> http://www.webappsec.org/lists/websecurity/archive/

>>

>> Subscribe via RSS:

>> http://www.webappsec.org/rss/websecurity.rss [RSS Feed]

>>

>> Join WASC on LinkedIn

>> http://www.linkedin.com/e/gis/83336/4B20E4374DBA

>>

>

>

>

> --

> Thought for the day -

> "Emails can hurt feelings. If this one did, please ignore your feelings."

>



----------------------------------------------------------------------------

Join us on IRC: irc.freenode.net #webappsec



Have a question? Search The Web Security Mailing List Archives:

http://www.webappsec.org/lists/websecurity/archive/



Subscribe via RSS:

http://www.webappsec.org/rss/websecurity.rss [RSS Feed]



Join WASC on LinkedIn

http://www.linkedin.com/e/gis/83336/4B20E4374DBA


Content of type "text/html" skipped

_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/