full-disclosure - Re: [WEB SECURITY] Re: noise about full-width encoding bypass?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 22 May 2007 13:54:14 -0700
From: "Arian J. Evans" <arian.evans@...chronic.com>
To: "Brian Eaton" <eaton.lists@...il.com>, 
	"Web Security" <websecurity@...appsec.org>, 
	Full-Disclosure <full-disclosure@...ts.grok.org.uk>
Subject: Re: [WEB SECURITY] Re: noise about full-width
	encoding bypass?

comments inline

On 5/22/07, Brian Eaton <eaton.lists@...il.com> wrote:
>
> What surprises me is that not all codepage conversion libraries are
> doing the same thing with this data.  I've tested a few, and some of
> them are canonicalizing full-width unicode to ASCII equivalents, and
> others are not.  Where we run into trouble is where one component
> doing input validation uses one technique for canonicalization, and
> another component trying to do the actual work is using a different
> technique.  Figuring out exactly what different application platforms
> are doing would help to figure out how much of a problem this poses in
> the real world.
>
> Somebody ought to put together a test suite for this, just to see what
> different vendors have done.


Funny thing you should say that. :) That's one of the exact things we
are working on, but specifically from a "software defect with security
implications" perspective. What you really probably need are some
unit-test type suites that ram home a huge charset in different encoding
types and see what happens. I am focusing on testing a small subset
of that (focusing on metacharacter transforms primarily) across a lot of
software as efficiently as possible. Anyway...

This subject gets really confusing because people mean many different
things when they say "encoding attack" or "encoding bypass". The two
common meanings are:

1. Obfuscation of the attack through encoding, e.g.-slipping by the hall
monitor
(e.g-K2's polymorphic shellcode in the IDS/AV static string-match days,
etc.),

2. Evading very specific input filters: Taking an attack that will get
interpreted
or executed by a specific parser and encoding in a way to evade specific
input
filters, but so it will reach the parser in an interpretable state.

#2 gets confusing because people myopically focus on the parser/interpreter
that is the *target* of the attack, and debate that parser's ability to
execute a
given input encoding type... Which many have nothing to do with intermediary
functions & transforms performed on the attack data on the way to the target
parser. Things like canonicalization or normalizing data for full-text
searching
are examples of key intermediary transforms performed upon one's data.

This leaves the sub-points:

2.1 What parser are you targeting?
2.2 What encoding types will that parser interpret/execute?
2.3 What intermediary decoding/canonicalization steps will *all* software &
hardware
involved in the transaction a priori the target parser take?

Sub-point 2.3 is a real bear, eh? People under-estimate this one. I've
gotten
several direct inquiries from folks that usually ask some form of the
question:

:: Which of these is responsible for the issue? ::

+ Is it the client? (Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7)

+ Is it the protocol? (Content-Type: text/html; charset=utf-8;
charset=iso-8859-1, etc)

+ Is it the web server? (IIS Hex URL & Unicode decode/double-decode issues)

+ Is it the framework? (.NET's Hex canonicalization issue from 2005)

+ Is it the language? (glued together open source PHP crap; huge monolithic
J2EE projects)

+ Is it our custom code? (insert random canonicalization library, add
/random canonicalization step to your software for situational normalization
issue you run into...but make it global for all data passing through those
functions)


A: Yes, Yes, Yes, etc.

One part of this, then, is clearly defining your target.

The other part is evaluating the transforms performed on your data, and
the transforms & canonicalization your software is *capable* of. We can
directly deduce this in some situations, I believe, given a valid data type
and the ability to correlate output, but in some cases where we are
targeting
a parser internal to the system (e.g.-SQL interpreter) this will have to be
inferred by some state change, or context change, which is going to be
very difficult to do in an automated fashion with any sense of reliability.

But, definitely, that problem is being worked on.

I think this is a classic case where run-time black-box analysis is
essential.

There is simply no way a source code or controls audit or binary analysis
is going to find the majority of issues in this case (when evaluating
real-world
production software deployments), because they are usually the result of
emergent behaviors of complex, glued-together systems with many different
components (including even things like firewalls/IPS that may "fix" or
"re-code"
protocols in transit, etc., assuming they really even understand the
protocol).

-- 
Arian Evans
software security stuff

"Diplomacy is the art of saying "Nice doggie" until you can find a rock." --
Will Rogers

Content of type "text/html" skipped

_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/