full-disclosure - Re: Two MSIE 6.0/7.0 NULL pointer crashes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f26cd0911001210812v7fa7a327m85722bb1094369f1@mail.gmail.com>
Date: Thu, 21 Jan 2010 17:12:17 +0100
From: Dan Kaminsky <dan@...para.com>
To: Michal Zalewski <lcamtuf@...edump.cx>
Cc: full-disclosure@...ts.grok.org.uk
Subject: Re: Two MSIE 6.0/7.0 NULL pointer crashes

On Thu, Jan 21, 2010 at 1:53 AM, Michal Zalewski <lcamtuf@...edump.cx> wrote:
>> Testing takes time.  That's why both Microsoft and Mozilla test.
>
> Testing almost never legitimately takes months or years, unless the
> process is severely broken; contrary to the popular claims,
> personally, I have serious doubts that QA is a major bottleneck when
> it comes to security response - certainly not as often as portrayed.

There are a lot of factors that go into how long it takes to run QA.
Here's a few (I'll leave out the joys of multivendor for now):

1) How widespread is the deployment?  A little while ago, Google had
an XSS on Google Maps.  An hour later, they didn't.  About a decade
ago, AOL Instant Messenger had a remote code execution vulnerability.
Eight hours later, they didn't.  Say what you will about
centralization, but it *really* makes it easier and safer to deploy a
fix, because the environment is totally known, and you have only one
environment.  There are a couple dimensions at play here:

a) How many versions do you need to patch?
b) How many different deployment classes are there?  If your
developers are making a bunch of enterprise assumptions (there's a
domain, there's group policy, there's an IT guy, etc) and the fix is
going to Grandma, let me tell you, something's not going to work
c) What's at stake?  Your D-Link router has very different deployment
characteristics than your Juniper router.

2) How complicated is the fix?  Throwing in a one-liner to make sure
an integer doesn't overflow is indeed relatively straightforward.  But
imagine an oldschool application drenched in strcpy, where you've lost
context of the length of that buffer five functions ago.  Or imagine
the modern browser bug, where you're going up against an attacker who
*by design* has a Turing complete capability to manipulate your object
tree, complete with control over time.  Or, worst of all, take a
design flaw like Marsh Ray's TLS renegotiation bug.  People are still
fiddling around with figuring out how to fix that bug right, months
later.  Complexity introduces three issues:

a) You have to fix the entire flaw, and related flaws.  We've all seen
companies who deploy fixes like "if this argument contains alert(1),
abort".  Yeah, that's not enough.
b) You have to not introduce new flaws with your fix -- complexity
doesn't stop increasing vulnerability just because you're doing a
security fix.
c) The system needs to work entirely the same after.  That means you
don't get to significantly degrade performance, usability,
reliability, or especially compatibility.  Particularly with design
bugs, other systems grow to depend on their presence.  No software
lives in a vacuum, so you have to actually _find_ these other pieces
of code, and make sure things still work.

3) How many people do you actually expect to deploy your patch?
There's this entire continuum between "only the other developers on
SVN", through "the people who call to complain", to "everybody who
clicks 'I accept the risk of patching'", to "my entire customer base
with zero user interaction whatsoever".  A patch with problems 0.005%
of the time is acceptable if 1000 people are deploying, but not if
1,000,000 people are deploying.  Note that security research is very
strongly correlated with deployment numbers, to the point that
vulnerability count is much more correlated with popularity than code
quality.  So you have this interesting situation where the more your
fix is pushed, the more scrutiny there will be upon it.

Now, you can consider these all excuses.  Believe me, QA people have
no shortage of guys who look down on them and their problems.  But
certainly different bugs have different characteristics, and assuming
that all things can be fixed in the same time window with the same
output quality is just factually untrue.  You might as well be
claiming the next version of HTML5 will include an <antigravity> tag
that will make your laptop float in the air and spin around.

There is a balancing act.  Years is, of course, ridiculous.  In many
situations, so too are a couple of weeks.  If the goal is to achieve
the best quality patches, then you want the issue _prioritized
heavily_, but not _on public fire_.  The latter encourages people to
skip testing, and you know of course what happens when you skip
testing?

You end up with Gigabit Ethernet drivers that can't actually handle
all frame lengths.  (Epic Intel find from Fabian Yamaguchi.  Wow.)

Responsible disclosure has its risks.  You really can be jerked around
by a company, *especially* one that hasn't experienced the weirdness
of an external security disclosure before.  But engineering realities
don't go away just because the suits finally got out of the way.
Testing matters.

_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/