linux-kernel - Re: [PATCH v3] scsi: csiostor: Use kcalloc() instead of kzalloc()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c358208c5d4c823e3373aca4fe42998a6edd12fb.camel@HansenPartnership.com>
Date: Wed, 01 May 2024 10:39:02 -0400
From: James Bottomley <James.Bottomley@...senPartnership.com>
To: Kees Cook <keescook@...omium.org>, "Martin K. Petersen"
	 <martin.petersen@...cle.com>
Cc: Erick Archer <erick.archer@...look.com>, Bjorn Helgaas
 <bhelgaas@...gle.com>,  Justin Stitt <justinstitt@...gle.com>, "Gustavo A.
 R. Silva" <gustavoars@...nel.org>,  linux-scsi@...r.kernel.org,
 linux-kernel@...r.kernel.org,  linux-hardening@...r.kernel.org
Subject: Re: [PATCH v3] scsi: csiostor: Use kcalloc() instead of kzalloc()

On Mon, 2024-04-29 at 13:13 -0700, Kees Cook wrote:
> On Mon, Apr 29, 2024 at 02:31:19PM -0400, Martin K. Petersen wrote:
> > 
> > Kees,
> > 
> > > > This patch seems to be lost. Gustavo reviewed it on January 15,
> > > > 2024 but the patch has not been applied since.
> > > 
> > > This looks correct to me. I can pick this up if no one else snags
> > > it?
> > 
> > I guess my original reply didn't make it out, I don't see it in the
> > archives.
> > 
> > My objections were:
> > 
> >  1. The original code is more readable to me than the proposed
> >     replacement.
> 
> I guess this is a style preference. I find the proposed easier to
> read. It also removes lines while doing it. :)
> 
> >  2. The original code has worked since introduced in 2012. Nobody
> > has touched it since, presumably it's fine.
> 
> The code itself is fine unless you have a 32-bit system with a
> malicious card, so yeah, near zero risk.

Well, no actually zero: we assume plugged in hardware to operate
correctly (had this argument in the driver hardening thread a while
ago), but in this particular case you'd have to have a card with a very
high number of ports, which would cause kernel allocations to fail long
before anything could introduce an overflow of sizeof(struct csio_lnode
*) * hw->num_lns.

> >  3. I don't have the hardware and thus no way of validating the
> > proposed changes.
> 
> This is kind of an ongoing tension we have between driver code and
> refactoring efforts.

That's because we keep having cockups where we accept so called "zero
risk" changes to older drivers only to have people with the hardware
turn up months to years later demanding to know why we broke it.

Security is about balancing risks and the risk here of a malicious
adversary crafting an attack based on a driver so few people use (and
given they'd have to come up with modified hardware) seems equally
zero.

>  And this isn't a case where we can show identical binary output,
> since this actively adds overflow checking via kcalloc() internals.

Overflow checking which is unnecessary as I showed above.

> > So what is the benefit of me accepting this patch? We have had
> > several regressions in these conversions. Had one just last week,
> > almost identical in nature to the one at hand.
> 
> People are working through large piles of known "weak code patterns"
> with the goal of reaching 0 instances in the kernel. Usually this is
> for ongoing greater compiler flag coverage, but this particular one
> is harder for the compiler to warn on, so it's from Coccinelle
> patterns.

We understand the problem and we're happy to investigate and then
explain why something like this can't be exploited, so what's the issue
with adding it to the exceptions list given that, as you said, it's
never going to be compiler detected?

> > I am all for fixing code which is undergoing active use and
> > development. But I really don't see the benefit of updating a
> > legacy driver which hasn't seen updates in ages. Why risk
> > introducing a regression?
> 
> I see a common pattern where "why risk introducing a regression?"
> gets paired with "we can't test this code". I'm really not sure what
> to do about this given how much the kernel is changing all the time.

Well, it's a balance of risks, but given that there's zero chance of
exploitation of the potential overflow, it would seem that balance lies
on the side of not risking the regression.  I think if you could
demonstrate you were fixing an exploitable bug (without needing
modified hardware) the balance would lie differently.

> In this particular case, I guess all I can say is that it is a
> trivially correct change that uses a more robust API and more
> idiomatic allocation sizeof()s (i.e. use the sizeof() of what is
> being allocated, not a potentially disconnected struct name).

Which is somewhat similar to the statement other people made about the
strncpy replacement which eventually turned out to cause a problem.

James