netdev - Re: Kernel oops with 6.14 when enabling TLS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z8cv9VKka2KBnBKV@casper.infradead.org>
Date: Tue, 4 Mar 2025 16:53:09 +0000
From: Matthew Wilcox <willy@...radead.org>
To: Hannes Reinecke <hare@...e.com>
Cc: Vlastimil Babka <vbabka@...e.cz>, Boris Pismenny <borisp@...dia.com>,
	John Fastabend <john.fastabend@...il.com>,
	Jakub Kicinski <kuba@...nel.org>, Sagi Grimberg <sagi@...mberg.me>,
	"linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>,
	"linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
	linux-mm@...ck.org, Harry Yoo <harry.yoo@...cle.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: Kernel oops with 6.14 when enabling TLS

On Tue, Mar 04, 2025 at 05:32:32PM +0100, Hannes Reinecke wrote:
> On 3/4/25 17:14, Matthew Wilcox wrote:
> > I thought we'd done all the work needed to get rid of these pointless
> > refcount bumps.  Turns out that's only on the block side (eg commit
> > e4cc64657bec).  So what does networking need in order to understand
> > that some iovecs do not need to mess with the refcount?
> 
> The network stack needs to get hold of the page while transmission is
> ongoing, as there is potentially rather deep queueing involved,
> requiring several calls to sendmsg() and friends before the page is finally
> transmitted. And maybe some post-processing (checksums,
> digests, you name it), too, all of which require the page to be there.
> 
> It's all so jumbled up ... personally, I would _love_ to do away with
> __iov_iter_get_pages_alloc(). Allocating a page array? Seriously?
> 
> And the problem with that is that it's always takes a page(!) reference,
> completely oblivious to the fact whether you even _can_ take a page
> reference (eg for tail pages); we've hit this problem several times now
> (check for sendpage_ok() ...).

Calling get_page() / put_page() on a tail page is fine -- that just
redirects to the head page.  But calling it on a slab never made any
sense; at best it gets you the equivalent of TYPESAFE_BY_RCU -- that is,
the object can be freed and reallocated, but the underlying slab will
not be reallocated to some other purpose.

> But that's not the real issue; real issue is that the page reference is
> taken down in the very bowels of __iov_iter_get_pages_alloc(), but needs
> to be undone by the _caller_. Who might (or might not) have an idea
> that he needs to drop the reference here.
> That's why there is no straightforward conversion; you need to audit
> each and every caller and try to find out where the page reference (if any)
> is dropped.
> Bah.
> 
> Can't we (at the very least) leave it to the caller of
> __iov_iter_get_pages() to get a page reference (he has access to the page
> array, after all ...)? That would make the interface slightly
> better, and it'll be far more obvious to the caller what needs
> to be done.

Right, that's what happened in the block layer.  We mark the bio with
BIO_PAGE_PINNED if the pincount needs to be dropped.  As a transitional
period, we had BIO_PAGE_REFFED which indicated that the page refcount
needed to be dropped.  Perhaps there's something similar that network
could be doing.