lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 23 Sep 2022 17:13:42 +0100
From:   Al Viro <viro@...iv.linux.org.uk>
To:     Christoph Hellwig <hch@...radead.org>
Cc:     Jan Kara <jack@...e.cz>, John Hubbard <jhubbard@...dia.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Jens Axboe <axboe@...nel.dk>,
        Miklos Szeredi <miklos@...redi.hu>,
        "Darrick J . Wong" <djwong@...nel.org>,
        Trond Myklebust <trond.myklebust@...merspace.com>,
        Anna Schumaker <anna@...nel.org>,
        David Hildenbrand <david@...hat.com>,
        Logan Gunthorpe <logang@...tatee.com>,
        linux-block@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-xfs@...r.kernel.org, linux-nfs@...r.kernel.org,
        linux-mm@...ck.org, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 4/7] iov_iter: new iov_iter_pin_pages*() routines

On Fri, Sep 23, 2022 at 01:44:33AM -0700, Christoph Hellwig wrote:

> Why would I?  We generall do have or should have the iov_iter around.

Not for async IO.

> And for the common case where we don't (bios) we can carry that
> information in the bio as it needs a special unmap helper anyway.

*Any* async read_iter is like that.

> > Where are they getting
> > dropped and what guarantees that IO is complete by that point?
> 
> The exact place depens on the exact taaraget frontend of which we have
> a few.  But it happens from the end_io callback that is triggered
> through a call to target_complete_cmd.

OK...

> > The reason I'm asking is that here you have an ITER_BVEC possibly fed to
> > __blkdev_direct_IO_async(), with its
> >         if (iov_iter_is_bvec(iter)) {
> >                 /*
> >                  * Users don't rely on the iterator being in any particular
> >                  * state for async I/O returning -EIOCBQUEUED, hence we can
> >                  * avoid expensive iov_iter_advance(). Bypass
> >                  * bio_iov_iter_get_pages() and set the bvec directly.
> >                  */
> >                 bio_iov_bvec_set(bio, iter);
> > which does *not* grab the page referneces.  Sure, bio_release_pages() knows
> > to leave those alone and doesn't drop anything.  However, what is the
> > mechanism preventing the pages getting freed before the IO completion
> > in this case?
> 
> The contract that callers of bvec iters need to hold their own
> references as without that doing I/O do them would be unsafe.  It they
> did not hold references the pages could go away before even calling
> bio_iov_iter_get_pages (or this open coded bio_iov_bvec_set).

You are mixing two issues here - holding references to pages while using
iov_iter instance is obvious; holding them until async IO is complete, even
though struct iov_iter might be long gone by that point is a different
story.

And originating iov_iter instance really can be long-gone by the time
of IO completion - requirement to keep it around would be very hard to
satisfy.  I've no objections to requiring the pages in ITER_BVEC to be
preserved at least until the IO completion by means independent of
whatever ->read_iter/->write_iter does to them, but
	* that needs to be spelled out very clearly and
	* we need to verify that it is, indeed, the case for all existing
iov_iter_bvec callers, preferably with comments next to non-obvious ones
(something that is followed only by the sync IO is obvious)

That goes not just for bio - if we make get_pages *NOT* grab references
on ITER_BVEC (and I'm all for it), we need to make sure that those
pages won't be retained after the original protection runs out.  Which
includes the reference put into struct nfs_request, for example, as well
as whatever ceph transport is doing, etc.  Another thing that needs to
be sorted out is __zerocopy_sg_from_iter() and its callers - AFAICS,
all of those are in ->sendmsg() with MSG_ZEROCOPY in flags.

It's a non-trivial amount of code audit - we have about 40 iov_iter_bvec()
callers in the tree, and while many are clearly sync-only... the ones
that are not tend to balloon into interesting analysis of call chains, etc.

Don't get me wrong - that analysis needs to be done, just don't expect
it to be trivial.  And it will require quite a bit of cooperation from the
folks familiar with e.g. drivers/target, or with ceph transport layer,
etc.

FWIW, my preference would be along the lines of

	Backing memory for any non user-backed iov_iter should be protected
	from reuse by creator of iov_iter and that protection should continue
	through not just all synchronous operations with iov_iter in question
	- it should last until all async operations involving that memory are
	finished.  That continued protection must not depend upon taking
	extra page references, etc. while we are working with iov_iter.

But getting there will take quite a bit of code audit and probably some massage
as well.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ