lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1880793.1673257404@warthog.procyon.org.uk>
Date:   Mon, 09 Jan 2023 09:43:24 +0000
From:   David Howells <dhowells@...hat.com>
To:     Jens Axboe <axboe@...nel.dk>
Cc:     dhowells@...hat.com, Al Viro <viro@...iv.linux.org.uk>,
        Christoph Hellwig <hch@....de>,
        Matthew Wilcox <willy@...radead.org>,
        Logan Gunthorpe <logang@...tatee.com>,
        Christoph Hellwig <hch@...radead.org>,
        Jeff Layton <jlayton@...nel.org>,
        linux-fsdevel@...r.kernel.org, linux-block@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 7/7] iov_iter, block: Make bio structs pin pages rather than ref'ing if appropriate

Jens Axboe <axboe@...nel.dk> wrote:

> > A field, bi_cleanup_mode, is added to the bio struct that gets set by
> > iov_iter_extract_pages() with FOLL_* flags indicating what cleanup is
> > necessary.  FOLL_GET -> put_page(), FOLL_PIN -> unpin_user_page().  Other
> > flags could also be used in future.
> > 
> > Newly allocated bio structs have bi_cleanup_mode set to FOLL_GET to
> > indicate that attached pages are ref'd by default.  Cloning sets it to 0.
> > __bio_iov_iter_get_pages() overrides it to what iov_iter_extract_pages()
> > indicates.
> 
> What's the motivation for this change?

DIO reads in most filesystems and, I think, the block layer are currently
broken with respect to concurrent fork in the same process because they take
refs (FOLL_GET) on the pages involved which causes the CoW mechanism to
malfunction, leading (I think) the parent process to not see the result of the
DIO.  IIRC, the pages undergoing DIO get forcibly copied by fork - and the
copies given to the parent.  Instead, DIO reads should be pinning the pages
(FOLL_PIN).  Maybe Willy can weigh in on this?

Further, getting refs on pages in, say, a KVEC iterator is the wrong thing to
do as the kvec may point to things that shouldn't be ref'd (vmap'd or
vmalloc'd regions, for example).  Instead, the in-kernel caller should do what
it needs to do to keep hold of the memory and the DIO should not take a ref at
all.

> It's growing struct bio, which we can have a lot of in the system. I read
> the cover letter too and I can tell what the change does, but there's no
> justification really for the change.

The FOLL_* flags I'm getting back from iov_iter_extract_pages() can be mapped
to BIO_* flags in the bio.  For the moment, AFAIK, I think only FOLL_GET and
FOLL_PIN are necessary.  There are three cleanup types: put, unpin and do
nothing.

David

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ