lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <303059e6-3a33-99cb-2952-82fe8079fa45@nvidia.com>
Date:   Fri, 25 Feb 2022 16:41:14 -0800
From:   John Hubbard <jhubbard@...dia.com>
To:     Theodore Ts'o <tytso@....edu>
Cc:     Eric Biggers <ebiggers@...nel.org>,
        Lee Jones <lee.jones@...aro.org>, linux-ext4@...r.kernel.org,
        Christoph Hellwig <hch@....de>,
        Dave Chinner <dchinner@...hat.com>,
        Goldwyn Rodrigues <rgoldwyn@...e.com>,
        "Darrick J . Wong" <darrick.wong@...cle.com>,
        Bob Peterson <rpeterso@...hat.com>,
        Damien Le Moal <damien.lemoal@....com>,
        Andreas Gruenbacher <agruenba@...hat.com>,
        Ritesh Harjani <riteshh@...ux.ibm.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Johannes Thumshirn <jth@...nel.org>, linux-xfs@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, cluster-devel@...hat.com,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH -v3] ext4: don't BUG if kernel subsystems dirty pages
 without asking ext4 first

On 2/25/22 15:21, Theodore Ts'o wrote:
...
> For process_vm_writev() this is a case where user pages are pinned and
> then released in short order, so I suspect that race with the page
> cleaner would also be very hard to hit.  But we could completely
> remove the potential for the race, and also make things kinder for

Completely removing the race would be wonderful. Because large
supercomputer installations are good at hitting "rare" cases.


> f2fs and btrfs's compressed file write support, by making things work
> much like the write(2) system call.  Imagine if we had a
> "pin_user_pages_local()" which calls write_begin(), and a
> "unpin_user_pages_local()" which calls write_end(), and the

Right, that would supply the missing connection to the filesystems.

In fact, maybe these names about right:

     pin_user_file_pages()
     unpin_user_file_pages()

...and then put them in a filesystem header file, because these are now
tightly coupled to filesystems, what with the need to call
.write_begin() and .write_end().

OK...

> presumption with the "[un]pin_user_pages_local" API is that you don't
> hold the pinned pages for very long --- say, not across a system call
> boundary, and then it would work the same way the write(2) system call
> works does except that in the case of process_vm_writev(2) the pages
> are identified by another process's address space where they happen to
> be mapped.
> 
> This obviously doesn't work when pinning pages for remote DMA, because
> in that case the time between pin_user_pages_remote() and
> unpin_user_pages_remote() could be a long, long time, so that means we
> can't use using write_begin/write_end; we'd need to call page_mkwrite()
> when the pages are first pinned and then somehow prevent the page
> cleaner from touching a dirty page which is pinned for use by the
> remote DMA.
> 
> Does that make sense?
> 
> 							- Ted

Yes, I really like this suggestion. It would neatly solve most short
term pinning cases, without interfering with any future solutions for
the long term pinning cases. Very nice.


thanks,
-- 
John Hubbard
NVIDIA

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ