lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 25 Feb 2022 16:41:14 -0800 From: John Hubbard <jhubbard@...dia.com> To: Theodore Ts'o <tytso@....edu> Cc: Eric Biggers <ebiggers@...nel.org>, Lee Jones <lee.jones@...aro.org>, linux-ext4@...r.kernel.org, Christoph Hellwig <hch@....de>, Dave Chinner <dchinner@...hat.com>, Goldwyn Rodrigues <rgoldwyn@...e.com>, "Darrick J . Wong" <darrick.wong@...cle.com>, Bob Peterson <rpeterso@...hat.com>, Damien Le Moal <damien.lemoal@....com>, Andreas Gruenbacher <agruenba@...hat.com>, Ritesh Harjani <riteshh@...ux.ibm.com>, Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Johannes Thumshirn <jth@...nel.org>, linux-xfs@...r.kernel.org, linux-fsdevel@...r.kernel.org, cluster-devel@...hat.com, linux-kernel@...r.kernel.org Subject: Re: [PATCH -v3] ext4: don't BUG if kernel subsystems dirty pages without asking ext4 first On 2/25/22 15:21, Theodore Ts'o wrote: ... > For process_vm_writev() this is a case where user pages are pinned and > then released in short order, so I suspect that race with the page > cleaner would also be very hard to hit. But we could completely > remove the potential for the race, and also make things kinder for Completely removing the race would be wonderful. Because large supercomputer installations are good at hitting "rare" cases. > f2fs and btrfs's compressed file write support, by making things work > much like the write(2) system call. Imagine if we had a > "pin_user_pages_local()" which calls write_begin(), and a > "unpin_user_pages_local()" which calls write_end(), and the Right, that would supply the missing connection to the filesystems. In fact, maybe these names about right: pin_user_file_pages() unpin_user_file_pages() ...and then put them in a filesystem header file, because these are now tightly coupled to filesystems, what with the need to call .write_begin() and .write_end(). OK... > presumption with the "[un]pin_user_pages_local" API is that you don't > hold the pinned pages for very long --- say, not across a system call > boundary, and then it would work the same way the write(2) system call > works does except that in the case of process_vm_writev(2) the pages > are identified by another process's address space where they happen to > be mapped. > > This obviously doesn't work when pinning pages for remote DMA, because > in that case the time between pin_user_pages_remote() and > unpin_user_pages_remote() could be a long, long time, so that means we > can't use using write_begin/write_end; we'd need to call page_mkwrite() > when the pages are first pinned and then somehow prevent the page > cleaner from touching a dirty page which is pinned for use by the > remote DMA. > > Does that make sense? > > - Ted Yes, I really like this suggestion. It would neatly solve most short term pinning cases, without interfering with any future solutions for the long term pinning cases. Very nice. thanks, -- John Hubbard NVIDIA
Powered by blists - more mailing lists