[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210414122319.GD31323@quack2.suse.cz>
Date: Wed, 14 Apr 2021 14:23:19 +0200
From: Jan Kara <jack@...e.cz>
To: Dave Chinner <david@...morbit.com>
Cc: Jan Kara <jack@...e.cz>, linux-fsdevel@...r.kernel.org,
linux-ext4@...r.kernel.org, linux-xfs@...r.kernel.org,
Ted Tso <tytso@....edu>, Christoph Hellwig <hch@...radead.org>,
Amir Goldstein <amir73il@...il.com>
Subject: Re: [PATCH 2/7] mm: Protect operations adding pages to page cache
with i_mapping_lock
On Wed 14-04-21 10:01:13, Dave Chinner wrote:
> On Tue, Apr 13, 2021 at 01:28:46PM +0200, Jan Kara wrote:
> > *
> > * ->mmap_lock
> > * ->i_mmap_rwsem
> > @@ -85,7 +86,8 @@
> > * ->i_pages lock (arch-dependent flush_dcache_mmap_lock)
> > *
> > * ->mmap_lock
> > - * ->lock_page (access_process_vm)
> > + * ->i_mapping_sem (filemap_fault)
> > + * ->lock_page (filemap_fault, access_process_vm)
> > *
> > * ->i_rwsem (generic_perform_write)
> > * ->mmap_lock (fault_in_pages_readable->do_page_fault)
> > @@ -2276,16 +2278,28 @@ static int filemap_update_page(struct kiocb *iocb,
> > {
> > int error;
> >
> > + if (iocb->ki_flags & IOCB_NOWAIT) {
> > + if (!down_read_trylock(&mapping->host->i_mapping_sem))
> > + return -EAGAIN;
> > + } else {
> > + down_read(&mapping->host->i_mapping_sem);
> > + }
>
> We really need a lock primitive for this. The number of times this
> exact lock pattern is being replicated all through the IO path is
> getting out of hand.
>
> static inline bool
> down_read_try_or_lock(struct rwsem *sem, bool try)
> {
> if (try) {
> if (!down_read_trylock(sem))
> return false;
> } else {
> down_read(&mapping->host->i_mapping_sem);
> }
> return true;
> }
>
> and the callers become:
>
> if (!down_read_try_or_lock(sem, (iocb->ki_flags & IOCB_NOWAIT)))
> return -EAGAIN;
>
> We can do the same with mutex_try_or_lock(), down_try_or_lock(), etc
> and we don't need to rely on cargo cult knowledge to propagate this
> pattern anymore. Because I'm betting relatively few people actually
> know why the code is written this way because the only place it is
> documented is in an XFS commit message....
>
> Doing this is a separate cleanup, though, and not something that
> needs to be done in this patchset.
Yep, good idea but let's do it in a separate patch set.
> > index c5b0457415be..ac5bb50b3a4c 100644
> > --- a/mm/readahead.c
> > +++ b/mm/readahead.c
> > @@ -192,6 +192,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
> > */
> > unsigned int nofs = memalloc_nofs_save();
> >
> > + down_read(&mapping->host->i_mapping_sem);
> > /*
> > * Preallocate as many pages as we will need.
> > */
>
> I can't say I'm a great fan of having the mapping reach back up to
> the host to lock the host. THis seems the wrong way around to me
> given that most of the locking in the IO path is in "host locks
> mapping" and "mapping locks internal mapping structures" order...
>
> I also come back to the naming confusion here, in that when we look
> at this in long hand from the inode perspective, this chain actually
> looks like:
>
> lock(inode->i_mapping->inode->i_mapping_sem)
>
> i.e. the mapping is reaching back up outside it's scope to lock
> itself against other inode->i_mapping operations. Smells of layering
> violations to me.
>
> So, next question: should this truncate semanphore actually be part
> of the address space, not the inode? This patch is actually moving
> the page fault serialisation from the inode into the address space
> operations when page faults and page cache operations are done, so
> maybe the lock should also make that move? That would help clear up
> the naming problem, because now we can name it based around what it
> serialises in the address space, not the address space as a whole...
I think that moving the lock to address_space makes some sence although the
lock actually protects consistency of inode->i_mapping->i_pages with
whatever the filesystem has in its file_offset->disk_block mapping
structures (which are generally associated with the inode). So it is not
only about inode->i_mapping contents but I agree that struct address_space
is probably a bit more logical place than struct inode.
Regarding the name: How about i_pages_rwsem? The lock is protecting
invalidation of mapping->i_pages and needs to be held until insertion of
pages into i_pages is safe again...
Honza
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists