linux-kernel - Re: [regression 4.2-rc3] loop: xfstests xfs/073 deadlocked in low memory conditions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150721054612.GZ7943@dastard>
Date:	Tue, 21 Jul 2015 15:46:12 +1000
From:	Dave Chinner <david@...morbit.com>
To:	Ming Lei <ming.lei@...onical.com>
Cc:	Michal Hocko <mhocko@...e.cz>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-mm <linux-mm@...ck.org>, xfs@....sgi.com
Subject: Re: [regression 4.2-rc3] loop: xfstests xfs/073 deadlocked in low
 memory conditions

On Tue, Jul 21, 2015 at 12:05:56AM -0400, Ming Lei wrote:
> On Mon, Jul 20, 2015 at 9:59 PM, Dave Chinner <david@...morbit.com> wrote:
> > Hi Ming,
> >
> > With the recent merge of the loop device changes, I'm now seeing
> > XFS deadlock on my single CPU, 1GB RAM VM running xfs/073.
> >
> > The deadlocked is as follows:
> >
> > kloopd1: loop_queue_read_work
> >         xfs_file_iter_read
> >         lock XFS inode XFS_IOLOCK_SHARED (on image file)
> >         page cache read (GFP_KERNEL)
> >         radix tree alloc
> >         memory reclaim
> >         reclaim XFS inodes
> >         log force to unpin inodes
> >         <wait for log IO completion>
> >
> > xfs-cil/loop1: <does log force IO work>
> >         xlog_cil_push
> >         xlog_write
> >         <loop issuing log writes>
> >                 xlog_state_get_iclog_space()
> >                 <blocks due to all log buffers under write io>
> >                 <waits for IO completion>
> >
> > kloopd1: loop_queue_write_work
> >         xfs_file_write_iter
> >         lock XFS inode XFS_IOLOCK_EXCL (on image file)
> >         <wait for inode to be unlocked>
> >
> > [The full stack traces are below].
> >
> > i.e. the kloopd, with it's split read and write work queues, has
> > introduced a dependency through memory reclaim. i.e. that writes
> > need to be able to progress for reads make progress.
> 
> This kind of change just makes READ vs READ OR WRITE submitted
> to fs concurrently, and the use case should have been simulated from
> user space on one regular XFS file too?

Assuming the "regular XFS file" is on a normal block device (i.e.
not a loop device) then this will not deadlock as there is not
dependency on vfs level locking for log writes.

i.e. normal userspace IO path is:

userspace read
	vfs_read
	  xfs_read
	    page cache alloc (GFP_KERNEL)
	      direct reclaim
	        xfs_inode reclaim
		  log force
		    CIL push
		      <workqueue>
		        xlog_write
			  submit_bio
			    -> hardware.

And then the log IO completes, and everything continues onward.

What the loop device used to do:

userspace read
	vfs_read
	  xfs_read
	    page cache alloc (GFP_KERNEL)
	    submit_bio
	      loop device
	        splice_read (on image file)
		  xfs_splice_read
		    page cache alloc (GFP_NOFS)
		      direct reclaim
		        <skip filesystem reclaim>
		      submit_bio
		        -> hardware.

And when the read Io completes, everything moves onwards.

What the loop device now does:

userspace read
	vfs_read
	  xfs_read
	    page cache alloc (GFP_KERNEL)
	    submit_bio
	      loop device
	        <workqueue>
	        vfs_read (on image file)
		  xfs_read
		    page cache alloc (GFP_KERNEL)
		      direct reclaim
			xfs_inode reclaim
			  log force
			    CIL push
			      <workqueue>
				xlog_write
				  submit_bio
				    loop device
				    <workqueue>
				      vfs_write (on image file)
				        xfs_write
					  <deadlock on image file lock>


> > The problem, fundamentally, is that mpage_readpages() does a
> > GFP_KERNEL allocation, rather than paying attention to the inode's
> > mapping gfp mask, which is set to GFP_NOFS.
> 
> That looks the root cause, and I guess the issue is just triggered
> after commit aa4d86163e4(block: loop: switch to VFS ITER_BVEC)
> which changes splice to bvec iterator.

Yup - you are the unfortunate person who has wandered into the
minefield I'd been telling people about for quite some time. :(

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/