[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170330161204.GD4326@dhcp22.suse.cz>
Date: Thu, 30 Mar 2017 18:12:06 +0200
From: Michal Hocko <mhocko@...nel.org>
To: Ilya Dryomov <idryomov@...il.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
stable@...r.kernel.org, Sergey Jerusalimov <wintchester@...il.com>,
Jeff Layton <jlayton@...hat.com>, linux-xfs@...r.kernel.org
Subject: Re: [PATCH 4.4 48/76] libceph: force GFP_NOIO for socket allocations
On Thu 30-03-17 17:06:51, Ilya Dryomov wrote:
[...]
> > But if the allocation is stuck then the holder of the lock cannot make
> > a forward progress and it is effectivelly deadlocked because other IO
> > depends on the lock it holds. Maybe I just ask bad questions but what
>
> Only I/O to the same OSD. A typical ceph cluster has dozens of OSDs,
> so there is plenty of room for other in-flight I/Os to finish and move
> the allocator forward. The lock in question is per-ceph_connection
> (read: per-OSD).
>
> > makes GFP_NOIO different from GFP_KERNEL here. We know that the later
> > might need to wait for an IO to finish in the shrinker but it itself
> > doesn't get the lock in question directly. The former depends on the
> > allocator forward progress as well and that in turn wait for somebody
> > else to proceed with the IO. So to me any blocking allocation while
> > holding a lock which blocks further IO to complete is simply broken.
>
> Right, with GFP_NOIO we simply wait -- there is nothing wrong with
> a blocking allocation, at least in the general case. With GFP_KERNEL
> we deadlock, either in rbd/libceph (less likely) or in the filesystem
> above (more likely, shown in the xfs_reclaim_inodes_ag() traces you
> omitted in your quote).
I am not convinced. It seems you are relying on something that is not
guaranteed fundamentally. AFAIU all the IO paths should _guarantee_
and use mempools for that purpose if they need to allocate.
But, hey, I will not argue as my understanding of ceph is close to
zero. You are the maintainer so it is your call. I would just really
appreciate if you could document this as much as possible (ideally
at the place where you call memalloc_noio_save and describe the lock
dependency there).
Thanks!
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists