lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170330143652.GA4326@dhcp22.suse.cz>
Date:   Thu, 30 Mar 2017 16:36:52 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Ilya Dryomov <idryomov@...il.com>
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        stable@...r.kernel.org, Sergey Jerusalimov <wintchester@...il.com>,
        Jeff Layton <jlayton@...hat.com>, linux-xfs@...r.kernel.org
Subject: Re: [PATCH 4.4 48/76] libceph: force GFP_NOIO for socket allocations

On Thu 30-03-17 15:48:42, Ilya Dryomov wrote:
> On Thu, Mar 30, 2017 at 1:21 PM, Michal Hocko <mhocko@...nel.org> wrote:
[...]
> > familiar with Ceph at all but does any of its (slab) shrinkers generate
> > IO to recurse back?
> 
> We don't register any custom shrinkers.  This is XFS on top of rbd,
> a ceph-backed block device.

OK, that was the part I was missing. So you depend on the XFS to make a
forward progress here.

> >> Well,
> >> it's got to go through the same ceph_connection:
> >>
> >> rbd_queue_workfn
> >>   ceph_osdc_start_request
> >>     ceph_con_send
> >>       mutex_lock(&con->mutex)  # deadlock, OSD X worker is knocked out
> >>
> >> Now if that was a GFP_NOIO allocation, we would simply block in the
> >> allocator.  The placement algorithm distributes objects across the OSDs
> >> in a pseudo-random fashion, so even if we had a whole bunch of I/Os for
> >> that OSD, some other I/Os for other OSDs would complete in the meantime
> >> and free up memory.  If we are under the kind of memory pressure that
> >> makes GFP_NOIO allocations block for an extended period of time, we are
> >> bound to have a lot of pre-open sockets, as we would have done at least
> >> some flushing by then.
> >
> > How is this any different from xfs waiting for its IO to be done?
> 
> I feel like we are talking past each other here.  If the worker in
> question isn't deadlocked, it will eventually get its socket and start
> flushing I/O.  If it has deadlocked, it won't...

But if the allocation is stuck then the holder of the lock cannot make
a forward progress and it is effectivelly deadlocked because other IO
depends on the lock it holds. Maybe I just ask bad questions but what
makes GFP_NOIO different from GFP_KERNEL here. We know that the later
might need to wait for an IO to finish in the shrinker but it itself
doesn't get the lock in question directly. The former depends on the
allocator forward progress as well and that in turn wait for somebody
else to proceed with the IO. So to me any blocking allocation while
holding a lock which blocks further IO to complete is simply broken.
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ