linux-kernel - Re: [PATCH 4.4 48/76] libceph: force GFP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170330112126.GE1972@dhcp22.suse.cz>
Date:   Thu, 30 Mar 2017 13:21:26 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Ilya Dryomov <idryomov@...il.com>
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        stable@...r.kernel.org, Sergey Jerusalimov <wintchester@...il.com>,
        Jeff Layton <jlayton@...hat.com>, linux-xfs@...r.kernel.org
Subject: Re: [PATCH 4.4 48/76] libceph: force GFP_NOIO for socket allocations

On Thu 30-03-17 12:02:03, Ilya Dryomov wrote:
> On Thu, Mar 30, 2017 at 8:25 AM, Michal Hocko <mhocko@...nel.org> wrote:
> > On Wed 29-03-17 16:25:18, Ilya Dryomov wrote:
[...]
> >> We got rid of osdc->request_mutex in 4.7, so these workers are almost
> >> independent in newer kernels and should be able to free up memory for
> >> those blocked on GFP_NOIO retries with their respective con->mutex
> >> held.  Using GFP_KERNEL and thus allowing the recursion is just asking
> >> for an AA deadlock on con->mutex OTOH, so it does make a difference.
> >
> > You keep saying this but so far I haven't heard how the AA deadlock is
> > possible. Both GFP_KERNEL and GFP_NOIO can stall for an unbounded amount
> > of time and that would cause you problems AFAIU.
> 
> Suppose we have an I/O for OSD X, which means it's got to go through
> ceph_connection X:
> 
> ceph_con_workfn
>   mutex_lock(&con->mutex)
>     try_write
>       ceph_tcp_connect
>         sock_create_kern
>           GFP_KERNEL allocation
> 
> Suppose that generates another I/O for OSD X and blocks on it.

Yeah, I have understand that but I am asking _who_ is going to generate
that IO. We do not do writeback from the direct reclaim path. I am not
familiar with Ceph at all but does any of its (slab) shrinkers generate
IO to recurse back?

> Well,
> it's got to go through the same ceph_connection:
> 
> rbd_queue_workfn
>   ceph_osdc_start_request
>     ceph_con_send
>       mutex_lock(&con->mutex)  # deadlock, OSD X worker is knocked out
> 
> Now if that was a GFP_NOIO allocation, we would simply block in the
> allocator.  The placement algorithm distributes objects across the OSDs
> in a pseudo-random fashion, so even if we had a whole bunch of I/Os for
> that OSD, some other I/Os for other OSDs would complete in the meantime
> and free up memory.  If we are under the kind of memory pressure that
> makes GFP_NOIO allocations block for an extended period of time, we are
> bound to have a lot of pre-open sockets, as we would have done at least
> some flushing by then.

How is this any different from xfs waiting for its IO to be done?
-- 
Michal Hocko
SUSE Labs