linux-kernel - Re: [PATCH 4.4 48/76] libceph: force GFP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOi1vP_hF5Z3gO3jfwL-fUFFZdk5X+ESvKF5gyc9UAzyqKhdXA@mail.gmail.com>
Date:   Thu, 30 Mar 2017 19:19:59 +0200
From:   Ilya Dryomov <idryomov@...il.com>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        stable@...r.kernel.org, Sergey Jerusalimov <wintchester@...il.com>,
        Jeff Layton <jlayton@...hat.com>, linux-xfs@...r.kernel.org
Subject: Re: [PATCH 4.4 48/76] libceph: force GFP_NOIO for socket allocations

On Thu, Mar 30, 2017 at 6:12 PM, Michal Hocko <mhocko@...nel.org> wrote:
> On Thu 30-03-17 17:06:51, Ilya Dryomov wrote:
> [...]
>> > But if the allocation is stuck then the holder of the lock cannot make
>> > a forward progress and it is effectivelly deadlocked because other IO
>> > depends on the lock it holds. Maybe I just ask bad questions but what
>>
>> Only I/O to the same OSD.  A typical ceph cluster has dozens of OSDs,
>> so there is plenty of room for other in-flight I/Os to finish and move
>> the allocator forward.  The lock in question is per-ceph_connection
>> (read: per-OSD).
>>
>> > makes GFP_NOIO different from GFP_KERNEL here. We know that the later
>> > might need to wait for an IO to finish in the shrinker but it itself
>> > doesn't get the lock in question directly. The former depends on the
>> > allocator forward progress as well and that in turn wait for somebody
>> > else to proceed with the IO. So to me any blocking allocation while
>> > holding a lock which blocks further IO to complete is simply broken.
>>
>> Right, with GFP_NOIO we simply wait -- there is nothing wrong with
>> a blocking allocation, at least in the general case.  With GFP_KERNEL
>> we deadlock, either in rbd/libceph (less likely) or in the filesystem
>> above (more likely, shown in the xfs_reclaim_inodes_ag() traces you
>> omitted in your quote).
>
> I am not convinced. It seems you are relying on something that is not
> guaranteed fundamentally. AFAIU all the IO paths should _guarantee_
> and use mempools for that purpose if they need to allocate.
>
> But, hey, I will not argue as my understanding of ceph is close to
> zero. You are the maintainer so it is your call. I would just really
> appreciate if you could document this as much as possible (ideally
> at the place where you call memalloc_noio_save and describe the lock
> dependency there).

It's certainly not perfect (especially this socket case -- putting
together a pool of sockets is not easy) and I'm sure one could poke
some holes in the entire thing, but I'm convinced we are much better
off with the memalloc_noio_{save,restore}() pair in there.

I'll try to come up with a better comment, but the problem is that it
can be an arbitrary lock in an arbitrary filesystem, not just libceph's
con->mutex, so it's hard to be specific.

Do I have your OK to poke Greg to get the backports going?

Thanks,

                Ilya