lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 29 Mar 2017 13:16:51 +0200 From: Michal Hocko <mhocko@...nel.org> To: Ilya Dryomov <idryomov@...il.com> Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, stable@...r.kernel.org, Sergey Jerusalimov <wintchester@...il.com>, Jeff Layton <jlayton@...hat.com>, linux-xfs@...r.kernel.org Subject: Re: [PATCH 4.4 48/76] libceph: force GFP_NOIO for socket allocations On Wed 29-03-17 13:10:01, Ilya Dryomov wrote: > On Wed, Mar 29, 2017 at 12:55 PM, Michal Hocko <mhocko@...nel.org> wrote: > > On Wed 29-03-17 12:41:26, Michal Hocko wrote: > > [...] > >> > ceph_con_workfn > >> > mutex_lock(&con->mutex) # ceph_connection::mutex > >> > try_write > >> > ceph_tcp_connect > >> > sock_create_kern > >> > GFP_KERNEL allocation > >> > allocator recurses into XFS, more I/O is issued > > > > One more note. So what happens if this is a GFP_NOIO request which > > cannot make any progress? Your IO thread is blocked on con->mutex > > as you write below but the above thread cannot proceed as well. So I am > > _really_ not sure this acutally helps. > > This is not the only I/O worker. A ceph cluster typically consists of > at least a few OSDs and can be as large as thousands of OSDs. This is > the reason we are calling sock_create_kern() on the writeback path in > the first place: pre-opening thousands of sockets isn't feasible. Sorry for being dense here but what actually guarantees the forward progress? My current understanding is that the deadlock is caused by con->mutext being held while the allocation cannot make a forward progress. I can imagine this would be possible if the other io flushers depend on this lock. But then NOIO vs. KERNEL allocation doesn't make much difference. What am I missing? -- Michal Hocko SUSE Labs
Powered by blists - more mailing lists