lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <YfmwJe9cUQnBV311@carbon.dhcp.thefacebook.com>
Date:   Tue, 1 Feb 2022 14:11:49 -0800
From:   Roman Gushchin <guro@...com>
To:     Jeremy Linton <jeremy.linton@....com>
CC:     <linux-mm@...ck.org>, <cgroups@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Michal Hocko <mhocko@...nel.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [BUG/RFC] mm/memcg: Possible cgroup migrate/signal deadlock

On Tue, Feb 01, 2022 at 02:56:23PM -0600, Jeremy Linton wrote:
> With CONFIG_MEMCG_KMEM and CONFIG_PROVE_LOCKING enabled (fedora
> rawhide kernel), running a simple podman test tosses a circular
> locking dependency warning. The podman container in question simpy
> contains the echo command and the libc/ld-linux needed to run it. The
> warning can be duplicated with just a single `podman build --network
> host --layers=false -t localhost/echo .` command, although the exact
> sequence that triggers the warning needs the task state to be changing
> the frozen state as well. So, its easier to duplicate with a slightly
> longer test case.
> 
> I've attempted to trigger the actual deadlock with some standalone
> code and been unsuccessful, but looking at the code it appears to be a
> legitimate deadlock if a signal is being sent to the process from
> another thread while the task is migrating between cgroups.
> 
> Attached is a fix which I'm confident fixes the problem, but I'm not
> really that confident in the fix since I don't fully understand all
> the possible states in the cgroup code. The fix avoids the deadlock by
> shifting the objcg->list manipulation to another spinlock and then
> using list_del_rcu in obj_cgroup_release.
> 
> There is a bit more information in the actual BZ
> https://bugzilla.redhat.com/show_bug.cgi?id=2033016 including a shell
> script with the podman test/etc.

Hi Jeremy!

Thank you for the report and the patch!

We've discussed this issue some time ago and I posted a very similar patch:
https://marc.info/?l=linux-cgroups&m=164221633621286&w=2 .

Also I did resend the latest version few hours ago, but somehow the
mail didn't make it to the mailing lists. Anyway, I've added you
explicitly to cc@ and just resent.

Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ