lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOUHufahcS0G_GApTdmzE4_Nb_70LGaCkgV0NR_xJuWN2NdJVg@mail.gmail.com>
Date:   Sun, 15 Jan 2023 16:13:44 -0700
From:   Yu Zhao <yuzhao@...gle.com>
To:     msizanoen1 <msizanoen@...labs.xyz>
Cc:     Andrew Morton <akpm@...ux-foundation.org>, stable@...r.kernel.org,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] mm: do not try to migrate lru_gen if it's not
 associated with a memcg

On Sun, Jan 15, 2023 at 6:47 AM msizanoen1 <msizanoen@...labs.xyz> wrote:
>
> In some cases, memory cgroup migration can be initiated by userspace
> right after a process was created and right before `lru_gen_add_mm()` is
> called (e.g. by some program watching a cgroup and moving away any
> processes it detects[1]), which results in the following sequence of
> WARNs followed by an Oops as the kernel attempts to perform a
> `lru_gen_add_mm()` twice on the same `mm`:

...

> Fix this by simply leaving the lru_gen alone if it has not been
> associated with a memcg yet, as it should eventually be assigned to the
> right cgroup anyway.
>
> [1]: https://gitlab.freedesktop.org/benzea/uresourced/-/blob/master/cgroupify/cgroupify.c
>
> v2:
>         Added stable cc tags
>
> Signed-off-by: N/A (patch should not be copyrightable)
> Cc: stable@...r.kernel.org

Thanks for the fix.  Cc'ing stable is the right thing to do. The
commit message and the comment styles could be easily adjusted to
align with the guidelines.

I don't think the N/A is acceptible though. I fully respect it if you
wish to remain anonymous -- I can send a similar fix crediting you
as the "anonymous user <msizanoen@...labs.xyz>" who reported this bug.

A bit of background on how I broke it: an old version I have on 4.15
calls lru_gen_add_mm() before cgroup_post_fork(), which excludes
cgroup migrations by cgroup_threadgroup_rwsem. When I rebased it, I
made lru_gen_add_mm() depend on task_lock for the synchronization with
cgroup migrations -- the decoupling seemed (still seems) to make it
less complicated -- but this is not safe unless we have the check below.




> ---
>  mm/vmscan.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index bd6637fcd8f9..0cac40e7484c 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -3323,13 +3323,19 @@ void lru_gen_migrate_mm(struct mm_struct *mm)
>         if (mem_cgroup_disabled())
>                 return;
>
> +       /* This could happen if cgroup migration is invoked before the process
> +        * lru_gen is associated with a memcg (e.g. during process creation).
> +        * Simply ignore it in this case as the lru_gen will get assigned the
> +        * right cgroup later. */
> +       if (!mm->lru_gen.memcg)
> +               return;
> +
>         rcu_read_lock();
>         memcg = mem_cgroup_from_task(task);
>         rcu_read_unlock();
>         if (memcg == mm->lru_gen.memcg)
>                 return;
>
> -       VM_WARN_ON_ONCE(!mm->lru_gen.memcg);
>         VM_WARN_ON_ONCE(list_empty(&mm->lru_gen.list));
>
>         lru_gen_del_mm(mm);

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ