lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMgjq7CapiW2h2pzcKQBhwf_5rs5fgMGHw1E2YJYwEiY6zc=LQ@mail.gmail.com>
Date: Thu, 3 Oct 2024 02:58:19 +0800
From: Kairui Song <ryncsn@...il.com>
To: Dan Carpenter <dan.carpenter@...aro.org>
Cc: Naresh Kamboju <naresh.kamboju@...aro.org>, open list <linux-kernel@...r.kernel.org>, 
	lkft-triage@...ts.linaro.org, Linux Regressions <regressions@...ts.linux.dev>, 
	linux-mm <linux-mm@...ck.org>, Andrew Morton <akpm@...ux-foundation.org>, 
	Arnd Bergmann <arnd@...db.de>, Anders Roxell <anders.roxell@...aro.org>
Subject: Re: next-20241001: WARNING: at mm/list_lru.c:77 list_lru_del
 (mm/list_lru.c:212 mm/list_lru.c:200)

On Wed, Oct 2, 2024 at 7:28 PM Dan Carpenter <dan.carpenter@...aro.org> wrote:
>
> On Wed, Oct 02, 2024 at 02:25:34PM +0300, Dan Carpenter wrote:
> > On Wed, Oct 02, 2024 at 02:24:20PM +0300, Dan Carpenter wrote:
> > > Let's add Kairui Song to the  CC list.
> > >
> > > One simple thing is that we should add a READ_ONCE() to the comparison.  Naresh,
> > > could you test the attached diff?  I don't know that it will fix it but it's
> > > worth checking the easy stuff first.
> > >
> >
> > Actually that's not right.  Let me write a different patch.
>
> Try this one.
>
> regards,
> dan carpenter
>
> diff --git a/mm/list_lru.c b/mm/list_lru.c
> index 79c2d21504a2..2c429578ed31 100644
> --- a/mm/list_lru.c
> +++ b/mm/list_lru.c
> @@ -65,6 +65,7 @@ lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
>                        bool irq, bool skip_empty)
>  {
>         struct list_lru_one *l;
> +       long nr_items;
>         rcu_read_lock();
>  again:
>         l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg));
> @@ -73,8 +74,9 @@ lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
>                         spin_lock_irq(&l->lock);
>                 else
>                         spin_lock(&l->lock);
> -               if (likely(READ_ONCE(l->nr_items) != LONG_MIN)) {
> -                       WARN_ON(l->nr_items < 0);
> +               nr_items = READ_ONCE(l->nr_items);
> +               if (likely(nr_items != LONG_MIN)) {
> +                       WARN_ON(nr_items < 0);
>                         rcu_read_unlock();
>                         return l;
>                 }
>

Thanks. The warning is a new added sanity check, I'm not sure if this
WARN_ON triggered by an existing list_lru leak or if it's a new issue.

And unfortunately so far I can't reproduce it locally on my ARM
machine, it should be easily reproducible according to the
description. And if the WARN only triggered once, and only during
boot, mayce some static data wasn't initialized correctly? Or the
enablement of memcg caused some list_lru leak
(mem_cgroup_from_slab_obj changed from returning NULL to returning
actual memcg, so a item added to rootcg before will be attempt removed
from actual memcg, seems a real race). If it's the latter case, then
it's an existing issue caught by the new sanity check.

The READ_ONCE patch may be worth trying, I'll also try to do more
debugging on this and try to send a fix later.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ