lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGWkznGiVrqMs9fX2WGG9QsfTm72ffFj-cWXSUo3azrgeBOgAg@mail.gmail.com>
Date: Sat, 16 Mar 2024 16:53:09 +0800
From: Zhaoyang Huang <huangzhaoyang@...il.com>
To: Matthew Wilcox <willy@...radead.org>
Cc: "zhaoyang.huang" <zhaoyang.huang@...soc.com>, Andrew Morton <akpm@...ux-foundation.org>, 
	linux-mm@...ck.org, linux-kernel@...r.kernel.org, steve.kang@...soc.com
Subject: Re: [PATCH] mm: fix a race scenario in folio_isolate_lru

On Fri, Mar 15, 2024 at 8:46 PM Matthew Wilcox <willy@...radead.org> wrote:
>
> On Thu, Mar 14, 2024 at 04:39:21PM +0800, zhaoyang.huang wrote:
> > From: Zhaoyang Huang <zhaoyang.huang@...soc.com>
> >
> > Panic[1] reported which is caused by lruvec->list break. Fix the race
> > between folio_isolate_lru and release_pages.
> >
> > race condition:
> > release_pages could meet a non-refered folio which escaped from being
> > deleted from LRU but add to another list_head
>
> I don't think the bug is in folio_isolate_lru() but rather in its
> caller.
>
>  * Context:
>  *
>  * (1) Must be called with an elevated refcount on the folio. This is a
>  *     fundamental difference from isolate_lru_folios() (which is called
>  *     without a stable reference).
>
> So when release_pages() runs, it must not see a refcount decremented to
> zero, because the caller of folio_isolate_lru() is supposed to hold one.
>
> Your stack trace is for the thread which is calling release_pages(), not
> the one calling folio_isolate_lru(), so I can't help you debug further.
Thanks for the comments.  According to my understanding,
folio_put_testzero does the decrement before test which makes it
possible to have release_pages see refcnt equal zero and proceed
further(folio_get in folio_isolate_lru has not run yet).

   #0 folio_isolate_lru          #1 release_pages
BUG_ON(!folio_refcnt)
                                         if (folio_put_testzero())
   folio_get(folio)
   if (folio_test_clear_lru())

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ