linux-kernel - Re: [PATCH] mm: swap: Avoid infinite loop if no valid swap entry found during do_swap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <Z7ySUwdkXRsBjhLR@casper.infradead.org>
Date: Mon, 24 Feb 2025 15:37:55 +0000
From: Matthew Wilcox <willy@...radead.org>
To: Barry Song <21cnbao@...il.com>
Cc: mawupeng <mawupeng1@...wei.com>, akpm@...ux-foundation.org,
	david@...hat.com, kasong@...cent.com, ryan.roberts@....com,
	chrisl@...nel.org, huang.ying.caritas@...il.com,
	schatzberg.dan@...il.com, hanchuanhua@...o.com, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: swap: Avoid infinite loop if no valid swap entry
 found during do_swap_page

On Mon, Feb 24, 2025 at 08:11:47PM +1300, Barry Song wrote:
> Please send a V2 and update your changelog to accurately describe the real
> issue. Additionally, clarify how frequently this occurs and why resolving
> the root cause is challenging. Gaoxu reported a similar case on the Android
> kernel 6.6, while you're reporting it on 5.10. He observed an occurrence
> rate of 1 in 500,000 over a week on customer devices but was unable to
> reproduce it in the lab.
> 
> BTW, your patch is incorrect, as normally we could have a case _swap_info_get()
> returns NULL:
> thread 1                                           thread2
> 
> 
> 1. page fault happens
> with entry points to
> swapfile;
>                                                        swapoff()
> 2. do_swap_page()
> 
> In this scenario, _swap_info_get() may return NULL, which is expected,
> and we should not return -ERRNO—the subsequent page fault  will
> detect that the PTE has changed. Since you have never enabled any
> swap, the appropriate action is to do the following:
> 
>         /* Prevent swapoff from happening to us. */
>         si = get_swap_device(entry);
> -       if (unlikely(!si))
> +       if unlikely(!si)) {
> +                      /*
>  +                     * Return VM_FAULT_SIGBUS if the swap entry points to
> +                      * a never-enabled swap file, caused by either hardware
> +                      * issues or a kernel bug. Return an error code to prevent
> +                      * an infinite page fault (#PF) loop.
> +               if (WARN_ON_ONCE(!swp_swap_info(entry)))
> +                       ret = VM_FAULT_SIGBUS;
>                 goto out;
> +       }

This is overly specific to the case that you're tracking down.
So it's entirely appropriate to apply to _your_ kernel while you work on
tracking it down, but completely inappropriate to upstream.