lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <877bvymaau.fsf@DESKTOP-5N7EMDA>
Date: Mon, 10 Nov 2025 18:50:01 +0800
From: "Huang, Ying" <ying.huang@...ux.alibaba.com>
To: Kairui Song <ryncsn@...il.com>
Cc: Kairui Song via B4 Relay <devnull+kasong.tencent.com@...nel.org>,
  linux-mm@...ck.org,  Andrew Morton <akpm@...ux-foundation.org>,  Kemeng
 Shi <shikemeng@...weicloud.com>,  Nhat Pham <nphamcs@...il.com>,  Baoquan
 He <bhe@...hat.com>,  Barry Song <baohua@...nel.org>,  Chris Li
 <chrisl@...nel.org>,  Johannes Weiner <hannes@...xchg.org>,  Yosry Ahmed
 <yosry.ahmed@...ux.dev>,  Chengming Zhou <chengming.zhou@...ux.dev>,
  Youngjun Park <youngjun.park@....com>,  linux-kernel@...r.kernel.org,
  stable@...r.kernel.org,  Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Subject: Re: [PATCH] Revert "mm, swap: avoid redundant swap device pinning"

Kairui Song <ryncsn@...il.com> writes:

> On Mon, Nov 10, 2025 at 9:56 AM Huang, Ying
> <ying.huang@...ux.alibaba.com> wrote:
>>
>> Hi, Kairui,
>>
>> Kairui Song via B4 Relay <devnull+kasong.tencent.com@...nel.org> writes:
>>
>> > From: Kairui Song <kasong@...cent.com>
>> >
>> > This reverts commit 78524b05f1a3e16a5d00cc9c6259c41a9d6003ce.
>> >
>> > While reviewing recent leaf entry changes, I noticed that commit
>> > 78524b05f1a3 ("mm, swap: avoid redundant swap device pinning") isn't
>> > correct. It's true that most all callers of __read_swap_cache_async are
>> > already holding a swap entry reference, so the repeated swap device
>> > pinning isn't needed on the same swap device, but it is possible that
>> > VMA readahead (swap_vma_readahead()) may encounter swap entries from a
>> > different swap device when there are multiple swap devices, and call
>> > __read_swap_cache_async without holding a reference to that swap device.
>> >
>> > So it is possible to cause a UAF if swapoff of device A raced with
>> > swapin on device B, and VMA readahead tries to read swap entries from
>> > device A. It's not easy to trigger but in theory possible to cause real
>> > issues. And besides, that commit made swap more vulnerable to issues
>> > like corrupted page tables.
>> >
>> > Just revert it. __read_swap_cache_async isn't that sensitive to
>> > performance after all, as it's mostly used for SSD/HDD swap devices with
>> > readahead. SYNCHRONOUS_IO devices may fallback onto it for swap count >
>> > 1 entries, but very soon we will have a new helper and routine for
>> > such devices, so they will never touch this helper or have redundant
>> > swap device reference overhead.
>>
>> Is it better to add get_swap_device() in swap_vma_readahead()?  Whenever
>> we get a swap entry, the first thing we need to do is call
>> get_swap_device() to check the validity of the swap entry and prevent
>> the backing swap device from going under us.  This helps us to avoid
>> checking the validity of the swap entry in every swap function.  Does
>> this sound reasonable?
>
> Hi Ying, thanks for the suggestion!
>
> Yes, that's also a feasible approach.
>
> What I was thinking is that, currently except the readahead path, all
> swapin entry goes through the get_swap_device() helper, that helper
> also helps to mitigate swap entry corruption that may causes OOB or
> NULL deref. Although I think it's really not that helpful at all to
> mitigate page table corruption from the kernel side, but seems not a
> really bad idea to have.
>
> And the code is simpler this way, and seems more suitable for a stable
> & mainline fix. If we want  to add get_swap_device() in
> swap_vma_readahead(), we need to do that for every entry that doesn't
> match the target entry's swap device. The reference overhead is
> trivial compared to readhead and bio layer, and only non
> SYNCHRONOUS_IO devices use this helper (madvise is a special case, we
> may optimize that later). ZRAM may fallback to the readahead path but
> this fallback will be eliminated very soon in swap table p2.

We have 2 choices in general.

1. Add get/put_swap_device() in every swap function.

2. Add get/put_swap_device() in every caller of the swap functions.

Personally, I prefer 2.  It works better in situations like calling
multiple swap functions.  It can reduce duplicated references.  It helps
improve code reasoning and readability.

> Another approach I thought about is that we might want readahead to
> stop when it sees entries from a different swap device. That swap
> device might be ZRAM where VMA readahead is not helpful.
>
> How do you think?

One possible solution is to skip or stop for a swap entry from the
SYNCHRONOUS_IO swap device.

---
Best Regards,
Huang, Ying

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ