[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJD7tkamKcaqHR5V+4+9ixmFc3dC2NnGcu7YzdXqxqNEe8FqqA@mail.gmail.com>
Date: Mon, 23 Sep 2024 19:15:47 -0700
From: Yosry Ahmed <yosryahmed@...gle.com>
To: Baolin Wang <baolin.wang@...ux.alibaba.com>
Cc: Nhat Pham <nphamcs@...il.com>, akpm@...ux-foundation.org, hannes@...xchg.org,
hughd@...gle.com, shakeel.butt@...ux.dev, ryan.roberts@....com,
ying.huang@...el.com, chrisl@...nel.org, david@...hat.com, kasong@...cent.com,
willy@...radead.org, viro@...iv.linux.org.uk, baohua@...nel.org,
chengming.zhou@...ux.dev, linux-mm@...ck.org, kernel-team@...a.com,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 0/2] remove SWAP_MAP_SHMEM
On Mon, Sep 23, 2024 at 6:55 PM Baolin Wang
<baolin.wang@...ux.alibaba.com> wrote:
>
>
>
> On 2024/9/24 07:11, Nhat Pham wrote:
> > The SWAP_MAP_SHMEM state was originally introduced in the commit
> > aaa468653b4a ("swap_info: note SWAP_MAP_SHMEM"), to quickly determine if a
> > swap entry belongs to shmem during swapoff.
> >
> > However, swapoff has since been rewritten drastically in the commit
> > b56a2d8af914 ("mm: rid swapoff of quadratic complexity"). Now
> > having swap count == SWAP_MAP_SHMEM value is basically the same as having
> > swap count == 1, and swap_shmem_alloc() behaves analogously to
> > swap_duplicate()
> >
> > This RFC proposes the removal of this state and the associated helper to
> > simplify the state machine (both mentally and code-wise). We will also
> > have an extra state/special value that can be repurposed (for swap entries
> > that never gets re-duplicated).
> >
> > Another motivation (albeit a bit premature at the moment) is the new swap
> > abstraction I am currently working on, that would allow for swap/zswap
> > decoupling, swapoff optimization, etc. The fewer states and swap API
> > functions there are, the simpler the conversion will be.
> >
> > I am sending this series first as an RFC, just in case I missed something
> > or misunderstood this state, or if someone has a swap optimization in mind
> > for shmem that would require this special state.
>
> The idea makes sense to me. I did a quick test with shmem mTHP, and
> encountered the following warning which is triggered by
> 'VM_WARN_ON(usage == 1 && nr > 1)' in __swap_duplicate().
Apparently __swap_duplicate() does not currently handle increasing the
swap count for multiple swap entries by 1 (i.e. usage == 1) because it
does not handle rolling back count increases when
swap_count_continued() fails.
I guess this voids my Reviewed-by until we sort this out. Technically
swap_count_continued() won't ever be called for shmem because we only
ever increment the count by 1, but there is no way to know this in
__swap_duplicate() without SWAP_HAS_SHMEM.
>
> [ 81.064967] ------------[ cut here ]------------
> [ 81.064968] WARNING: CPU: 4 PID: 6852 at mm/swapfile.c:3623
> __swap_duplicate+0x1d0/0x2e0
> [ 81.064994] pstate: 23400005 (nzCv daif +PAN -UAO +TCO +DIT -SSBS
> BTYPE=--)
> [ 81.064995] pc : __swap_duplicate+0x1d0/0x2e0
> [ 81.064997] lr : swap_duplicate_nr+0x30/0x70
> [......]
> [ 81.065019] Call trace:
> [ 81.065019] __swap_duplicate+0x1d0/0x2e0
> [ 81.065021] swap_duplicate_nr+0x30/0x70
> [ 81.065022] shmem_writepage+0x24c/0x438
> [ 81.065024] pageout+0x104/0x2e0
> [ 81.065026] shrink_folio_list+0x7f0/0xe60
> [ 81.065027] reclaim_folio_list+0x90/0x178
> [ 81.065029] reclaim_pages+0x128/0x1a8
> [ 81.065030] madvise_cold_or_pageout_pte_range+0x80c/0xd10
> [ 81.065031] walk_pmd_range.isra.0+0x1b8/0x3a0
> [ 81.065033] walk_pud_range+0x120/0x1b0
> [ 81.065035] walk_pgd_range+0x150/0x1a8
> [ 81.065036] __walk_page_range+0xa4/0xb8
> [ 81.065038] walk_page_range+0x1c8/0x250
> [ 81.065039] madvise_pageout+0xf4/0x280
> [ 81.065041] madvise_vma_behavior+0x268/0x3f0
> [ 81.065042] madvise_walk_vmas.constprop.0+0xb8/0x128
> [ 81.065043] do_madvise.part.0+0xe8/0x2a0
> [ 81.065044] __arm64_sys_madvise+0x64/0x78
> [ 81.065046] invoke_syscall.constprop.0+0x54/0xe8
> [ 81.065048] do_el0_svc+0xa4/0xc0
> [ 81.065050] el0_svc+0x2c/0xb0
> [ 81.065052] el0t_64_sync_handler+0xb8/0xc0
> [ 81.065054] el0t_64_sync+0x14c/0x150
>
> > Swap experts, let me know if I'm mistaken :) Otherwise if there is no
> > objection I will resend this patch series again for merging.
> >
> > Nhat Pham (2):
> > swapfile: add a batched variant for swap_duplicate()
> > swap: shmem: remove SWAP_MAP_SHMEM
> >
> > include/linux/swap.h | 16 ++++++++--------
> > mm/shmem.c | 2 +-
> > mm/swapfile.c | 28 +++++++++-------------------
> > 3 files changed, 18 insertions(+), 28 deletions(-)
> >
> >
> > base-commit: acfabf7e197f7a5bedf4749dac1f39551417b049
Powered by blists - more mailing lists