linux-kernel - Re: [PATCH 08/24] mm/swap: check readahead policy per entry

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87a5r7c3o1.fsf@yhuang6-desk2.ccr.corp.intel.com>
Date:   Tue, 21 Nov 2023 09:10:06 +0800
From:   "Huang, Ying" <ying.huang@...el.com>
To:     Kairui Song <ryncsn@...il.com>
Cc:     linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
        David Hildenbrand <david@...hat.com>,
        Hugh Dickins <hughd@...gle.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Matthew Wilcox <willy@...radead.org>,
        Michal Hocko <mhocko@...e.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 08/24] mm/swap: check readahead policy per entry

Kairui Song <ryncsn@...il.com> writes:

> Huang, Ying <ying.huang@...el.com> 于2023年11月20日周一 14:07写道：
>>
>> Kairui Song <ryncsn@...il.com> writes:
>>
>> > From: Kairui Song <kasong@...cent.com>
>> >
>> > Currently VMA readahead is globally disabled when any rotate disk is
>> > used as swap backend. So multiple swap devices are enabled, if a slower
>> > hard disk is set as a low priority fallback, and a high performance SSD
>> > is used and high priority swap device, vma readahead is disabled globally.
>> > The SSD swap device performance will drop by a lot.
>> >
>> > Check readahead policy per entry to avoid such problem.
>> >
>> > Signed-off-by: Kairui Song <kasong@...cent.com>
>> > ---
>> >  mm/swap_state.c | 12 +++++++-----
>> >  1 file changed, 7 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/mm/swap_state.c b/mm/swap_state.c
>> > index ff6756f2e8e4..fb78f7f18ed7 100644
>> > --- a/mm/swap_state.c
>> > +++ b/mm/swap_state.c
>> > @@ -321,9 +321,9 @@ static inline bool swap_use_no_readahead(struct swap_info_struct *si, swp_entry_
>> >       return data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1;
>> >  }
>> >
>> > -static inline bool swap_use_vma_readahead(void)
>> > +static inline bool swap_use_vma_readahead(struct swap_info_struct *si)
>> >  {
>> > -     return READ_ONCE(enable_vma_readahead) && !atomic_read(&nr_rotate_swap);
>> > +     return data_race(si->flags & SWP_SOLIDSTATE) && READ_ONCE(enable_vma_readahead);
>> >  }
>> >
>> >  /*
>> > @@ -341,7 +341,7 @@ struct folio *swap_cache_get_folio(swp_entry_t entry,
>> >
>> >       folio = filemap_get_folio(swap_address_space(entry), swp_offset(entry));
>> >       if (!IS_ERR(folio)) {
>> > -             bool vma_ra = swap_use_vma_readahead();
>> > +             bool vma_ra = swap_use_vma_readahead(swp_swap_info(entry));
>> >               bool readahead;
>> >
>> >               /*
>> > @@ -920,16 +920,18 @@ static struct page *swapin_no_readahead(swp_entry_t entry, gfp_t gfp_mask,
>> >  struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
>> >                             struct vm_fault *vmf, bool *swapcached)
>> >  {
>> > +     struct swap_info_struct *si;
>> >       struct mempolicy *mpol;
>> >       struct page *page;
>> >       pgoff_t ilx;
>> >       bool cached;
>> >
>> > +     si = swp_swap_info(entry);
>> >       mpol = get_vma_policy(vmf->vma, vmf->address, 0, &ilx);
>> > -     if (swap_use_no_readahead(swp_swap_info(entry), entry)) {
>> > +     if (swap_use_no_readahead(si, entry)) {
>> >               page = swapin_no_readahead(entry, gfp_mask, mpol, ilx, vmf->vma->vm_mm);
>> >               cached = false;
>> > -     } else if (swap_use_vma_readahead()) {
>> > +     } else if (swap_use_vma_readahead(si)) {
>>
>> It's possible that some pages are swapped out to SSD while others are
>> swapped out to HDD in a readahead window.
>>
>> I suspect that there are practical requirements to use swap on SSD and
>> HDD at the same time.
>
> Hi Ying,
>
> Thanks for the review!
>
> For the first issue "fragmented readahead window", I was planning to
> do an extra check in readahead path to skip readahead entries that are
> on different swap devices, which is not hard to do,

This is a possible solution.

> but this series is growing too long so I thought it will be better
> done later.

You don't need to keep everything in one series.  Just use multiple
series.  Even if they are all swap-related.  They are dealing with
different problem in fact.

> For the second issue, "is there any practical use for multiple swap",
> I think actually there are. For example we are trying to use multi
> layer swap for offloading memory of different hotness on servers. And
> we also tried to implement a mechanism to migrate long sleep swap
> entries from high performance SSD/RAMDISK swap to cheap HDD swap
> device, with more than two layers of swap, which worked except the
> upstream issue, that readahead policy will no longer work as expected.

Thanks for your information.

>> >               page = swap_vma_readahead(entry, gfp_mask, mpol, ilx, vmf);
>> >               cached = true;
>> >       } else {

--
Best Regards,
Huang, Ying