lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACSyD1P6HmH9tSvONnNxYv8P+am_hH2dK3UJQd9_+o6EWkPsXA@mail.gmail.com>
Date: Tue, 2 Jan 2024 19:39:00 +0800
From: Zhongkun He <hezhongkun.hzk@...edance.com>
To: Nhat Pham <nphamcs@...il.com>
Cc: akpm@...ux-foundation.org, hannes@...xchg.org, yosryahmed@...gle.com, 
	sjenning@...hat.com, ddstreet@...e.org, vitaly.wool@...sulko.com, 
	linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [External] Re: [PATCH] mm: zswap: fix the lack of page lru flag
 in zswap_writeback_entry

On Sat, Dec 30, 2023 at 10:09 AM Nhat Pham <nphamcs@...il.com> wrote:
>
> On Tue, Oct 24, 2023 at 7:27 AM Zhongkun He
> <hezhongkun.hzk@...edance.com> wrote:
> >
>
> My apologies for the delayed response. I have a couple of questions.
>
> > The zswap_writeback_entry() will add a page to the swap cache, decompress
> > the entry data into the page, and issue a bio write to write the page back
> > to the swap device. Move the page to the tail of lru list  through
> > SetPageReclaim(page) and folio_rotate_reclaimable().
> >
> > Currently, about half of the pages will fail to move to the tail of lru
>
> May I ask what's the downstream effect of this? i.e so what if it
> fails? And yes, as Andrew pointed out, it'd be nice if the patch
> changelog spells out any observable or measurable change from the
> user's POV.
>

The swap cache page used to decompress zswap_entry should be
moved  to the tail of the inactive list after end_writeback, We can release
them in time.Just like the following code in zswap_writeback_entry().

   /* move it to the tail of the inactive list after end_writeback */
   SetPageReclaim(page);

After the writeback is over, the function of
folio_rotate_reclaimable() will fail
because the page is not in the LRU list but in some of the cpu folio batches.
Therefore we did not achieve the goal of setting SetPageReclaim(page), and
the pages could not be free in time.

> > list because there is no LRU flag in page which is not in the LRU list but
> > the cpu_fbatches. So fix it.
>
> This sentence is a bit confusing to me. Does this mean the page
> currently being processed for writeback is not in the LRU list
> (!PageLRU(page)), but IN one of the cpu folio batches? Which makes
> folio_rotate_reclaimable() fails on this page later on in the
> _swap_writepage() path? (hence the necessity of lru_add_drain()?)
>

Yes, exactly.

> Let me know if I'm misunderstanding the intention of this patch. I
> know it's a bit pedantic, but spelling things out (ideally in the
> changelog itself) will help the reviewers, as well as future
> contributors who want to study the codebase and make changes to it.
>

Sorry,my bad.

> >
> > Signed-off-by: Zhongkun He <hezhongkun.hzk@...edance.com>
>
> Thanks and look forward to your response,
> Nhat
>
> P/S: Have a nice holiday season and happy new year!

Here are the steps and results of the performance test:
1:zswap+ zram (simplified model with on IO)
2:disabel zswap/parameters/same_filled_pages_enabled (stress have same pages)
3:stress --vm 1 --vm-bytes 2g --vm-hang 0   (2Gi anon pages)
4: In order to quickly release zswap_entry, I used the previous
     patch (I will send it again later).
https://lore.kernel.org/all/20231025095248.458789-1-hezhongkun.hzk@bytedance.com/

Performance result:
   reclaim 1Gi zswap_entry

time echo 1 > writeback_time_threshold
(will release the zswap_entry,  not been accessed for more than 1 seconds )

Base                                 With this patch
real    0m1.015s               real    0m1.043s
user    0m0.000s              user    0m0.001s
sys     0m1.013s               sys     0m1.040s
So no obvious performance regression was found.

After writeback, we perform the following steps to release the memory again
echo 1g > memory.reclaim

Base:
                     total        used        recalim        total        used
Mem:           38Gi       2.5Gi       ---->             38Gi       1.5Gi
Swap:         5.0Gi       1.0Gi       ---->              5Gi        1.5Gi
used memory  -1G   swap +0.5g
It means that  half of the pages are failed to move to the tail of lru list,
So we need to release an additional 0.5Gi anon pages to swap space.

With this patch:
                     total        used        recalim        total        used
Mem:           38Gi       2.6Gi       ---->             38Gi       1.6Gi
Swap:         5.0Gi       1.0Gi       ---->              5Gi        1Gi

used memory  -1Gi,  swap +0Gi
It means that we release all the pages which have been add to the tail of
lru list in zswap_writeback_entry() and folio_rotate_reclaimable().


Thanks for your time Nhat and Andrew. Happy New Year!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ