lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJD7tkY6XF_rhnAzqhZ-mo8yw-W4hOjxFsbvH04oqVr0u8mOzQ@mail.gmail.com>
Date: Thu, 28 Dec 2023 07:33:34 -0800
From: Yosry Ahmed <yosryahmed@...gle.com>
To: Chris Li <chrisl@...nel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>, linux-kernel@...r.kernel.org, 
	linux-mm@...ck.org, Wei Xu <weixugc@...gle.com>, 
	Yu Zhao <yuzhao@...gle.com>, 
	Greg Thelen <gthelen@...gle.com>, Chun-Tse Shao <ctshao@...gle.com>, 
	Suren Baghdasaryan <surenb@...gle.com>, 
	Brain Geffon <bgeffon@...gle.com>, Minchan Kim <minchan@...nel.org>, Michal Hocko <mhocko@...e.com>, 
	Mel Gorman <mgorman@...hsingularity.net>, Huang Ying <ying.huang@...el.com>, 
	Nhat Pham <nphamcs@...il.com>, Johannes Weiner <hannes@...xchg.org>, Kairui Song <kasong@...cent.com>, 
	Zhongkun He <hezhongkun.hzk@...edance.com>, Kemeng Shi <shikemeng@...weicloud.com>, 
	Barry Song <v-songbaohua@...o.com>
Subject: Re: [PATCH] mm: swap: async free swap slot cache entries

On Thu, Dec 21, 2023 at 10:25 PM Chris Li <chrisl@...nel.org> wrote:
>
> We discovered that 1% swap page fault is 100us+ while 50% of
> the swap fault is under 20us.
>
> Further investigation show that a large portion of the time
> spent in the free_swap_slots() function for the long tail case.
>
> The percpu cache of swap slots is freed in a batch of 64 entries
> inside free_swap_slots(). These cache entries are accumulated
> from previous page faults, which may not be related to the current
> process.
>
> Doing the batch free in the page fault handler causes longer
> tail latencies and penalizes the current process.
>
> Move free_swap_slots() outside of the swapin page fault handler into an
> async work queue to avoid such long tail latencies.
>
> Testing:
>
> Chun-Tse did some benchmark in chromebook, showing that
> zram_wait_metrics improve about 15% with 80% and 95% confidence.
>
> I recently ran some experiments on about 1000 Google production
> machines. It shows swapin latency drops in the long tail
> 100us - 500us bucket dramatically.
>
> platform        (100-500us)             (0-100us)
> A               1.12% -> 0.36%          98.47% -> 99.22%
> B               0.65% -> 0.15%          98.96% -> 99.46%
> C               0.61% -> 0.23%          98.96% -> 99.38%

I recall you mentioning that mem_cgroup_uncharge_swap() is the most
expensive part of the batched freeing. If that's the case, I am
curious what happens if we move that call outside of the batching
(i.e. once the swap entry is no longer used and will be returned to
the cache). This should amortize the cost of memcg uncharging and
reduce the tail fault latency without extra work. Arguably, it could
increase the average fault latency, but not necessarily in a
significant way.

Ying pointed out something similar if I understand correctly (and
other operations that can be moved).

Also, if we choose to follow this route, I think there we should flush
the async worker in drain_slots_cache_cpu(), right?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ