lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240215161114.6bd444ed839f778eefdf6e0a@linux-foundation.org>
Date: Thu, 15 Feb 2024 16:11:14 -0800
From: Andrew Morton <akpm@...ux-foundation.org>
To: Chris Li <chrisl@...nel.org>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org, Wei Xu
 <weixugc@...gle.com>, Yu Zhao <yuzhao@...gle.com>, Greg Thelen
 <gthelen@...gle.com>, Chun-Tse Shao <ctshao@...gle.com>, Yosry Ahmed
 <yosryahmed@...gle.com>, Michal Hocko <mhocko@...e.com>, Mel Gorman
 <mgorman@...hsingularity.net>, Huang Ying <ying.huang@...el.com>, Nhat Pham
 <nphamcs@...il.com>, Kairui Song <kasong@...cent.com>, Barry Song
 <v-songbaohua@...o.com>, Tim Chen <tim.c.chen@...ux.intel.com>
Subject: Re: [PATCH v4] mm: swap: async free swap slot cache entries

On Wed, 14 Feb 2024 17:02:13 -0800 Chris Li <chrisl@...nel.org> wrote:

> We discovered that 1% swap page fault is 100us+ while 50% of
> the swap fault is under 20us.
> 
> Further investigation shows that a large portion of the time
> spent in the free_swap_slots() function for the long tail case.
> 
> The percpu cache of swap slots is freed in a batch of 64 entries
> inside free_swap_slots(). These cache entries are accumulated
> from previous page faults, which may not be related to the current
> process.
> 
> Doing the batch free in the page fault handler causes longer
> tail latencies and penalizes the current process.
> 
> When the swap cache slot is full, schedule async free cached
> swap slots in a work queue, before the next swap fault comes in.
> If the next swap fault comes in very fast, before the async
> free gets a chance to run. It will directly free all the swap
> cache in the swap fault the same way as previously.
> 
> Testing:
> 
> Chun-Tse did some benchmark in chromebook, showing that
> zram_wait_metrics improve about 15% with 80% and 95% confidence.
> 
> I recently ran some experiments on about 1000 Google production
> machines. It shows swapin latency drops in the long tail
> 100us - 500us bucket dramatically.
> 
> platform	(100-500us)	 	(0-100us)
> A		1.12% -> 0.36%		98.47% -> 99.22%
> B		0.65% -> 0.15%		98.96% -> 99.46%
> C		0.61% -> 0.23%		98.96% -> 99.38%
> 

What this description lacks is any description of why anyone cares. 

The patch clearly decreases overall throughput (speed-vs-latency is a
common tradeoff).

And the "we don't know how to fix this properly so punt it into a
kernel thread" approach remains lame.  For example, the risk that the
now-liberated allocator can outpace the async freeing, resulting in
unlimited object windup.

And here's a fun one: what happens if the producer of these objects has
SCHED_FIFO policy and it's a uniprocessor machine?  If the producer sits
there allocating objects and the freeing thread never executes?  Has
this been considered, and tested for?


All these concerns, risks and complexity and the changelog offers us no
reason to take any of this on.  What's wrong with the existing code? 
Please exhaustively describe the issues which are being seen.  And
explain why those issues are sufficiently serious to leave the above
issues and risks unaddressed.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ