lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <k3aiufe36mb2re3fyfzam4hqdeshvbqcashxiyb5grn7w2iz2s@2oeaei6klok3>
Date: Thu, 18 Jul 2024 17:40:53 -0700
From: Shakeel Butt <shakeel.butt@...ux.dev>
To: Jesper Dangaard Brouer <hawk@...nel.org>
Cc: Yosry Ahmed <yosryahmed@...gle.com>, tj@...nel.org, 
	cgroups@...r.kernel.org, hannes@...xchg.org, lizefan.x@...edance.com, longman@...hat.com, 
	kernel-team@...udflare.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH V7 1/2] cgroup/rstat: Avoid thundering herd problem by
 kswapd across NUMA nodes

Hi Jesper,

On Wed, Jul 17, 2024 at 06:36:28PM GMT, Jesper Dangaard Brouer wrote:
> 
[...]
> 
> 
> Looking at the production numbers for the time the lock is held for level 0:
> 
> @locked_time_level[0]:
> [4M, 8M)     623 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@               |
> [8M, 16M)    860 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [16M, 32M)   295 |@@@@@@@@@@@@@@@@@                                   |
> [32M, 64M)   275 |@@@@@@@@@@@@@@@@                                    |
> 

Is it possible to get the above histogram for other levels as well? I
know this is 12 numa node machine, how many total CPUs are there?

> The time is in nanosec, so M corresponds to ms (milliseconds).
> 
> With 36 flushes per second (as shown earlier) this is a flush every
> 27.7ms.  It is not unreasonable (from above data) that the flush time
> also spend 27ms, which means that we spend a full CPU second flushing.
> That is spending too much time flushing.

One idea to further reduce this time is more fine grained flush
skipping. At the moment we either skip the whole flush or not. How
about we make this decision per-cpu? We already have per-cpu updates
data and if it is less than MEMCG_CHARGE_BATCH, skip flush on that cpu.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ