lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240905221908.1960-1-hdanton@sina.com>
Date: Fri,  6 Sep 2024 06:19:08 +0800
From: Hillf Danton <hdanton@...a.com>
To: Marcelo Tosatti <mtosatti@...hat.com>
Cc: Leonardo Bras <leobras@...hat.com>,
	Michal Hocko <mhocko@...nel.org>,
	Roman Gushchin <roman.gushchin@...ux.dev>,
	linux-kernel@...r.kernel.org,
	linux-mm@...ck.org
Subject: Re: [RFC PATCH v1 0/4] Introduce QPW for per-cpu operations

On Tue, 23 Jul 2024 14:14:34 -0300 Marcelo Tosatti <mtosatti@...hat.com>
> On Sat, Jun 22, 2024 at 12:58:08AM -0300, Leonardo Bras wrote:
> > The problem:
> > Some places in the kernel implement a parallel programming strategy
> > consisting on local_locks() for most of the work, and some rare remote
> > operations are scheduled on target cpu. This keeps cache bouncing low since
> > cacheline tends to be mostly local, and avoids the cost of locks in non-RT
> > kernels, even though the very few remote operations will be expensive due
> > to scheduling overhead.
> > 
> > On the other hand, for RT workloads this can represent a problem: getting
> > an important workload scheduled out to deal with remote requests is
> > sure to introduce unexpected deadline misses.
> 
> Another hang with a busy polling workload (kernel update hangs on
> grub2-probe):
> 
> [342431.665417] INFO: task grub2-probe:24484 blocked for more than 622 seconds.
> [342431.665458]       Tainted: G        W      X  -------  ---  5.14.0-438.el9s.x86_64+rt #1
> [342431.665488] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [342431.665515] task:grub2-probe     state:D stack:0     pid:24484 ppid:24455  flags:0x00004002
> [342431.665523] Call Trace:
> [342431.665525]  <TASK>
> [342431.665527]  __schedule+0x22a/0x580
> [342431.665537]  schedule+0x30/0x80
> [342431.665539]  schedule_timeout+0x153/0x190
> [342431.665543]  ? preempt_schedule_thunk+0x16/0x30
> [342431.665548]  ? preempt_count_add+0x70/0xa0
> [342431.665554]  __wait_for_common+0x8b/0x1c0
> [342431.665557]  ? __pfx_schedule_timeout+0x10/0x10
> [342431.665560]  __flush_work.isra.0+0x15b/0x220

The fresh new flush_percpu_work() is nop with CONFIG_PREEMPT_RT enabled, why
are you testing it with 5.14.0-438.el9s.x86_64+rt instead of mainline? Or what
are you testing?

BTW the hang fails to show the unexpected deadline misses.

> [342431.665565]  ? __pfx_wq_barrier_func+0x10/0x10
> [342431.665570]  __lru_add_drain_all+0x17d/0x220
> [342431.665576]  invalidate_bdev+0x28/0x40
> [342431.665583]  blkdev_common_ioctl+0x714/0xa30
> [342431.665588]  ? bucket_table_alloc.isra.0+0x1/0x150
> [342431.665593]  ? cp_new_stat+0xbb/0x180
> [342431.665599]  blkdev_ioctl+0x112/0x270
> [342431.665603]  ? security_file_ioctl+0x2f/0x50
> [342431.665609]  __x64_sys_ioctl+0x87/0xc0

Powered by blists - more mailing lists