lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <tencent_347CD8D5F7FC23EFE58BBACBA98894DB6E05@qq.com>
Date: Wed, 28 Jan 2026 11:29:51 +0800
From: Yuwen Chen <ywen.chen@...mail.com>
To: tglx@...nel.org
Cc: akpm@...ux-foundation.org,
	andrealmeid@...lia.com,
	bigeasy@...utronix.de,
	colin.i.king@...il.com,
	dave@...olabs.net,
	dvhart@...radead.org,
	edliaw@...gle.com,
	justinstitt@...gle.com,
	kernel-team@...roid.com,
	licayy@...mail.com,
	linux-kernel@...r.kernel.org,
	linux-kselftest@...r.kernel.org,
	luto@....edu,
	mingo@...hat.com,
	morbo@...gle.com,
	nathan@...nel.org,
	ndesaulniers@...gle.com,
	peterz@...radead.org,
	shuah@...nel.org,
	usama.anjum@...labora.com,
	wakel@...gle.com,
	ywen.chen@...mail.com
Subject: Re: [PATCH v2] selftests/futex: fix the failed futex_requeue test issue

On Tue, 27 Jan 2026 19:30:31 +0100, Thomas Gleixner wrote:
> Extremely high?
> 
> The main thread waits for 10000us aka. 10 seconds to allow the waiter
> thread to reach futex_wait().
> 
> If anything is extreme then it's the 10 seconds wait, not the
> requirements. Please write factual changelogs and not fairy tales.

10,000 us is equal to 10 ms. On a specific ARM64 platform, it's quite
common for this test case to fail when there is a 10-millisecond waiting
time.

> That's a known issue for all futex selftests when the test system is
> under extreme load. That's why there is a gratious 10 seconds timeout,
> which is annoyingly long already.
> 
> Also why is this special for the requeue_single test case?
> 
> It's exactly the same issue for all futex selftests including the multi
> waiter one in the very same file, no?

Yes, this is a common phenomenon. However, for the sake of convenient
illustration, only the case of requeue_single is listed here.

> Why do you need an atomic store here?
> 
> pthread_barrier_wait() is a full memory barrier already, no?

Yes, there's no need to use atomic here. However, in the kernel, WRITE_ONCE
and READ_ONCE are more likely to be used. Since it's particularly difficult
to use them here, atomic is adopted.

> What's wrong with reading /proc/$PID/wchan ?
> 
> It's equally unreliable as /proc/$PID/stat because both can return the
> desired state _before_ the thread reaches the inner workings of the test
> related sys_futex(... WAIT).

Is it possible for the waiterfn to enter the sleep state between the
pthread_barrier_wait function and the futex_wait function? If so, would
checking the call stack be a solution?
Maybe using /proc/$PID/wchan is a better approach. Currently, I haven't found
any problems when using /proc/$PID/stat on our platform.

Thanks

    Yuwen


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ