[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f79735e1-1625-4746-98ce-a3c40123c5af@linux.dev>
Date: Sat, 23 Aug 2025 12:47:49 +0800
From: Lance Yang <lance.yang@...ux.dev>
To: Finn Thain <fthain@...ux-m68k.org>,
Geert Uytterhoeven <geert@...ux-m68k.org>, mhiramat@...nel.org
Cc: akpm@...ux-foundation.org, will@...nel.org, peterz@...radead.org,
mingo@...hat.com, longman@...hat.com, anna.schumaker@...cle.com,
boqun.feng@...il.com, joel.granados@...nel.org, kent.overstreet@...ux.dev,
leonylgao@...cent.com, linux-kernel@...r.kernel.org, rostedt@...dmis.org,
tfiga@...omium.org, amaindex@...look.com, jstultz@...gle.com,
Mingzhe Yang <mingzhe.yang@...com>, Eero Tamminen <oak@...sinkinet.fi>,
linux-m68k <linux-m68k@...ts.linux-m68k.org>,
Lance Yang <ioworker0@...il.com>, senozhatsky@...omium.org
Subject: Re: [PATCH v5 2/3] hung_task: show the blocker task if the task is
hung on semaphore
Hi Finn,
On 2025/8/23 08:27, Finn Thain wrote:
>
> On Sat, 23 Aug 2025, Lance Yang wrote:
>
>>>
>>> include/linux/hung_task.h-/*
>>> include/linux/hung_task.h- * @blocker: Combines lock address and blocking type.
>>> include/linux/hung_task.h- *
>>> include/linux/hung_task.h- * Since lock pointers are at least 4-byte aligned(32-bit) or 8-byte
>>> include/linux/hung_task.h- * aligned(64-bit). This leaves the 2 least bits (LSBs) of the pointer
>>> include/linux/hung_task.h- * always zero. So we can use these bits to encode the specific blocking
>>> include/linux/hung_task.h- * type.
>>> include/linux/hung_task.h- *
>
> That comment was introduced in commit e711faaafbe5 ("hung_task: replace
> blocker_mutex with encoded blocker"). It's wrong and should be fixed.
Right, the problematic assumption was introduced in that commit ;)
>
>>> include/linux/hung_task.h- * Type encoding:
>>> include/linux/hung_task.h- * 00 - Blocked on mutex
>>> (BLOCKER_TYPE_MUTEX)
>>> include/linux/hung_task.h- * 01 - Blocked on semaphore
>>> (BLOCKER_TYPE_SEM)
>>> include/linux/hung_task.h- * 10 - Blocked on rw-semaphore as READER
>>> (BLOCKER_TYPE_RWSEM_READER)
>>> include/linux/hung_task.h- * 11 - Blocked on rw-semaphore as WRITER
>>> (BLOCKER_TYPE_RWSEM_WRITER)
>>> include/linux/hung_task.h- */
>>> include/linux/hung_task.h-#define BLOCKER_TYPE_MUTEX 0x00UL
>>> include/linux/hung_task.h-#define BLOCKER_TYPE_SEM 0x01UL
>>> include/linux/hung_task.h-#define BLOCKER_TYPE_RWSEM_READER 0x02UL
>>> include/linux/hung_task.h-#define BLOCKER_TYPE_RWSEM_WRITER 0x03UL
>>> include/linux/hung_task.h-
>>> include/linux/hung_task.h:#define BLOCKER_TYPE_MASK 0x03UL
>>>
>>> On m68k, the minimum alignment of int and larger is 2 bytes.
>>
>> Ah, thanks, that's good to know! It clearly explains why the
>> WARN_ON_ONCE() is triggering.
>>
>>> If you want to use the lowest 2 bits of a pointer for your own use,
>>> you must make sure data is sufficiently aligned.
>>
>> You're right. Apparently I missed that :(
>>
>> I'm wondering if there's a way to check an architecture's minimum
>> alignment at compile-time. If so, we could disable this feature on
>> architectures that don't guarantee 4-byte alignment.
>>
>
> As Geert says, the compiler can give you all the bits you need, so you
> won't have to contort your algorithm to fit whatever free bits happen to
> be available. Please see for example, commit 258a980d1ec2 ("net: dst:
> Force 4-byte alignment of dst_metrics").
Yes, thanks, it's a helpful example!
I see your point that explicitly enforcing alignment is a very clean
solution for the lock structures supported by the blocker tracking
mechanism.
However, I'm thinking about the "principle of minimal impact" here.
Forcing alignment on the core lock types themselves — like struct
semaphore — feels like a broad change to fix an issue that's local to the
hung task detector :)
>
>> If not, the fallback is to adjust the runtime checks.
>>
>
> That would be a solution to a different problem.
For that reason, I would prefer to simply adjust the runtime checks within
the hung task detector. It feels like a more generic and self-contained
solution. It works out-of-the-box for the majority of architectures and
provides a safe fallback for those that aren't.
Happy to hear what you and others think about this trade-off. Perhaps
there's a perspective I'm missing ;)
Thanks,
Lance
Powered by blists - more mailing lists