[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0ca188a3-9861-49a4-a6f1-ba6ad726c5f0@redhat.com>
Date: Wed, 7 Aug 2024 13:41:50 -0400
From: Waiman Long <longman@...hat.com>
To: Miaohe Lin <linmiaohe@...wei.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Huang Ying <ying.huang@...el.com>, Len Brown <len.brown@...el.com>,
Juri Lelli <juri.lelli@...hat.com>, Andrew Morton
<akpm@...ux-foundation.org>, Naoya Horiguchi <nao.horiguchi@...il.com>
Subject: Re: [PATCH v2] mm/memory-failure: Use raw_spinlock_t in struct
memory_failure_cpu
On 8/6/24 23:15, Miaohe Lin wrote:
> On 2024/8/7 0:41, Waiman Long wrote:
>> The memory_failure_cpu structure is a per-cpu structure. Access to its
>> content requires the use of get_cpu_var() to lock in the current CPU
>> and disable preemption. The use of a regular spinlock_t for locking
>> purpose is fine for a non-RT kernel.
>>
>> Since the integration of RT spinlock support into the v5.15 kernel,
>> a spinlock_t in a RT kernel becomes a sleeping lock and taking a
>> sleeping lock in a preemption disabled context is illegal resulting in
>> the following kind of warning.
>>
>> [12135.732244] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
>> [12135.732248] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 270076, name: kworker/0:0
>> [12135.732252] preempt_count: 1, expected: 0
>> [12135.732255] RCU nest depth: 2, expected: 2
>> :
>> [12135.732420] Hardware name: Dell Inc. PowerEdge R640/0HG0J8, BIOS 2.10.2 02/24/2021
>> [12135.732423] Workqueue: kacpi_notify acpi_os_execute_deferred
>> [12135.732433] Call Trace:
>> [12135.732436] <TASK>
>> [12135.732450] dump_stack_lvl+0x57/0x81
>> [12135.732461] __might_resched.cold+0xf4/0x12f
>> [12135.732479] rt_spin_lock+0x4c/0x100
>> [12135.732491] memory_failure_queue+0x40/0xe0
>> [12135.732503] ghes_do_memory_failure+0x53/0x390
>> [12135.732516] ghes_do_proc.constprop.0+0x229/0x3e0
>> [12135.732575] ghes_proc+0xf9/0x1a0
>> [12135.732591] ghes_notify_hed+0x6a/0x150
>> [12135.732602] notifier_call_chain+0x43/0xb0
>> [12135.732626] blocking_notifier_call_chain+0x43/0x60
>> [12135.732637] acpi_ev_notify_dispatch+0x47/0x70
>> [12135.732648] acpi_os_execute_deferred+0x13/0x20
>> [12135.732654] process_one_work+0x41f/0x500
>> [12135.732695] worker_thread+0x192/0x360
>> [12135.732715] kthread+0x111/0x140
>> [12135.732733] ret_from_fork+0x29/0x50
>> [12135.732779] </TASK>
>>
>> Fix it by using a raw_spinlock_t for locking instead. Also move the
>> pr_err() out of the lock critical section to avoid indeterminate latency
>> of this call.
>>
>> Fixes: ea8f5fb8a71f ("HWPoison: add memory_failure_queue()")
> We shouldn't have this problem before RT spinlock is supported? If so, this Fixes tag might be wrong.
OK, I can take out the Fixes tag. It is hard to pinpoint a particular RT
related commit.
>
>> Signed-off-by: Waiman Long <longman@...hat.com>
>> ---
>> mm/memory-failure.c | 18 ++++++++++--------
>> 1 file changed, 10 insertions(+), 8 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 581d3e5c9117..7aeb5198c2a0 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -2417,7 +2417,7 @@ struct memory_failure_entry {
>> struct memory_failure_cpu {
>> DECLARE_KFIFO(fifo, struct memory_failure_entry,
>> MEMORY_FAILURE_FIFO_SIZE);
>> - spinlock_t lock;
>> + raw_spinlock_t lock;
>> struct work_struct work;
>> };
>>
>> @@ -2443,19 +2443,21 @@ void memory_failure_queue(unsigned long pfn, int flags)
>> {
>> struct memory_failure_cpu *mf_cpu;
>> unsigned long proc_flags;
>> + bool buffer_overflow;
>> struct memory_failure_entry entry = {
>> .pfn = pfn,
>> .flags = flags,
>> };
>>
>> mf_cpu = &get_cpu_var(memory_failure_cpu);
>> - spin_lock_irqsave(&mf_cpu->lock, proc_flags);
>> - if (kfifo_put(&mf_cpu->fifo, entry))
>> + raw_spin_lock_irqsave(&mf_cpu->lock, proc_flags);
>> + buffer_overflow = !kfifo_put(&mf_cpu->fifo, entry);
>> + if (!buffer_overflow)
>> schedule_work_on(smp_processor_id(), &mf_cpu->work);
>> - else
>> + raw_spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
>> + if (buffer_overflow)
>> pr_err("buffer overflow when queuing memory failure at %#lx\n",
>> pfn);
> Should we put pr_err() further under put_cpu_var()?
Yes, we should probably enable preemption first before calling pr_err().
Will make the change in v2.
Thanks,
Longman
>
>> - spin_unlock_irqrestore(&mf_cpu->lock, proc_flags)
>> put_cpu_var(memory_failure_cpu);
>> }
> Will below diff be more straightforward?
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index b68953dc9fad..be172cbc6ca9 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2553,20 +2553,23 @@ void memory_failure_queue(unsigned long pfn, int flags)
> {
> struct memory_failure_cpu *mf_cpu;
> unsigned long proc_flags;
> + bool buffer_overflow = false;
> struct memory_failure_entry entry = {
> .pfn = pfn,
> .flags = flags,
> };
>
> mf_cpu = &get_cpu_var(memory_failure_cpu);
> - spin_lock_irqsave(&mf_cpu->lock, proc_flags);
> + raw_spin_lock_irqsave(&mf_cpu->lock, proc_flags);
> if (kfifo_put(&mf_cpu->fifo, entry))
> schedule_work_on(smp_processor_id(), &mf_cpu->work);
> else
> + buffer_overflow = true;
> + raw_spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
> + put_cpu_var(memory_failure_cpu);
> + if (buffer_overflow)
> pr_err("buffer overflow when queuing memory failure at %#lx\n",
> pfn);
> - spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
> - put_cpu_var(memory_failure_cpu);
> }
> EXPORT_SYMBOL_GPL(memory_failure_queue);
>
> But no strong opinion.
>
> Thanks.
> .
>
Powered by blists - more mailing lists