linux-kernel - Re: [PATCH v2] mm/memory-failure: Use raw_spinlock_t in struct memory_failure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0ca188a3-9861-49a4-a6f1-ba6ad726c5f0@redhat.com>
Date: Wed, 7 Aug 2024 13:41:50 -0400
From: Waiman Long <longman@...hat.com>
To: Miaohe Lin <linmiaohe@...wei.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
 Huang Ying <ying.huang@...el.com>, Len Brown <len.brown@...el.com>,
 Juri Lelli <juri.lelli@...hat.com>, Andrew Morton
 <akpm@...ux-foundation.org>, Naoya Horiguchi <nao.horiguchi@...il.com>
Subject: Re: [PATCH v2] mm/memory-failure: Use raw_spinlock_t in struct
 memory_failure_cpu

On 8/6/24 23:15, Miaohe Lin wrote:
> On 2024/8/7 0:41, Waiman Long wrote:
>> The memory_failure_cpu structure is a per-cpu structure. Access to its
>> content requires the use of get_cpu_var() to lock in the current CPU
>> and disable preemption. The use of a regular spinlock_t for locking
>> purpose is fine for a non-RT kernel.
>>
>> Since the integration of RT spinlock support into the v5.15 kernel,
>> a spinlock_t in a RT kernel becomes a sleeping lock and taking a
>> sleeping lock in a preemption disabled context is illegal resulting in
>> the following kind of warning.
>>
>>    [12135.732244] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
>>    [12135.732248] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 270076, name: kworker/0:0
>>    [12135.732252] preempt_count: 1, expected: 0
>>    [12135.732255] RCU nest depth: 2, expected: 2
>>      :
>>    [12135.732420] Hardware name: Dell Inc. PowerEdge R640/0HG0J8, BIOS 2.10.2 02/24/2021
>>    [12135.732423] Workqueue: kacpi_notify acpi_os_execute_deferred
>>    [12135.732433] Call Trace:
>>    [12135.732436]  <TASK>
>>    [12135.732450]  dump_stack_lvl+0x57/0x81
>>    [12135.732461]  __might_resched.cold+0xf4/0x12f
>>    [12135.732479]  rt_spin_lock+0x4c/0x100
>>    [12135.732491]  memory_failure_queue+0x40/0xe0
>>    [12135.732503]  ghes_do_memory_failure+0x53/0x390
>>    [12135.732516]  ghes_do_proc.constprop.0+0x229/0x3e0
>>    [12135.732575]  ghes_proc+0xf9/0x1a0
>>    [12135.732591]  ghes_notify_hed+0x6a/0x150
>>    [12135.732602]  notifier_call_chain+0x43/0xb0
>>    [12135.732626]  blocking_notifier_call_chain+0x43/0x60
>>    [12135.732637]  acpi_ev_notify_dispatch+0x47/0x70
>>    [12135.732648]  acpi_os_execute_deferred+0x13/0x20
>>    [12135.732654]  process_one_work+0x41f/0x500
>>    [12135.732695]  worker_thread+0x192/0x360
>>    [12135.732715]  kthread+0x111/0x140
>>    [12135.732733]  ret_from_fork+0x29/0x50
>>    [12135.732779]  </TASK>
>>
>> Fix it by using a raw_spinlock_t for locking instead. Also move the
>> pr_err() out of the lock critical section to avoid indeterminate latency
>> of this call.
>>
>> Fixes: ea8f5fb8a71f ("HWPoison: add memory_failure_queue()")
> We shouldn't have this problem before RT spinlock is supported? If so, this Fixes tag might be wrong.
OK, I can take out the Fixes tag. It is hard to pinpoint a particular RT 
related commit.
>
>> Signed-off-by: Waiman Long <longman@...hat.com>
>> ---
>>   mm/memory-failure.c | 18 ++++++++++--------
>>   1 file changed, 10 insertions(+), 8 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 581d3e5c9117..7aeb5198c2a0 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -2417,7 +2417,7 @@ struct memory_failure_entry {
>>   struct memory_failure_cpu {
>>   	DECLARE_KFIFO(fifo, struct memory_failure_entry,
>>   		      MEMORY_FAILURE_FIFO_SIZE);
>> -	spinlock_t lock;
>> +	raw_spinlock_t lock;
>>   	struct work_struct work;
>>   };
>>   
>> @@ -2443,19 +2443,21 @@ void memory_failure_queue(unsigned long pfn, int flags)
>>   {
>>   	struct memory_failure_cpu *mf_cpu;
>>   	unsigned long proc_flags;
>> +	bool buffer_overflow;
>>   	struct memory_failure_entry entry = {
>>   		.pfn =		pfn,
>>   		.flags =	flags,
>>   	};
>>   
>>   	mf_cpu = &get_cpu_var(memory_failure_cpu);
>> -	spin_lock_irqsave(&mf_cpu->lock, proc_flags);
>> -	if (kfifo_put(&mf_cpu->fifo, entry))
>> +	raw_spin_lock_irqsave(&mf_cpu->lock, proc_flags);
>> +	buffer_overflow = !kfifo_put(&mf_cpu->fifo, entry);
>> +	if (!buffer_overflow)
>>   		schedule_work_on(smp_processor_id(), &mf_cpu->work);
>> -	else
>> +	raw_spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
>> +	if (buffer_overflow)
>>   		pr_err("buffer overflow when queuing memory failure at %#lx\n",
>>   		       pfn);
> Should we put pr_err() further under put_cpu_var()?

Yes, we should probably enable preemption first before calling pr_err(). 
Will make the change in v2.

Thanks,
Longman

>
>> -	spin_unlock_irqrestore(&mf_cpu->lock, proc_flags)
>>   	put_cpu_var(memory_failure_cpu);
>>   }
> Will below diff be more straightforward?
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index b68953dc9fad..be172cbc6ca9 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2553,20 +2553,23 @@ void memory_failure_queue(unsigned long pfn, int flags)
>   {
>          struct memory_failure_cpu *mf_cpu;
>          unsigned long proc_flags;
> +       bool buffer_overflow = false;
>          struct memory_failure_entry entry = {
>                  .pfn =          pfn,
>                  .flags =        flags,
>          };
>
>          mf_cpu = &get_cpu_var(memory_failure_cpu);
> -       spin_lock_irqsave(&mf_cpu->lock, proc_flags);
> +       raw_spin_lock_irqsave(&mf_cpu->lock, proc_flags);
>          if (kfifo_put(&mf_cpu->fifo, entry))
>                  schedule_work_on(smp_processor_id(), &mf_cpu->work);
>          else
> +               buffer_overflow = true;
> +       raw_spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
> +       put_cpu_var(memory_failure_cpu);
> +       if (buffer_overflow)
>                  pr_err("buffer overflow when queuing memory failure at %#lx\n",
>                         pfn);
> -       spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
> -       put_cpu_var(memory_failure_cpu);
>   }
>   EXPORT_SYMBOL_GPL(memory_failure_queue);
>
> But no strong opinion.
>
> Thanks.
> .
>