linux-kernel - Re: [PATCH] ALSA: seq: Fix RCU stall in snd_seq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2d05ceab-b8b7-0c7b-f847-69950c6db14e@gmail.com>
Date:   Tue, 2 Nov 2021 17:41:57 +0800
From:   Zqiang <qiang.zhang1211@...il.com>
To:     Takashi Iwai <tiwai@...e.de>
Cc:     tiwai@...e.com, alsa-devel@...a-project.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] ALSA: seq: Fix RCU stall in snd_seq_write()


On 2021/11/2 下午4:33, Takashi Iwai wrote:
> On Tue, 02 Nov 2021 04:32:22 +0100,
> Zqiang wrote:
>> If we have a lot of cell object, this cycle may take a long time, and
>> trigger RCU stall. insert a conditional reschedule point to fix it.
>>
>> rcu: INFO: rcu_preempt self-detected stall on CPU
>> rcu: 	1-....: (1 GPs behind) idle=9f5/1/0x4000000000000000
>> 	softirq=16474/16475 fqs=4916
>> 	(t=10500 jiffies g=19249 q=192515)
>> NMI backtrace for cpu 1
>> ......
>> asm_sysvec_apic_timer_interrupt
>> RIP: 0010:_raw_spin_unlock_irqrestore+0x38/0x70
>> spin_unlock_irqrestore
>> snd_seq_prioq_cell_out+0x1dc/0x360
>> snd_seq_check_queue+0x1a6/0x3f0
>> snd_seq_enqueue_event+0x1ed/0x3e0
>> snd_seq_client_enqueue_event.constprop.0+0x19a/0x3c0
>> snd_seq_write+0x2db/0x510
>> vfs_write+0x1c4/0x900
>> ksys_write+0x171/0x1d0
>> do_syscall_64+0x35/0xb0
>>
>> Reported-by: syzbot+bb950e68b400ab4f65f8@...kaller.appspotmail.com
>> Signed-off-by: Zqiang <qiang.zhang1211@...il.com>
>> ---
>>   sound/core/seq/seq_queue.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/sound/core/seq/seq_queue.c b/sound/core/seq/seq_queue.c
>> index d6c02dea976c..f5b1e4562a64 100644
>> --- a/sound/core/seq/seq_queue.c
>> +++ b/sound/core/seq/seq_queue.c
>> @@ -263,6 +263,7 @@ void snd_seq_check_queue(struct snd_seq_queue *q, int atomic, int hop)
>>   		if (!cell)
>>   			break;
>>   		snd_seq_dispatch_event(cell, atomic, hop);
>> +		cond_resched();
>>   	}
>>   
>>   	/* Process time queue... */
>> @@ -272,6 +273,7 @@ void snd_seq_check_queue(struct snd_seq_queue *q, int atomic, int hop)
>>   		if (!cell)
>>   			break;
>>   		snd_seq_dispatch_event(cell, atomic, hop);
>> +		cond_resched();
>
> It's good to have cond_resched() in those places but it must be done
> more carefully, as the code path may be called from the non-atomic
> context, too.  That is, it must have a check of atomic argument, and
> cond_resched() is applied only when atomic==false.
>
> But I still wonder how this gets a RCU stall out of sudden.  Looking
> through https://syzkaller.appspot.com/bug?extid=bb950e68b400ab4f65f8
> it's triggered by many cases since the end of September...

I did not find useful information from the log,  through calltrace, I 
guess it may be triggered by the long cycle time, which caused the 
static state of the RCU to

not be reported in time.

I ignore the atomic parameter check,  I will resend v2 .   in no-atomic 
context, we can insert

cond_resched() to avoid this situation, but in atomic context,

the RCU stall maybe still trigger.

thanks
Zqiang

>
>
> thanks,
>
> Takashi