lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 7 Jun 2024 10:37:58 +0800
From: Chengming Zhou <chengming.zhou@...ux.dev>
To: Friedrich Weber <f.weber@...xmox.com>, axboe@...nel.dk,
 ming.lei@...hat.com, hch@....de, bvanassche@....org
Cc: linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
 zhouchengming@...edance.com, Thomas Lamprecht <t.lamprecht@...xmox.com>
Subject: Re: [PATCH] block: fix request.queuelist usage in flush

On 2024/6/6 16:44, Friedrich Weber wrote:
> On 05/06/2024 16:27, Chengming Zhou wrote:
>> On 2024/6/5 21:34, Friedrich Weber wrote:
>>> On 05/06/2024 12:54, Friedrich Weber wrote:
>>> [...]
>>>
>>> My results:
>>>
>>> Booting the Debian (virtual) machine with mainline kernel v6.10-rc2
>>> (c3f38fa61af77b49866b006939479069cd451173):
>>> works fine, no crash
>>>
>>> Booting the Debian (virtual) machine with patch "block: fix
>>> request.queuelist usage in flush" applied on top of v6.10-rc2: The
>>> Debian (virtual) machine crashes during boot with [1].
>>>
>>> Hope this helps! If I can provide anything else, just let me know.
>>
>> Thanks for your help, I still can't reproduce it myself, don't know why.
> 
> Weird -- when booting the Debian machine into mainline kernel v6.10-rc2
> with "block: fix request.queuelist usage in flush" applied on top, it
> crashes reliably for me. The machine having its root on LVM seems to be
> essential to reproduce the crash, though.

Yeah, right, it seems LVM may create this special request that only has
PREFLUSH | POSTFLUSH without any DATA, goes into the flush state machine.
Then, cause the request double list_add_tail() without list_del_init().
I don't know the reason behind it, but well, it's allowable in the current
flush code.

> 
> Maybe the fact that I'm running the Debian machine virtualized makes the
> crash more likely to trigger. I'll try to reproduce on bare metal to
> narrow down the reproducer and get back to you.

Thanks much for your very detailed process on that thread!

> 
>> Could you help to test with this diff?
>>
>> diff --git a/block/blk-flush.c b/block/blk-flush.c
>> index e7aebcf00714..cca4f9131f79 100644
>> --- a/block/blk-flush.c
>> +++ b/block/blk-flush.c
>> @@ -263,6 +263,7 @@ static enum rq_end_io_ret flush_end_io(struct request *flush_rq,
>>                 unsigned int seq = blk_flush_cur_seq(rq);
>>
>>                 BUG_ON(seq != REQ_FSEQ_PREFLUSH && seq != REQ_FSEQ_POSTFLUSH);
>> +               list_del_init(&rq->queuelist);
>>                 blk_flush_complete_seq(rq, fq, seq, error);
>>         }
> 
> I used mainline kernel v6.10-rc2 as base and applied:
> 
> - "block: fix request.queuelist usage in flush"
> - Your `list_del_init` addition from above
> 
> and if I boot the Debian machine into this kernel, I do not get the
> crash anymore.

Good to hear. So can I merge these two diffs into one patch and add
your Tested-by?

> 
> Happy to run more tests for you, just let me know.

Thanks again!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ