lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aabbc521-b263-2d5f-efc6-72d500ab5c71@tomt.net>
Date:   Sat, 14 Dec 2019 15:13:57 +0100
From:   Andre Tomt <andre@...t.net>
To:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        linux-kernel@...r.kernel.org
Cc:     stable@...r.kernel.org, Christoph Hellwig <hch@....de>,
        Ming Lei <ming.lei@...hat.com>,
        Jianchao Wang <jianchao.w.wang@...cle.com>,
        Jens Axboe <axboe@...nel.dk>, Sasha Levin <sashal@...nel.org>
Subject: Re: [PATCH 4.19 153/306] block: fix the DISCARD request merge
 (4.19.87+ crash)

On 27.11.2019 21:30, Greg Kroah-Hartman wrote:
> From: Jianchao Wang <jianchao.w.wang@...cle.com>
> 
> [ Upstream commit 69840466086d2248898020a08dda52732686c4e6 ]
> 
> There are two cases when handle DISCARD merge.
> If max_discard_segments == 1, the bios/requests need to be contiguous
> to merge. If max_discard_segments > 1, it takes every bio as a range
> and different range needn't to be contiguous.
> 
> But now, attempt_merge screws this up. It always consider contiguity
> for DISCARD for the case max_discard_segments > 1 and cannot merge
> contiguous DISCARD for the case max_discard_segments == 1, because
> rq_attempt_discard_merge always returns false in this case.
> This patch fixes both of the two cases above.
> 
> Reviewed-by: Christoph Hellwig <hch@....de>
> Reviewed-by: Ming Lei <ming.lei@...hat.com>
> Signed-off-by: Jianchao Wang <jianchao.w.wang@...cle.com>
> Signed-off-by: Jens Axboe <axboe@...nel.dk>
> Signed-off-by: Sasha Levin <sashal@...nel.org>

4.19.87, 4.19.88, 4.19.89 all lock up frequently on some of my systems. 
The same systems run 5.4.3 fine, so the newer trees are probably OK.
Reverting this commit on top of 4.19.87 makes everything stable.

To trigger it all I have to do is re-rsyncing a directory tree with some 
changed files churn, it will usually crash in 10 to 30 minutes.

The systems crashing has ext4 filesystem on a two ssd md raid1 mounted 
with the mount option discard. If mounting it without discard, the 
crashes no longer seem to occur.

No oops/panic made it to the ipmi console. I suspect the console is just 
misbehaving and it didnt really livelock. At one point one line of the 
crash made it to the console (kernel BUG at block/blk-core.c:1776), and 
it was enough to pinpoint this commit. Note that the line number might 
be off, as I was attempting a bisect at the time.

This commit also made it to 4.14.x, but I have not tested it.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ