linux-kernel - Re: blktests: block/009 next-20210304 failure rate average of 1/448

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210318175424.GR13911@42.do-not-panic.com>
Date:   Thu, 18 Mar 2021 17:54:24 +0000
From:   Luis Chamberlain <mcgrof@...nel.org>
To:     linux-block@...r.kernel.org, jejb@...ux.ibm.com,
        martin.petersen@...cle.com
Cc:     linux-scsi@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: blktests: block/009 next-20210304 failure rate average of 1/448

Adding linux-fsdevel as folks working on fstests might be
interested.

On Tue, Mar 16, 2021 at 05:46:45PM +0000, Luis Chamberlain wrote:
> My personal suspicion is not on the block layer but on scsi_debug
> because this can fail:
> 
> modprobe scsi_debug; rmmod scsi_debug
> 
> This second issue may be a secondary separate issue, but I figured 
> I'd mention it. To fix this later issue I've looked at ways to
> make scsi_debug_init() wait until its scsi devices are probed,
> however its not clear how to do this correctly. If someone has
> an idea let me know. If that fixes this issue then we know it was
> that.

OK so this other issue with scsi_debug indeed deserves its own tracking
so I filed a bug for it but also looked into it and tried to see how to
resolve it.

Someone who works on scsi should revise my work as I haven't touched
scsi before except for the recent block layer work I had done for the
blktrace races, however, my own analysis is that this should not be
fixed in scsi_debug but instead in the users of scsi_debug.

The rationale for that is here:

https://bugzilla.kernel.org/show_bug.cgi?id=212337

The skinny of it is that we have no control over when userspace may muck
with the newly exposed devices as they are being initialized, and
shoe-horning a solution in scsi_debug_init() is prone to always be allow
a race with userspace never letting scsi_debug_init() complete.

So best we can do is just use something like lsof on the tools which
use scsi_debug *prior* to mucking with the devices and / or removal of
the module.

I'll follow up with respective blktests / fstests patches, which I
suspect may address a few false positives.

  Luis