lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1a00da0e-cb8e-30ea-8d17-120f97242b2f@kernel.dk>
Date:   Mon, 11 Feb 2019 08:46:04 -0700
From:   Jens Axboe <axboe@...nel.dk>
To:     James Bottomley <James.Bottomley@...senPartnership.com>,
        Mikael Pettersson <mikpelinux@...il.com>
Cc:     Linux SPARC Kernel Mailing List <sparclinux@...r.kernel.org>,
        linux-block@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [5.0-rc5 regression] "scsi: kill off the legacy IO path" causes 5
 minute delay during boot on Sun Blade 2500

On 2/11/19 8:42 AM, James Bottomley wrote:
> On Mon, 2019-02-11 at 08:28 -0700, Jens Axboe wrote:
>> On 2/11/19 8:25 AM, James Bottomley wrote:
>>> On Sun, 2019-02-10 at 09:35 -0700, Jens Axboe wrote:
>>>> On 2/10/19 9:25 AM, James Bottomley wrote:
>>>>> On Sun, 2019-02-10 at 09:05 -0700, Jens Axboe wrote:
>>>>>> On 2/10/19 8:44 AM, James Bottomley wrote:
>>>>>>> On Sun, 2019-02-10 at 10:17 +0100, Mikael Pettersson wrote:
>>>>>>>> On Sat, Feb 9, 2019 at 7:19 PM James Bottomley
>>>>>>>> <James.Bottomley@...senpartnership.com> wrote:
>>>>>>>
>>>>>>> [...]
>>>>>>>>> I think the reason for this is that the block mq path
>>>>>>>>> doesn't feed the kernel entropy pool correctly, hence
>>>>>>>>> the need to install an entropy gatherer for systems
>>>>>>>>> that don't have other good random number sources.
>>>>>>>>
>>>>>>>> That does sound plausible, I admit I didn't even consider
>>>>>>>> the possibility that the old block I/O path also was an
>>>>>>>> entropy source.
>>>>>>>
>>>>>>> In theory, the new one should be as well since the
>>>>>>> rotational entropy collector is on the SCSI completion
>>>>>>> path.   I'd seen the same problem but had assumed it was
>>>>>>> something someone had done to our internal entropy pool and
>>>>>>> thus hadn't bisected it.
>>>>>>
>>>>>> The difference is that the old stack included ADD_RANDOM by
>>>>>> default, so this check:
>>>>>>
>>>>>> 	if (blk_queue_add_random(q))
>>>>>> 		add_disk_randomness(req->rq_disk);
>>>>>>
>>>>>> in scsi_end_request() would be true, and we'd add the
>>>>>> randomness. For sd, it seems to set it just fine for non-
>>>>>> rotational drives. Could this be because other devices don't?
>>>>>> Maybe the below makes a difference.
>>>>>
>>>>> No, in both we set it per the rotational parameters of the disk
>>>>> in 
>>>>>
>>>>> sd.c:sd_read_block_characteristics()
>>>>>
>>>>> 	rot = get_unaligned_be16(&buffer[4]);
>>>>>
>>>>> 	if (rot == 1) {
>>>>> 	
>>>>> 	blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
>>>>> 	
>>>>> 	blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, q);
>>>>> 	} else {
>>>>> 	
>>>>> 	blk_queue_flag_clear(QUEUE_FLAG_NONROT, q);
>>>>> 	
>>>>> 	blk_queue_flag_set(QUEUE_FLAG_ADD_RANDOM, q);
>>>>> 	}
>>>>>
>>>>>
>>>>> That check wasn't changed by the code removal.
>>>>
>>>> As I said above, for sd. This isn't true for non-disks.
>>>
>>> Yes, but the behaviour above doesn't change across a switch to MQ,
>>> so I don't quite understand how it bisects back to that change.  If
>>> we're not gathering entropy for the device now, we wouldn't have
>>> been before the switch, so the entropy characteristics shouldn't
>>> have changed.
>>
>> But it does, as I also wrote in that first email. The legacy queue
>> flags had QUEUE_FLAG_ADD_RANDOM set by default, the MQ ones do not.
>> Hence any non-sd device would previously ALWAYS have ADD_RANDOM
>> set, now none of them do. Also see the patch I sent.
> 
> So your theory is that the disk in question never gets to the
> rotational check?  because the check will clear the flag if it's non-
> rotational and set it if it's not, so the default state of the flag
> shouldn't matter.

No, my point is about non-disks, devices that aren't driven by sd. The
behavior for sd hasn't changed, as it sets/clears it unconditionally.
That's not true for something driven by sr, for instance, and anything
else non-sd. For those we defaulted to adding randomness for !scsi-mq,
and default to not adding randomness for scsi-mq.

The patch I included would have the same behavior for scsi-mq as we had
for non-mq.

-- 
Jens Axboe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ