lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 2 Aug 2018 19:35:12 +0800
From:   Ming Lei <tom.leiming@...il.com>
To:     Guenter Roeck <linux@...ck-us.net>, linux-ide@...r.kernel.org,
        Tejun Heo <tj@...nel.org>
Cc:     James Bottomley <James.Bottomley@...senpartnership.com>,
        Stephen Rothwell <sfr@...b.auug.org.au>,
        Linux-Next Mailing List <linux-next@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-scsi <linux-scsi@...r.kernel.org>,
        Ming Lei <ming.lei@...hat.com>
Subject: Re: linux-next: Tree for Aug 1

On Thu, Aug 2, 2018 at 12:58 PM, Guenter Roeck <linux@...ck-us.net> wrote:
> On 08/01/2018 05:03 PM, James Bottomley wrote:
>>
>> On Thu, 2018-08-02 at 07:57 +0800, Ming Lei wrote:
>>>
>>> On Thu, Aug 2, 2018 at 7:47 AM, Guenter Roeck <linux@...ck-us.net>
>>> wrote:
>>>>
>>>> On Wed, Aug 01, 2018 at 03:52:45PM -0700, James Bottomley wrote:
>>>>>
>>>>> On Wed, 2018-08-01 at 15:48 -0700, Guenter Roeck wrote:
>>>>>>
>>>>>> On Wed, Aug 01, 2018 at 05:58:52PM +1000, Stephen Rothwell
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Changes since 20180731:
>>>>>>>
>>>>>>> The pci tree gained a conflict against the pci-current tree.
>>>>>>>
>>>>>>> The net-next tree gained a conflict against the bpf tree.
>>>>>>>
>>>>>>> The block tree lost its build failure.
>>>>>>>
>>>>>>> The staging tree still had its build failure due to an
>>>>>>> interaction
>>>>>>> with
>>>>>>> the vfs tree for which I disabled CONFIG_EROFS_FS.
>>>>>>>
>>>>>>> The kspp tree lost its build failure.
>>>>>>>
>>>>>>> Non-merge commits (relative to Linus' tree): 10070
>>>>>>>   9137 files changed, 417605 insertions(+), 179996 deletions(-
>>>>>>> )
>>>>>>>
>>>>>>> -----------------------------------------------------------
>>>>>>> ------
>>>>>>> -----------
>>>>>>>
>>>>>>
>>>>>> The widespread kernel hang issues are still seen. I managed
>>>>>> to bisect it after working around the transient build failures.
>>>>>> Bisect log is attached below. Unfortunately, it doesn't help
>>>>>> much.
>>>>>> The culprit is reported as:
>>>>>>
>>>>>> 2d542828c5e9 Merge remote-tracking branch 'scsi/for-next'
>>>>>>
>>>>>> The preceding merge,
>>>>>>
>>>>>> 453f1d821165 Merge remote-tracking branch 'cgroup/for-next'
>>>>>>
>>>>>> checks out fine, as does the tip of scsi-next (commit
>>>>>> 103c7b7e0184,
>>>>>> "Merge branch 'misc' into for-next"). No idea how to proceed.
>>>>>
>>>>>
>>>>> This sounds like you may have a problem with this patch:
>>>>>
>>>>>      commit d5038a13eca72fb216c07eb717169092e92284f1
>>>>>       Author: Johannes Thumshirn <jthumshirn@...e.de>
>>>>>       Date:   Wed Jul 4 10:53:56 2018 +0200
>>>>>
>>>>>           scsi: core: switch to scsi-mq by default
>>>>>
>>>>> To verify, boot with the additional kernel parameter
>>>>>
>>>>> scsi_mod.use_blk_mq=0
>>>>>
>>>>> Which will reverse the effect of the above patch.
>>>>>
>>>>
>>>> Yes, that fixes the problem.
>>>
>>>
>>> That may not the root cause, given this issue is only started to
>>> see from next-20180731, but d5038a13eca7 (scsi: core: switch to
>>> scsi-mq by default)
>>> has been in -next for quite a while.
>>>
>>> Seems something new causes this issue.
>>
>>
>> Read my other email about how to find this.
>>
>> https://marc.info/?l=linux-scsi&m=153316446223676
>>
>> Now that we've confirmed the issue, Gunter, could you attempt to bisect
>> it as that email describes?
>>
>
> So, I am more and more baffled.
>
> I ran another round of bisect, this time each test executing twice,
> once with "scsi_mod.use_blk_mq=1" and once with "scsi_mod.use_blk_mq=0",
> requiring both to pass. Bisect still points to the merge as culprit.
>
> Ok, one step further: Actually _revert_ commit d5038a13eca72 before running
> each test, meaning the default is use_blk_mq=0. Still run both tests.
> Bisect _still_ points to the merge of scsi-next as culprit.
>
> So, to me it looks like the problem is triggered by _something_ in
> scsi-next, combined with _something_ in -next prior to the merge,
> not specifically associated with use_blk_mq=[0|1] or d5038a13eca72,
> but to a combination of some patch in scsi-next and some other patch.

Today I am a bit busy, and not trace it much.

So far, I found the code hangs in scsi_test_unit_ready()
<-get_capabilities()<-sr_probe(), and scsi_queue_rq()/ata_scsi_queuecmd()
has queued the command successfully, but never completed.

Also tried to revert commits merged to ata tree on 30th, 31th,
but no difference.


Thanks,
Ming Lei

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ