lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 21 Nov 2017 11:27:23 -0700
From:   Jens Axboe <axboe@...nel.dk>
To:     Christian Borntraeger <borntraeger@...ibm.com>,
        Bart Van Assche <Bart.VanAssche@....com>,
        "virtualization@...ts.linux-foundation.org" 
        <virtualization@...ts.linux-foundation.org>,
        "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
        "mst@...hat.com" <mst@...hat.com>,
        "jasowang@...hat.com" <jasowang@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Christoph Hellwig <hch@....de>
Subject: Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with
 virtio-blk (also 4.12 stable)

On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
> 
> 
> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>> Bisect points to
>>>>
>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>> Author: Christoph Hellwig <hch@....de>
>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>
>>>>     blk-mq: Create hctx for each present CPU
>>>>     
>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>     
>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>     the code.
>>>>     
>>>>     Signed-off-by: Christoph Hellwig <hch@....de>
>>>>     Reviewed-by: Jens Axboe <axboe@...nel.dk>
>>>>     Cc: Keith Busch <keith.busch@...el.com>
>>>>     Cc: linux-block@...r.kernel.org
>>>>     Cc: linux-nvme@...ts.infradead.org
>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>     Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
>>>>     Cc: Oleksandr Natalenko <oleksandr@...alenko.name>
>>>>     Cc: Mike Galbraith <efault@....de>
>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
>>>
>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>> take a look.
>>
>> Can't make it trigger here. We do init for each present CPU, which means
>> that if I offline a few CPUs here and register a queue, those still show
>> up as present (just offline) and get mapped accordingly.
>>
>> From the looks of it, your setup is different. If the CPU doesn't show
>> up as present and it gets hotplugged, then I can see how this condition
>> would trigger. What environment are you running this in? We might have
>> to re-introduce the cpu hotplug notifier, right now we just monitor
>> for a dead cpu and handle that.
> 
> I am not doing a hot unplug and the replug, I use KVM and add a previously
> not available CPU.
> 
> in libvirt/virsh speak:
>   <vcpu placement='static' current='1'>4</vcpu>

So that's why we run into problems. It's not present when we load the device,
but becomes present and online afterwards.

Christoph, we used to handle this just fine, your patch broke it.

I'll see if I can come up with an appropriate fix.

-- 
Jens Axboe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ