netdev - Re: [BUG] Inconsistent lock state in virtnet poll

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <1255e119-c58b-f91f-c44f-cf6a0b47fb4b@redhat.com>
Date:   Wed, 6 May 2020 16:15:34 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>,
        Eric Dumazet <eric.dumazet@...il.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>, netdev@...r.kernel.org,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [BUG] Inconsistent lock state in virtnet poll


On 2020/5/6 下午3:31, Michael S. Tsirkin wrote:
> On Tue, May 05, 2020 at 07:24:18PM -0700, Eric Dumazet wrote:
>> On 5/5/20 6:25 PM, Michael S. Tsirkin wrote:
>>> On Tue, May 05, 2020 at 06:19:09PM -0700, Eric Dumazet wrote:
>>>> On 5/5/20 5:43 PM, Michael S. Tsirkin wrote:
>>>>> On Tue, May 05, 2020 at 03:40:09PM -0700, Eric Dumazet wrote:
>>>>>> On 5/5/20 3:30 PM, Thomas Gleixner wrote:
>>>>>>> "Michael S. Tsirkin"<mst@...hat.com>  writes:
>>>>>>>> On Tue, May 05, 2020 at 02:08:56PM +0200, Thomas Gleixner wrote:
>>>>>>>>> The following lockdep splat happens reproducibly on 5.7-rc4
>>>>>>>>> ================================
>>>>>>>>> WARNING: inconsistent lock state
>>>>>>>>> 5.7.0-rc4+ #79 Not tainted
>>>>>>>>> --------------------------------
>>>>>>>>> inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
>>>>>>>>> ip/356 [HC0[0]:SC1[1]:HE1:SE0] takes:
>>>>>>>>> f3ee4cd8 (&syncp->seq#2){+.?.}-{0:0}, at: net_rx_action+0xfb/0x390
>>>>>>>>> {SOFTIRQ-ON-W} state was registered at:
>>>>>>>>>    lock_acquire+0x82/0x300
>>>>>>>>>    try_fill_recv+0x39f/0x590
>>>>>>>> Weird. Where does try_fill_recv acquire any locks?
>>>>>>>    u64_stats_update_begin(&rq->stats.syncp);
>>>>>>>
>>>>>>> That's a 32bit kernel which uses a seqcount for this. sequence counts
>>>>>>> are "lock" constructs where you need to make sure that writers are
>>>>>>> serialized.
>>>>>>>
>>>>>>> Actually the problem at hand is that try_fill_recv() is called from
>>>>>>> fully preemptible context initialy and then from softirq context.
>>>>>>>
>>>>>>> Obviously that's for the open() path a non issue, but lockdep does not
>>>>>>> know about that. OTOH, there is other code which calls that from
>>>>>>> non-softirq context.
>>>>>>>
>>>>>>> The hack below made it shut up. It's obvioulsy not ideal, but at least
>>>>>>> it let me look at the actual problem I was chasing down:)
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>>          tglx
>>>>>>>
>>>>>>> 8<-----------
>>>>>>> --- a/drivers/net/virtio_net.c
>>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>>> @@ -1243,9 +1243,11 @@ static bool try_fill_recv(struct virtnet
>>>>>>>   			break;
>>>>>>>   	} while (rq->vq->num_free);
>>>>>>>   	if (virtqueue_kick_prepare(rq->vq) && virtqueue_notify(rq->vq)) {
>>>>>>> +		local_bh_disable();
>>>>>> Or use u64_stats_update_begin_irqsave() whic is a NOP on 64bit kernels
>>>>> I applied this, but am still trying to think of something that
>>>>> is 0 overhead for all configs.
>>>>> Maybe we can select a lockdep class depending on whether napi
>>>>> is enabled?
>>>> Do you_really_  need 64bit counter for stats.kicks on 32bit kernels ?
>>>>
>>>> Adding 64bit counters just because we can might be overhead anyway.
>>> Well 32 bit kernels don't fundamentally kick less than 64 bit ones,
>>> and we kick more or less per packet, sometimes per batch,
>>> people expect these to be in sync ..
>> Well, we left many counters in networking stack as 'unsigned long'
>> and nobody complained yet of overflows on 32bit kernels.
> Right.  For TX it is helpful that everything is maintained
> atomically so we do need the seqlock machinery anyway:
>
>          u64_stats_update_begin(&sq->stats.syncp);
>          sq->stats.bytes += bytes;
>          sq->stats.packets += packets;
>          sq->stats.xdp_tx += n;
>          sq->stats.xdp_tx_drops += drops;
>          sq->stats.kicks += kicks;
>          u64_stats_update_end(&sq->stats.syncp);
>
> for RX kicks are currently updated separately.  Which I guess is more or
> less a minor bug.
>
>          if (rq->vq->num_free > min((unsigned int)budget, virtqueue_get_vring_size(rq->vq)) / 2) {
>                  if (!try_fill_recv(vi, rq, GFP_ATOMIC))
>                          schedule_delayed_work(&vi->refill, 0);
>          }
>
>          u64_stats_update_begin(&rq->stats.syncp);
>          for (i = 0; i < VIRTNET_RQ_STATS_LEN; i++) {
>                  size_t offset = virtnet_rq_stats_desc[i].offset;
>                  u64 *item;
>
>                  item = (u64 *)((u8 *)&rq->stats + offset);
>                  *item += *(u64 *)((u8 *)&stats + offset);
>          }
>          u64_stats_update_end(&rq->stats.syncp);
>
> we should update kicks in virtnet_receive.
>
> And as long as we do that there's no cost to 64 bit counters ...
>
>

Or find way to use u32 to count kick in workqueue (which should be rare) 
separately.

Thanks