linux-kernel - Re: [PATCH RFC 1/2] virtio-net: bql support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <88db987e-b519-5c1f-f64f-6f65f8415799@redhat.com>
Date:   Mon, 7 Jan 2019 14:31:47 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>
Cc:     linux-kernel@...r.kernel.org, maxime.coquelin@...hat.com,
        tiwei.bie@...el.com, wexu@...hat.com, jfreimann@...hat.com,
        "David S. Miller" <davem@...emloft.net>,
        virtualization@...ts.linux-foundation.org, netdev@...r.kernel.org
Subject: Re: [PATCH RFC 1/2] virtio-net: bql support


On 2019/1/7 下午12:01, Michael S. Tsirkin wrote:
> On Mon, Jan 07, 2019 at 11:51:55AM +0800, Jason Wang wrote:
>> On 2019/1/7 上午11:17, Michael S. Tsirkin wrote:
>>> On Mon, Jan 07, 2019 at 10:14:37AM +0800, Jason Wang wrote:
>>>> On 2019/1/2 下午9:59, Michael S. Tsirkin wrote:
>>>>> On Wed, Jan 02, 2019 at 11:28:43AM +0800, Jason Wang wrote:
>>>>>> On 2018/12/31 上午2:45, Michael S. Tsirkin wrote:
>>>>>>> On Thu, Dec 27, 2018 at 06:00:36PM +0800, Jason Wang wrote:
>>>>>>>> On 2018/12/26 下午11:19, Michael S. Tsirkin wrote:
>>>>>>>>> On Thu, Dec 06, 2018 at 04:17:36PM +0800, Jason Wang wrote:
>>>>>>>>>> On 2018/12/6 上午6:54, Michael S. Tsirkin wrote:
>>>>>>>>>>> When use_napi is set, let's enable BQLs.  Note: some of the issues are
>>>>>>>>>>> similar to wifi.  It's worth considering whether something similar to
>>>>>>>>>>> commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be
>>>>>>>>>>> benefitial.
>>>>>>>>>> I've played a similar patch several days before. The tricky part is the mode
>>>>>>>>>> switching between napi and no napi. We should make sure when the packet is
>>>>>>>>>> sent and trakced by BQL,  it should be consumed by BQL as well. I did it by
>>>>>>>>>> tracking it through skb->cb.  And deal with the freeze by reset the BQL
>>>>>>>>>> status. Patch attached.
>>>>>>>>>>
>>>>>>>>>> But when testing with vhost-net, I don't very a stable performance,
>>>>>>>>> So how about increasing TSQ pacing shift then?
>>>>>>>> I can test this. But changing default TCP value is much more than a
>>>>>>>> virtio-net specific thing.
>>>>>>> Well same logic as wifi applies. Unpredictable latencies related
>>>>>>> to radio in one case, to host scheduler in the other.
>>>>>>>
>>>>>>>>>> it was
>>>>>>>>>> probably because we batch the used ring updating so tx interrupt may come
>>>>>>>>>> randomly. We probably need to implement time bounded coalescing mechanism
>>>>>>>>>> which could be configured from userspace.
>>>>>>>>> I don't think it's reasonable to expect userspace to be that smart ...
>>>>>>>>> Why do we need time bounded? used ring is always updated when ring
>>>>>>>>> becomes empty.
>>>>>>>> We don't add used when means BQL may not see the consumed packet in time.
>>>>>>>> And the delay varies based on the workload since we count packets not bytes
>>>>>>>> or time before doing the batched updating.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>> Sorry I still don't get it.
>>>>>>> When nothing is outstanding then we do update the used.
>>>>>>> So if BQL stops userspace from sending packets then
>>>>>>> we get an interrupt and packets start flowing again.
>>>>>> Yes, but how about the cases of multiple flows. That's where I see unstable
>>>>>> results.
>>>>>>
>>>>>>
>>>>>>> It might be suboptimal, we might need to tune it but I doubt running
>>>>>>> timers is a solution, timer interrupts cause VM exits.
>>>>>> Probably not a timer but a time counter (or event byte counter) in vhost to
>>>>>> add used and signal guest if it exceeds a value instead of waiting the
>>>>>> number of packets.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>> Well we already have VHOST_NET_WEIGHT - is it too big then?
>>>> I'm not sure, it might be too big.
>>>>
>>>>
>>>>> And maybe we should expose the "MORE" flag in the descriptor -
>>>>> do you think that will help?
>>>>>
>>>> I don't know. But how a "more" flag can help here?
>>>>
>>>> Thanks
>>> It sounds like we should be a bit more aggressive in updating used ring.
>>> But if we just do it naively we will harm performance for sure as that
>>> is how we are doing batching right now.
>>
>> I agree but the problem is to balance the PPS and throughput. More batching
>> helps for PPS but may damage TCP throughput.
> That is what more flag is supposed to be I think - it is only set if
> there's a socket that actually needs the skb freed in order to go on.


I'm not quite sure I get, but is this something similar to what you want?

https://lists.linuxfoundation.org/pipermail/virtualization/2014-October/027667.html

Which enables tx interrupt for TCP packets, and you want to add used 
more aggressively for those sockets?


Thanks


>>>    Instead we could make guest
>>> control batching using the more flag - if that's not set we write out
>>> the used ring.
>>
>> It's under the control of guest, so I'm afraid we still need some more guard
>> (e.g time/bytes counters) on host.
>>
>> Thanks
> Point is if guest does not care about the skb being freed, then there is no
> rush host side to mark buffer used.
>
>