linux-kernel - Re: [PATCH] virtio_ring: Update weak barriers to use dma

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <55259031.5040309@redhat.com>
Date:	Wed, 08 Apr 2015 13:31:45 -0700
From:	Alexander Duyck <alexander.h.duyck@...hat.com>
To:	"Michael S. Tsirkin" <mst@...hat.com>
CC:	linux-kernel@...r.kernel.org,
	virtualization@...ts.linux-foundation.org, rusty@...tcorp.com.au
Subject: Re: [PATCH] virtio_ring: Update weak barriers to use dma_wmb/rmb

On 04/08/2015 11:37 AM, Michael S. Tsirkin wrote:
> On Wed, Apr 08, 2015 at 07:41:49AM -0700, Alexander Duyck wrote:
>> On 04/08/2015 01:42 AM, Michael S. Tsirkin wrote:
>>> On Tue, Apr 07, 2015 at 05:47:42PM -0700, Alexander Duyck wrote:
>>>> This change makes it so that instead of using smp_wmb/rmb which varies
>>>> depending on the kernel configuration we can can use dma_wmb/rmb which for
>>>> most architectures should be equal to or slightly more strict than
>>>> smp_wmb/rmb.
>>>>
>>>> The advantage to this is that these barriers are available to uniprocessor
>>>> builds as well so the performance should improve under such a
>>>> configuration.
>>>>
>>>> Signed-off-by: Alexander Duyck <alexander.h.duyck@...hat.com>
>>> Well the generic implementation has:
>>> #ifndef dma_rmb
>>> #define dma_rmb()       rmb()
>>> #endif
>>>
>>> #ifndef dma_wmb
>>> #define dma_wmb()       wmb()
>>> #endif
>>>
>>> So for these arches you are slightly speeding up UP but slightly hurting SMP -
>>> I think we did benchmark the difference as measureable in the past.
>> The generic implementation for the smp_ barriers does the same thing when
>> CONFIG_SMP is defined.  The only spot where there should be an appreciable
>> difference between the two is on ARM where we define the dma_ barriers as
>> being in the outer shareable domain, and for the smp_ barriers they are
>> inner shareable domain.
>>
>>> Additionally, isn't this relying on undocumented behaviour?
>>> The documentation says:
>>> 	"These are for use with consistent memory"
>>> and virtio does not bother to request consistent memory
>>> allocations.
>> Consistent in this case represents memory that exists within one coherency
>> domain.  So in the case of x86 for instance this represents writes only to
>> system memory.  If you mix writes to system memory and device memory (PIO)
>> then you should be using the full wmb/rmb to guarantee ordering between the
>> two memories.
>>
>>> One wonders whether these will always be strong enough.
>> For the purposes of weak barriers they should be, and they are only slightly
>> stronger than SMP in one case so odds are strength will not be the issue.
>> As far as speed I would suspect that the difference between inner and outer
>> shareable domain should be negligible compared to the difference between a
>> dsb() and a dmb().
>>
>> - Alex
> Maybe it's safe, and maybe there's no performance impact.  But what's
> the purpose of the patch?  From the commit log, It sounds like it's an
> optimization, but it's not an obvious win, and it's not accompanied by
> any numbers.

The win would be that non-SMP should get the same performance from the 
barriers as SMP.  Based on the numbers for commit 7b21e34fd1c2 ("virtio: 
harsher barriers for rpmsg.") it sounds like the gains could be pretty 
significant (TCP_RR test improved by 35% CPU, 14% throughput).  The idea 
is to get the same benefits in a uniprocessor environment.  If needed I 
can gather the data for x86 for SMP and non-SMP, however I had 
considered the patch to be low hanging fruit on that architecture since 
the smp_ and dma_ barriers are the same.

The performance numbers that I would like to collect but can't would be 
on ARM 7 or later as that is the only spot where the smp_ and dma_ 
barriers differ in any significant way, however I don't have an ARM 
platform that I could test this patch on to generate such data.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/