lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cbd0f41d-bc01-3f93-9171-2795c61afdc9@cogentembedded.com>
Date:   Wed, 12 Oct 2016 19:11:08 +0300
From:   Nikita Yushchenko <nikita.yoush@...entembedded.com>
To:     David Laight <David.Laight@...LAB.COM>,
        'Alexander Duyck' <alexander.duyck@...il.com>
Cc:     Eric Dumazet <edumazet@...gle.com>,
        David Miller <davem@...emloft.net>,
        Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
        intel-wired-lan <intel-wired-lan@...ts.osuosl.org>,
        Netdev <netdev@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Chris Healy <cphealy@...il.com>
Subject: Re: igb driver can cause cache invalidation of non-owned memory?

>>> To get some throughput improvement, I propose removal of that
>>> sync_for_device() before reusing buffer. Will you accept such a patch ;)
>>
>> Not one that gets rid of sync_for_device() in the driver.  From what I
>> can tell there are some DMA APIs that use that to perform the
>> invalidation on the region of memory so that it can be DMAed into.
>> Without that we run the risk of having a race between something the
>> CPU might have placed in the cache and something the device wrote into
>> memory.  The sync_for_device() call is meant to invalidate the cache
>> for the region so that when the device writes into memory there is no
>> risk of that race.
> 
> I'm not expert, but some thought...
> 
> Just remember that the cpu can do speculative memory and cache line reads.
> So you need to ensure there are no dirty cache lines when the receive
> buffer is setup and no cache lines at all at before looking at the frame.

Exactly.

And because of that, arm does cache invalidation both in sync_for_cpu()
and sync_for_device().

> If you can 100% guarantee the cpu hasn't dirtied the cache then I
> think the invalidate prior to reusing the buffer can be skipped.
> But I wouldn't want to debug that going wrong.

I've written earlier in this thread why it is the case for igb - as long
as Alexander's statement that igb's buffers are readonly for the stack
is true. If Alexander's statement is wrong, then igb is vulnerable to
issue I've described in the first message of this thread.

Btw, we are unable get any breakage with that sync_to_device() removed
already for a day. And - our tests run with software checksumming,
because on our hardware i210 is wired connected to DSA switch and in
this config, no i210 offloading works (i210 hardware gets frames with
eDSA headers that it can't parse). Thus any breakage should be
immediately discovered.

And throughput measured by iperf raises from 650 to 850 Mbps.

Nikita

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ