lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150416185426.GH5622@wotan.suse.de>
Date:	Thu, 16 Apr 2015 20:54:26 +0200
From:	"Luis R. Rodriguez" <mcgrof@...e.com>
To:	Hyong-Youb Kim <hkim@...i.com>
Cc:	Andy Walls <awalls@...metrocast.net>,
	Hyong-Youb Kim <hykim@...i.com>, netdev@...r.kernel.org,
	Andy Lutomirski <luto@...capital.net>,
	Toshi Kani <toshi.kani@...com>,
	"H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...nel.org>,
	linux-kernel@...r.kernel.org,
	Hal Rosenstock <hal.rosenstock@...il.com>,
	Sean Hefty <sean.hefty@...el.com>,
	Suresh Siddha <sbsiddha@...il.com>,
	Rickard Strandqvist <rickard_strandqvist@...ctrumdigital.se>,
	Mike Marciniszyn <mike.marciniszyn@...el.com>,
	Roland Dreier <roland@...estorage.com>,
	Juergen Gross <jgross@...e.com>,
	Mauro Carvalho Chehab <mchehab@....samsung.com>,
	Borislav Petkov <bp@...e.de>, Mel Gorman <mgorman@...e.de>,
	Vlastimil Babka <vbabka@...e.cz>,
	Davidlohr Bueso <dbueso@...e.de>,
	Johannes Berg <johannes@...solutions.net>,
	Felix Fietkau <nbd@...nwrt.org>,
	Benjamin Poirier <bpoirier@...e.de>,
	dave.hansen@...ux.intel.com, plagnioj@...osoft.com,
	tglx@...utronix.de,
	Ville Syrjälä <syrjala@....fi>,
	linux-fbdev@...r.kernel.org, linux-media@...r.kernel.org,
	x86@...nel.org
Subject: Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

On Thu, Apr 16, 2015 at 01:18:37PM +0900, Hyong-Youb Kim wrote:
> On Thu, Apr 16, 2015 at 01:58:16AM +0200, Luis R. Rodriguez wrote:
> > 
> > An alternative... is to just ioremap_wc() the entire region, including
> > MMIO registers for these old devices. I see one ethernet driver that does
> > this, myri10ge, and am curious how and why they ended up deciding this
> > and if they have run into any issues. I wonder if this is a reasonable
> > comrpomise for these 2 remaining corner cases.
> > 
> 
> For myri10ge, it a performance thing.  Descriptor rings are in NIC
> memory BAR0, not in host memory.  Say, to send a packet, the driver
> writes the send descriptor to the ioremap'd NIC memory.  It is a
> multi-word descriptor.  So, to send it as one PCIE MWr transaction,
> the driver maps the whole BAR0 as WC and does "copy descriptor; wmb".

Interesting, so you burst write multi-word descriptor writes using
write-combining here for the Ethernet device.

> Without WC, descriptors would end up as multiple 4B or 8B MWr packets
> to the NIC, which has a pretty big performance impact on this
> particular NIC.

How big are the descriptors?

> Most registers that do not want WC are actually in BAR2, which is not
> mapped as WC.  For registers that are in BAR0, we do "write to the
> register; wmb".  If we want to wait till the NIC has seen the write,
> we do "write; wmb; read".

Interesting, thanks, yeah using this as a work around to the problem sounds
plausible however it still would require likely making just as many changes to
the ivtv and ipath driver as to just do a proper split. I do wonder however if
this sort of work around can be generalized somehow though so that others could
use, if this sort of thing is going to become prevalent. If so then this would
serve two purposes: work around for the corner cases of MTRR use on Linux and
also these sorts of device constraints.

In order to determine if this is likely to be generally useful could you elaborate
a bit more about the detals of the performance issues of not bursting writes
for the descriptor on this device.

Even if that is done a conversion over to this work around seems it may require
device specific nitpicks. For instance I note in myri10ge_submit_req() for
small writes you just do a reverse write and do the first set last, then
finally the last 32 bits are rewritten, I guess to trigger something?

> This approach has worked for this device for many years.  I cannot say
> whether it works for other devices, though.

I think it should but the more interesting question would be exactly
*why* it was needed for this device, who determined that, and why?

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ