lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 30 Apr 2015 10:03:18 -0700
From:	Andy Lutomirski <luto@...capital.net>
To:	"Luis R. Rodriguez" <mcgrof@...e.com>
Cc:	Bjorn Helgaas <bhelgaas@...gle.com>,
	"Luis R. Rodriguez" <mcgrof@...not-panic.com>,
	"Michael S. Tsirkin" <mst@...hat.com>,
	Jean-Christophe Plagniol-Villard <plagnioj@...osoft.com>,
	Tomi Valkeinen <tomi.valkeinen@...com>,
	David Airlie <airlied@...ux.ie>, daniel.vetter@...el.com,
	Linux Fbdev development list <linux-fbdev@...r.kernel.org>,
	cocci@...teme.lip6.fr,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Toshi Kani <toshi.kani@...com>,
	Suresh Siddha <sbsiddha@...il.com>,
	Ingo Molnar <mingo@...e.hu>,
	Thomas Gleixner <tglx@...utronix.de>,
	Juergen Gross <jgross@...e.com>,
	Daniel Vetter <daniel.vetter@...ll.ch>,
	Dave Airlie <airlied@...hat.com>,
	Antonino Daplas <adaplas@...il.com>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	Arnd Bergmann <arnd@...db.de>, venkatesh.pallipadi@...el.com,
	Stefan Bader <stefan.bader@...onical.com>,
	Ville Syrjälä <syrjala@....fi>,
	Mel Gorman <mgorman@...e.de>, Vlastimil Babka <vbabka@...e.cz>,
	Borislav Petkov <bp@...e.de>, Davidlohr Bueso <dbueso@...e.de>,
	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
	ville.syrjala@...ux.intel.com,
	David Vrabel <david.vrabel@...rix.com>,
	Jan Beulich <jbeulich@...e.com>,
	Roger Pau Monné <roger.pau@...rix.com>,
	xen-devel@...ts.xensource.com,
	"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>
Subject: Re: [PATCH v4 1/5] pci: add pci_iomap_wc() variants

On Thu, Apr 30, 2015 at 9:52 AM, Luis R. Rodriguez <mcgrof@...e.com> wrote:
> On Thu, Apr 30, 2015 at 10:59:17AM -0500, Bjorn Helgaas wrote:
>> [+cc linux-pci]
>>
>> Hi Luis,
>>
>> On Wed, Apr 29, 2015 at 02:36:08PM -0700, Luis R. Rodriguez wrote:
>> > From: "Luis R. Rodriguez" <mcgrof@...e.com>
>> >
>> > This allows drivers to take advantage of write-combining
>> > when possible. Ideally we'd have pci_read_bases() just
>> > peg an IORESOURCE_WC flag for us
>>
>> This makes it sound like pci_read_bases() could do a better job
>> if we just tried harder, but I don't think that's the case.  All
>> pci_read_bases() can do is look at the bits in the BAR.  For
>> memory BARs, there's a "prefetchable" bit and a "64-bit" bit.
>>
>> If you just want to complain that the PCI spec didn't define a
>> way for software to discover whether a BAR can be mapped with WC,
>> that's fine, but it's misleading to suggest that pci_read_bases()
>> could figure out WC without some help from the spec.
>
> You're right sorry about that, in my original patch this was more
> of a question and I did not have a full answer for but mst had
> clarified before the spec doesn't allow for this [0] and you are
> confirming this now as well.
>
> [0] https://lkml.org/lkml/2015/4/21/714
>
> I'll update the patch and at least document we did think about
> this and that its a shortcoming of the spec.
>
>> > but where exactly
>> > video devices memory lie varies *largely* and at times things
>> > are mixed with MMIO registers, sometimes we can address
>> > the changes in drivers, other times the change requires
>> > intrusive changes.
>> >
>> > Although there is also arch_phys_wc_add() that makes use of
>> > architecture specific write-combining alternatives (MTRR on
>> > x86 when a system does not have PAT) we void polluting
>> > pci_iomap() space with it and force drivers and subsystems
>> > that want to use it to be explicit.
>>
>> I'm not quite sure I understand the point you're making here
>> about not polluting pci_iomap_wc() with arch_phys_wc_add().  I
>> think the choice is for a driver to do either this:
>>
>>   info->screen_base = pci_iomap_wc(dev, 0, 0);
>>
>> or this:
>>
>>   info->screen_base = pci_iomap_wc(dev, 0, 0);
>>   par->wc_cookie = arch_phys_wc_add(pci_resource_start(dev, 0),
>>                                   pci_resource_len(dev, 0));
>>
>> The driver is *already* being explicit because it calls
>> pci_iomap_wc() instead of pci_iomap().
>>
>> It seems like it would be ideal if ioremap_wc() could call
>> arch_phys_wc_add() internally.
>
> Indeed, that's what I was alluding to.
>
>> Doesn't any caller of
>> arch_phys_wc_add() have to also do some sort of ioremap()
>> beforehand?
>
> This is not a requirement as the physical address is used,
> not the virtual address.
>
>> I assume there's some reason for separating them,
>
> Well a full sweep to change to arch_phys_wc_add() was never done,
> consider this part of the last effort to do so. In retrospect now
> that I've covered all other drivers in 12 different series of patches
> I think its perhaps best to not mesh them together as we're phasing
> out MTRR and the only reason to have arch_phys_wc_add() is for MTRR
> which is legacy.

I would say it much more strongly.

Drivers for new hardware SHOULD NOT call arch_phys_wc_add, directly or
otherwise.  MTRRs are crap.  They have nasty alignment requirements,
they are a very limited and unpredictable resource, and the interact
poorly with BIOS.  They should really only be used for old video
framebuffers and such.

Anything new should use PAT (it's been available for a long time) and
possibly streaming memory writes.  Even fancy server gear (myri10ge,
for example) should stay far away from MTRRs and such: it's very easy
to put enough devices in a server board that you simply run out of
MTRRs and arch_phys_wc_add will stop working.

If we make ioremap_wc and similar start automatically adding MTRRs,
then performance will vary wildly with the order of driver loading,
because we'll run out of MTRRs part-way through bootup.

ioremap_wc via PAT, on the other hand, is 100% reliable on newer hardware.

Maybe I should have called it arch_phys_wc_add_awful_legacy_hack.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ