lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZP9MQdRYmlawNsbC@nvidia.com>
Date:   Mon, 11 Sep 2023 14:20:01 -0300
From:   Jason Gunthorpe <jgg@...dia.com>
To:     Lorenzo Pieralisi <lpieralisi@...nel.org>
Cc:     ankita@...dia.com, maz@...nel.org, oliver.upton@...ux.dev,
        catalin.marinas@....com, will@...nel.org, aniketa@...dia.com,
        cjia@...dia.com, kwankhede@...dia.com, targupta@...dia.com,
        vsethi@...dia.com, acurrid@...dia.com, apopple@...dia.com,
        jhubbard@...dia.com, danw@...dia.com,
        linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.linux.dev,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1 2/2] KVM: arm64: allow the VM to select DEVICE_* and
 NORMAL_NC for IO memory

On Mon, Sep 11, 2023 at 04:57:51PM +0200, Lorenzo Pieralisi wrote:
> On Thu, Sep 07, 2023 at 11:14:59AM -0700, ankita@...dia.com wrote:
> > From: Ankit Agrawal <ankita@...dia.com>
> > 
> > Linux allows device drivers to map IO memory on a per-page basis using
> > "write combining" or WC. This is often done using
> > pgprot_writecombing(). The driver knows which pages can support WC
> 
> pgprot_writecombine() ?
> 
> > access and the proper programming model to generate this IO. Generally
> > the use case is to boost performance by using write combining to
> > generate larger PCIe MemWr TLPs.
> 
> First off, this changeset does not affect *only* Linux guests, obviously.

I think everyone understands that. It can be clarified.

> I understand that's the use case you are after but this change is
> targeting all VMs, it must be clear.
> 
> Then WC and mapping to PCI TLPs, either you describe that in details
> (NormalNC vs device-nGnRE and resulting SystemBus<->PCI transactions) or
> you don't describe it at all, as it stands I don't know how to use
> this information.

How about another pargraph:

 KVM prevents all VMs (including Linux) from accessing NORMAL_NC
 mappings, which is how Linux implements pgprot_writecombine(). This
 prevents using this performance optimization within VMs.

I don't think we need to go into details how it works beyond that it
requires NORMAL_NC.

> > Allow VMs to select DEVICE_* or NORMAL_NC on a page by page basis for
> > all IO memory. This puts the VM in charge of the memory attributes,
> > and removes the KVM override to DEVICE_nGnRE.
> > 
> > Ultimately this makes pgprot_writecombing() work correctly in VMs and
> 
> pgprot_writecombine() ?
> 
> > allows drivers like mlx5 to fully operate their HW.
> > 
> > After some discussions with ARM and CPU architects we reached the
> > conclusion there was no need for KVM to prevent the VM from selecting
> > between DEVICE_* and NORMAL_NC for IO memory in VMs. There was a fear
> > that NORMAL_NC could result in uncontained failures, but upon deeper
> > analysis it turns out there are already possible cases for uncontained
> > failures with DEVICE types too. Ultimately the platform must be
> > implemented in a way that ensures that all DEVICE_* and NORMAL_NC
> > accesses have no uncontained failures.
> 
> I would reorder/rephrase this changelog as follows:
> 
> - Describe what the problem is (ie KVM default s2 mappings)

The problem is that pgprot_writecombine() doesn't work in Linux
VMs. That is the first pagraph.

> - Describe how you are solving it

That is the middle paragraph "Allow VMs to select DEVICE_* or
NORMAL_NC on a page by page basis"

> - Add a link to the documentation that states why it is safe to do
>   that and the resulting s1/s2 mappings combination

AFAIK there is no documentation beyond the combining rules. Exactly
what should happen in various error conditions is implementation
defined. Catalin did you ever find anything?

> It must be clear why from a legacy standpoint this is a safe change
> to apply.

This is why:
 
> > Fortunately real platforms do tend to implement this.

It is why it is safe today, because real platforms don't throw
uncontained errors from typical PCI accesses that VFIO allows. I think
the conclusions was it turns out that is just because they don't do
errors at all, not because DEVICE_* prevents it.

> Remove this sentence, it adds no information for someone who
> is chasing bugs or just wants to understand the change itself.

So, if you hit a bug here you might evaluate if there is something
wrong with your platform, ie it is allowing uncontained errors in
unexpected places.

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ