lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <21407a96-5b20-3fae-f1c8-895973b655ef@ozlabs.ru>
Date:   Fri, 9 Apr 2021 14:36:16 +1000
From:   Alexey Kardashevskiy <aik@...abs.ru>
To:     Michael Ellerman <mpe@...erman.id.au>,
        Leonardo Bras <leobras.c@...il.com>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Paul Mackerras <paulus@...ba.org>, brking@...ux.vnet.ibm.com
Cc:     linuxppc-dev@...ts.ozlabs.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/1] powerpc/iommu: Enable remaining IOMMU Pagesizes
 present in LoPAR



On 08/04/2021 19:04, Michael Ellerman wrote:
> Alexey Kardashevskiy <aik@...abs.ru> writes:
>> On 08/04/2021 15:37, Michael Ellerman wrote:
>>> Leonardo Bras <leobras.c@...il.com> writes:
>>>> According to LoPAR, ibm,query-pe-dma-window output named "IO Page Sizes"
>>>> will let the OS know all possible pagesizes that can be used for creating a
>>>> new DDW.
>>>>
>>>> Currently Linux will only try using 3 of the 8 available options:
>>>> 4K, 64K and 16M. According to LoPAR, Hypervisor may also offer 32M, 64M,
>>>> 128M, 256M and 16G.
>>>
>>> Do we know of any hardware & hypervisor combination that will actually
>>> give us bigger pages?
>>
>>
>> On P8 16MB host pages and 16MB hardware iommu pages worked.
>>
>> On P9, VM's 16MB IOMMU pages worked on top of 2MB host pages + 2MB
>> hardware IOMMU pages.
> 
> The current code already tries 16MB though.
> 
> I'm wondering if we're going to ask for larger sizes that have never
> been tested and possibly expose bugs. But it sounds like this is mainly
> targeted at future platforms.


I tried for fun to pass through a PCI device to a guest with this patch as:

pbuild/qemu-killslof-aiku1904le-ppc64/qemu-system-ppc64 \
-nodefaults \
-chardev stdio,id=STDIO0,signal=off,mux=on \
-device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
-mon id=MON0,chardev=STDIO0,mode=readline \
-nographic \
-vga none \
-enable-kvm \
-m 16G \
-kernel ./vmldbg \
-initrd /home/aik/t/le.cpio \
-device vfio-pci,id=vfio0001_01_00_0,host=0001:01:00.0 \
-mem-prealloc \
-mem-path qemu_hp_1G_node0 \
-global spapr-pci-host-bridge.pgsz=0xffffff000 \
-machine cap-cfpc=broken,cap-ccf-assist=off \
-smp 1,threads=1 \
-L /home/aik/t/qemu-ppc64-bios/ \
-trace events=qemu_trace_events \
-d guest_errors,mmu \
-chardev socket,id=SOCKET0,server=on,wait=off,path=qemu.mon.1_1_0_0 \
-mon chardev=SOCKET0,mode=control


The guest created a huge window:

xhci_hcd 0000:00:00.0: ibm,create-pe-dma-window(2027) 0 8000000 20000000 
22 22 returned 0 (liobn = 0x80000001 starting addr = 8000000 0)

The first "22" is page_shift in hex (16GB), the second "22" is 
window_shift (so we have 1 TCE).

On the host side the window#1 was created with 1GB pages:
pci 0001:01     : [PE# fd] Setting up window#1 
800000000000000..80007ffffffffff pg=40000000


The XHCI seems working. Without the patch 16MB was the maximum.


> 
>>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
>>>> index 9fc5217f0c8e..6cda1c92597d 100644
>>>> --- a/arch/powerpc/platforms/pseries/iommu.c
>>>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>>>> @@ -53,6 +53,20 @@ enum {
>>>>    	DDW_EXT_QUERY_OUT_SIZE = 2
>>>>    };
>>>
>>> A comment saying where the values come from would be good.
>>>
>>>> +#define QUERY_DDW_PGSIZE_4K	0x01
>>>> +#define QUERY_DDW_PGSIZE_64K	0x02
>>>> +#define QUERY_DDW_PGSIZE_16M	0x04
>>>> +#define QUERY_DDW_PGSIZE_32M	0x08
>>>> +#define QUERY_DDW_PGSIZE_64M	0x10
>>>> +#define QUERY_DDW_PGSIZE_128M	0x20
>>>> +#define QUERY_DDW_PGSIZE_256M	0x40
>>>> +#define QUERY_DDW_PGSIZE_16G	0x80
>>>
>>> I'm not sure the #defines really gain us much vs just putting the
>>> literal values in the array below?
>>
>> Then someone says "uuuuu magic values" :) I do not mind either way. Thanks,
> 
> Yeah that's true. But #defining them doesn't make them less magic, if
> you only use them in one place :)

Defining them with "QUERY_DDW" in the names kinda tells where they are 
from. Can also grep QEMU using these to see how the other side handles 
it. Dunno.

btw the bot complained about __builtin_ctz(SZ_16G) which should be 
__builtin_ctzl(SZ_16G) so we have to ask Leonardo to repost anyway :)



-- 
Alexey

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ