linux-kernel - Re: Probing nvme disks fails on Upstream kernels on powerpc Maxconfig

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2a80cb20-0c9f-2d0c-e951-c4f005f3e4b3@ozlabs.ru>
Date:   Thu, 13 Apr 2023 22:09:22 +1000
From:   Alexey Kardashevskiy <aik@...abs.ru>
To:     Michael Ellerman <mpe@...erman.id.au>,
        "Linux regression tracking (Thorsten Leemhuis)" 
        <regressions@...mhuis.info>,
        Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
Cc:     Nicholas Piggin <npiggin@...il.com>,
        Christophe Leroy <christophe.leroy@...roup.eu>,
        linuxppc-dev@...ts.ozlabs.org, linux-kernel@...r.kernel.org,
        sachinp@...ux.vnet.ibm.com,
        Linux kernel regressions list <regressions@...ts.linux.dev>
Subject: Re: Probing nvme disks fails on Upstream kernels on powerpc Maxconfig



On 05/04/2023 15:45, Michael Ellerman wrote:
> "Linux regression tracking (Thorsten Leemhuis)" <regressions@...mhuis.info> writes:
>> [CCing the regression list, as it should be in the loop for regressions:
>> https://docs.kernel.org/admin-guide/reporting-regressions.html]
>>
>> On 23.03.23 10:53, Srikar Dronamraju wrote:
>>>
>>> I am unable to boot upstream kernels from v5.16 to the latest upstream
>>> kernel on a maxconfig system. (Machine config details given below)
>>>
>>> At boot, we see a series of messages like the below.
>>>
>>> dracut-initqueue[13917]: Warning: dracut-initqueue: timeout, still waiting for following initqueue hooks:
>>> dracut-initqueue[13917]: Warning: /lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fdisk\x2fby-uuid\x2f93dc0767-18aa-467f-afa7-5b4e9c13108a.sh: "if ! grep -q After=remote-fs-pre.target /run/systemd/generator/systemd-cryptsetup@...ervice 2>/dev/null; then
>>> dracut-initqueue[13917]:     [ -e "/dev/disk/by-uuid/93dc0767-18aa-467f-afa7-5b4e9c13108a" ]
>>> dracut-initqueue[13917]: fi"
>>
>> Alexey, did you look into this? This is apparently caused by a commit of
>> yours (see quoted part below) that Michael applied. Looks like it fell
>> through the cracks from here, but maybe I'm missing something.
> 
> Unfortunately Alexey is not working at IBM any more, so he won't have
> access to any hardware to debug/test this.
> 
> Srikar are you debugging this? If not we'll have to find someone else to
> look at it.

Has this been fixed and I missed cc:? Anyway, without the full log, I 
still see it is a huge guest so chances are the guest could not map all 
RAM so instead it uses the biggest possible DDW with 2M pages. If that's 
the case, this might help it:

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 614af78b3695..996acf245ae5 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -906,7 +906,7 @@ void *iommu_alloc_coherent(struct device *dev, 
struct iommu_table *tbl,
         unsigned int nio_pages, io_order;
         struct page *page;

-       size = PAGE_ALIGN(size);
+       size = _ALIGN(size, IOMMU_PAGE_SIZE(tbl));
         order = get_order(size);

         /*
@@ -949,10 +949,9 @@ void iommu_free_coherent(struct iommu_table *tbl, 
size_t size,
         if (tbl) {
                 unsigned int nio_pages;

-               size = PAGE_ALIGN(size);
+               size = _ALIGN(size, IOMMU_PAGE_SIZE(tbl));
                 nio_pages = size >> tbl->it_page_shift;
                 iommu_free(tbl, dma_handle, nio_pages);
-               size = PAGE_ALIGN(size);
                 free_pages((unsigned long)vaddr, get_order(size));
         }


And there may be other places where PAGE_SIZE is used instead of 
IOMMU_PAGE_SIZE(tbl). Thanks,


-- 
Alexey