linux-kernel - Re: How to reduce PCI initialization from 5 s (1.5 s adding them to IOMMU groups)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <2a19254d-1b5d-4a52-bd54-9ef3eb3f8ebf@molgen.mpg.de>
Date:   Fri, 19 Nov 2021 15:43:58 +0100
From:   Paul Menzel <pmenzel@...gen.mpg.de>
To:     Krzysztof Wilczyński <kw@...ux.com>
Cc:     Jörg Rödel <joro@...tes.org>,
        Suravee Suthikulpanit <suravee.suthikulpanit@....com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        iommu@...ts.linux-foundation.org,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        x86@...nel.org, LKML <linux-kernel@...r.kernel.org>,
        linux-pci@...r.kernel.org
Subject: Re: How to reduce PCI initialization from 5 s (1.5 s adding them to
 IOMMU groups)

Dear Krzysztof,


Am 10.11.21 um 00:10 schrieb Krzysztof Wilczyński:

> [...]
>>> I am curious - why is this a problem?  Are you power-cycling your servers
>>> so often to the point where the cumulative time spent in enumerating PCI
>>> devices and adding them later to IOMMU groups is a problem?
>>>
>>> I am simply wondering why you decided to signal out the PCI enumeration as
>>> slow in particular, especially given that a large server hardware tends to
>>> have (most of the time, as per my experience) rather long initialisation
>>> time either from being powered off or after being power cycled.  I can take
>>> a while before the actual operating system itself will start.
>>
>> It’s not a problem per se, and more a pet peeve of mine. Systems get faster
>> and faster, and boottime slower and slower. On desktop systems, it’s much
>> more important with firmware like coreboot taking less than one second to
>> initialize the hardware and passing control to the payload/operating system.
>> If we are lucky, we are going to have servers with FLOSS firmware.
>>
>> But, already now, using kexec to reboot a system, avoids the problems you
>> pointed out on servers, and being able to reboot a system as quickly as
>> possible, lowers the bar for people to reboot systems more often to, for
>> example, so updates take effect.
> 
> A very good point about the kexec usage.
> 
> This is definitely often invaluable to get security updates out of the door
> quickly, update kernel version, or when you want to switch operating system
> quickly (a trick that companies like Equinix Metal use when offering their
> baremetal as a service).
> 
>>> We talked about this briefly with Bjorn, and there might be an option to
>>> perhaps add some caching, as we suspect that the culprit here is doing PCI
>>> configuration space read for each device, which can be slow on some
>>> platforms.
>>>
>>> However, we would need to profile this to get some quantitative data to see
>>> whether doing anything would even be worthwhile.  It would definitely help
>>> us understand better where the bottlenecks really are and of what magnitude.
>>>
>>> I personally don't have access to such a large hardware like the one you
>>> have access to, thus I was wondering whether you would have some time, and
>>> be willing, to profile this for us on the hardware you have.
>>>
>>> Let me know what do you think?
>>
>> Sounds good. I’d be willing to help. Note, that I won’t have time before
>> Wednesday next week though.
> 
> Not a problem!  I am very grateful you are willing to devote some of you
> time to help with this.
> 
> I only have access to a few systems such as some commodity hardware like
> a desktop PC and notebooks, and some assorted SoCs.  These are sadly not
> even close to a proper server platforms, and trying to measure anything on
> these does not really yield any useful data as the delays related to PCI
> enumeration on startup are quite insignificant in comparison - there is
> just not enough hardware there, so to speak.
> 
> I am really looking forward to the data you can gather for us and what
> insight it might provide us with.

So, kexec seems to work besides some DMAR-IR warnings [1]. 
`initcall_debug` increases the Linux boot time by over 50 % from 7.7 s 
to 12 s, which I didn’t expect.

Here are the functions taking more than 200 ms:

     initcall pci_apply_final_quirks+0x0/0x132 returned 0 after 228433 usecs
     initcall raid6_select_algo+0x0/0x2d6 returned 0 after 383789 usecs
     initcall pcibios_assign_resources+0x0/0xc0 returned 0 after 610757 
usecs
     initcall _mpt3sas_init+0x0/0x1c0 returned 0 after 721257 usecs
     initcall ahci_pci_driver_init+0x0/0x1a returned 0 after 945094 usecs
     initcall pci_iommu_init+0x0/0x3f returned 0 after 1487134 usecs
     initcall acpi_init+0x0/0x349 returned 0 after 7291015 usecs

Some of them are run later though, but `acpi_init` sticks out with 7.3 s.


Kind regards,

Paul


[1]: 
https://lore.kernel.org/linux-iommu/40a7581d-985b-f12b-0bb2-99c586a9f968@molgen.mpg.de/T/#u
View attachment "furoncles-linux-5.10.70-dmesg-initcall_debug-kexec-2.txt" of type "text/plain" (272428 bytes)