lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <juiog3337iozva23zpf4apdydegj4z7jibqykfvcgnkabemw4w@z5g5hhwrqr2w>
Date: Wed, 9 Jul 2025 07:23:44 -0700
From: Breno Leitao <leitao@...ian.org>
To: Mark Rutland <mark.rutland@....com>, ankita@...dia.com, 
	bwicaksono@...dia.com
Cc: rmk+kernel@...linux.org.uk, catalin.marinas@....com, 
	linux-serial@...r.kernel.org, rmikey@...a.com, linux-arm-kernel@...ts.infradead.org, 
	usamaarif642@...il.com, leo.yan@....com, linux-kernel@...r.kernel.org, 
	paulmck@...nel.org
Subject: Re: arm64: csdlock at early boot due to slow serial (?)

On Tue, Jul 08, 2025 at 07:00:45AM -0700, Breno Leitao wrote:
> On Thu, Jul 03, 2025 at 05:31:09PM +0100, Mark Rutland wrote:
> 
> Here is more information I got about this problem. TL;DR: While the
> machine is booting, it is throttled by the UART speed, while having IRQ
> disabled.

quick update: I've identified a solution that significantly improves the
situation. I've found that the serial issue was heavily affecting boot
time, which is unleashed now.

After applying the following fix, the boot speed has improved
dramatically. It's the fastest I've seen, and the CSD lockups are gone.

If no concerns raise in the next days, I will send it officially to the
serial maintainers.

Author: Breno Leitao <leitao@...ian.org>
Date:   Wed Jul 9 05:57:06 2025 -0700

    serial: amba-pl011: Fix boot performance by switching to console_initcall()

    Replace arch_initcall() with console_initcall() for PL011 driver initialization
    to resolve severe boot performance issues.

    The current arch_initcall() registration causes the console to initialize
    before the printk subsystem is ready, forcing the driver into atomic mode
    during early boot. This results in:

    - 5-8 second boot delay while ~700 boot messages are processed
    - System freeze with IRQs disabled during message output
    - Each character transmitted synchronously with cpu_relax() polling

    This is what is driving the driver to atomic mode in the early boot:

      static inline void printk_get_console_flush_type(struct console_flush_type *ft)
      {
            ....
            if (printk_kthreads_running)
                    ft->nbcon_offload = true;

    The atomic path processes each character individually through
    pl011_console_putchar(), waiting for UART transmission completion
    before proceeding. With only one CPU online during early boot,
    this creates a bottleneck where the system spends excessive time
    in interrupt-disabled state.

    Here is how the code looks like:

      1) disable interrupt
      2) for each of these 700 messages, call pl011_console_write_atomic()
      3) for each character in the message, calls pl011_console_putchar(),
         which waits for the character to be transmitted
      4) once all the line is transmitted, wait for the UART to be idle
      5) re-enable interrupt

    Here is the code representation of the above:

            pl011_console_write_atomic() {
                    ...
                    // For each char in the message
                    pl011_console_putchar() {
                            while (pl011_read(uap, REG_FR) & UART01x_FR_TXFF)
                                    cpu_relax();
                    }
                    while ((pl011_read(uap, REG_FR) ^ uap->vendor->inv_fr) & uap->vendor->fr_busy)
                            cpu_relax();

    Using console_initcall() ensures proper initialization order,
    allowing the printk subsystem to use threaded output instead
    of atomic mode, eliminating the performance bottleneck.

    Performance improvement: 16x faster kernel boot time at my GRACE SoC
    machine.

      - Before: 10.08s to reach init process
      - After: 0.62s to reach init process

    Here are more timing details, collected from Linus' upstream, where the
    only different is this patch:

    Linus upstream:
      [    0.616203] printk: legacy console [netcon_ext0] enabled
      [    0.627469] Run /init as init process
      [    0.837477] loop: module loaded
      [    8.354803] Adding 134199360k swap on /swapvol/swapfile.

    With this patch:
      [    0.305109] ARMH0011:00: ttyAMA0 at MMIO 0xc280000 (irq = 66, base_baud = 0) is a SBSA
      [   10.081742] Run /init as init process
      [   13.288717] loop: module loaded
      [   22.919934] Adding 134199168k swap on /swapvol/swapfile.

    Link: https://lore.kernel.org/all/aGVn%2FSnOvwWewkOW@gmail.com/ [1]

    Signed-off-by: Breno Leitao <leitao@...ian.org>

diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c
index 22939841b1de..0cf251365825 100644
--- a/drivers/tty/serial/amba-pl011.c
+++ b/drivers/tty/serial/amba-pl011.c
@@ -3116,7 +3116,7 @@ static void __exit pl011_exit(void)
  * While this can be a module, if builtin it's most likely the console
  * So let's leave module_exit but move module_init to an earlier place
  */
-arch_initcall(pl011_init);
+console_initcall(pl011_init);
 module_exit(pl011_exit);

 MODULE_AUTHOR("ARM Ltd/Deep Blue Solutions Ltd");






Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ