lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHhAz+iHsEaEkhEFNPyiiR-N-eLYYa3dFArO3rLvGdGKnWbm2w@mail.gmail.com>
Date: Sun, 20 Oct 2024 00:39:47 +0530
From: Muni Sekhar <munisekharrms@...il.com>
To: kernelnewbies <kernelnewbies@...nelnewbies.org>, 
	kernel-hardening-sc.1597159196.oakfigcenbmaokmiekdo-munisekharrms=gmail.com@...ts.openwall.com, 
	LKML <linux-kernel@...r.kernel.org>
Subject: Assistance Needed for Kernel mode driver Soft Lockup Issue

Dear Linux Kernel Developers,

I am encountering a soft lockup issue in my system related to the
continuous while loop in the empty_rx_fifo() function. Below is the
relevant code:


#include <linux/io.h> // For readw()

#define FIFO_STATUS 0x0014
#define FIFO_MAN_READ 0x0015
#define RX_FIFO_EMPTY 0x01 // Assuming RX_FIFO_EMPTY is defined as 0x01

static inline uint16_t read16_shifted(void __iomem *addr, u32 offset)
{
    void __iomem *target_addr = addr + (offset << 1); // Left shift
the offset by 1 and add to the base address
    uint16_t value = readw(target_addr); // Read the 16-bit value from
the calculated address
    return value;
}

void empty_rx_fifo(void __iomem *addr)
{
    while (!(read16_shifted(addr, FIFO_STATUS) & RX_FIFO_EMPTY)) {
        read16_shifted(addr, FIFO_MAN_READ); // Keep reading from the
FIFO until it's empty
    }
}

Explanation:
Function Name: read16_shifted — The function reads a 16-bit value from
an offset address with a left shift operation.
Operation: It shifts the offset left by 1 (offset << 1), adds it to
the base address, and reads the value from the new address.
The empty_rx_fifo function is designed to clear out the RX FIFO, but
I've encountered soft lockup issues. Specifically, the system logs
repeated soft lockup messages in the kernel log, with a time gap of
roughly 28 seconds between them (as per the kernel log timestamps).
Here's an example log:

watchdog: BUG: soft lockup - CPU#0 stuck for 23s!

In all cases, the RIP points to:
RIP: 0010:read16_shifted+0x11/0x20


Analysis:
The soft lockup seems to be caused by the continuous while loop in the
empty_rx_fifo() function. The RX FIFO takes a considerable amount of
time to empty, sometimes up to 1000 seconds. As a result, from the
first occurrence of the soft lockup trace, the log repeats
approximately every 28 seconds for the entire 1000 seconds duration.
After 1000 seconds, the system resumes normal operation.

Questions:
1. How should I best handle this kind of issue? Even if the hardware
takes time, I would like advice on the best approach to prevent these
lockups.
2. Do soft lockup issues auto-recover like this? Is this something I
should consider serious, or can it be ignored?

I would appreciate any guidance on how to resolve or mitigate this problem.


-- 
Thanks,
Sekhar

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ