lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAHhAz+i+4iCn+Ddh1YvuMn1v-PfJj72m6DcjRaY+3vx7wLhFsQ@mail.gmail.com>
Date: Fri, 15 Nov 2024 21:47:59 +0530
From: Muni Sekhar <munisekharrms@...il.com>
To: kernel-hardening@...ts.openwall.com, 
	kasan-dev <kasan-dev@...glegroups.com>, LKML <linux-kernel@...r.kernel.org>, 
	kernelnewbies <kernelnewbies@...nelnewbies.org>
Subject: Help Needed: Debugging Memory Corruption results GPF

Hi all,

I am encountering a memory corruption issue in the function
msm_set_laddr() from the Slimbus MSM Controller driver source code.
https://android.googlesource.com/kernel/msm/+/refs/heads/android-msm-sunfish-4.14-android12/drivers/slimbus/slim-msm-ctrl.c

In msm_set_laddr(), one of the arguments is ea (enumeration address),
which is a pointer to constant data. While testing, I observed strange
behavior:

The contents of the ea buffer get corrupted during a timeout scenario
in the call to:

timeout = wait_for_completion_timeout(&done, HZ);

Specifically, the ea buffer's contents differ before and after the
wait_for_completion_timeout() call, even though it's declared as a
pointer to constant data (const u8 *ea).
To debug this issue, I enabled KASAN, but it didn't reveal any memory
corruption. After the buffer corruption, random memory allocations in
other parts of the kernel occasionally result in a GPF crash.

Here is the relevant part of the code:

static int msm_set_laddr(struct slim_controller *ctrl, const u8 *ea,
                         u8 elen, u8 laddr)
{
    struct msm_slim_ctrl *dev = slim_get_ctrldata(ctrl);
    struct completion done;
    int timeout, ret, retries = 0;
    u32 *buf;
retry_laddr:
    init_completion(&done);
    mutex_lock(&dev->tx_lock);
    buf = msm_get_msg_buf(dev, 9, &done);
    if (buf == NULL)
        return -ENOMEM;
    buf[0] = SLIM_MSG_ASM_FIRST_WORD(9, SLIM_MSG_MT_CORE,
                                     SLIM_MSG_MC_ASSIGN_LOGICAL_ADDRESS,
                                     SLIM_MSG_DEST_LOGICALADDR,
                                     ea[5] | ea[4] << 8);
    buf[1] = ea[3] | (ea[2] << 8) | (ea[1] << 16) | (ea[0] << 24);
    buf[2] = laddr;
    ret = msm_send_msg_buf(dev, buf, 9, MGR_TX_MSG);
    timeout = wait_for_completion_timeout(&done, HZ);
    if (!timeout)
        dev->err = -ETIMEDOUT;
    if (dev->err) {
        ret = dev->err;
        dev->err = 0;
    }
    mutex_unlock(&dev->tx_lock);
    if (ret) {
        pr_err("set LADDR:0x%x failed:ret:%d, retrying", laddr, ret);
        if (retries < INIT_MX_RETRIES) {
            msm_slim_wait_retry(dev);
            retries++;
            goto retry_laddr;
        } else {
            pr_err("set LADDR failed after retrying:ret:%d", ret);
        }
    }
    return ret;
}

What I've Tried:
KASAN: Enabled it but couldn't identify the source of the corruption.
Debugging Logs: Added logs to print the ea contents before and after
the wait_for_completion_timeout() call. The logs show a mismatch in
the data.

Question:
How can I efficiently trace the source of the memory corruption in
this scenario?
Could wait_for_completion_timeout() or a related function cause
unintended side effects?
Are there additional tools or techniques (e.g., dynamic debugging or
specific kernel config options) that can help identify this
corruption?
Any insights or suggestions would be greatly appreciated!



-- 
Thanks,
Sekhar

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ