lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240318100108.4fafd72b@namcao>
Date: Mon, 18 Mar 2024 10:01:08 +0100
From: Nam Cao <namcao@...utronix.de>
To: Eva Kurchatova <nyandarknessgirl@...il.com>
Cc: linux-riscv <linux-riscv@...ts.infradead.org>, bugs@...ts.linux.dev,
 linux-i2c@...r.kernel.org, jikos@...nel.org, benjamin.tissoires@...hat.com,
 dianders@...omium.org, mripard@...nel.org, johan+linaro@...nel.org,
 linux-input@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: Boot hang with SiFive PLIC when routing I2C-HID level-triggered
 interrupts

On 18/Mar/2024 Eva Kurchatova wrote:
> On Sun, Mar 17, 2024 at 11:27 PM Nam Cao <namcao@...utronix.de> wrote:
> > It seems I2C HID's interrupt handler (i2c_hid_irq) returns immediately if
> > I2C_HID_READ_PENDING is set. This flag is supposed to be cleared in
> > i2c_hid_xfer(), but since the (threaded) interrupt handler runs at higher
> > priority, the flag is never cleared. So we have a lock-up: interrupt
> > handler won't do anything unless the flag is cleared, but the clearing of
> > this flag is done in a lower priority task which never gets scheduled while
> > the interrupt handler is active.
> >
> > There is RT throttling to prevent RT tasks from locking up the system like
> > this. I don't know much about scheduling stuffs, so I am not really sure
> > why RT throttling does not work. I think because RT throttling triggers
> > when RT tasks take too much CPU time, but in this case hard interrupt
> > handlers take lots of CPU time too (~50% according to my measurement), so
> > RT throttling doesn't trigger often enough (in this case, it triggers once
> > and never again). Again, I don't know much about scheduler so I may be
> > talking nonsense here.
> >
> > The flag I2C_HID_READ_PENDING seems to be used to make sure that only 1
> > I2C operation can happen at a time. But this seems pointless, because I2C
> > subsystem already takes care of this. So I think we can just remove it.
> >
> > Can you give the below patch a try?
> >
> > diff --git a/drivers/hid/i2c-hid/i2c-hid-core.c b/drivers/hid/i2c-hid/i2c-hid-core.c
> > index 2735cd585af0..799ad0ef9c4a 100644
> > --- a/drivers/hid/i2c-hid/i2c-hid-core.c
> > +++ b/drivers/hid/i2c-hid/i2c-hid-core.c
> > @@ -64,7 +64,6 @@
> >  /* flags */
> >  #define I2C_HID_STARTED                0
> >  #define I2C_HID_RESET_PENDING  1
> > -#define I2C_HID_READ_PENDING   2
> >
> >  #define I2C_HID_PWR_ON         0x00
> >  #define I2C_HID_PWR_SLEEP      0x01
> > @@ -190,15 +189,10 @@ static int i2c_hid_xfer(struct i2c_hid *ihid,
> >                 msgs[n].len = recv_len;
> >                 msgs[n].buf = recv_buf;
> >                 n++;
> > -
> > -               set_bit(I2C_HID_READ_PENDING, &ihid->flags);
> >         }
> >
> >         ret = i2c_transfer(client->adapter, msgs, n);
> >
> > -       if (recv_len)
> > -               clear_bit(I2C_HID_READ_PENDING, &ihid->flags);
> > -
> >         if (ret != n)
> >                 return ret < 0 ? ret : -EIO;
> >
> > @@ -566,9 +560,6 @@ static irqreturn_t i2c_hid_irq(int irq, void *dev_id)
> >  {
> >         struct i2c_hid *ihid = dev_id;
> >
> > -       if (test_bit(I2C_HID_READ_PENDING, &ihid->flags))
> > -               return IRQ_HANDLED;
> > -
> >         i2c_hid_get_input(ihid);
> >
> >         return IRQ_HANDLED;  
> 
> Patch applied cleanly on top of 6.7.9, builds OK (No warns, etc).
> 
> This indeed fixes the hang completely.

Nice! I assume I can add
    Reported-and-tested-by: Eva Kurchatova <nyandarknessgirl@...il.com>
to the patch?

> I modified RVVM to send millions of keystroke events per second,
> and put `reboot` as a service hook in the guest. It has been continuously
> rebooting without a hitch for the last 30 minutes or so (Full boot takes
> around 2 seconds), whereas unpatched 6.7.9 hangs almost immediately
> in such conditions (Reverted your patch & rebuilt to be sure).
> 
> Thank you very much for this! Hope to see it upstreamed soon

Thank you for the report, your investigation helped a lot.

I am still confused why RT throttling doesn't unstuck the kernel in this
case. I will consult some people and investigate more on this. But I think
this patch is good on its own, so I will send a proper patch shortly.

Best regards,
Nam

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ