linux-kernel - Re: [RFC] staging: r8188eu: Sleeping in atomic context (SAC) bugs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20220207173808.GR1978@kadam>
Date:   Mon, 7 Feb 2022 20:38:08 +0300
From:   Dan Carpenter <dan.carpenter@...cle.com>
To:     "Fabio M. De Francesco" <fmdefrancesco@...il.com>
Cc:     linux-staging@...ts.linux.dev, gregkh@...uxfoundation.org,
        phil@...lpotter.co.uk, larry.finger@...inger.net,
        julia.lawall@...ia.fr, linux-kernel@...r.kernel.org,
        Michael Straube <michael.straube@...d.uni-goettingen.de>,
        martin@...ser.cx
Subject: Re: [RFC] staging: r8188eu: Sleeping in atomic context (SAC) bugs

On Mon, Feb 07, 2022 at 03:18:52PM +0100, Fabio M. De Francesco wrote:
> 1) Leave the code as-is with those msleep(10) and risk to hang the driver (or 
> the whole kernel?).
> 2) Change msleep(10) to mdelay(10). It's a huge amount of time, but we are 
> guaranteed that it is safe.
> 3) Change to mdelay() and try to figure out what's the minimum amount of time 
> that suffice. I forgot to say that those msleep(10) are within "while" loops, 
> therefore they are called an unknown amount of times.

You are talking about the:
rtw_set_802_11_disassociate() <- disables preempt
-> _rtw_pwr_wakeup()

_rtw_pwr_wakeup() calls msleep() on two different paths.

Let's just add it to the TODO list.

The code is a mess, right?  Maybe if we clean it up a bit first the
answer will become obvious.

The first loop is trying to prevent racing with rtw_ps_processor().
Sleeping is not a valid way to prevent race conditions.  Especially
since it just gives up and it's like, "Oh well, three seconds are up.
Let's just race."

I don't really think that sleeping in atomic bugs are likely to lock up
the whole system these days because even the cheapest mobile phones have
something like 64 cores in them.  But imagine it was twenty years and
you had only one core with CONFIG_PREEMPT turned on.  The CPU was in
ips_enter() and it was preempted by this thread, which is totally
possible in theory.  Then this thread took the spinlock and we changed
the msleep to mdelay().  There is now no way for the other thread to
complete ips_enter() because we don't give up the CPU so we just sit for
3 seconds and then timeout.

In other words, changing from msleep to mdelay could make bad code even
worse.  Plus it silences the warning so now the problem is hard to find.

regards,
dan carpenter