linux-kernel - Re: [PATCH] drivers/firmware_loader: remove list entry before deallocation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y5bEYfQOMyA4XQEW@bombadil.infradead.org>
Date:   Sun, 11 Dec 2022 22:04:17 -0800
From:   Luis Chamberlain <mcgrof@...nel.org>
To:     Michal Lach <michal.lach@...sung.com>
Cc:     linux-kernel@...r.kernel.org, russell.h.weight@...el.com,
        gregkh@...uxfoundation.org, rafael@...nel.org,
        ming.lei@...onical.com, tiwai@...e.de
Subject: Re: [PATCH] drivers/firmware_loader: remove list entry before
 deallocation

Hey Michal! Thanks for your patch! I have a few doubts though!

On Wed, Nov 23, 2022 at 12:14:56PM +0100, Michal Lach wrote:
> If CONFIG_FW_LOADER_USER_HELPER is enabled, it is possible to interrupt
> the loading process after adding pending_list to pending_fw_list.

In that case wouldn't fw_load_abort() get called and do the right thing?
That in the end calls __fw_state_set() which does:

static inline void __fw_state_set(...)
{
	struct fw_state *fw_st = &fw_priv->fw_st;

	WRITE_ONCE(fw_st->status, status);
	if (status == FW_STATUS_DONE || status == FW_STATUS_ABORTED) {
#ifdef CONFIG_FW_LOADER_USER_HELPER
	list_del_init(&fw_priv->pending_list);
#endif
	complete_all(&fw_st->completion);
	}
}

So the question I have for you is -- what path could be cancelled
which we don't end up calling a respective abort fw_load_abort()?

> Subsequently, if user calls release_firmware() which deallocates the
> fw_priv structure which pending_list is a member of, the entry in the
> list is not deleted. This causes a use-after-free on further attempts
> to add an entry to the list or on list traversal.

Now we're dealing with the possibility of a driver interaction drivers
can be buggy too. So I'm curious what driver triggers this?

> While not problematic in most drivers since this function is mainly used
> in probe or init routines, some drivers expose firmware loading
> functionality via user-accessible functions like write() etc.
> In this case during the sysfs loading process, we can send SIGKILL to the
> thread which is then in kernel, leave the entry in the list and then free
> the structure.

To account for not having to deal with specific drivers we have the
Linux kernel selftests. And so you can test the firmware loader with all
sorts of crazy situations which any driver could use and try to see
if you can re-recreate the issue.

The kernel selftests driver for the firmware loader is in
lib/test_firmware.c and you can use thetools/testing/selftests/firmware/fw_run_tests.sh
to run all the tests. To test the firmware fallback alone you can use
just fw_fallback.sh.

If you want to just virtualize this and you can also use kdevops [0] and
enable the firmware loader selftest and use:;

make menuconfig          #  enable selftests and just the firmware test
make linux               #  build linux, pick linux-next
make selftests
make selftests-firmware

But this may be something more you can use later once you get your flow
going. Just compiling the kernel and running the selftest manually with
fw_fallback.sh should suffice.

[0] https://github.com/linux-kdevops/kdevops

> Example kernel panics with CONFIG_DEBUG_LIST turned on:
> 
> kernel BUG at lib/list_debug.c:25!
> /* ... */
> Call trace:
>  __list_add_valid+0x7c/0x98
>  fw_load_sysfs_fallback+0xd4/0x334
>  fw_load_from_user_helper+0x148/0x1f8
>  firmware_fallback_sysfs+0xe0/0x17c
>  _request_firmware+0x1a0/0x470
>  request_firmware+0x50/0x78
> /* ... */
> 
> or
> 
> kernel BUG at lib/list_debug.c:56!
> /* ... */
> Call trace:
>  __list_del_entry_valid+0xa0/0xa4
>  fw_load_abort+0x38/0x64
>  fw_load_sysfs_fallback+0x354/0x468
>  fw_load_from_user_helper+0x17c/0x1c0
>  firmware_fallback_sysfs+0xc0/0x11c
>  _request_firmware+0xe0/0x4a4
>  request_firmware+0x20/0x2c
> /* ... */

OK so this proves the bug can happen but I'd like to see the full trace
and the exact kernel version showing that this can happen on a recent
kernel. Without that I'm not seeing how this can trigger yet.

It would be easy to prove with the selftests.

  Luis