linux-kernel - Re: [PATCH] virt: acrn: fix invalid check past list iterator

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <F54F358F-FFBC-4D6D-AB5C-9BC4A617C9BF@gmail.com>
Date:   Fri, 1 Apr 2022 05:22:36 +0200
From:   Jakob Koschel <jakobkoschel@...il.com>
To:     Li Fei1 <fei1.li@...el.com>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Mike Rapoport <rppt@...nel.org>,
        Brian Johannesmeyer <bjohannesmeyer@...il.com>,
        Cristiano Giuffrida <c.giuffrida@...nl>, h.j.bos@...nl
Subject: Re: [PATCH] virt: acrn: fix invalid check past list iterator


> On 1. Apr 2022, at 03:15, Li Fei1 <fei1.li@...el.com> wrote:
> 
> On Thu, Mar 31, 2022 at 01:20:50PM +0200, Jakob Koschel wrote:
>> 
>>> On 30. Mar 2022, at 09:57, Li Fei1 <fei1.li@...el.com> wrote:
>>> 
>>> On Sat, Mar 19, 2022 at 09:38:19PM +0100, Jakob Koschel wrote:
>>>> The condition retry == 0 is theoretically possible even if 'client'
>>>> does not point to a valid element because no break was hit.
>>>> 
>>>> To only execute the dev_warn if actually a break within the loop was
>>>> hit, a separate variable is used that is only set if it is ensured to
>>>> point to a valid client struct.
>>>> 
>>> Hi Koschel
>>> 
>>> Thanks for you to help us to try to improve the code. Maybe you don't get the point.
>>> The dev_warn should only been called when has_pending = true && retry == 0
>> 
>> Maybe I don't understand but looking isolated at this function I could see a way to call
>> the dev_warn() with has_pending = false && retry == 0.
> Yes, even has_pending = true && retry == 0 at the beginning, we could not make sure
> has_pending is true after schedule_timeout_interruptible and we even didn't check
> there're other pending client on the ioreq_clients (because we can't make sure
> when we dev_warn this clent is still pending). So we just use dev_warn not higher log level.

I'm sorry, I don't quite understand what you mean by that.
Do you agree that has_pending = false && retry == 0 is possible when calling the dev_warn()
or not?

>> 
>>> 		list_for_each_entry(client, &vm->ioreq_clients, list) {
>>> 			has_pending = has_pending_request(client);
>>> 			if (has_pending)
>>> 		}
>>> 		spin_unlock_bh(&vm->ioreq_clients_lock);
>> 
>> imagine has_pending == false && retry == 1 here, then client will not hold a valid list entry.
> What do you mean "client will not hold a valid list entry" ? 

Imagine a very simple example:

	struct acrn_ioreq_client *client;
	list_for_each_entry(client, &vm->ioreq_clients, list) {
		continue;
	}

	dev_warn(acrn_dev.this_device,
		 "%s cannot flush pending request!\n", client->name); /* NOT GOOD */


Since there is no break for the list_for_each_entry() iterator to return early,
client *cannot* be a valid entry of the list. In fact if you look at the list_for_each_entry()
macro, it will be an 'bogus' pointer, pointing somewhere into 'vm'.
Essentially before the terminating condition of the list traversal the following code is called:

	list_entry(&vm->ioreq_clients, struct acrn_ioreq_client *, list);

resulting in a:

	container_of(&vm->ioreq_clients, struct acrn_ioreq_client *, list);

&vm->ioreq_clients however is not contained in a struct acrn_ioreq_client, making
this call compute an invalid pointer.
Therefore using 'client' as in the example above (e.g. client->name) after the loop is
not safe. Since the loop can never return early in the simple example above it will
always break. On other cases (the one we are discussing here) there might be a chance that
there is one code path (in theory) where the loop did not exit early and 'client'
holds that 'invalid entry'.

This would be the case with has_pending = false && retry == 0.

I hope this makes sense.

> 
>> 
>>> 
>>> 		if (has_pending)
>>> 			schedule_timeout_interruptible(HZ / 100);
>>> 	} while (has_pending && --retry > 0);
>> 
>> since has_pending && --retry > 0 is no longer true the loop stops.
>> 
>>> 	if (retry == 0)
>>> 		dev_warn(acrn_dev.this_device,
>>> 			 "%s cannot flush pending request!\n", client->name);
>> client->name is accessed since retry == 0 now, but client is not a valid struct ending up
>> in a type confusion.
>> 
>>> 
>>> If retry > 0 and has_pending is true, we would call schedule_timeout_interruptible
>>> to schedule out to wait all the pending I/O requests would been completed.
>>> 
>>> Thanks.
>> 
>> Again, I'm not sure if this is realistically possible. I'm trying to remove
>> any use of the list iterator after the loop to make such potentially issues detectable
> You may think we still in the loop (could we ?), even that we didn't write the list iterator then.

I'm not exactly sure which loop you are referring to but no, I don't think we are still in
the do while loop.

The only thing we know after the do while loop is that: !has_pending || retry == 0
And iff has_pending && retry == 0, then we shouldn't call the dev_warn().

>> at compile time instead of relying on certain (difficult to maintain) conditions to be met
>> to avoid the type confusion.
>> 
>> Thanks,
>> Jakob

	Jakob