linux-kernel - Re: [PATCH] virt: acrn: fix invalid check past list iterator

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <4198F3E8-7FBF-4A54-9AD3-C8599AA2702F@gmail.com>
Date:   Fri, 1 Apr 2022 11:24:43 +0200
From:   Jakob Koschel <jakobkoschel@...il.com>
To:     Li Fei1 <fei1.li@...el.com>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        rppt@...nel.org, bjohannesmeyer@...il.com, c.giuffrida@...nl,
        h.j.bos@...nl
Subject: Re: [PATCH] virt: acrn: fix invalid check past list iterator



> On 1. Apr 2022, at 11:12, Li Fei1 <fei1.li@...el.com> wrote:
> 
> On Fri, Apr 01, 2022 at 11:08:02AM +0200, Jakob Koschel wrote:
>> 
>> 
>>> On 1. Apr 2022, at 11:05, Li Fei1 <fei1.li@...el.com> wrote:
>>> 
>>> On Fri, Apr 01, 2022 at 10:50:51AM +0200, Jakob Koschel wrote:
>>>> 
>>>> 
>>>>> On 1. Apr 2022, at 09:57, Li Fei1 <fei1.li@...el.com> wrote:
>>>>> 
>>>>> On Fri, Apr 01, 2022 at 09:16:48AM +0200, Jakob Koschel wrote:
>>>>>> 
>>>>>> 
>>>>>>> On 1. Apr 2022, at 05:57, Li Fei1 <fei1.li@...el.com> wrote:
>>>>>>> 
>>>>>>> On Fri, Apr 01, 2022 at 05:22:36AM +0200, Jakob Koschel wrote:
>>>>>>>> 
>>>>>>>>> On 1. Apr 2022, at 03:15, Li Fei1 <fei1.li@...el.com> wrote:
>>>>>>>>> 
>>>>>>>>> On Thu, Mar 31, 2022 at 01:20:50PM +0200, Jakob Koschel wrote:
>>>>>>>>>> 
>>>>>>>>>>> On 30. Mar 2022, at 09:57, Li Fei1 <fei1.li@...el.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> On Sat, Mar 19, 2022 at 09:38:19PM +0100, Jakob Koschel wrote:
>>>>>>>>>>>> The condition retry == 0 is theoretically possible even if 'client'
>>>>>>>>>>>> does not point to a valid element because no break was hit.
>>>>>>>>>>>> 
>>>>>>>>>>>> To only execute the dev_warn if actually a break within the loop was
>>>>>>>>>>>> hit, a separate variable is used that is only set if it is ensured to
>>>>>>>>>>>> point to a valid client struct.
>>>>>>>>>>>> 
>>>>>>>>>>> Hi Koschel
>>>>>>>>>>> 
>>>>>>>>>>> Thanks for you to help us to try to improve the code. Maybe you don't get the point.
>>>>>>>>>>> The dev_warn should only been called when has_pending = true && retry == 0
>>>>>>>>>> 
>>>>>>>>>> Maybe I don't understand but looking isolated at this function I could see a way to call
>>>>>>>>>> the dev_warn() with has_pending = false && retry == 0.
>>>>>>>>> Yes, even has_pending = true && retry == 0 at the beginning, we could not make sure
>>>>>>>>> has_pending is true after schedule_timeout_interruptible and we even didn't check
>>>>>>>>> there're other pending client on the ioreq_clients (because we can't make sure
>>>>>>>>> when we dev_warn this clent is still pending). So we just use dev_warn not higher log level.
>>>>>>>> 
>>>>>>>> I'm sorry, I don't quite understand what you mean by that.
>>>>>>>> Do you agree that has_pending = false && retry == 0 is possible when calling the dev_warn()
>>>>>>>> or not?
>>>>>>>> 
>>>>>>> Yes, so what ? It just a hint there may have pending request.
>>>>>> 
>>>>>> if has_pending == false && retry == 0 when calling dev_warn() there are very clear
>>>>>> dependencies met:
>>>>>> 
>>>>>> * has_pending == false means that the list_for_each_entry() macro it *not* exit early.
>>>>>> * since list_for_each_entry() did *not* exit early, client will not hold a valid list entry
>>>>>> * using client->name is not safe and will not point to a valid string (perhaps not even an address)
>>>>> So you'd better to check when the client in ioreq_clients would been destroyed, right ?
>>>> 
>>>> I'm afraid I don't know exactly what you mean here.
>>>> 
>>>> More specifically:
>>>> to check what? and I'm also not sure what you mean with "when client in ioreq_clients would been destroyed".
>>>> 
>>>> Actually thinking about it more the check should be
>>>> 
>>>> 	if (client && retry == 0)
>>>> 
>>>> to be correct.
>>> The client in ioreq_clients would always been "valid" (here valid means the client struct would not
>>> been destroyed) when this function been called. That's guaranteed by the code logic.
>> 
>> Now I'm very confused.
>> Didn't you say the dev_msg() can be called with has_pending == false && retry == 0?
>> Then the 'client' used in the dev_msg() cannot be valid.
> I think I would not reply you before you understand how ACRN ioreq works.

I've asked you before if dev_msg() can be called with such requirements. Here:

>> Do you agree that has_pending = false && retry == 0 is possible when calling the dev_warn()
>> or not?
>> 
> Yes, so what ? It just a hint there may have pending request.

I'm just making the point that in such a case 'client' cannot be valid. For that I don't
need to understand the entirety how ACRN ioreq. Hence I asked you that question.

If you can't keep this to an objective discussion I'll just leave it as is.

> 
>> 
>>>> 
>>>> if has_pending == false I read the code as if no client was found that has a pending request 
>>>> so I don't follow how:
>>>> 
>>>> 	"%s cannot flush pending request!\n", client->name);
>>>> 
>>>> can be valid since no client has a pending request.
>>>> 
>>>>> 
>>>>>> 
>>>>>> I'm *only* talking about the case where has_pending == false, in case that's not clear.
>>>>>> 
>>>>>> 
>>>>>>> Even retry == 0 && has_pending = true,
>>>>>>> When we call dev_warn, the clent may not is pending.
>>>>>>> 
>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 		list_for_each_entry(client, &vm->ioreq_clients, list) {
>>>>>>>>>>> 			has_pending = has_pending_request(client);
>>>>>>>>>>> 			if (has_pending)
>>>>>>>>>>> 		}
>>>>>>>>>>> 		spin_unlock_bh(&vm->ioreq_clients_lock);
>>>>>>>>>> 
>>>>>>>>>> imagine has_pending == false && retry == 1 here, then client will not hold a valid list entry.
>>>>>>>>> What do you mean "client will not hold a valid list entry" ? 
>>>>>>>> 
>>>>>>>> Imagine a very simple example:
>>>>>>>> 
>>>>>>>> 	struct acrn_ioreq_client *client;
>>>>>>>> 	list_for_each_entry(client, &vm->ioreq_clients, list) {
>>>>>>>> 		continue;
>>>>>>>> 	}
>>>>>>>> 
>>>>>>>> 	dev_warn(acrn_dev.this_device,
>>>>>>>> 		 "%s cannot flush pending request!\n", client->name); /* NOT GOOD */
>>>>>>>> 
>>>>>>> If there're pending request, we would call schedule to schedule out then schedule back
>>>>>>> to check the list from the beginning. If there's no pending client, we point to the last
>>>>>>> client and break out the while loop.
>>>>>>> 
>>>>>>> The code doesn't want to find the pending client and break out the while loop and call
>>>>>>> dev_warn. Please see the function comment.
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> Since there is no break for the list_for_each_entry() iterator to return early,
>>>>>>>> client *cannot* be a valid entry of the list. In fact if you look at the list_for_each_entry()
>>>>>>>> macro, it will be an 'bogus' pointer, pointing somewhere into 'vm'.
>>>>>>>> Essentially before the terminating condition of the list traversal the following code is called:
>>>>>>>> 
>>>>>>>> 	list_entry(&vm->ioreq_clients, struct acrn_ioreq_client *, list);
>>>>>>>> 
>>>>>>>> resulting in a:
>>>>>>>> 
>>>>>>>> 	container_of(&vm->ioreq_clients, struct acrn_ioreq_client *, list);
>>>>>>>> 
>>>>>>>> &vm->ioreq_clients however is not contained in a struct acrn_ioreq_client, making
>>>>>>>> this call compute an invalid pointer.
>>>>>>>> Therefore using 'client' as in the example above (e.g. client->name) after the loop is
>>>>>>>> not safe. Since the loop can never return early in the simple example above it will
>>>>>>>> always break. On other cases (the one we are discussing here) there might be a chance that
>>>>>>>> there is one code path (in theory) where the loop did not exit early and 'client'
>>>>>>>> holds that 'invalid entry'.
>>>>>>>> 
>>>>>>>> This would be the case with has_pending = false && retry == 0.
>>>>>>>> 
>>>>>>>> I hope this makes sense.
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 		if (has_pending)
>>>>>>>>>>> 			schedule_timeout_interruptible(HZ / 100);
>>>>>>>>>>> 	} while (has_pending && --retry > 0);
>>>>>>>>>> 
>>>>>>>>>> since has_pending && --retry > 0 is no longer true the loop stops.
>>>>>>>>>> 
>>>>>>>>>>> 	if (retry == 0)
>>>>>>>>>>> 		dev_warn(acrn_dev.this_device,
>>>>>>>>>>> 			 "%s cannot flush pending request!\n", client->name);
>>>>>>>>>> client->name is accessed since retry == 0 now, but client is not a valid struct ending up
>>>>>>>>>> in a type confusion.
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> If retry > 0 and has_pending is true, we would call schedule_timeout_interruptible
>>>>>>>>>>> to schedule out to wait all the pending I/O requests would been completed.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks.
>>>>>>>>>> 
>>>>>>>>>> Again, I'm not sure if this is realistically possible. I'm trying to remove
>>>>>>>>>> any use of the list iterator after the loop to make such potentially issues detectable
>>>>>>>>> You may think we still in the loop (could we ?), even that we didn't write the list iterator then.
>>>>>>>> 
>>>>>>>> I'm not exactly sure which loop you are referring to but no, I don't think we are still in
>>>>>>>> the do while loop.
>>>>>>>> 
>>>>>>>> The only thing we know after the do while loop is that: !has_pending || retry == 0
>>>>>>>> And iff has_pending && retry == 0, then we shouldn't call the dev_warn().
>>>>>>>> 
>>>>>>>>>> at compile time instead of relying on certain (difficult to maintain) conditions to be met
>>>>>>>>>> to avoid the type confusion.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Jakob
>>>>>>>> 
>>>>>>>> 	Jakob
>>>>>> 
>>>>>> 	Jakob