linux-kernel - Re: [PATCHv2 0/3] Enable no_cache flag to driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <917cb457-d0ae-aee6-246e-b3d80e6a9c45@linux.intel.com>
Date:   Wed, 7 Jun 2017 16:00:42 -0500
From:   "Li, Yi" <yi1.li@...ux.intel.com>
To:     "Luis R. Rodriguez" <mcgrof@...nel.org>
Cc:     atull@...nel.org, gregkh@...uxfoundation.org, wagi@...om.org,
        dwmw2@...radead.org, rafal@...ecki.pl,
        arend.vanspriel@...adcom.com, rjw@...ysocki.net,
        moritz.fischer@...us.com, pmladek@...e.com,
        johannes.berg@...el.com, emmanuel.grumbach@...el.com,
        luciano.coelho@...el.com, kvalo@...eaurora.org, luto@...nel.org,
        takahiro.akashi@...aro.org, dhowells@...hat.com, pjones@...hat.com,
        linux-kernel@...r.kernel.org, linux-fpga@...r.kernel.org
Subject: Re: [PATCHv2 0/3] Enable no_cache flag to driver_data

On 6/7/2017 12:59 PM, Luis R. Rodriguez wrote:
> On Tue, Jun 06, 2017 at 02:31:54PM -0500, Li, Yi wrote:
>>> We use the cache upon suspend to cache the firmware so that upon resume a
>>> request will use that cache, to avoid the file lookup on disk. Doing a test
>>> with qemu suspend + resume is possible but that requires having access to
>>> the qemu monitor interface and doing something like this to trigger a wakeup:
>>>
>>> echo system_wakeup | socat - /dev/pts/7,raw,echo=0,crnl
>
> BTW if it helps for testing:
>
> https://github.com/mcgrof/kvm-boot

Thanks, will explore that.

>
>> I am studying at the firmware caching codes and have to say it's very
>> complicated. :-( Here are some questions:
>
> It is! For this reason I tried to document it, have you read this:
>
> https://www.kernel.org/doc/html/latest/driver-api/firmware/firmware_cache.html

Good info, thanks.

>
> This is certainly missing the notes about how some of this is loosely re-used
> for possible duplicate requests though, but I think I've mentioned this on
> my review of your patches so far.
>
>> 1. Since device_cache_fw_images invokes dev_cache_fw_image through
>> dpm_for_each_dev, adding a debugfs driver to kick it can only cache firmware
>> for those associated with devices which has PM enabled, which do not include
>> the driver_data_test_device. Any suggestions?
>
> Ah, consider adding PM to driver_data_test_device ?
>

That's what I am looking now. But per my understanding, the misc_device 
does not have the hook to add PM support like those platform device 
drivers do. Please let me know if there is a way to do that.

> May be worth updating the documentation bout this requirement too!
>
>> 2. Look into dev_cache_fw_image function, devres_for_each_res will walk
>> through the firmware have been loaded before (through assign_firmware_buf ->
>> fw_add_devm_name) and add to the todo list, eventually it will create the
>> fw_names list. So in the test driver, we need to load the firmware once
>> before calling the kick?
>
> Yes, makes sense, given its a firmware cache, it would only make sense to
> cache firmware for requests already made.
>
> An important note I made in the documentation which you did not mention
> but should be important for your own sanity:
>
> "The firmware devres entry is maintained throughout the lifetime of the device.
> This means that even if you release_firmware() the firmware cache will still be
> used on resume from suspend."
>
> This means the cache will be tried even if the driver did use release_firmware()
> after request_firmware().

Yes, it makes sense and good to know and it's explained in the 
kernel.org link you pointed out earlier. The devres entry records all 
the firmware have been loaded before, it will save restore prepare time 
for all the drivers to mark their firmware "no cache" if there is no 
need to reload the firmware during restore.

>
>>> static void device_cache_fw_images(void)
>>> {
>>> 	...
>>>          fwc->state = FW_LOADER_START_CACHE;
>>>          dpm_for_each_dev(NULL, dev_cache_fw_image);
>>> 	...
>>> }
>> This only applies to the devices have PM enabled.
>
> Makes sense!
>
>>
>> device_uncache_fw_images_delay() will be called for PM_POST_RESTORE, which
>> means the hibernation restore got error. How about the successful restore
>> case, calling driver will free its firmware_buf after loading?
>
> As it is on my latest 20170605-driver-data [0] (but should be just as yours if you
> used a recent branch of mine), fw_pm_notify() uses:
>
> static int fw_pm_notify(struct notifier_block *notify_block,
> 			unsigned long mode, void *unused)
> {
> 	switch (mode) {
> 	...
> 	case PM_POST_SUSPEND:
> 	case PM_POST_HIBERNATION:
> 	case PM_POST_RESTORE:
> 		/*
> 		 * In case that system sleep failed and syscore_suspend is
> 		 * not called.
> 		 */
> 		mutex_lock(&fw_lock);
> 		fw_cache.state = FW_LOADER_NO_CACHE;
> 		mutex_unlock(&fw_lock);
> 		enable_firmware();
>
> 		device_uncache_fw_images_delay(10 * MSEC_PER_SEC);
> 		break;
>         }
> }
>
> So it uses also PM_POST_RESTORE. I think that covers it all ? As it
> is today we handle the firmware cache for all drivers, it should be
> transparent to drivers.

Ah, the comment of "sleep failed" mislead me. :)

>
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20170605-driver-data
>
>>> To me this last part smells like a possible source of issue (not sure) if we might
>>> suspend/resume a lot in short period of time, this theory could be tested by toggling
>>> on/of this debugfs interface I suggested while having requests of different types
>>> blast in.
>> Agree, it might create an issue if the system is get into restore_prepare
>> again before the device_uncache_fw_images_delay clear the cache, why we need
>> the 10 * MSEC_PER_SEC delay? In theory, fwc->name_lock should protect the
>> case though.
>
> One possible solution may be to cancel_delayed_work(&fw_cache.work)
> or cancel_delayed_work_sync(&fw_cache.work) on suspend, and then
> force it to run *right away* so we clear any pending old cache.
>
> One performance issue with this is we are clearing some possible cache we are
> about to re-create, so mod_delayed_work() seems more efficient -- provided we
> have accounting notes to know no new firmware requests have trickled in since
> the last cache build. This seems rather complicated so a first initial approach
> to just kill all known cache before suspend seems easy and fair.
>
> BTW such a fix might be a stable candidate if you find a real behavioural
> issue with the kernel !
>
>>> This is the sort of testing which would really help here.
>>>
>>> Likewise, if you are extending functionality please consider ways to break it :)
>> Understand the need to test the firmware caching part. For non-caching test
>> case, will it be enough if we can test that the noncache setting will ban
>> the firmware name be added to the fwc->fw_names list?
>
> Good question. Let's see...  in the future a device might have two requests
> with the same firmware, one with a cache enabled and one without it -- I suppose
> for now we should perhaps disable a non-cache request if one already exists
> with a cache request ? The code for that probably is not there... But other than
> this corner case...
>
> Yeah I think that's a reasonable test case. Such a code seems may need a debug
> export symbol to allow drivers to query for this info, if so feel free to a
> respective debug kconfig entry for it.

Sounds good, thanks!

>
>   Luis
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fpga" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>