linux-kernel - Re: [PATCHv2 0/3] Enable no_cache flag to driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170607175922.GM27288@wotan.suse.de>
Date:   Wed, 7 Jun 2017 19:59:22 +0200
From:   "Luis R. Rodriguez" <mcgrof@...nel.org>
To:     "Li, Yi" <yi1.li@...ux.intel.com>
Cc:     "Luis R. Rodriguez" <mcgrof@...nel.org>, atull@...nel.org,
        gregkh@...uxfoundation.org, wagi@...om.org, dwmw2@...radead.org,
        rafal@...ecki.pl, arend.vanspriel@...adcom.com, rjw@...ysocki.net,
        moritz.fischer@...us.com, pmladek@...e.com,
        johannes.berg@...el.com, emmanuel.grumbach@...el.com,
        luciano.coelho@...el.com, kvalo@...eaurora.org, luto@...nel.org,
        takahiro.akashi@...aro.org, dhowells@...hat.com, pjones@...hat.com,
        linux-kernel@...r.kernel.org, linux-fpga@...r.kernel.org
Subject: Re: [PATCHv2 0/3] Enable no_cache flag to driver_data

On Tue, Jun 06, 2017 at 02:31:54PM -0500, Li, Yi wrote:
> > We use the cache upon suspend to cache the firmware so that upon resume a
> > request will use that cache, to avoid the file lookup on disk. Doing a test
> > with qemu suspend + resume is possible but that requires having access to
> > the qemu monitor interface and doing something like this to trigger a wakeup:
> > 
> > echo system_wakeup | socat - /dev/pts/7,raw,echo=0,crnl

BTW if it helps for testing:

https://github.com/mcgrof/kvm-boot

> I am studying at the firmware caching codes and have to say it's very
> complicated. :-( Here are some questions:

It is! For this reason I tried to document it, have you read this:

https://www.kernel.org/doc/html/latest/driver-api/firmware/firmware_cache.html

This is certainly missing the notes about how some of this is loosely re-used
for possible duplicate requests though, but I think I've mentioned this on
my review of your patches so far.

> 1. Since device_cache_fw_images invokes dev_cache_fw_image through
> dpm_for_each_dev, adding a debugfs driver to kick it can only cache firmware
> for those associated with devices which has PM enabled, which do not include
> the driver_data_test_device. Any suggestions?

Ah, consider adding PM to driver_data_test_device ?

May be worth updating the documentation bout this requirement too!

> 2. Look into dev_cache_fw_image function, devres_for_each_res will walk
> through the firmware have been loaded before (through assign_firmware_buf ->
> fw_add_devm_name) and add to the todo list, eventually it will create the
> fw_names list. So in the test driver, we need to load the firmware once
> before calling the kick?

Yes, makes sense, given its a firmware cache, it would only make sense to
cache firmware for requests already made.

An important note I made in the documentation which you did not mention 
but should be important for your own sanity:

"The firmware devres entry is maintained throughout the lifetime of the device.
This means that even if you release_firmware() the firmware cache will still be
used on resume from suspend."

This means the cache will be tried even if the driver did use release_firmware()
after request_firmware().

> > static void device_cache_fw_images(void)
> > {
> > 	...
> >          fwc->state = FW_LOADER_START_CACHE;
> >          dpm_for_each_dev(NULL, dev_cache_fw_image);
> > 	...
> > }
> This only applies to the devices have PM enabled.

Makes sense!

> 
> device_uncache_fw_images_delay() will be called for PM_POST_RESTORE, which
> means the hibernation restore got error. How about the successful restore
> case, calling driver will free its firmware_buf after loading?

As it is on my latest 20170605-driver-data [0] (but should be just as yours if you
used a recent branch of mine), fw_pm_notify() uses:

static int fw_pm_notify(struct notifier_block *notify_block,
			unsigned long mode, void *unused)
{
	switch (mode) {
	...
	case PM_POST_SUSPEND:
	case PM_POST_HIBERNATION:
	case PM_POST_RESTORE:
		/*
		 * In case that system sleep failed and syscore_suspend is
		 * not called.
		 */
		mutex_lock(&fw_lock);
		fw_cache.state = FW_LOADER_NO_CACHE;
		mutex_unlock(&fw_lock);
		enable_firmware();

		device_uncache_fw_images_delay(10 * MSEC_PER_SEC);
		break;
        } 
}

So it uses also PM_POST_RESTORE. I think that covers it all ? As it
is today we handle the firmware cache for all drivers, it should be
transparent to drivers.

[0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20170605-driver-data

> > To me this last part smells like a possible source of issue (not sure) if we might
> > suspend/resume a lot in short period of time, this theory could be tested by toggling
> > on/of this debugfs interface I suggested while having requests of different types
> > blast in.
> Agree, it might create an issue if the system is get into restore_prepare
> again before the device_uncache_fw_images_delay clear the cache, why we need
> the 10 * MSEC_PER_SEC delay? In theory, fwc->name_lock should protect the
> case though.

One possible solution may be to cancel_delayed_work(&fw_cache.work)
or cancel_delayed_work_sync(&fw_cache.work) on suspend, and then
force it to run *right away* so we clear any pending old cache.

One performance issue with this is we are clearing some possible cache we are
about to re-create, so mod_delayed_work() seems more efficient -- provided we
have accounting notes to know no new firmware requests have trickled in since
the last cache build. This seems rather complicated so a first initial approach
to just kill all known cache before suspend seems easy and fair.

BTW such a fix might be a stable candidate if you find a real behavioural
issue with the kernel !

> > This is the sort of testing which would really help here.
> > 
> > Likewise, if you are extending functionality please consider ways to break it :)
> Understand the need to test the firmware caching part. For non-caching test
> case, will it be enough if we can test that the noncache setting will ban
> the firmware name be added to the fwc->fw_names list?

Good question. Let's see...  in the future a device might have two requests
with the same firmware, one with a cache enabled and one without it -- I suppose
for now we should perhaps disable a non-cache request if one already exists
with a cache request ? The code for that probably is not there... But other than
this corner case...

Yeah I think that's a reasonable test case. Such a code seems may need a debug
export symbol to allow drivers to query for this info, if so feel free to a
respective debug kconfig entry for it.

  Luis