[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170607175922.GM27288@wotan.suse.de>
Date: Wed, 7 Jun 2017 19:59:22 +0200
From: "Luis R. Rodriguez" <mcgrof@...nel.org>
To: "Li, Yi" <yi1.li@...ux.intel.com>
Cc: "Luis R. Rodriguez" <mcgrof@...nel.org>, atull@...nel.org,
gregkh@...uxfoundation.org, wagi@...om.org, dwmw2@...radead.org,
rafal@...ecki.pl, arend.vanspriel@...adcom.com, rjw@...ysocki.net,
moritz.fischer@...us.com, pmladek@...e.com,
johannes.berg@...el.com, emmanuel.grumbach@...el.com,
luciano.coelho@...el.com, kvalo@...eaurora.org, luto@...nel.org,
takahiro.akashi@...aro.org, dhowells@...hat.com, pjones@...hat.com,
linux-kernel@...r.kernel.org, linux-fpga@...r.kernel.org
Subject: Re: [PATCHv2 0/3] Enable no_cache flag to driver_data
On Tue, Jun 06, 2017 at 02:31:54PM -0500, Li, Yi wrote:
> > We use the cache upon suspend to cache the firmware so that upon resume a
> > request will use that cache, to avoid the file lookup on disk. Doing a test
> > with qemu suspend + resume is possible but that requires having access to
> > the qemu monitor interface and doing something like this to trigger a wakeup:
> >
> > echo system_wakeup | socat - /dev/pts/7,raw,echo=0,crnl
BTW if it helps for testing:
https://github.com/mcgrof/kvm-boot
> I am studying at the firmware caching codes and have to say it's very
> complicated. :-( Here are some questions:
It is! For this reason I tried to document it, have you read this:
https://www.kernel.org/doc/html/latest/driver-api/firmware/firmware_cache.html
This is certainly missing the notes about how some of this is loosely re-used
for possible duplicate requests though, but I think I've mentioned this on
my review of your patches so far.
> 1. Since device_cache_fw_images invokes dev_cache_fw_image through
> dpm_for_each_dev, adding a debugfs driver to kick it can only cache firmware
> for those associated with devices which has PM enabled, which do not include
> the driver_data_test_device. Any suggestions?
Ah, consider adding PM to driver_data_test_device ?
May be worth updating the documentation bout this requirement too!
> 2. Look into dev_cache_fw_image function, devres_for_each_res will walk
> through the firmware have been loaded before (through assign_firmware_buf ->
> fw_add_devm_name) and add to the todo list, eventually it will create the
> fw_names list. So in the test driver, we need to load the firmware once
> before calling the kick?
Yes, makes sense, given its a firmware cache, it would only make sense to
cache firmware for requests already made.
An important note I made in the documentation which you did not mention
but should be important for your own sanity:
"The firmware devres entry is maintained throughout the lifetime of the device.
This means that even if you release_firmware() the firmware cache will still be
used on resume from suspend."
This means the cache will be tried even if the driver did use release_firmware()
after request_firmware().
> > static void device_cache_fw_images(void)
> > {
> > ...
> > fwc->state = FW_LOADER_START_CACHE;
> > dpm_for_each_dev(NULL, dev_cache_fw_image);
> > ...
> > }
> This only applies to the devices have PM enabled.
Makes sense!
>
> device_uncache_fw_images_delay() will be called for PM_POST_RESTORE, which
> means the hibernation restore got error. How about the successful restore
> case, calling driver will free its firmware_buf after loading?
As it is on my latest 20170605-driver-data [0] (but should be just as yours if you
used a recent branch of mine), fw_pm_notify() uses:
static int fw_pm_notify(struct notifier_block *notify_block,
unsigned long mode, void *unused)
{
switch (mode) {
...
case PM_POST_SUSPEND:
case PM_POST_HIBERNATION:
case PM_POST_RESTORE:
/*
* In case that system sleep failed and syscore_suspend is
* not called.
*/
mutex_lock(&fw_lock);
fw_cache.state = FW_LOADER_NO_CACHE;
mutex_unlock(&fw_lock);
enable_firmware();
device_uncache_fw_images_delay(10 * MSEC_PER_SEC);
break;
}
}
So it uses also PM_POST_RESTORE. I think that covers it all ? As it
is today we handle the firmware cache for all drivers, it should be
transparent to drivers.
[0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20170605-driver-data
> > To me this last part smells like a possible source of issue (not sure) if we might
> > suspend/resume a lot in short period of time, this theory could be tested by toggling
> > on/of this debugfs interface I suggested while having requests of different types
> > blast in.
> Agree, it might create an issue if the system is get into restore_prepare
> again before the device_uncache_fw_images_delay clear the cache, why we need
> the 10 * MSEC_PER_SEC delay? In theory, fwc->name_lock should protect the
> case though.
One possible solution may be to cancel_delayed_work(&fw_cache.work)
or cancel_delayed_work_sync(&fw_cache.work) on suspend, and then
force it to run *right away* so we clear any pending old cache.
One performance issue with this is we are clearing some possible cache we are
about to re-create, so mod_delayed_work() seems more efficient -- provided we
have accounting notes to know no new firmware requests have trickled in since
the last cache build. This seems rather complicated so a first initial approach
to just kill all known cache before suspend seems easy and fair.
BTW such a fix might be a stable candidate if you find a real behavioural
issue with the kernel !
> > This is the sort of testing which would really help here.
> >
> > Likewise, if you are extending functionality please consider ways to break it :)
> Understand the need to test the firmware caching part. For non-caching test
> case, will it be enough if we can test that the noncache setting will ban
> the firmware name be added to the fwc->fw_names list?
Good question. Let's see... in the future a device might have two requests
with the same firmware, one with a cache enabled and one without it -- I suppose
for now we should perhaps disable a non-cache request if one already exists
with a cache request ? The code for that probably is not there... But other than
this corner case...
Yeah I think that's a reasonable test case. Such a code seems may need a debug
export symbol to allow drivers to query for this info, if so feel free to a
respective debug kconfig entry for it.
Luis
Powered by blists - more mailing lists