lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <s5h7dxuizep.wl-tiwai@suse.de>
Date:   Sat, 02 May 2020 09:11:58 +0200
From:   Takashi Iwai <tiwai@...e.de>
To:     Nicholas Johnson <nicholas.johnson-opensource@...look.com.au>
Cc:     Alex Deucher <alexdeucher@...il.com>,
        "Zhou, David(ChunMing)" <David1.Zhou@....com>,
        "alsa-devel@...a-project.org" <alsa-devel@...a-project.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "amd-gfx@...ts.freedesktop.org" <amd-gfx@...ts.freedesktop.org>,
        Takashi Iwai <tiwai@...e.com>, Lukas Wunner <lukas@...ner.de>,
        "Deucher, Alexander" <Alexander.Deucher@....com>,
        "Koenig, Christian" <Christian.Koenig@....com>
Subject: Re: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state

On Thu, 30 Apr 2020 19:49:03 +0200,
Takashi Iwai wrote:
> 
> On Thu, 30 Apr 2020 19:38:16 +0200,
> Nicholas Johnson wrote:
> > 
> > On Thu, Apr 30, 2020 at 07:01:08PM +0200, Takashi Iwai wrote:
> > > On Thu, 30 Apr 2020 18:52:20 +0200,
> > > Nicholas Johnson wrote:
> > > > 
> > > > On Thu, Apr 30, 2020 at 05:14:56PM +0200, Takashi Iwai wrote:
> > > > > On Wed, 29 Apr 2020 18:19:57 +0200,
> > > > > Alex Deucher wrote:
> > > > > > 
> > > > > > On Wed, Apr 29, 2020 at 12:05 PM Takashi Iwai <tiwai@...e.de> wrote:
> > > > > > > Well, but the code path there is the runtime PM resume of the audio
> > > > > > > device and it means that GPU must have been runtime-resumed again
> > > > > > > beforehand via the device link.  So, it should have worked from the
> > > > > > > beginning but in reality not -- that is, apparently some inconsistency
> > > > > > > is found in the initial attempt of the runtime resume...
> > > > > > 
> > > > > > Yeah, it should be covered, but I wonder if there is something in the
> > > > > > ELD update sequence that needs to call pm_runtime_get_sync()?  The ELD
> > > > > > sequence on AMD GPUs doesn't work the same as on other vendors.  The
> > > > > > GPU driver has a backdoor into the HDA device's verbs to set update
> > > > > > the audio state rather than doing it via an ELD buffer update.  We
> > > > > > still update the ELD buffer for consistency.  Maybe when the GPU
> > > > > > driver sets the audio state at monitor detection time that triggers an
> > > > > > interrupt or something on the HDA side which races with the CPU and
> > > > > > the power down of the GPU.  That still seems unlikely though since the
> > > > > > runtime pm on the GPU side defaults to a 5 second suspend timer.
> > > > > 
> > > > > I'm not sure whether it's the race between runtime suspend of GPU vs
> > > > > runtime resume of audio.  My wild guess is rather that it's the timing
> > > > > GPU notifies to the audio; then the audio driver notifies to
> > > > > user-space and user-space opens the stream, which in turn invokes the
> > > > > runtime resume of GPU. But in GPU side, it's still under processing,
> > > > > so it proceeds before the GPU finishes its initialization job.
> > > > > 
> > > > > Nicholas, could you try the patch below and see whether the problem
> > > > > still appears?  The patch artificially delays the notification and ELD
> > > > > update for 300msec.  If this works, it means the timing problem.
> > > > The bug still occurred after applying the patch.
> > > > 
> > > > But you were absolutely correct - it just needed to be increased to 
> > > > 3000ms - then the bug stopped.
> > > 
> > > Interesting.  3 seconds are too long, but I guess 1 second would work
> > > as well?
> > 1000ms indeed worked as well.
> > 
> > > 
> > > In anyway, the success with a long delay means that the sound setup
> > > after the full runtime resume of GPU seems working.
> > > 
> > > > Now the question is, what do we do now that we know this?
> > > > 
> > > > Also, are you still interested in the contents of the ELD# files? I can 
> > > > dump them all into a file at some specific moment in time which you 
> > > > request, if needed.
> > > 
> > > Yes, please take the snapshot before plugging, right after plugging
> > > and right after enabling.  I'm not sure whether your monitor supports
> > > the audio, and ELD contents should show that, at least.
> > The monitor supports the audio. There is 3.5mm audio out jack. No 
> > inbuilt speakers, although Samsung did sell a sound bar to suit it. The 
> > sound bar, which I do not own, presumably attaches via 3.5mm jack.
> > 
> > I am not sure if by plugging, you mean hot-adding Thunderbolt GPU or 
> > plugging the monitor to the GPU, so I have covered extra cases to be 
> > sure. I have taken the eld# files with the 1000ms patch applied, so the 
> > error is not triggered.
> 
> OK, thanks.  If I understand correctly...
> 
> > ####
> > Before hot-adding the Thunderbolt GPU:
> > /proc/asound/card1 not present
> > ####
> > ####
> > After hot-adding the GPU with no monitor attached:
> > 
> > /proc/asound/card1 contains:
> > eld#0.0  eld#0.1  eld#0.2  eld#0.3  eld#0.4  eld#0.5
> > 
> > All of the above have the same contents:
> > 
> > monitor_present         0
> > eld_valid               0
> > ####
> > ####
> > Monitor attached to Fiji GPU but not enabled:
> > 
> > Same as above
> > ####
> > ####
> > Monitor enabled:
> 
> ... the error is triggered at this moment, right?
> 
> 
> > All files with same contents except for eld#0.1 which looks like:
> > 
> > monitor_present         1
> > eld_valid               1
> > monitor_name            U32E850
> > connection_type         DisplayPort
> > eld_version             [0x2] CEA-861D or below
> > edid_version            [0x3] CEA-861-B, C or D
> > manufacture_id          0x2d4c
> > product_id              0xce3
> > port_id                 0x0
> > support_hdcp            0
> > support_ai              0
> > audio_sync_delay        0
> > speakers                [0x1] FL/FR
> > sad_count               1
> > sad0_coding_type        [0x1] LPCM
> > sad0_channels           2
> > sad0_rates              [0xe0] 32000 44100 48000
> > sad0_bits               [0xe0000] 16 20 24
> 
> So your monitor supports the audio :)

Now I took a look at the actual code again, and I found that I
remembered wrongly.  Namely, the device link isn't created in the
audio component framework but only in the graphics side, so currently
only i915 has it.

Could you try the patch below?

Note that i915 binding has no DL_FLAG_PM_RUNTIME as it manages the
power via get_power/put_power ops pair.


thanks,

Takashi

--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -673,6 +673,12 @@ static int amdgpu_dm_audio_component_bind(struct device *kdev,
 	struct amdgpu_device *adev = dev->dev_private;
 	struct drm_audio_component *acomp = data;
 
+	if (!device_link_add(hda_kdev, kdev, DL_FLAG_STATELESS |
+			     DL_FLAG_PM_RUNTIME)) {
+		DRM_ERROR("DM: cannot add device link to audio device\n");
+		return -ENOMEM;
+	}
+
 	acomp->ops = &amdgpu_dm_audio_component_ops;
 	acomp->dev = kdev;
 	adev->dm.audio_component = acomp;
@@ -690,6 +696,8 @@ static void amdgpu_dm_audio_component_unbind(struct device *kdev,
 	acomp->ops = NULL;
 	acomp->dev = NULL;
 	adev->dm.audio_component = NULL;
+
+	device_link_remove(hda_kdev, kdev);
 }
 
 static const struct component_ops amdgpu_dm_audio_component_bind_ops = {

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ