lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20c49150-8de3-4230-4ba1-b4db6b68d096@linux.intel.com>
Date:   Mon, 22 Aug 2016 09:27:34 +0200
From:   Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>
To:     Lyude Paul <cpaul@...hat.com>, intel-gfx@...ts.freedesktop.org,
        Ville Syrjälä <ville.syrjala@...ux.intel.com>,
        Matt Roper <matthew.d.roper@...el.com>
Cc:     Daniel Vetter <daniel.vetter@...ll.ch>,
        linux-kernel@...r.kernel.org, dri-devel@...ts.freedesktop.org,
        Daniel Vetter <daniel.vetter@...el.com>, stable@...r.kernel.org
Subject: Re: [PATCH v12 2/7] drm/i915/skl: Add support for the SAGV, fix
 underrun hangs

Op 18-08-16 om 16:05 schreef Lyude Paul:
> On Thu, 2016-08-18 at 09:39 +0200, Maarten Lankhorst wrote:
>> Hey,
>>
>> Op 17-08-16 om 21:55 schreef Lyude:
>>> Since the watermark calculations for Skylake are still broken, we're apt
>>> to hitting underruns very easily under multi-monitor configurations.
>>> While it would be lovely if this was fixed, it's not. Another problem
>>> that's been coming from this however, is the mysterious issue of
>>> underruns causing full system hangs. An easy way to reproduce this with
>>> a skylake system:
>>>
>>> - Get a laptop with a skylake GPU, and hook up two external monitors to
>>>   it
>>> - Move the cursor from the built-in LCD to one of the external displays
>>>   as quickly as you can
>>> - You'll get a few pipe underruns, and eventually the entire system will
>>>   just freeze.
>>>
>>> After doing a lot of investigation and reading through the bspec, I
>>> found the existence of the SAGV, which is responsible for adjusting the
>>> system agent voltage and clock frequencies depending on how much power
>>> we need. According to the bspec:
>>>
>>> "The display engine access to system memory is blocked during the
>>>  adjustment time. SAGV defaults to enabled. Software must use the
>>>  GT-driver pcode mailbox to disable SAGV when the display engine is not
>>>  able to tolerate the blocking time."
>>>
>>> The rest of the bspec goes on to explain that software can simply leave
>>> the SAGV enabled, and disable it when we use interlaced pipes/have more
>>> then one pipe active.
>>>
>>> Sure enough, with this patchset the system hangs resulting from pipe
>>> underruns on Skylake have completely vanished on my T460s. Additionally,
>>> the bspec mentions turning off the SAGV	with more then one pipe
>>> enabled
>>> as a workaround for display underruns. While this patch doesn't entirely
>>> fix that, it looks like it does improve the situation a little bit so
>>> it's likely this is going to be required to make watermarks on Skylake
>>> fully functional.
>>>
>>> This will still need additional work in the future: we shouldn't be
>>> enabling the SAGV if any of the currently enabled planes can't enable WM
>>> levels that introduce latencies >= 30 µs.
>>>
>>> Changes since v11:
>>>  - Add skl_can_enable_sagv()
>>>  - Make sure we don't enable SAGV when not all planes can enable
>>>    watermarks >= the SAGV engine block time. I was originally going to
>>>    save this for later, but I recently managed to run into a machine
>>>    that was having problems with a single pipe configuration + SAGV.
>>>  - Make comparisons to I915_SKL_SAGV_NOT_CONTROLLED explicit
>>>  - Change I915_SAGV_DYNAMIC_FREQ to I915_SAGV_ENABLE
>>>  - Move printks outside of mutexes
>>>  - Don't print error messages twice
>>> Changes since v10:
>>>  - Apparently sandybridge_pcode_read actually writes values and reads
>>>    them back, despite it's misleading function name. This means we've
>>>    been doing this mostly wrong and have been writing garbage to the
>>>    SAGV control. Because of this, we no longer attempt to read the SAGV
>>>    status during initialization (since there are no helpers for this).
>>>  - mlankhorst noticed that this patch was breaking on some very early
>>>    pre-release Skylake machines, which apparently don't allow you to
>>>    disable the SAGV. To prevent machines from failing tests due to SAGV
>>>    errors, if the first time we try to control the SAGV results in the
>>>    mailbox indicating an invalid command, we just disable future attempts
>>>    to control the SAGV state by setting dev_priv->skl_sagv_status to
>>>    I915_SKL_SAGV_NOT_CONTROLLED and make a note of it in dmesg.
>>>  - Move mutex_unlock() a little higher in skl_enable_sagv(). This
>>>    doesn't actually fix anything, but lets us release the lock a little
>>>    sooner since we're finished with it.
>>> Changes since v9:
>>>  - Only enable/disable sagv on Skylake
>>> Changes since v8:
>>>  - Add intel_state->modeset guard to the conditional for
>>>    skl_enable_sagv()
>>> Changes since v7:
>>>  - Remove GEN9_SAGV_LOW_FREQ, replace with GEN9_SAGV_IS_ENABLED (that's
>>>    all we use it for anyway)
>>>  - Use GEN9_SAGV_IS_ENABLED instead of 0x1 for clarification
>>>  - Fix a styling error that snuck past me
>>> Changes since v6:
>>>  - Protect skl_enable_sagv() with intel_state->modeset conditional in
>>>    intel_atomic_commit_tail()
>>> Changes since v5:
>>>  - Don't use is_power_of_2. Makes things confusing
>>>  - Don't use the old state to figure out whether or not to
>>>    enable/disable the sagv, use the new one
>>>  - Split the loop in skl_disable_sagv into it's own function
>>>  - Move skl_sagv_enable/disable() calls into intel_atomic_commit_tail()
>>> Changes since v4:
>>>  - Use is_power_of_2 against active_crtcs to check whether we have > 1
>>>    pipe enabled
>>>  - Fix skl_sagv_get_hw_state(): (temp & 0x1) indicates disabled, 0x0
>>>    enabled
>>>  - Call skl_sagv_enable/disable() from pre/post-plane updates
>>> Changes since v3:
>>>  - Use time_before() to compare timeout to jiffies
>>> Changes since v2:
>>>  - Really apply minor style nitpicks to patch this time
>>> Changes since v1:
>>>  - Added comments about this probably being one of the requirements to
>>>    fixing Skylake's watermark issues
>>>  - Minor style nitpicks from Matt Roper
>>>  - Disable these functions on Broxton, since it doesn't have an SAGV
>>>
>>> Signed-off-by: Lyude <cpaul@...hat.com>
>>> Cc: Matt Roper <matthew.d.roper@...el.com>
>>> Cc: Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>
>>> Cc: Daniel Vetter <daniel.vetter@...ll.ch>
>>> Cc: Ville Syrjälä <ville.syrjala@...ux.intel.com>
>>> Cc: stable@...r.kernel.org
>>> ---
>>>  drivers/gpu/drm/i915/i915_drv.h      |   7 ++
>>>  drivers/gpu/drm/i915/i915_reg.h      |   4 +
>>>  drivers/gpu/drm/i915/intel_display.c |  11 +++
>>>  drivers/gpu/drm/i915/intel_drv.h     |   3 +
>>>  drivers/gpu/drm/i915/intel_pm.c      | 148
>>> +++++++++++++++++++++++++++++++++++
>>>  5 files changed, 173 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h
>>> b/drivers/gpu/drm/i915/i915_drv.h
>>> index 35caa9b..f20530bb 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -1949,6 +1949,13 @@ struct drm_i915_private {
>>>  	struct i915_suspend_saved_registers regfile;
>>>  	struct vlv_s0ix_state vlv_s0ix_state;
>>>  
>>> +	enum {
>>> +		I915_SKL_SAGV_UNKNOWN = 0,
>>> +		I915_SKL_SAGV_DISABLED,
>>> +		I915_SKL_SAGV_ENABLED,
>>> +		I915_SKL_SAGV_NOT_CONTROLLED
>>> +	} skl_sagv_status;
>>> +
>>>  	struct {
>>>  		/*
>>>  		 * Raw watermark latency values:
>>> diff --git a/drivers/gpu/drm/i915/i915_reg.h
>>> b/drivers/gpu/drm/i915/i915_reg.h
>>> index 7419fbf..be82c49 100644
>>> --- a/drivers/gpu/drm/i915/i915_reg.h
>>> +++ b/drivers/gpu/drm/i915/i915_reg.h
>>> @@ -7153,6 +7153,10 @@ enum {
>>>  #define   HSW_PCODE_DE_WRITE_FREQ_REQ		0x17
>>>  #define   DISPLAY_IPS_CONTROL			0x19
>>>  #define	  HSW_PCODE_DYNAMIC_DUTY_CYCLE_CONTROL	0x1A
>>> +#define   GEN9_PCODE_SAGV_CONTROL		0x21
>>> +#define     GEN9_SAGV_DISABLE			0x0
>>> +#define     GEN9_SAGV_IS_DISABLED		0x1
>>> +#define     GEN9_SAGV_ENABLE 	             0x3
>>>  #define GEN6_PCODE_DATA				_MMIO(0x138128)
>>>  #define   GEN6_PCODE_FREQ_IA_RATIO_SHIFT	8
>>>  #define   GEN6_PCODE_FREQ_RING_RATIO_SHIFT	16
>>> diff --git a/drivers/gpu/drm/i915/intel_display.c
>>> b/drivers/gpu/drm/i915/intel_display.c
>>> index 781d22e..ca4b83f 100644
>>> --- a/drivers/gpu/drm/i915/intel_display.c
>>> +++ b/drivers/gpu/drm/i915/intel_display.c
>>> @@ -14156,6 +14156,13 @@ static void intel_atomic_commit_tail(struct
>>> drm_atomic_state *state)
>>>  		     intel_state->cdclk_pll_vco != dev_priv-
>>>> cdclk_pll.vco))
>>>  			dev_priv->display.modeset_commit_cdclk(state);
>>>  
>>> +		/*
>>> +		 * SKL workaround: bspec recommends we disable the SAGV
>>> when we
>>> +		 * have more then one pipe enabled
>>> +		 */
>>> +		if (IS_SKYLAKE(dev_priv) && !skl_can_enable_sagv(state))
>>> +			skl_disable_sagv(dev_priv);
>>> +
>>>  		intel_modeset_verify_disabled(dev);
>>>  	}
>>>  
>>> @@ -14229,6 +14236,10 @@ static void intel_atomic_commit_tail(struct
>>> drm_atomic_state *state)
>>>  		intel_modeset_verify_crtc(crtc, old_crtc_state, crtc-
>>>> state);
>>>  	}
>>>  
>>> +	if (IS_SKYLAKE(dev_priv) && intel_state->modeset &&
>>> +	    skl_can_enable_sagv(state))
>>> +		skl_enable_sagv(dev_priv);
>>> +
>>>  	drm_atomic_helper_commit_hw_done(state);
>>>  
>>>  	if (intel_state->modeset)
>>> diff --git a/drivers/gpu/drm/i915/intel_drv.h
>>> b/drivers/gpu/drm/i915/intel_drv.h
>>> index 1c700b0..d203c77 100644
>>> --- a/drivers/gpu/drm/i915/intel_drv.h
>>> +++ b/drivers/gpu/drm/i915/intel_drv.h
>>> @@ -1722,6 +1722,9 @@ void ilk_wm_get_hw_state(struct drm_device *dev);
>>>  void skl_wm_get_hw_state(struct drm_device *dev);
>>>  void skl_ddb_get_hw_state(struct drm_i915_private *dev_priv,
>>>  			  struct skl_ddb_allocation *ddb /* out */);
>>> +bool skl_can_enable_sagv(struct drm_atomic_state *state);
>>> +int skl_enable_sagv(struct drm_i915_private *dev_priv);
>>> +int skl_disable_sagv(struct drm_i915_private *dev_priv);
>>>  uint32_t ilk_pipe_pixel_rate(const struct intel_crtc_state *pipe_config);
>>>  bool ilk_disable_lp_wm(struct drm_device *dev);
>>>  int sanitize_rc6_option(struct drm_i915_private *dev_priv, int enable_rc6);
>>> diff --git a/drivers/gpu/drm/i915/intel_pm.c
>>> b/drivers/gpu/drm/i915/intel_pm.c
>>> index b4cf988..fed2bae8 100644
>>> --- a/drivers/gpu/drm/i915/intel_pm.c
>>> +++ b/drivers/gpu/drm/i915/intel_pm.c
>>> @@ -2860,6 +2860,7 @@ bool ilk_disable_lp_wm(struct drm_device *dev)
>>>  
>>>  #define SKL_DDB_SIZE		896	/* in blocks */
>>>  #define BXT_DDB_SIZE		512
>>> +#define SKL_SAGV_BLOCK_TIME 30 /* µs */
>>>  
>>>  /*
>>>   * Return the index of a plane in the SKL DDB and wm result
>>> arrays.  Primary
>>> @@ -2883,6 +2884,153 @@ skl_wm_plane_id(const struct intel_plane *plane)
>>>  	}
>>>  }
>>>  
>>> +/*
>>> + * SAGV dynamically adjusts the system agent voltage and clock frequencies
>>> + * depending on power and performance requirements. The display engine
>>> access
>>> + * to system memory is blocked during the adjustment time. Because of the
>>> + * blocking time, having this enabled can cause full system hangs and/or
>>> pipe
>>> + * underruns if we don't meet all of the following requirements:
>>> + *
>>> + *  - <= 1 pipe enabled
>>> + *  - All planes can enable watermarks for latencies >= SAGV engine block
>>> time
>>> + *  - We're not using an interlaced display configuration
>>> + */
>>> +int
>>> +skl_enable_sagv(struct drm_i915_private *dev_priv)
>>> +{
>>> +	int ret;
>>> +
>>> +	if (dev_priv->skl_sagv_status == I915_SKL_SAGV_NOT_CONTROLLED ||
>>> +	    dev_priv->skl_sagv_status == I915_SKL_SAGV_ENABLED)
>>> +		return 0;
>>> +
>>> +	DRM_DEBUG_KMS("Enabling the SAGV\n");
>>> +	mutex_lock(&dev_priv->rps.hw_lock);
>>> +
>>> +	ret = sandybridge_pcode_write(dev_priv, GEN9_PCODE_SAGV_CONTROL,
>>> +				      GEN9_SAGV_ENABLE);
>>> +
>>> +	/* We don't need to wait for the SAGV when enabling */
>>> +	mutex_unlock(&dev_priv->rps.hw_lock);
>>> +
>>> +	/*
>>> +	 * Some skl systems, pre-release machines in particular,
>>> +	 * don't actually have an SAGV.
>>> +	 */
>>> +	if (ret == -ENOSYS) {
>>> +		DRM_DEBUG_DRIVER("No SAGV found on system, ignoring\n");
>>> +		dev_priv->skl_sagv_status = I915_SKL_SAGV_NOT_CONTROLLED;
>>> +		return 0;
>>> +	} else if (ret < 0) {
>>> +		DRM_ERROR("Failed to enable the SAGV\n");
>>> +		return ret;
>>> +	}
>>> +
>>> +	dev_priv->skl_sagv_status = I915_SKL_SAGV_ENABLED;
>>> +	return 0;
>>> +}
>>> +
>>> +static int
>>> +skl_do_sagv_disable(struct drm_i915_private *dev_priv)
>>> +{
>>> +	int ret;
>>> +	uint32_t temp = GEN9_SAGV_DISABLE;
>>> +
>>> +	ret = sandybridge_pcode_read(dev_priv, GEN9_PCODE_SAGV_CONTROL,
>>> +				     &temp);
>>> +	if (ret)
>>> +		return ret;
>>> +	else
>>> +		return temp & GEN9_SAGV_IS_DISABLED;
>>> +}
>>> +
>>> +int
>>> +skl_disable_sagv(struct drm_i915_private *dev_priv)
>>> +{
>>> +	int ret, result;
>>> +
>>> +	if (dev_priv->skl_sagv_status == I915_SKL_SAGV_NOT_CONTROLLED ||
>>> +	    dev_priv->skl_sagv_status == I915_SKL_SAGV_DISABLED)
>>> +		return 0;
>>> +
>>> +	DRM_DEBUG_KMS("Disabling the SAGV\n");
>>> +	mutex_lock(&dev_priv->rps.hw_lock);
>>> +
>>> +	/* bspec says to keep retrying for at least 1 ms */
>>> +	ret = wait_for(result = skl_do_sagv_disable(dev_priv), 1);
>>> +	mutex_unlock(&dev_priv->rps.hw_lock);
>>> +
>>> +	if (ret == -ETIMEDOUT) {
>>> +		DRM_ERROR("Request to disable SAGV timed out\n");
>>> +		return -ETIMEDOUT;
>>> +	}
>>> +
>>> +	/*
>>> +	 * Some skl systems, pre-release machines in particular,
>>> +	 * don't actually have an SAGV.
>>> +	 */
>>> +	if (result == -ENOSYS) {
>>> +		DRM_DEBUG_DRIVER("No SAGV found on system, ignoring\n");
>>> +		dev_priv->skl_sagv_status = I915_SKL_SAGV_NOT_CONTROLLED;
>>> +		return 0;
>>> +	} else if (result < 0) {
>>> +		DRM_ERROR("Failed to disable the SAGV\n");
>>> +		return result;
>>> +	}
>>> +
>>> +	dev_priv->skl_sagv_status = I915_SKL_SAGV_DISABLED;
>>> +	return 0;
>>> +}
>>> +
>>> +bool skl_can_enable_sagv(struct drm_atomic_state *state)
>>> +{
>>> +	struct drm_device *dev = state->dev;
>>> +	struct drm_i915_private *dev_priv = to_i915(dev);
>>> +	struct intel_atomic_state *intel_state =
>>> to_intel_atomic_state(state);
>>> +	struct drm_crtc *crtc;
>>> +	enum pipe pipe;
>>> +	int level, plane;
>>> +
>>> +	/*
>>> +	 * SKL workaround: bspec recommends we disable the SAGV when we
>>> have
>>> +	 * more then one pipe enabled
>>> +	 *
>>> +	 * If there are no active CRTCs, no additional checks need be
>>> performed
>>> +	 */
>>> +	if (hweight32(intel_state->active_crtcs) == 0)
>>> +		return true;
>>> +	else if (hweight32(intel_state->active_crtcs) > 1)
>>> +		return false;
>>> +
>>> +	/* Since we're now guaranteed to only have one active CRTC... */
>>> +	pipe = ffs(intel_state->active_crtcs) - 1;
>>> +	crtc = dev_priv->pipe_to_crtc_mapping[pipe];
>>> +
>>> +	if (crtc->state->mode.flags & DRM_MODE_FLAG_INTERLACE)
>>> +		return false;
>>> +
>>> +	for_each_plane(dev_priv, pipe, plane) {
>>> +		/* Skip this plane if it's not enabled */
>>> +		if (intel_state->wm_results.plane[pipe][plane][0] == 0)
>>> +			continue;
>>> +
>>> +		/* Find the highest enabled wm level for this plane */
>>> +		for (level = ilk_wm_max_level(dev);
>>> +		     intel_state->wm_results.plane[pipe][plane][level] ==
>>> 0;
>>> +		     --level);
>>> +
>>> +		/*
>>> +		 * If any of the planes on this pipe don't enable wm levels
>>> +		 * that incur memory latencies higher then 30µs we can't
>>> enable
>>> +		 * the SAGV
>>> +		 */
>>> +		if (dev_priv->wm.skl_latency[level] < SKL_SAGV_BLOCK_TIME)
>>> +			return false;
>> Shouldn't this check be >= BLOCK_TIME?
>>
> That's the requirement for the sagv but the conditional here is still correct.
>
> WM0 - 2ms
> WM1 - 5ms
> WM2 - 10ms
> WM3 - 20ms
> WM4+- disabled
>
> (20ms < BLOCK_TIME) == true, which indicates we don't have any watermark levels
> with latency values >= 30ms. We can't enable so return false.
>
> WM0 - 2ms
> WM1 - 5ms
> WM2 - 10ms
> WM3 - 20ms
> WM4 - 33ms
> WM5 - 50ms
> WM6 - 70ms
> WM7 - 99ms
>
> (99ms < BLOCK_TIME) == false, and 99 >= BLOCK_TIME so we end up returning true
> to indicate it's safe to enable.
Ah right, thanks for explaining.

~Maarten

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ