linux-kernel - Re: [PATCH v12 2/7] drm/i915/skl: Add support for the SAGV, fix underrun hangs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b0bc1bec-7ec2-4575-f83e-119bd240f3eb@linux.intel.com>
Date:	Thu, 18 Aug 2016 09:39:03 +0200
From:	Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>
To:	Lyude <cpaul@...hat.com>, intel-gfx@...ts.freedesktop.org,
	Ville Syrjälä <ville.syrjala@...ux.intel.com>,
	Matt Roper <matthew.d.roper@...el.com>
Cc:	Daniel Vetter <daniel.vetter@...ll.ch>, stable@...r.kernel.org,
	Daniel Vetter <daniel.vetter@...el.com>,
	Jani Nikula <jani.nikula@...ux.intel.com>,
	David Airlie <airlied@...ux.ie>,
	dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v12 2/7] drm/i915/skl: Add support for the SAGV, fix
 underrun hangs

Hey,

Op 17-08-16 om 21:55 schreef Lyude:
> Since the watermark calculations for Skylake are still broken, we're apt
> to hitting underruns very easily under multi-monitor configurations.
> While it would be lovely if this was fixed, it's not. Another problem
> that's been coming from this however, is the mysterious issue of
> underruns causing full system hangs. An easy way to reproduce this with
> a skylake system:
>
> - Get a laptop with a skylake GPU, and hook up two external monitors to
>   it
> - Move the cursor from the built-in LCD to one of the external displays
>   as quickly as you can
> - You'll get a few pipe underruns, and eventually the entire system will
>   just freeze.
>
> After doing a lot of investigation and reading through the bspec, I
> found the existence of the SAGV, which is responsible for adjusting the
> system agent voltage and clock frequencies depending on how much power
> we need. According to the bspec:
>
> "The display engine access to system memory is blocked during the
>  adjustment time. SAGV defaults to enabled. Software must use the
>  GT-driver pcode mailbox to disable SAGV when the display engine is not
>  able to tolerate the blocking time."
>
> The rest of the bspec goes on to explain that software can simply leave
> the SAGV enabled, and disable it when we use interlaced pipes/have more
> then one pipe active.
>
> Sure enough, with this patchset the system hangs resulting from pipe
> underruns on Skylake have completely vanished on my T460s. Additionally,
> the bspec mentions turning off the SAGV	with more then one pipe enabled
> as a workaround for display underruns. While this patch doesn't entirely
> fix that, it looks like it does improve the situation a little bit so
> it's likely this is going to be required to make watermarks on Skylake
> fully functional.
>
> This will still need additional work in the future: we shouldn't be
> enabling the SAGV if any of the currently enabled planes can't enable WM
> levels that introduce latencies >= 30 µs.
>
> Changes since v11:
>  - Add skl_can_enable_sagv()
>  - Make sure we don't enable SAGV when not all planes can enable
>    watermarks >= the SAGV engine block time. I was originally going to
>    save this for later, but I recently managed to run into a machine
>    that was having problems with a single pipe configuration + SAGV.
>  - Make comparisons to I915_SKL_SAGV_NOT_CONTROLLED explicit
>  - Change I915_SAGV_DYNAMIC_FREQ to I915_SAGV_ENABLE
>  - Move printks outside of mutexes
>  - Don't print error messages twice
> Changes since v10:
>  - Apparently sandybridge_pcode_read actually writes values and reads
>    them back, despite it's misleading function name. This means we've
>    been doing this mostly wrong and have been writing garbage to the
>    SAGV control. Because of this, we no longer attempt to read the SAGV
>    status during initialization (since there are no helpers for this).
>  - mlankhorst noticed that this patch was breaking on some very early
>    pre-release Skylake machines, which apparently don't allow you to
>    disable the SAGV. To prevent machines from failing tests due to SAGV
>    errors, if the first time we try to control the SAGV results in the
>    mailbox indicating an invalid command, we just disable future attempts
>    to control the SAGV state by setting dev_priv->skl_sagv_status to
>    I915_SKL_SAGV_NOT_CONTROLLED and make a note of it in dmesg.
>  - Move mutex_unlock() a little higher in skl_enable_sagv(). This
>    doesn't actually fix anything, but lets us release the lock a little
>    sooner since we're finished with it.
> Changes since v9:
>  - Only enable/disable sagv on Skylake
> Changes since v8:
>  - Add intel_state->modeset guard to the conditional for
>    skl_enable_sagv()
> Changes since v7:
>  - Remove GEN9_SAGV_LOW_FREQ, replace with GEN9_SAGV_IS_ENABLED (that's
>    all we use it for anyway)
>  - Use GEN9_SAGV_IS_ENABLED instead of 0x1 for clarification
>  - Fix a styling error that snuck past me
> Changes since v6:
>  - Protect skl_enable_sagv() with intel_state->modeset conditional in
>    intel_atomic_commit_tail()
> Changes since v5:
>  - Don't use is_power_of_2. Makes things confusing
>  - Don't use the old state to figure out whether or not to
>    enable/disable the sagv, use the new one
>  - Split the loop in skl_disable_sagv into it's own function
>  - Move skl_sagv_enable/disable() calls into intel_atomic_commit_tail()
> Changes since v4:
>  - Use is_power_of_2 against active_crtcs to check whether we have > 1
>    pipe enabled
>  - Fix skl_sagv_get_hw_state(): (temp & 0x1) indicates disabled, 0x0
>    enabled
>  - Call skl_sagv_enable/disable() from pre/post-plane updates
> Changes since v3:
>  - Use time_before() to compare timeout to jiffies
> Changes since v2:
>  - Really apply minor style nitpicks to patch this time
> Changes since v1:
>  - Added comments about this probably being one of the requirements to
>    fixing Skylake's watermark issues
>  - Minor style nitpicks from Matt Roper
>  - Disable these functions on Broxton, since it doesn't have an SAGV
>
> Signed-off-by: Lyude <cpaul@...hat.com>
> Cc: Matt Roper <matthew.d.roper@...el.com>
> Cc: Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>
> Cc: Daniel Vetter <daniel.vetter@...ll.ch>
> Cc: Ville Syrjälä <ville.syrjala@...ux.intel.com>
> Cc: stable@...r.kernel.org
> ---
>  drivers/gpu/drm/i915/i915_drv.h      |   7 ++
>  drivers/gpu/drm/i915/i915_reg.h      |   4 +
>  drivers/gpu/drm/i915/intel_display.c |  11 +++
>  drivers/gpu/drm/i915/intel_drv.h     |   3 +
>  drivers/gpu/drm/i915/intel_pm.c      | 148 +++++++++++++++++++++++++++++++++++
>  5 files changed, 173 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 35caa9b..f20530bb 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1949,6 +1949,13 @@ struct drm_i915_private {
>  	struct i915_suspend_saved_registers regfile;
>  	struct vlv_s0ix_state vlv_s0ix_state;
>  
> +	enum {
> +		I915_SKL_SAGV_UNKNOWN = 0,
> +		I915_SKL_SAGV_DISABLED,
> +		I915_SKL_SAGV_ENABLED,
> +		I915_SKL_SAGV_NOT_CONTROLLED
> +	} skl_sagv_status;
> +
>  	struct {
>  		/*
>  		 * Raw watermark latency values:
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 7419fbf..be82c49 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -7153,6 +7153,10 @@ enum {
>  #define   HSW_PCODE_DE_WRITE_FREQ_REQ		0x17
>  #define   DISPLAY_IPS_CONTROL			0x19
>  #define	  HSW_PCODE_DYNAMIC_DUTY_CYCLE_CONTROL	0x1A
> +#define   GEN9_PCODE_SAGV_CONTROL		0x21
> +#define     GEN9_SAGV_DISABLE			0x0
> +#define     GEN9_SAGV_IS_DISABLED		0x1
> +#define     GEN9_SAGV_ENABLE 	             0x3
>  #define GEN6_PCODE_DATA				_MMIO(0x138128)
>  #define   GEN6_PCODE_FREQ_IA_RATIO_SHIFT	8
>  #define   GEN6_PCODE_FREQ_RING_RATIO_SHIFT	16
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 781d22e..ca4b83f 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -14156,6 +14156,13 @@ static void intel_atomic_commit_tail(struct drm_atomic_state *state)
>  		     intel_state->cdclk_pll_vco != dev_priv->cdclk_pll.vco))
>  			dev_priv->display.modeset_commit_cdclk(state);
>  
> +		/*
> +		 * SKL workaround: bspec recommends we disable the SAGV when we
> +		 * have more then one pipe enabled
> +		 */
> +		if (IS_SKYLAKE(dev_priv) && !skl_can_enable_sagv(state))
> +			skl_disable_sagv(dev_priv);
> +
>  		intel_modeset_verify_disabled(dev);
>  	}
>  
> @@ -14229,6 +14236,10 @@ static void intel_atomic_commit_tail(struct drm_atomic_state *state)
>  		intel_modeset_verify_crtc(crtc, old_crtc_state, crtc->state);
>  	}
>  
> +	if (IS_SKYLAKE(dev_priv) && intel_state->modeset &&
> +	    skl_can_enable_sagv(state))
> +		skl_enable_sagv(dev_priv);
> +
>  	drm_atomic_helper_commit_hw_done(state);
>  
>  	if (intel_state->modeset)
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index 1c700b0..d203c77 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -1722,6 +1722,9 @@ void ilk_wm_get_hw_state(struct drm_device *dev);
>  void skl_wm_get_hw_state(struct drm_device *dev);
>  void skl_ddb_get_hw_state(struct drm_i915_private *dev_priv,
>  			  struct skl_ddb_allocation *ddb /* out */);
> +bool skl_can_enable_sagv(struct drm_atomic_state *state);
> +int skl_enable_sagv(struct drm_i915_private *dev_priv);
> +int skl_disable_sagv(struct drm_i915_private *dev_priv);
>  uint32_t ilk_pipe_pixel_rate(const struct intel_crtc_state *pipe_config);
>  bool ilk_disable_lp_wm(struct drm_device *dev);
>  int sanitize_rc6_option(struct drm_i915_private *dev_priv, int enable_rc6);
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index b4cf988..fed2bae8 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -2860,6 +2860,7 @@ bool ilk_disable_lp_wm(struct drm_device *dev)
>  
>  #define SKL_DDB_SIZE		896	/* in blocks */
>  #define BXT_DDB_SIZE		512
> +#define SKL_SAGV_BLOCK_TIME 30 /* µs */
>  
>  /*
>   * Return the index of a plane in the SKL DDB and wm result arrays.  Primary
> @@ -2883,6 +2884,153 @@ skl_wm_plane_id(const struct intel_plane *plane)
>  	}
>  }
>  
> +/*
> + * SAGV dynamically adjusts the system agent voltage and clock frequencies
> + * depending on power and performance requirements. The display engine access
> + * to system memory is blocked during the adjustment time. Because of the
> + * blocking time, having this enabled can cause full system hangs and/or pipe
> + * underruns if we don't meet all of the following requirements:
> + *
> + *  - <= 1 pipe enabled
> + *  - All planes can enable watermarks for latencies >= SAGV engine block time
> + *  - We're not using an interlaced display configuration
> + */
> +int
> +skl_enable_sagv(struct drm_i915_private *dev_priv)
> +{
> +	int ret;
> +
> +	if (dev_priv->skl_sagv_status == I915_SKL_SAGV_NOT_CONTROLLED ||
> +	    dev_priv->skl_sagv_status == I915_SKL_SAGV_ENABLED)
> +		return 0;
> +
> +	DRM_DEBUG_KMS("Enabling the SAGV\n");
> +	mutex_lock(&dev_priv->rps.hw_lock);
> +
> +	ret = sandybridge_pcode_write(dev_priv, GEN9_PCODE_SAGV_CONTROL,
> +				      GEN9_SAGV_ENABLE);
> +
> +	/* We don't need to wait for the SAGV when enabling */
> +	mutex_unlock(&dev_priv->rps.hw_lock);
> +
> +	/*
> +	 * Some skl systems, pre-release machines in particular,
> +	 * don't actually have an SAGV.
> +	 */
> +	if (ret == -ENOSYS) {
> +		DRM_DEBUG_DRIVER("No SAGV found on system, ignoring\n");
> +		dev_priv->skl_sagv_status = I915_SKL_SAGV_NOT_CONTROLLED;
> +		return 0;
> +	} else if (ret < 0) {
> +		DRM_ERROR("Failed to enable the SAGV\n");
> +		return ret;
> +	}
> +
> +	dev_priv->skl_sagv_status = I915_SKL_SAGV_ENABLED;
> +	return 0;
> +}
> +
> +static int
> +skl_do_sagv_disable(struct drm_i915_private *dev_priv)
> +{
> +	int ret;
> +	uint32_t temp = GEN9_SAGV_DISABLE;
> +
> +	ret = sandybridge_pcode_read(dev_priv, GEN9_PCODE_SAGV_CONTROL,
> +				     &temp);
> +	if (ret)
> +		return ret;
> +	else
> +		return temp & GEN9_SAGV_IS_DISABLED;
> +}
> +
> +int
> +skl_disable_sagv(struct drm_i915_private *dev_priv)
> +{
> +	int ret, result;
> +
> +	if (dev_priv->skl_sagv_status == I915_SKL_SAGV_NOT_CONTROLLED ||
> +	    dev_priv->skl_sagv_status == I915_SKL_SAGV_DISABLED)
> +		return 0;
> +
> +	DRM_DEBUG_KMS("Disabling the SAGV\n");
> +	mutex_lock(&dev_priv->rps.hw_lock);
> +
> +	/* bspec says to keep retrying for at least 1 ms */
> +	ret = wait_for(result = skl_do_sagv_disable(dev_priv), 1);
> +	mutex_unlock(&dev_priv->rps.hw_lock);
> +
> +	if (ret == -ETIMEDOUT) {
> +		DRM_ERROR("Request to disable SAGV timed out\n");
> +		return -ETIMEDOUT;
> +	}
> +
> +	/*
> +	 * Some skl systems, pre-release machines in particular,
> +	 * don't actually have an SAGV.
> +	 */
> +	if (result == -ENOSYS) {
> +		DRM_DEBUG_DRIVER("No SAGV found on system, ignoring\n");
> +		dev_priv->skl_sagv_status = I915_SKL_SAGV_NOT_CONTROLLED;
> +		return 0;
> +	} else if (result < 0) {
> +		DRM_ERROR("Failed to disable the SAGV\n");
> +		return result;
> +	}
> +
> +	dev_priv->skl_sagv_status = I915_SKL_SAGV_DISABLED;
> +	return 0;
> +}
> +
> +bool skl_can_enable_sagv(struct drm_atomic_state *state)
> +{
> +	struct drm_device *dev = state->dev;
> +	struct drm_i915_private *dev_priv = to_i915(dev);
> +	struct intel_atomic_state *intel_state = to_intel_atomic_state(state);
> +	struct drm_crtc *crtc;
> +	enum pipe pipe;
> +	int level, plane;
> +
> +	/*
> +	 * SKL workaround: bspec recommends we disable the SAGV when we have
> +	 * more then one pipe enabled
> +	 *
> +	 * If there are no active CRTCs, no additional checks need be performed
> +	 */
> +	if (hweight32(intel_state->active_crtcs) == 0)
> +		return true;
> +	else if (hweight32(intel_state->active_crtcs) > 1)
> +		return false;
> +
> +	/* Since we're now guaranteed to only have one active CRTC... */
> +	pipe = ffs(intel_state->active_crtcs) - 1;
> +	crtc = dev_priv->pipe_to_crtc_mapping[pipe];
> +
> +	if (crtc->state->mode.flags & DRM_MODE_FLAG_INTERLACE)
> +		return false;
> +
> +	for_each_plane(dev_priv, pipe, plane) {
> +		/* Skip this plane if it's not enabled */
> +		if (intel_state->wm_results.plane[pipe][plane][0] == 0)
> +			continue;
> +
> +		/* Find the highest enabled wm level for this plane */
> +		for (level = ilk_wm_max_level(dev);
> +		     intel_state->wm_results.plane[pipe][plane][level] == 0;
> +		     --level);
> +
> +		/*
> +		 * If any of the planes on this pipe don't enable wm levels
> +		 * that incur memory latencies higher then 30µs we can't enable
> +		 * the SAGV
> +		 */
> +		if (dev_priv->wm.skl_latency[level] < SKL_SAGV_BLOCK_TIME)
> +			return false;
Shouldn't this check be >= BLOCK_TIME?

~Maarten