lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150522115805.GR20750@edamame.cdg.redhat.com>
Date:	Fri, 22 May 2015 13:58:05 +0200
From:	Christophe Fergeau <cfergeau@...hat.com>
To:	Frediano Ziglio <fziglio@...hat.com>
Cc:	spice-devel@...ts.freedesktop.org, David Airlie <airlied@...ux.ie>,
	dri-devel@...ts.freedesktop.org, Dave Airlie <airlied@...hat.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [Spice-devel] [PATCH] Do not loop on ERESTARTSYS using
 interruptible waits

Hey,

On Tue, May 19, 2015 at 05:54:54AM -0400, Frediano Ziglio wrote:
> This problem happens using KMS surfaces and QXL driver.
> To easy reproduce use KDE Plasma (which use surfaces a lot) and assure
> you are using KMS surfaces (QXL driver on Fedora/RedHat has a patch to
> stop using them). Open some complex application like LibreOffice and
> after a while your machine get stuck using 100% CPU on Xorg.
> The problem occurs as creating new surfaces not interruptible wait
> are used however instead of returning ERESTARTSYS back to userspace
> you try to loop but wait routines always keep returning ERESTARTSYS
> once the signal is marked.
> On out of memory conditions TTM module try to move objects to system
> memory and QXL assure surface is updated before the move.
> The fix handle differently this case using no interruptible wait so
> wait functions will wait instead of returning ERESTARTSYS.
> Note the when the loop occurs driver will send a lot of update requests
> causing more CPU usage on Qemu side too.
> 
> Signed-off-by: Frediano Ziglio <fziglio@...hat.com>
> ---
>  qxl/qxl_cmd.c   | 12 +++---------
>  qxl/qxl_drv.h   |  2 +-
>  qxl/qxl_ioctl.c |  2 +-
>  3 files changed, 5 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drivers/gpu/drm/qxl/qxl_cmd.c b/qxl/qxl_cmd.c
> index 9782364..bd5404e 100644
> --- a/drivers/gpu/drm/qxl/qxl_cmd.c
> +++ b/drivers/gpu/drm/qxl/qxl_cmd.c
> @@ -317,14 +317,11 @@ static void wait_for_io_cmd(struct qxl_device *qdev, uint8_t val, long port)
>  {
>  	int ret;
>  
> -restart:
>  	ret = wait_for_io_cmd_user(qdev, val, port, false);
> -	if (ret == -ERESTARTSYS)
> -		goto restart;

I think this one is not directly related to the fix, but can be removed
because wait_for_io_cmd_user(qdev, val, port, false); will call
wait_event_timeout() which cannot return ERESTARTSYS? Or was this loop
causing issues too?

>  }
>  
>  int qxl_io_update_area(struct qxl_device *qdev, struct qxl_bo *surf,
> -			const struct qxl_rect *area)
> +			const struct qxl_rect *area, bool intr)
>  {
>  	int surface_id;
>  	uint32_t surface_width, surface_height;
> @@ -350,7 +347,7 @@ int qxl_io_update_area(struct qxl_device *qdev, struct qxl_bo *surf,
>  	mutex_lock(&qdev->update_area_mutex);
>  	qdev->ram_header->update_area = *area;
>  	qdev->ram_header->update_surface = surface_id;
> -	ret = wait_for_io_cmd_user(qdev, 0, QXL_IO_UPDATE_AREA_ASYNC, true);
> +	ret = wait_for_io_cmd_user(qdev, 0, QXL_IO_UPDATE_AREA_ASYNC, intr);
>  	mutex_unlock(&qdev->update_area_mutex);
>  	return ret;
>  }
> @@ -588,10 +585,7 @@ int qxl_update_surface(struct qxl_device *qdev, struct qxl_bo *surf)
>  	rect.right = surf->surf.width;
>  	rect.top = 0;
>  	rect.bottom = surf->surf.height;
> -retry:
> -	ret = qxl_io_update_area(qdev, surf, &rect);
> -	if (ret == -ERESTARTSYS)
> -		goto retry;
> +	ret = qxl_io_update_area(qdev, surf, &rect, false);

My understanding is that the fix is this hunk? If so, this could be made
more obvious with an intermediate commit adding the 'bool intr' arg to
qxl_io_update_area and only calling it with 'true' in the appropriate
places.
This code path is only triggered from qxl_surface_evict() which I assume
is not necessarily easily interruptible, so this change makes sense to
me. However it would be much better to get a review from Dave Airlie ;)

Christophe

Content of type "application/pgp-signature" skipped

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ