lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bdaf62020388d3dcf8d3a95ae465c0ae2db7eca1.camel@intel.com>
Date: Mon, 4 Mar 2024 14:29:03 +0000
From: "Souza, Jose" <jose.souza@...el.com>
To: "intel-xe@...ts.freedesktop.org" <intel-xe@...ts.freedesktop.org>,
	"johannes@...solutions.net" <johannes@...solutions.net>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC: "Vivi, Rodrigo" <rodrigo.vivi@...el.com>, "quic_mojha@...cinc.com"
	<quic_mojha@...cinc.com>, "Cavitt, Jonathan" <jonathan.cavitt@...el.com>
Subject: Re: [PATCH v2 2/4] devcoredump: Add dev_coredumpm_timeout()

On Fri, 2024-03-01 at 09:38 +0100, Johannes Berg wrote:
> > On Wed, 2024-02-28 at 17:56 +0000, Souza, Jose wrote:
> > > > 
> > > > In my opinion, the timeout should depend on the type of device driver.
> > > > 
> > > > In the case of server-class Ethernet cards, where corporate users automate most tasks, five minutes might even be considered excessive.
> > > > 
> > > > For our case, GPUs, users might experience minor glitches and only search for what happened after finishing their current task (writing an email,
> > > > ending a gaming match, watching a YouTube video, etc.).
> > > > If they land on https://drm.pages.freedesktop.org/intel-docs/how-to-file-i915-bugs.html or the future Xe version of that page, following the
> > > > instructions alone may take inexperienced Linux users more than five minutes.
> > 
> > That's all not wrong, but I don't see why you wouldn't automate this
> > even on end user machines? I feel you're boxing the problem in by
> > wanting to solve it entirely in the kernel?

The other part of the stack that we provide are the libraries implementing Vulkan and OpenGL APIs, I don't think we could ship scripts that needs
elevated privileges to read and store coredump.

> > 
> > > > I have set the timeout to one hour in the Xe driver, but this could increase if we start receiving user complaints.
> > 
> > At an hour now, people will probably start arguing that "indefinitely"
> > is about right? But at that point you're probably back to persisting
> > them on disk anyway? Or maybe glitches happen during logout/shutdown ...

i915 driver don't use coredump and it persist the error dump in memory until user frees it or reboot it and we got no complains.

> > 
> > Anyway, I don't want to block this because I just don't care enough
> > about how you do things, but I think the kernel is the wrong place to
> > solve this problem... The intent here was to give some userspace time to
> > grab it (and yes for that 5 minutes is already way too long), not the
> > users. That's also part of the reason we only hold on to a single
> > instance, since I didn't want it to keep consuming more and more memory
> > for it if happens repeatedly.
> > 

okay so will move forward with other version applying your suggestion to make dev_coredumpm() static inline and move to the header.

thank you for the feedback

> > johannes

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ