lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 17 May 2022 16:09:36 -0700
From:   "T.J. Mercier" <tjmercier@...gle.com>
To:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc:     Christian König <christian.koenig@....com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Kalesh Singh <kaleshsingh@...gle.com>,
        Minchan Kim <minchan@...gle.com>,
        Greg Kroah-Hartman <gregkh@...gle.com>,
        John Stultz <jstultz@...gle.com>,
        Sumit Semwal <sumit.semwal@...aro.org>,
        Daniel Vetter <daniel.vetter@...ll.ch>,
        Hridya Valsaraju <hridya@...gle.com>, kernel-team@...roid.com,
        linux-media@...r.kernel.org, dri-devel@...ts.freedesktop.org,
        linaro-mm-sig@...ts.linaro.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] dma-buf: Move sysfs work out of DMA-BUF export path

On Mon, May 16, 2022 at 11:13 PM Greg Kroah-Hartman
<gregkh@...uxfoundation.org> wrote:
>
> On Mon, May 16, 2022 at 05:08:05PM -0700, T.J. Mercier wrote:
> > On Mon, May 16, 2022 at 12:21 PM Christian König
> > <christian.koenig@....com> wrote:
> > >
> > > Am 16.05.22 um 20:08 schrieb T.J. Mercier:
> > > > On Mon, May 16, 2022 at 10:20 AM Christian König
> > > > <christian.koenig@....com> wrote:
> > > >> Am 16.05.22 um 19:13 schrieb T.J. Mercier:
> > > >>> Recently, we noticed an issue where a process went into direct reclaim
> > > >>> while holding the kernfs rw semaphore for sysfs in write (exclusive)
> > > >>> mode. This caused processes who were doing DMA-BUF exports and releases
> > > >>> to go into uninterruptible sleep since they needed to acquire the same
> > > >>> semaphore for the DMA-BUF sysfs entry creation/deletion. In order to avoid
> > > >>> blocking DMA-BUF export for an indeterminate amount of time while
> > > >>> another process is holding the sysfs rw semaphore in exclusive mode,
> > > >>> this patch moves the per-buffer sysfs file creation to the default work
> > > >>> queue. Note that this can lead to a short-term inaccuracy in the dmabuf
> > > >>> sysfs statistics, but this is a tradeoff to prevent the hot path from
> > > >>> being blocked. A work_struct is added to dma_buf to achieve this, but as
> > > >>> it is unioned with the kobject in the sysfs_entry, dma_buf does not
> > > >>> increase in size.
> > > >> I'm still not very keen of this approach as it strongly feels like we
> > > >> are working around shortcoming somewhere else.
> > > >>
> > > > My read of the thread for the last version is that we're running into
> > > > a situation where sysfs is getting used for something it wasn't
> > > > originally intended for, but we're also stuck with this sysfs
> > > > functionality for dmabufs.
> > > >
> > > >>> Fixes: bdb8d06dfefd ("dmabuf: Add the capability to expose DMA-BUF stats in sysfs")
> > > >>> Originally-by: Hridya Valsaraju <hridya@...gle.com>
> > > >>> Signed-off-by: T.J. Mercier <tjmercier@...gle.com>
> > > >>>
> > > >>> ---
> > > >>> See the originally submitted patch by Hridya Valsaraju here:
> > > >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2022%2F1%2F4%2F1066&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C794614324d114880a25508da37672e4b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637883213566903705%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=bGlA2FeubfSeL5XDHYyWMZqUXfScoCphZjjK4jrqQJs%3D&amp;reserved=0
> > > >>>
> > > >>> v2 changes:
> > > >>> - Defer only sysfs creation instead of creation and teardown per
> > > >>> Christian König
> > > >>>
> > > >>> - Use a work queue instead of a kthread for deferred work per
> > > >>> Christian König
> > > >>> ---
> > > >>>    drivers/dma-buf/dma-buf-sysfs-stats.c | 56 ++++++++++++++++++++-------
> > > >>>    include/linux/dma-buf.h               | 14 ++++++-
> > > >>>    2 files changed, 54 insertions(+), 16 deletions(-)
> > > >>>
> > > >>> diff --git a/drivers/dma-buf/dma-buf-sysfs-stats.c b/drivers/dma-buf/dma-buf-sysfs-stats.c
> > > >>> index 2bba0babcb62..67b0a298291c 100644
> > > >>> --- a/drivers/dma-buf/dma-buf-sysfs-stats.c
> > > >>> +++ b/drivers/dma-buf/dma-buf-sysfs-stats.c
> > > >>> @@ -11,6 +11,7 @@
> > > >>>    #include <linux/printk.h>
> > > >>>    #include <linux/slab.h>
> > > >>>    #include <linux/sysfs.h>
> > > >>> +#include <linux/workqueue.h>
> > > >>>
> > > >>>    #include "dma-buf-sysfs-stats.h"
> > > >>>
> > > >>> @@ -168,10 +169,46 @@ void dma_buf_uninit_sysfs_statistics(void)
> > > >>>        kset_unregister(dma_buf_stats_kset);
> > > >>>    }
> > > >>>
> > > >>> +static void sysfs_add_workfn(struct work_struct *work)
> > > >>> +{
> > > >>> +     struct dma_buf_sysfs_entry *sysfs_entry =
> > > >>> +             container_of(work, struct dma_buf_sysfs_entry, sysfs_add_work);
> > > >>> +     struct dma_buf *dmabuf = sysfs_entry->dmabuf;
> > > >>> +
> > > >>> +     /*
> > > >>> +      * A dmabuf is ref-counted via its file member. If this handler holds the only
> > > >>> +      * reference to the dmabuf, there is no need for sysfs kobject creation. This is an
> > > >>> +      * optimization and a race; when the reference count drops to 1 immediately after
> > > >>> +      * this check it is not harmful as the sysfs entry will still get cleaned up in
> > > >>> +      * dma_buf_stats_teardown, which won't get called until the final dmabuf reference
> > > >>> +      * is released, and that can't happen until the end of this function.
> > > >>> +      */
> > > >>> +     if (file_count(dmabuf->file) > 1) {
> > > >> Please completely drop that. I see absolutely no justification for this
> > > >> additional complexity.
> > > >>
> > > > This case gets hit around 5% of the time in my testing so the else is
> > > > not a completely unused branch.
> > >
> > > Well I can only repeat myself: This means that your userspace is
> > > severely broken!
> > >
> > > DMA-buf are meant to be long living objects
> > This patch addresses export *latency* regardless of how long-lived the
> > object is. Even a single, long-lived export will benefit from this
> > change if it would otherwise be blocked on adding an object to sysfs.
> > I think attempting to improve this latency still has merit.
>
> Fixing the latency is nice, but as it's just pushing the needed work off
> to another code path, it will take longer overall for the sysfs stuff to
> be ready for userspace to see.
>
> Perhaps we need to step back and understand what this code is supposed
> to be doing.  As I recall, it was created because some systems do not
> allow debugfs anymore, and they wanted the debugging information that
> the dmabuf code was exposing to debugfs on a "normal" system.  Moving
> that logic to sysfs made sense, but now I am wondering why we didn't see
> these issues in the debugfs code previously?
>
The debugfs stuff doesn't happen on every export, right?

> Perhaps we should go just one step further and make a misc device node
> for dmabug debugging information to be in and just have userspace
> poll/read on the device node and we spit the info that used to be in
> debugfs out through that?  That way this only affects systems when they
> want to read the information and not normal code paths?  Yeah that's a
> hack, but this whole thing feels overly complex now.
>

And deprecate sysfs support? I'm happy to try out anything you think
might be a better way. As far as complexity of this patch, this
revision is a much simpler version of the one from Hridya you already
reviewed.

> thanks,
>
> greg k-h

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ