linux-kernel - Re: [RFC v3 5/8] dmabuf: Add gpu cgroup charge transfer function

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CABdmKX3Un=k3yU1BuCnEEoZkOqMovVrjcg=GiqDEtLZD_awX3g@mail.gmail.com>
Date:   Wed, 23 Mar 2022 16:37:08 -0700
From:   "T.J. Mercier" <tjmercier@...gle.com>
To:     Michal Koutný <mkoutny@...e.com>
Cc:     Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
        Maxime Ripard <mripard@...nel.org>,
        Thomas Zimmermann <tzimmermann@...e.de>,
        David Airlie <airlied@...ux.ie>,
        Daniel Vetter <daniel@...ll.ch>,
        Jonathan Corbet <corbet@....net>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Arve Hjønnevåg <arve@...roid.com>,
        Todd Kjos <tkjos@...roid.com>,
        Martijn Coenen <maco@...roid.com>,
        Joel Fernandes <joel@...lfernandes.org>,
        Christian Brauner <brauner@...nel.org>,
        Hridya Valsaraju <hridya@...gle.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Sumit Semwal <sumit.semwal@...aro.org>,
        Christian König <christian.koenig@....com>,
        Benjamin Gaignard <benjamin.gaignard@...aro.org>,
        Liam Mark <lmark@...eaurora.org>,
        Laura Abbott <labbott@...hat.com>,
        Brian Starkey <Brian.Starkey@....com>,
        John Stultz <john.stultz@...aro.org>,
        Tejun Heo <tj@...nel.org>, Zefan Li <lizefan.x@...edance.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Shuah Khan <shuah@...nel.org>,
        Kalesh Singh <kaleshsingh@...gle.com>, Kenny.Ho@....com,
        dri-devel@...ts.freedesktop.org, linux-doc@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-media@...r.kernel.org,
        linaro-mm-sig@...ts.linaro.org, cgroups@...r.kernel.org,
        "Subject: Re: [RFC v3 5/8] dmabuf: Add gpu cgroup charge transfer
        function Reply-To: In-Reply-To:" 
        <CABdmKX3+mTjxWzgrv44SKWT7mdGnQKMrv6c26d=iWdNPG7f1VQ@...l.gmail.com>
Subject: Re: [RFC v3 5/8] dmabuf: Add gpu cgroup charge transfer function

On Tue, Mar 22, 2022 at 9:47 AM T.J. Mercier <tjmercier@...gle.com> wrote:
>
> On Tue, Mar 22, 2022 at 2:52 AM Michal Koutný <mkoutny@...e.com> wrote:
> >
> > On Mon, Mar 21, 2022 at 04:54:26PM -0700, "T.J. Mercier"
> > <tjmercier@...gle.com> wrote:
> > > Since the charge is duplicated in two cgroups for a short period
> > > before it is uncharged from the source cgroup I guess the situation
> > > you're thinking about is a global (or common ancestor) limit?
> >
> > The common ancestor was on my mind (after the self-shortcut).
> >
> > > I can see how that would be a problem for transfers done this way and
> > > an alternative would be to swap the order of the charge operations:
> > > first uncharge, then try_charge. To be certain the uncharge is
> > > reversible if the try_charge fails, I think I'd need either a mutex
> > > used at all gpucg_*charge call sites or access to the gpucg_mutex,
> >
> > Yes, that'd provide safe conditions for such operations, although I'm
> > not sure these special types of memory can afford global lock on their
> > fast paths.
>
> I have a benchmark I think is suitable, so let me try this change to
> the transfer implementation and see how it compares.

I added a mutex to struct gpucg which is locked when charging the
cgroup initially during allocation, and also only for the source
cgroup during dma_buf_charge_transfer. Then I used a multithreaded
benchmark where each thread allocates 4, 8, 16, or 32 DMA buffers and
then sends them through Binder to another process with charge transfer
enabled. This was intended to generate contention for the mutex in
dma_buf_charge_transfer. The results of this benchmark show that the
difference between a mutex protected charge transfer and an
unprotected charge transfer is within measurement noise. The worst
data point shows about 3% overheard for the mutex.

So I'll prep this change for the next revision. Thanks for pointing it out.
>
> >
> > > which implies adding transfer support to gpu.c as part of the gpucg_*
> > > API itself and calling it here. Am I following correctly here?
> >
> > My idea was to provide a special API (apart from
> > gpucp_{try_charge,uncharge}) to facilitate transfers...
> >
> > > This series doesn't actually add limit support just accounting, but
> > > I'd like to get it right here.
> >
> > ...which could be implemented (or changed) depending on how the charging
> > is realized internally.
> >
> >
> > Michal