lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEXW_YSODXRfgkR0D2G-x=0uZdsqvF3kZL+LL3DcRX-5CULJ1Q@mail.gmail.com>
Date:   Thu, 13 Jul 2023 10:07:10 -0400
From:   Joel Fernandes <joel@...lfernandes.org>
To:     Gao Xiang <hsiangkao@...ux.alibaba.com>
Cc:     paulmck@...nel.org, Sandeep Dhavale <dhavale@...gle.com>,
        Frederic Weisbecker <frederic@...nel.org>,
        Neeraj Upadhyay <quic_neeraju@...cinc.com>,
        Josh Triplett <josh@...htriplett.org>,
        Boqun Feng <boqun.feng@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Zqiang <qiang.zhang1211@...il.com>,
        Matthias Brugger <matthias.bgg@...il.com>,
        AngeloGioacchino Del Regno 
        <angelogioacchino.delregno@...labora.com>,
        linux-erofs@...ts.ozlabs.org, xiang@...nel.org,
        Will Shiu <Will.Shiu@...iatek.com>, kernel-team@...roid.com,
        rcu@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org,
        linux-mediatek@...ts.infradead.org
Subject: Re: [PATCH v1] rcu: Fix and improve RCU read lock checks when !CONFIG_DEBUG_LOCK_ALLOC

On Thu, Jul 13, 2023 at 12:59 AM Gao Xiang <hsiangkao@...ux.alibaba.com> wrote:
> On 2023/7/13 12:52, Paul E. McKenney wrote:
> > On Thu, Jul 13, 2023 at 12:41:09PM +0800, Gao Xiang wrote:
> >>
> >>
>
> ...
>
> >>
> >> There are lots of performance issues here and even a plumber
> >> topic last year to show that, see:
> >>
> >> [1] https://lore.kernel.org/r/20230519001709.2563-1-tj@kernel.org
> >> [2] https://lore.kernel.org/r/CAHk-=wgE9kORADrDJ4nEsHHLirqPCZ1tGaEPAZejHdZ03qCOGg@mail.gmail.com
> >> [3] https://lore.kernel.org/r/CAB=BE-SBtO6vcoyLNA9F-9VaN5R0t3o_Zn+FW8GbO6wyUqFneQ@mail.gmail.com
> >> [4] https://lpc.events/event/16/contributions/1338/
> >> and more.
> >>
> >> I'm not sure if it's necessary to look info all of that,
> >> andSandeep knows more than I am (the scheduling issue
> >> becomes vital on some aarch64 platform.)
> >
> > Hmmm...  Please let me try again.
> >
> > Assuming that this approach turns out to make sense, the resulting
> > patch will need to clearly state the performance benefits directly in
> > the commit log.
> >
> > And of course, for the approach to make sense, it must avoid breaking
> > the existing lockdep-RCU debugging code.
> >
> > Is that more clear?
>
> Personally I'm not working on Android platform any more so I don't
> have a way to reproduce, hopefully Sandeep could give actually
> number _again_ if dm-verity is enabled and trigger another
> workqueue here and make a comparsion why the scheduling latency of
> the extra work becomes unacceptable.
>

Question from my side, are we talking about only performance issues or
also a crash? It appears z_erofs_decompress_pcluster() takes
mutex_lock(&pcl->lock);

So if it is either in an RCU read-side critical section or in an
atomic section, like the softirq path, then it may
schedule-while-atomic or trigger RCU warnings.

z_erofs_decompressqueue_endio
-> z_erofs_decompress_kickoff
 ->z_erofs_decompressqueue_work
  ->z_erofs_decompress_queue
   -> z_erofs_decompress_pcluster
    -> mutex_lock

Per Sandeep in [1], this stack happens under RCU read-lock in:

#define __blk_mq_run_dispatch_ops(q, check_sleep, dispatch_ops) \
[...]
                rcu_read_lock();
                (dispatch_ops);
                rcu_read_unlock();
[...]

Coming from:
blk_mq_flush_plug_list ->
                           blk_mq_run_dispatch_ops(q,
                                __blk_mq_flush_plug_list(q, plug));

and __blk_mq_flush_plug_list does this:
          q->mq_ops->queue_rqs(&plug->mq_list);

This somehow ends up calling the bio_endio and the
z_erofs_decompressqueue_endio which grabs the mutex.

So... I have a question, it looks like one of the paths in
__blk_mq_run_dispatch_ops() uses SRCU.  Where are as the alternate
path uses RCU. Why does this alternate want to block even if it is not
supposed to? Is the real issue here that the BLK_MQ_F_BLOCKING should
be set? It sounds like you want to block in the "else" path even
though BLK_MQ_F_BLOCKING is not set:

/* run the code block in @dispatch_ops with rcu/srcu read lock held */
#define __blk_mq_run_dispatch_ops(q, check_sleep, dispatch_ops) \
do {                                                            \
        if ((q)->tag_set->flags & BLK_MQ_F_BLOCKING) {          \
                struct blk_mq_tag_set *__tag_set = (q)->tag_set; \
                int srcu_idx;                                   \
                                                                \
                might_sleep_if(check_sleep);                    \
                srcu_idx = srcu_read_lock(__tag_set->srcu);     \
                (dispatch_ops);                                 \
                srcu_read_unlock(__tag_set->srcu, srcu_idx);    \
        } else {                                                \
                rcu_read_lock();                                \
                (dispatch_ops);                                 \
                rcu_read_unlock();                              \
        }                                                       \
} while (0);

It seems quite incorrect to me that erofs wants to block even though
clearly BLK_MQ_F_BLOCKING is not set. Did I miss something? I believe
it may be more beneficial to correct this properly now, rather than
maintaining the appearance of non-blocking operations and potentially
having to manage atomicity detection with preempt counters at a later
stage.

thanks,

 - Joel

[1] https://lore.kernel.org/all/CAB=BE-QV0PiKFpCOcdEUFxS4wJHsLCcsymAw+nP72Yr3b1WMng@mail.gmail.com/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ