[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220304015650.GC6112@X58A-UD3R>
Date: Fri, 4 Mar 2022 10:56:51 +0900
From: Byungchul Park <byungchul.park@....com>
To: Jan Kara <jack@...e.cz>
Cc: torvalds@...ux-foundation.org, damien.lemoal@...nsource.wdc.com,
linux-ide@...r.kernel.org, adilger.kernel@...ger.ca,
linux-ext4@...r.kernel.org, mingo@...hat.com,
linux-kernel@...r.kernel.org, peterz@...radead.org,
will@...nel.org, tglx@...utronix.de, rostedt@...dmis.org,
joel@...lfernandes.org, sashal@...nel.org, daniel.vetter@...ll.ch,
chris@...is-wilson.co.uk, duyuyang@...il.com,
johannes.berg@...el.com, tj@...nel.org, tytso@....edu,
willy@...radead.org, david@...morbit.com, amir73il@...il.com,
bfields@...ldses.org, gregkh@...uxfoundation.org,
kernel-team@....com, linux-mm@...ck.org, akpm@...ux-foundation.org,
mhocko@...nel.org, minchan@...nel.org, hannes@...xchg.org,
vdavydov.dev@...il.com, sj@...nel.org, jglisse@...hat.com,
dennis@...nel.org, cl@...ux.com, penberg@...nel.org,
rientjes@...gle.com, vbabka@...e.cz, ngupta@...are.org,
linux-block@...r.kernel.org, axboe@...nel.dk,
paolo.valente@...aro.org, josef@...icpanda.com,
linux-fsdevel@...r.kernel.org, viro@...iv.linux.org.uk,
jack@...e.com, jlayton@...nel.org, dan.j.williams@...el.com,
hch@...radead.org, djwong@...nel.org,
dri-devel@...ts.freedesktop.org, airlied@...ux.ie,
rodrigosiqueiramelo@...il.com, melissa.srw@...il.com,
hamohammed.sa@...il.com
Subject: Re: Report 2 in ext4 and journal based on v5.17-rc1
On Thu, Mar 03, 2022 at 10:54:56AM +0100, Jan Kara wrote:
> On Thu 03-03-22 10:00:33, Byungchul Park wrote:
> > Unfortunately, it's neither perfect nor safe without another wakeup
> > source - rescue wakeup source.
> >
> > consumer producer
> >
> > lock L
> > (too much work queued == true)
> > unlock L
> > --- preempted
> > lock L
> > unlock L
> > do work
> > lock L
> > unlock L
> > do work
> > ...
> > (no work == true)
> > sleep
> > --- scheduled in
> > sleep
> >
> > This code leads a deadlock without another wakeup source, say, not safe.
>
> So the scenario you describe above is indeed possible. But the trick is
> that the wakeup from 'consumer' as is doing work will remove 'producer'
> from the wait queue and change the 'producer' process state to
> 'TASK_RUNNING'. So when 'producer' calls sleep (in fact schedule()), the
> scheduler will just treat this as another preemption point and the
> 'producer' will immediately or soon continue to run. So indeed we can think
> of this as "another wakeup source" but the source is in the CPU scheduler
> itself. This is the standard way how waitqueues are used in the kernel...
Nice! Thanks for the explanation. I will take it into account if needed.
> > Lastly, just for your information, I need to explain how Dept works a
> > little more for you not to misunderstand Dept.
> >
> > Assuming the consumer and producer guarantee not to lead a deadlock like
> > the following, Dept won't report it a problem:
> >
> > consumer producer
> >
> > sleep
> > wakeup work_done
> > queue work
> > sleep
> > wakeup work_queued
> > do work
> > sleep
> > wakeup work_done
> > queue work
> > sleep
> > wakeup work_queued
> > do work
> > sleep
> > ... ...
> >
> > Dept does not consider all waits preceeding an event but only waits that
> > might lead a deadlock. In this case, Dept works with each region
> > independently.
> >
> > consumer producer
> >
> > sleep <- initiates region 1
> > --- region 1 starts
> > ... ...
> > --- region 1 ends
> > wakeup work_done
> > ... ...
> > queue work
> > ... ...
> > sleep <- initiates region 2
> > --- region 2 starts
> > ... ...
> > --- region 2 ends
> > wakeup work_queued
> > ... ...
> > do work
> > ... ...
> > sleep <- initiates region 3
> > --- region 3 starts
> > ... ...
> > --- region 3 ends
> > wakeup work_done
> > ... ...
> > queue work
> > ... ...
> > sleep <- initiates region 4
> > --- region 4 starts
> > ... ...
> > --- region 4 ends
> > wakeup work_queued
> > ... ...
> > do work
> > ... ...
> >
> > That is, Dept does not build dependencies across different regions. So
> > you don't have to worry about unreasonable false positives that much.
> >
> > Thoughts?
>
> Thanks for explanation! And what exactly defines the 'regions'? When some
> process goes to sleep on some waitqueue, this defines a start of a region
> at the place where all the other processes are at that moment and wakeup of
> the waitqueue is an end of the region?
Yes. Let me explain it more for better understanding.
(I copied it from the talk I did with Matthew..)
ideal view
-----------
context X context Y
request event E ...
write REQUESTEVENT when (notice REQUESTEVENT written)
... notice the request from X [S]
--- ideally region 1 starts here
wait for the event ...
sleep if (can see REQUESTEVENT written)
it's on the way to the event
...
...
--- ideally region 1 ends here
finally the event [E]
Dept basically works with the above view with regard to wait and event.
But it's very hard to identify the ideal [S] point in practice. So Dept
instead identifies [S] point by checking WAITSTART with memory barriers
like the following, which would make Dept work conservatively.
Dept's view
------------
context X context Y
request event E ...
write REQUESTEVENT when (notice REQUESTEVENT written)
... notice the request from X
--- region 2 Dept gives up starts
wait for the event ...
write barrier
write WAITSTART read barrier
sleep when (notice WAITSTART written)
ensure the request has come [S]
--- region 2 Dept gives up ends
--- region 3 starts here
...
if (can see WAITSTART written)
it's on the way to the event
...
...
--- region 3 ends here
finally the event [E]
In short, Dept works with region 3.
Thanks,
Byungchul
Powered by blists - more mailing lists