linux-kernel - Re: [RFC PATCH 02/11] Drivers: hv: vmbus: Don't bind the offer&rescind works to a specific CPU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200403120456.GA24298@andrea>
Date:   Fri, 3 Apr 2020 14:04:56 +0200
From:   Andrea Parri <parri.andrea@...il.com>
To:     Vitaly Kuznetsov <vkuznets@...hat.com>
Cc:     Michael Kelley <mikelley@...rosoft.com>,
        Dexuan Cui <decui@...rosoft.com>,
        KY Srinivasan <kys@...rosoft.com>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        Stephen Hemminger <sthemmin@...rosoft.com>,
        Wei Liu <wei.liu@...nel.org>,
        "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
        Boqun Feng <boqun.feng@...il.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 02/11] Drivers: hv: vmbus: Don't bind the
 offer&rescind works to a specific CPU

On Mon, Mar 30, 2020 at 02:24:16PM +0200, Vitaly Kuznetsov wrote:
> Michael Kelley <mikelley@...rosoft.com> writes:
> 
> > From: Andrea Parri <parri.andrea@...il.com> Sent: Saturday, March 28, 2020 10:09 AM
> >> 
> >> > In case we believe that OFFER -> RESCINF sequence is always ordered
> >> > by the host AND we don't care about other offers in the queue the
> >> > suggested locking is OK: we're guaranteed to process RESCIND after we
> >> > finished processing OFFER for the same channel. However, waiting for
> >> > 'offer_in_progress == 0' looks fishy so I'd suggest we at least add a
> >> > comment explaining that the wait is only needed to serialize us with
> >> > possible OFFER for the same channel - and nothing else. I'd personally
> >> > still slightly prefer the algorythm I suggested as it guarantees we take
> >> > channel_mutex with offer_in_progress == 0 -- even if there are no issues
> >> > we can think of today (not strongly though).
> >> 
> >> Does it?  offer_in_progress is incremented without channel_mutex...
> >> 
> 
> No, it does not, you're right, by itself the change is insufficient.
> 
> >> IAC, I have no objections to apply the changes you suggested.  To avoid
> >> misunderstandings: vmbus_bus_suspend() presents a similar usage...  Are
> >> you suggesting that I apply similar changes there?
> >> 
> >> Alternatively:  FWIW, the comment in vmbus_onoffer_rescind() does refer
> >> to "The offer msg and the corresponding rescind msg...".  I am all ears
> >> if you have any concrete suggestions to improve these comments.
> >> 
> >
> > Given that waiting for 'offer_in_progress == 0' is the current code, I think
> > there's an argument to made for not changing it if the change isn't strictly
> > necessary.  This patch set introduces enough change that *is* necessary. :-)
> >
> 
> Sure. I was thinking a bit more about this and it seems that over years
> we've made the synchronization of channels code too complex (every time
> for a good reason but still). Now (before this series) we have at least:
> 
> vmbus_connection.channel_mutex
> vmbus_connection.offer_in_progress
> channel.probe_done
> channel.rescind
> Workqueues (vmbus_connection.work_queue,
>  queue_work_on(vmbus_connection.connect_cpu),...)
> channel.lock spinlock (the least of the problems)
> 
> Maybe there's room for improvement? Out of top of my head I'd suggest a
> state machine for each channel (e.g something like
> OFFERED->OPENING->OPEN->RESCIND_REQ->RESCINDED->CLOSED) + refcounting
> (subchannels, open/rescind/... requests in progress, ...) + non-blocking
> request handling like "Can we handle this rescind offer now? No,
> refcount is too big. OK, rescheduling the work". Maybe not the best
> design ever and I'd gladly support any other which improves the
> readability of the code and makes all state changes and synchronization
> between them more obvious.
> 
> Note, VMBus channel handling driven my messages (unlike events for ring
> buffer) is not performance critical, we just need to ensure completeness
> (all requests are handled correctly) with forward progress guarantees
> (no deadlocks).
> 
> I understand the absence of 'hot' issues in the current code is what can
> make the virtue of redesign questionable and sorry for hijacking the
> series which doesn't seem to make things worse :-)

(Back here...  Sorry for the delay.)

FWIW, what you wrote above makes sense to me; and *yes*, the series in
question was not intended as "let us undertake such a redesign" (quite
the opposite in fact...)

With regard to this specific patch, it seems to me that we've reached
a certain consensus or, at least, I don't see complaints  ;-).  Please
let me know if I misunderstood.

Thanks,
  Andrea