[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201110160254.GB17288@roeck-us.net>
Date: Tue, 10 Nov 2020 08:02:54 -0800
From: Guenter Roeck <linux@...ck-us.net>
To: Brad Campbell <brad@...rfbargle.com>
Cc: Henrik Rydberg <rydberg@...math.org>,
Arnd Bergmann <arnd@...db.de>, linux-hwmon@...r.kernel.org,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
hns@...delico.com, Andreas Kemnade <andreas@...nade.info>,
Jean Delvare <jdelvare@...e.com>
Subject: Re: [PATCH v3] applesmc: Re-work SMC comms
On Tue, Nov 10, 2020 at 04:40:23PM +1100, Brad Campbell wrote:
> On 10/11/20 3:55 pm, Guenter Roeck wrote:
> > On Tue, Nov 10, 2020 at 01:04:04PM +1100, Brad Campbell wrote:
> >> On 9/11/20 3:06 am, Guenter Roeck wrote:
> >>> On 11/8/20 2:14 AM, Henrik Rydberg wrote:
> >>>> On Sun, Nov 08, 2020 at 09:35:28AM +0100, Henrik Rydberg wrote:
> >>>>> Hi Brad,
> >>>>>
> >>>>> On 2020-11-08 02:00, Brad Campbell wrote:
> >>>>>> G'day Henrik,
> >>>>>>
> >>>>>> I noticed you'd also loosened up the requirement for SMC_STATUS_BUSY in read_smc(). I assume
> >>>>>> that causes problems on the early Macbook. This is revised on the one sent earlier.
> >>>>>> If you could test this on your Air1,1 it'd be appreciated.
> >>>>>
> >>>>> No, I managed to screw up the patch; you can see that I carefully added the
> >>>>> same treatment for the read argument, being unsure if the BUSY state would
> >>>>> remain during the AVAILABLE data phase. I can check that again, but
> >>>>> unfortunately the patch in this email shows the same problem.
> >>>>>
> >>>>> I think it may be worthwhile to rethink the behavior of wait_status() here.
> >>>>> If one machine shows no change after a certain status bit change, then
> >>>>> perhaps the others share that behavior, and we are waiting in vain. Just
> >>>>> imagine how many years of cpu that is, combined. ;-)
> >>>>
> >>>> Here is a modification along that line.
> >>>>
> >>>
> >>> Please resend this patch as stand-alone v4 patch. If sent like it was sent here,
> >>> it doesn't make it into patchwork, and is thus not only difficult to apply but
> >>> may get lost, and it is all but impossible to find and apply all tags.
> >>> Also, prior to Henrik's Signed=off-by: there should be a one-line explanation
> >>> of the changes made.
> >>>
> >>> Thanks,
> >>> Guenter
> >>>
> >>>> Compared to your latest version, this one has wait_status() return the
> >>>> actual status on success. Instead of waiting for BUSY, it waits for
> >>>> the other status bits, and checks BUSY afterwards. So as not to wait
> >>>> unneccesarily, the udelay() is placed together with the single
> >>>> outb(). The return value of send_byte_data() is augmented with
> >>>> -EAGAIN, which is then used in send_command() to create the resend
> >>>> loop.
> >>>>
> >>>> I reach 41 reads per second on the MBA1,1 with this version, which is
> >>>> getting close to the performance prior to the problems.
> >>>>
> >>
> >> Can I get an opinion on this wait statement please?
> >>
> >> The apple driver uses a loop with a million (1,000,000) retries spaced with a 10uS delay.
> >>
> >> In my testing on 2 machines, we don't busy wait more than about 2 loops.
> >> Replacing a small udelay with the usleep_range kills performance.
> >> With the following (do 10 fast checks before we start sleeping) I nearly triple the performance
> >> of the driver on my laptop, and double it on my iMac. This is on an otherwise unmodified version of
> >> Henriks v4 submission.
> >>
> >> Yes, given the timeouts I know it's a ridiculous loop condition.
> >>
> >> static int wait_status(u8 val, u8 mask)
> >> {
> >> unsigned long end = jiffies + (APPLESMC_MAX_WAIT * HZ) / USEC_PER_SEC;
> >> u8 status;
> >> int i;
> >>
> >> for (i=1; i < 1000000; i++) {
> >
> > The minimum wait time is 10 us, or 16uS after the first 10
> > attempts. 1000000 * 10 = 10 seconds. I mean, it would make
> > some sense to limit the loop to APPLESMC_MAX_WAIT /
> > APPLESMC_MIN_WAIT iterations, but why 1,000,000 ?
>
> I had to pick a big number and it was easy to punch in 6 zeros as that is what is in
> the OSX driver. In this instance it's more a proof of concept rather than sane example
> because it'll either abort on timeout or return the correct status anyway.
> Could also have been a while (true) {} but then I'd need to manually increment i.
>
> >> status = inb(APPLESMC_CMD_PORT);
> >> if ((status & mask) == val)
> >> return status;
> >> /* timeout: give up */
> >> if (time_after(jiffies, end))
> >> break;
> >> if (i < 10)
> >> udelay(10);
> >> else
> >> usleep_range(APPLESMC_MIN_WAIT, APPLESMC_MIN_WAIT * 16);
> >
> > The original code had the exponential wait time increase.
> > I don't really see the point of changing that. I'd suggest
> > to keep the exponential increase but change the code to
> > something like
> > if (us < APPLESMC_MIN_WAIT * 4)
> > udelay(us)
> > else
> > usleep_range(us, us * 16);
>
> The main reason I dropped the exponential was best case on this laptop the modified code with exponential
> wait as described above increase increases the performance from ~40 -> 62 reads/sec, whereas the version
> I posted with the first 10 loops at 10uS goes from ~40 -> 100 reads/sec.
>
> About 95% of waits never get past a few of iterations of that loop (so ~30-60uS). With a
> modified exponential starting at 8uS a 30uS requirement ends up at 56uS. If it needed 60us with
> the original we end up waiting 120uS.
>
> If it needs longer than 120uS it's rare enough that a bigger sleep isn't going to cause much
> of a performance hit.
>
> Getting completely pathological and busy waiting with a fixed 10uS delay like the OSX driver
> does give about 125 reads/sec but I was trying to find a balance and 10 loops seemed to hit that mark.
>
> > Effectively that means the first wait would be 16 uS,
> > followed by 32 uS, followed by increasingly larger sleep
> > times. I don't know the relevance of APPLESMC_MIN_WAIT
> > being set to 16, but if you'd want to start with smaller
> > wait times you could reduce it to 8. If you are concerned
> > about excessively large sleep times you could reduce
> > the span from us..us*16 to, say, us..us*4 or us..us*2.
>
> I was tossing up here between the overhead required to manage a tighter usleep_range
> vs some extra udelays.
>
> It's not exactly a performance sensitive driver, but I thought there might be a balance to be
> struck between performance and simplicity. The exponential delay always struck me as odd.
>
> If the preference is to leave it as is or modify as suggested I'm ok with that too.
> Appreciate the input.
Ok, not worth arguing about.
Guenter
>
> Regards,
> Brad
Powered by blists - more mailing lists