[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bc5db9c7-a290-b49e-0217-879817c8cc87@acm.org>
Date: Tue, 30 Jan 2018 19:52:41 -0600
From: Corey Minyard <minyard@....org>
To: "Wang, Haiyue" <haiyue.wang@...ux.intel.com>, joel@....id.au,
avifishman70@...il.com, openbmc@...ts.ozlabs.org,
openipmi-developer@...ts.sourceforge.net,
linux-kernel@...r.kernel.org
Cc: andriy.shevchenko@...el.com
Subject: Re: [PATCH arm/aspeed/ast2500 v2] ipmi: add an Aspeed KCS IPMI BMC
driver
On 01/30/2018 07:37 PM, Wang, Haiyue wrote:
>
>
> On 2018-01-31 09:25, Corey Minyard wrote:
>> On 01/30/2018 07:02 PM, Wang, Haiyue wrote:
>>>
>>>
>>> On 2018-01-31 08:52, Corey Minyard wrote:
>>>> On 01/30/2018 06:02 PM, Wang, Haiyue wrote:
>>>>>
>>>>>
>>>>> On 2018-01-30 21:49, Corey Minyard wrote:
>>>>>> On 01/29/2018 07:57 AM, Wang, Haiyue wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2018-01-26 22:48, Corey Minyard wrote:
>>>>>>>> On 01/26/2018 12:08 AM, Wang, Haiyue wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2018-01-25 01:48, Corey Minyard wrote:
>>>>>>>>>> On 01/24/2018 10:06 AM, Haiyue Wang wrote:
>>>>>>>>>>> The KCS (Keyboard Controller Style) interface is used to
>>>>>>>>>>> perform in-band
>>>>>>>>>>> IPMI communication between a server host and its BMC
>>>>>>>>>>> (BaseBoard Management
>>>>>>>>>>> Controllers).
>>>>>>>>>>>
>>>>>>>>>>> This driver exposes the KCS interface on ASpeed SOCs
>>>>>>>>>>> (AST2400 and AST2500)
>>>>>>>>>>> as a character device. Such SOCs are commonly used as BMCs
>>>>>>>>>>> and this driver
>>>>>>>>>>> implements the BMC side of the KCS interface.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Haiyue Wang <haiyue.wang@...ux.intel.com>
>>>>>>>>>>>
>>>>>>>>>>> ---
>>>>>>>>>>> v1->v2
>>>>>>>>>>>
>>>>>>>>>>> - Divide the driver into two parts, one handles the BMC KCS
>>>>>>>>>>> IPMI 2.0 state;
>>>>>>>>>>> the other handles the BMC KCS controller such as AST2500
>>>>>>>>>>> IO accessing.
>>>>>>>>>>> - Use the spin lock APIs to handle the device file
>>>>>>>>>>> operations and BMC chip
>>>>>>>>>>> IRQ inferface for accessing the same KCS BMC data structure.
>>>>>>>>>>> - Enhanced the phases handling of the KCS BMC.
>>>>>>>>>>> - Unified the IOCTL definition for IPMI BMC, it will be used
>>>>>>>>>>> by KCS and BT.
>>>>>>>>>>>
>>>>>>>>>>> ---
>>>>>>>>>>>
>>>>>>>>>>> +
>>>>>>>>>>> +static void kcs_bmc_handle_data(struct kcs_bmc *kcs_bmc)
>>>>>>>>>>> +{
>>>>>>>>>>> + u8 data;
>>>>>>>>>>> +
>>>>>>>>>>> + switch (kcs_bmc->phase) {
>>>>>>>>>>> + case KCS_PHASE_WRITE:
>>>>>>>>>>> + set_state(kcs_bmc, WRITE_STATE);
>>>>>>>>>>> +
>>>>>>>>>>> + /* set OBF before reading data */
>>>>>>>>>>> + write_data(kcs_bmc, KCS_ZERO_DATA);
>>>>>>>>>>> +
>>>>>>>>>>> + if (kcs_bmc->data_in_idx < KCS_MSG_BUFSIZ)
>>>>>>>>>>> + kcs_bmc->data_in[kcs_bmc->data_in_idx++] =
>>>>>>>>>>> + read_data(kcs_bmc);
>>>>>>>>
>>>>>>>> I missed this earlier, you need to issue a length error if the
>>>>>>>> data is too large.
>>>>>>>>
>>>>>>>>>>> + break;
>>>>>>>>>>> +
>>>>>>>>>>> + case KCS_PHASE_WRITE_END:
>>>>>>>>>>> + set_state(kcs_bmc, READ_STATE);
>>>>>>>>>>> +
>>>>>>>>>>> + if (kcs_bmc->data_in_idx < KCS_MSG_BUFSIZ)
>>>>>>>>>>> + kcs_bmc->data_in[kcs_bmc->data_in_idx++] =
>>>>>>>>>>> + read_data(kcs_bmc);
>>>>>>>>>>> +
>>>>>>>>>>> + kcs_bmc->phase = KCS_PHASE_WAIT_READ;
>>>>>>>>>>> + if (kcs_bmc->running) {
>>>>>>>>>>
>>>>>>>>>> Why do you only do this when running is set? It won't hurt
>>>>>>>>>> anything if it's not
>>>>>>>>>> set. As it is, you have a race if something opens the device
>>>>>>>>>> while this code
>>>>>>>>>> runs.
>>>>>>>>>>
>>>>>>>>>> Also, don't set the state to wait read until the "write" has
>>>>>>>>>> finished (userland has
>>>>>>>>>> read the data out of the buffer. More on that later.
>>>>>>>>>>
>>>>>>>>> Understood.
>>>>>>>>>>> + kcs_bmc->data_in_avail = true;
>>>>>>>>>>> + wake_up_interruptible(&kcs_bmc->queue);
>>>>>>>>>>> + }
>>>>>>>>>>> + break;
>>>>>>>>>>> +
>>>>>>>>>>> + case KCS_PHASE_READ:
>>>>>>>>>>> + if (kcs_bmc->data_out_idx == kcs_bmc->data_out_len)
>>>>>>>>>>> + set_state(kcs_bmc, IDLE_STATE);
>>>>>>>>>>> +
>>>>>>>>>>> + data = read_data(kcs_bmc);
>>>>>>>>>>> + if (data != KCS_CMD_READ_BYTE) {
>>>>>>>>>>> + set_state(kcs_bmc, ERROR_STATE);
>>>>>>>>>>> + write_data(kcs_bmc, KCS_ZERO_DATA);
>>>>>>>>>>> + break;
>>>>>>>>>>> + }
>>>>>>>>>>> +
>>>>>>>>>>> + if (kcs_bmc->data_out_idx == kcs_bmc->data_out_len) {
>>>>>>>>>>> + write_data(kcs_bmc, KCS_ZERO_DATA);
>>>>>>>>>>> + kcs_bmc->phase = KCS_PHASE_IDLE;
>>>>>>>>>>> + break;
>>>>>>>>>>> + }
>>>>>>>>>>> +
>>>>>>>>>>> + write_data(kcs_bmc,
>>>>>>>>>>> + kcs_bmc->data_out[kcs_bmc->data_out_idx++]);
>>>>>>>>>>> + break;
>>>>>>>>>>> +
>>>>>>>>>>> + case KCS_PHASE_ABORT_ERROR1:
>>>>>>>>>>> + set_state(kcs_bmc, READ_STATE);
>>>>>>>>>>> +
>>>>>>>>>>> + /* Read the Dummy byte */
>>>>>>>>>>> + read_data(kcs_bmc);
>>>>>>>>>>> +
>>>>>>>>>>> + write_data(kcs_bmc, kcs_bmc->error);
>>>>>>>>>>> + kcs_bmc->phase = KCS_PHASE_ABORT_ERROR2;
>>>>>>>>>>> + break;
>>>>>>>>>>> +
>>>>>>>>>>> + case KCS_PHASE_ABORT_ERROR2:
>>>>>>>>>>> + set_state(kcs_bmc, IDLE_STATE);
>>>>>>>>>>> +
>>>>>>>>>>> + /* Read the Dummy byte */
>>>>>>>>>>> + read_data(kcs_bmc);
>>>>>>>>>>> +
>>>>>>>>>>> + write_data(kcs_bmc, KCS_ZERO_DATA);
>>>>>>>>>>> + kcs_bmc->phase = KCS_PHASE_IDLE;
>>>>>>>>>>> +
>>>>>>>>>>> + break;
>>>>>>>>>>> +
>>>>>>>>>>> + default:
>>>>>>>>>>> + set_state(kcs_bmc, ERROR_STATE);
>>>>>>>>>>> +
>>>>>>>>>>> + /* Read the Dummy byte */
>>>>>>>>>>> + read_data(kcs_bmc);
>>>>>>>>>>> +
>>>>>>>>>>> + write_data(kcs_bmc, KCS_ZERO_DATA);
>>>>>>>>>>> + break;
>>>>>>>>>>> + }
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +static void kcs_bmc_handle_command(struct kcs_bmc *kcs_bmc)
>>>>>>>>>>> +{
>>>>>>>>>>> + u8 cmd;
>>>>>>>>>>> +
>>>>>>>>>>> + set_state(kcs_bmc, WRITE_STATE);
>>>>>>>>>>> +
>>>>>>>>>>> + /* Dummy data to generate OBF */
>>>>>>>>>>> + write_data(kcs_bmc, KCS_ZERO_DATA);
>>>>>>>>>>> +
>>>>>>>>>>> + cmd = read_data(kcs_bmc);
>>>>>>>>>>
>>>>>>>>>> Shouldn't you check the phase in all the cases below and do
>>>>>>>>>> error
>>>>>>>>>> handling if the phase isn't correct?
>>>>>>>>>>
>>>>>>>>>> Similar thing if the device here isn't open. You need to handle
>>>>>>>>>> that gracefully.
>>>>>>>>>>
>>>>>>>>>> Also, you should remove data_in_avail and data_in_idx setting
>>>>>>>>>> from
>>>>>>>>>> here, for reasons I will explain later.
>>>>>>>>>>
>>>>>>>>> If host software sends the data twice such as a retry before
>>>>>>>>> the BMC's IPMI service starts,
>>>>>>>>> then the two IPMI requests will be merged into one, if not
>>>>>>>>> clear data_in_idx after receving
>>>>>>>>> KCS_CMD_WRITE_START. Most of the states are driven by host
>>>>>>>>> software (SMS). :(
>>>>>>>>
>>>>>>>> True, but what if the host issues WRITE_START or a WRITE_END
>>>>>>>> while this driver is in read
>>>>>>>> state? The spec is unclear on this, but it really only makes
>>>>>>>> sense for the host to issue
>>>>>>>> WRITE_START in idle stat and WRITE_END in write state. IMHO it
>>>>>>>> should go to error
>>>>>>>> state. You might make the case that a WRITE_START anywhere
>>>>>>>> restarts the transaction,
>>>>>>>> but the feel of the error state machine kind of goes against
>>>>>>>> that. WRITE_END is definitely
>>>>>>>> wrong anywhere but write state.
>>>>>>>>
>>>>>>>> I just found the following in the spec (section 9.12):
>>>>>>>>
>>>>>>>> Thus, since the interface will allow a command transfer to be
>>>>>>>> started or restarted
>>>>>>>> at any time when the input buffer is empty, software could
>>>>>>>> elect to
>>>>>>>> simply retry
>>>>>>>> the command upon detecting an error condition, or issue a
>>>>>>>> ‘known good’
>>>>>>>> command in order to clear ERROR_STATE
>>>>>>>>
>>>>>>>> So a WRITE_START anywhere is ok. A WRITE_END in the wrong
>>>>>>>> state should probably
>>>>>>>> still go to error state. This means the user needs to be able
>>>>>>>> to handle a write error at
>>>>>>>> any time. It also means it's very important to make sure the
>>>>>>>> user does a read before
>>>>>>>> doing a write. If the host re-issues a WRITE_START and writes
>>>>>>>> a new command
>>>>>>>> between the time the use reads the data and writes the
>>>>>>>> response, the response would
>>>>>>>> be for the wrong command.
>>>>>>>>
>>>>>>>>>>> + switch (cmd) {
>>>>>>>>>>> + case KCS_CMD_WRITE_START:
>>>>>>>>>>> + kcs_bmc->data_in_avail = false;
>>>>>>>>>>> + kcs_bmc->data_in_idx = 0;
>>>>>>>>>>> + kcs_bmc->phase = KCS_PHASE_WRITE;
>>>>>>>>>>> + kcs_bmc->error = KCS_NO_ERROR;
>>>>>>>>>>> + break;
>>>>>>>>>>> +
>>>>>>>>>>> + case KCS_CMD_WRITE_END:
>>>>>>>>>>> + kcs_bmc->phase = KCS_PHASE_WRITE_END;
>>>>>>>>>>> + break;
>>>>>>>>>>> +
>>>>>>>>>>> + case KCS_CMD_ABORT:
>>>>>>>>>>> + if (kcs_bmc->error == KCS_NO_ERROR)
>>>>>>>>>>> + kcs_bmc->error = KCS_ABORTED_BY_COMMAND;
>>>>>>>>>>> +
>>>>>>>>>>> + kcs_bmc->phase = KCS_PHASE_ABORT_ERROR1;
>>>>>>>>>>> + break;
>>>>>>>>>>> +
>>>>>>>>>>> + default:
>>>>>>>>>>> + kcs_bmc->error = KCS_ILLEGAL_CONTROL_CODE;
>>>>>>>>>>> + set_state(kcs_bmc, ERROR_STATE);
>>>>>>>>>>> + write_data(kcs_bmc, kcs_bmc->error);
>>>>>>>>>>> + kcs_bmc->phase = KCS_PHASE_ERROR;
>>>>>>>>>>> + break;
>>>>>>>>>>> + }
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +int kcs_bmc_handle_event(struct kcs_bmc *kcs_bmc)
>>>>>>>>>>> +{
>>>>>>>>>>> + unsigned long flags;
>>>>>>>>>>> + int ret = 0;
>>>>>>>>>>> + u8 status;
>>>>>>>>>>> +
>>>>>>>>>>> + spin_lock_irqsave(&kcs_bmc->lock, flags);
>>>>>>>>>>> +
>>>>>>>>>>> + status = read_status(kcs_bmc) & (KCS_STATUS_IBF |
>>>>>>>>>>> KCS_STATUS_CMD_DAT);
>>>>>>>>>>> +
>>>>>>>>>>> + switch (status) {
>>>>>>>>>>> + case KCS_STATUS_IBF | KCS_STATUS_CMD_DAT:
>>>>>>>>>>> + kcs_bmc_handle_command(kcs_bmc);
>>>>>>>>>>> + break;
>>>>>>>>>>> +
>>>>>>>>>>> + case KCS_STATUS_IBF:
>>>>>>>>>>> + kcs_bmc_handle_data(kcs_bmc);
>>>>>>>>>>> + break;
>>>>>>>>>>> +
>>>>>>>>>>> + default:
>>>>>>>>>>> + ret = -1;
>>>>>>>>>>> + break;
>>>>>>>>>>> + }
>>>>>>>>>>> +
>>>>>>>>>>> + spin_unlock_irqrestore(&kcs_bmc->lock, flags);
>>>>>>>>>>> +
>>>>>>>>>>> + return ret;
>>>>>>>>>>> +}
>>>>>>>>>>> +EXPORT_SYMBOL(kcs_bmc_handle_event);
>>>>>>>>>>> +
>>>>>>>>>>> +static inline struct kcs_bmc *file_kcs_bmc(struct file *filp)
>>>>>>>>>>> +{
>>>>>>>>>>> + return container_of(filp->private_data, struct kcs_bmc,
>>>>>>>>>>> miscdev);
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +static int kcs_bmc_open(struct inode *inode, struct file
>>>>>>>>>>> *filp)
>>>>>>>>>>> +{
>>>>>>>>>>> + struct kcs_bmc *kcs_bmc = file_kcs_bmc(filp);
>>>>>>>>>>> + int ret = 0;
>>>>>>>>>>> +
>>>>>>>>>>> + spin_lock_irq(&kcs_bmc->lock);
>>>>>>>>>>> +
>>>>>>>>>>> + if (!kcs_bmc->running) {
>>>>>>>>>>> + kcs_bmc->running = 1;
>>>>>>>>>>> + kcs_bmc->phase = KCS_PHASE_IDLE;
>>>>>>>>>>> + kcs_bmc->data_in_avail = false;
>>>>>>>>>>
>>>>>>>>>> If you do everything right, setting the phase and
>>>>>>>>>> data_in_avail should not
>>>>>>>>>> be necessary here.
>>>>>>>>>>
>>>>>>>>>>> + } else {
>>>>>>>>>>> + ret = -EBUSY;
>>>>>>>>>>> + }
>>>>>>>>>>> +
>>>>>>>>>>> + spin_unlock_irq(&kcs_bmc->lock);
>>>>>>>>>>> +
>>>>>>>>>>> + return ret;
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +static unsigned int kcs_bmc_poll(struct file *filp,
>>>>>>>>>>> poll_table *wait)
>>>>>>>>>>> +{
>>>>>>>>>>> + struct kcs_bmc *kcs_bmc = file_kcs_bmc(filp);
>>>>>>>>>>> + unsigned int mask = 0;
>>>>>>>>>>> +
>>>>>>>>>>> + poll_wait(filp, &kcs_bmc->queue, wait);
>>>>>>>>>>> +
>>>>>>>>>>> + spin_lock_irq(&kcs_bmc->lock);
>>>>>>>>>>> +
>>>>>>>>>>> + if (kcs_bmc->data_in_avail)
>>>>>>>>>>> + mask |= POLLIN;
>>>>>>>>>>> +
>>>>>>>>>>> + spin_unlock_irq(&kcs_bmc->lock);
>>>>>>>>>>> +
>>>>>>>>>>> + return mask;
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +static ssize_t kcs_bmc_read(struct file *filp, char *buf,
>>>>>>>>>>> + size_t count, loff_t *offset)
>>>>>>>>>>> +{
>>>>>>>>>>> + struct kcs_bmc *kcs_bmc = file_kcs_bmc(filp);
>>>>>>>>>>> + ssize_t ret = -EAGAIN;
>>>>>>>>>>> +
>>>>>>>>>>
>>>>>>>>>> This function still has some issues.
>>>>>>>>>>
>>>>>>>>>> You can't call copy_to_user() with a spinlock held or
>>>>>>>>>> interrupts disabled.
>>>>>>>>>> To handle readers, you probably need a separate mutex.
>>>>>>>>>>
>>>>>>>>>> Also, this function can return -EAGAIN even if O_NONBLOCK is
>>>>>>>>>> not set if
>>>>>>>>>> kcs_bmc->data_in_avail changes between when you wait on the
>>>>>>>>>> event
>>>>>>>>>> and when you check it under the lock.
>>>>>>>>>>
>>>>>>>>>> You also clear data_in_avail even if the copy_to_user()
>>>>>>>>>> fails, which is
>>>>>>>>>> wrong.
>>>>>>>>>>
>>>>>>>>>> I believe the best way to handle this would be to have the
>>>>>>>>>> spinlock
>>>>>>>>>> protect the inner workings of the state machine and a mutex
>>>>>>>>>> handle
>>>>>>>>>> copying data out, setting/clearing the running flag (thus a
>>>>>>>>>> mutex
>>>>>>>>>> instead of spinlock in open and release) and the ioctl
>>>>>>>>>> settings (except
>>>>>>>>>> for abort where you will need to grab the spinlock).
>>>>>>>>>>
>>>>>>>>>> After the wait event below, grab the mutex. If data is not
>>>>>>>>>> available
>>>>>>>>>> and O_NONBLOCK is not set, drop the mutex and retry. Otherwise
>>>>>>>>>> this is the only place (besides release) that sets
>>>>>>>>>> data_in_avail to false.
>>>>>>>>>> Do the copy_to_user(), grab the spinlock, clear data_in_avail
>>>>>>>>>> and
>>>>>>>>>> data_in_idx, then release the lock and mutex. If you are really
>>>>>>>>>> adventurous you can do this without grabbing the lock using
>>>>>>>>>> barriers, but it's probably not necessary here.
>>>>>>>>>>
>>>>>>>>
>>>>>>>> With the state machine being able to be restarted at any time,
>>>>>>>> you need
>>>>>>>> something a little different here. You still need the mutex to
>>>>>>>> handle
>>>>>>>> multiple readers and the copy. I think the function should be
>>>>>>>> something
>>>>>>>> like:
>>>>>>>>
>>>>>>> Since KCS is not a multi-reader protocol from BMC's view, you
>>>>>>> makes things complex. :-)
>>>>>>
>>>>>> No, I don't think you understand. The primary purpose of the
>>>>>> complexity
>>>>>> here is to protect the driver from the host system (on the other
>>>>>> side of
>>>>>> the KCS interface). Without this protection, it is possible for
>>>>>> the host
>>>>>> system to start a new write while the user on the BMC side is
>>>>>> reading
>>>>>> data out, resulting in corrupt data being read.
>>>>>>
>>>>>> I haven't thought too much about this. There may be a simpler way,
>>>>>> but the protection needs to be there.
>>>>>>
>>>>>> And you may not think you need to protect the driver against a
>>>>>> malicious BMC side user code, but you would be wrong. You can
>>>>>> only have one opener, but with threads or a fork you can have
>>>>>> multiple readers. And you don't know if a malicious piece of
>>>>>> code has taken over userland. You always need to protect the
>>>>>> kernel.
>>>>>>
>>>>> Sure, the read/write have protected the critical data area with
>>>>> IRQ, and also, these
>>>>> functions should be thread local safe I believe.
>>>>>
>>>>> spin_lock_irq(&kcs_bmc->lock);
>>>>> ...
>>>>> spin_unlock_irq(&kcs_bmc->lock);
>>>>>
>>>>
>>>> But remember, you can't call copy_to_user() when IRQs are off or
>>>> when you are holding
>>>> a spinlock. That is an absolute no. It can crash the kernel.
>>>>
>>>> So you need a design that takes this into account, but will not
>>>> result in the possibility
>>>> of bad data being read.
>>>>
>>> Yes, sure, as I said before: access_ok(VERIFY_WRITE, to, n), then
>>> memcpy in spin_lock.
>>
>> Where did you get the idea that this was ok? It's not. access_ok()
>> is not actually very
>> useful, since the permissions on memory can change at any time unless
>> you are holding
>> the mm lock, which is also not an ok thing to do. It is entirely
>> possible for access_ok()
>> to pass and copy_to_user() to fail.
>>
> I thought memcpy will not fail. :(
Oh, memcpy won't fail as long as the source and destination is kernel
memory.
I was a little confused by the access_ok() thing, it's common for people to
assume that if they do access_ok(), that copy_to_user() won't fail.
>> I'm not exactly sure what you are saying, though. In any event, a
>> well-designed read()/write()
>> operation should leave the system unchanged if it gets an error.
>>
> I saw BT use a local buffer, If I change the '#define
> KCS_MSG_BUFSIZ 1024' to ".. 512", should it be OK
> as BT ?
>
> static ssize_t bt_bmc_read(struct file *file, char __user *buf,
> size_t count, loff_t *ppos)
> {
> struct bt_bmc *bt_bmc = file_bt_bmc(file);
> u8 len;
> int len_byte = 1;
> u8 kbuffer[BT_BMC_BUFFER_SIZE]; --> #define BT_BMC_BUFFER_SIZE 256
It's good practice to keep larger things off the stack, which is why I
dynamically
allocated it. But if you have a mutex, you can put that buffer in
struct bt_bmc
since it would only be accessed when holding the mutex.
>
>> -corey
>>
>>>>>>>> static ssize_t kcs_bmc_read(struct file *filp, char *buf,
>>>>>>>> size_t count, loff_t *offset)
>>>>>>>> {
>>>>>>>> struct kcs_bmc *kcs_bmc = file_kcs_bmc(filp);
>>>>>>>> ssize_t ret;
>>>>>>>> bool avail;
>>>>>>>> size_t data_size;
>>>>>>>> u8 *data;
>>>>>>>>
>>>>>>>> data = kmalloc(KCS_MSG_BUFSIZ, GFP_KERNEL);
>>>>>>>> if (!data)
>>>>>>>> return -ENOMEM;
>>>>>>>>
>>>>>>>> retry:
>>>>>>>> ret = -EAGAIN;
>>>>>>>> if (!(filp->f_flags & O_NONBLOCK))
>>>>>>>> wait_event_interruptible(kcs_bmc->queue,
>>>>>>>> kcs_bmc->data_in_avail);
>>>>>>>>
>>>>>>>> mutex_lock(&kcs_bmc->read_mutex);
>>>>>>>>
>>>>>>>> spin_lock_irq(&kcs_bmc->lock);
>>>>>>>> avail = kcs_bmc->data_in_avail;
>>>>>>>> if (avail) {
>>>>>>>> memcpy(data, kcs_bmc->data_in, kcs_bmc->data_in_idx);
>>>>>>>> data_size = kcs_bmc->data_in_idx;
>>>>>>>> }
>>>>>>>> spin_unlock_irq(&kcs_bmc->lock);
>>>>>>>>
>>>>>>>> if (!avail) {
>>>>>>>> if (filp->f_flags & O_NONBLOCK)
>>>>>>>> goto out_mutex_unlock;
>>>>>>>> mutex_unlock(&kcs_bmc->read_mutex);
>>>>>>>> goto retry;
>>>>>>>> }
>>>>>>>>
>>>>>>>> if (count < data_size) {
>>>>>>>> ret = -EOVERFLOW;
>>>>>>>> ? I'm not sure about the error, but userspace
>>>>>>>> needs to know.
>>>>>>>> goto out_mutex_unlock;
>>>>>>
>>>>>> Maybe a length error to the host side here?
>>>>
>>>> You didn't comment on this or the other length error. That needs
>>>> to be
>>>> handled.
>>>>
>>> Yes, will send a length error by following KCS spec.
>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>> if (!copy_to_user(buf, data, data_size)) {
>>>>>>>> ret = -EFAULT;
>>>>>>>> goto out_mutex_unlock;
>>>>>>>> }
>>>>>>>>
>>>>>>>> ret = data_size;
>>>>>>>>
>>>>>>>> spin_lock_irq(&kcs_bmc->lock);
>>>>>>>>
>>>>>>>> if (kcs_bmc->phase != KCS_PHASE_WRITE_END_DONE)
>>>>>>>> /* Something aborted or restarted the state
>>>>>>>> machine. */
>>>>>>>> ? Maybe restart if O_NONBLOCK is not set and
>>>>>>>> -EAGAIN if it is?
>>>>>>>> ret = -EIO;
>>>>>>>> } else {
>>>>>>>> kcs_bmc->phase = KCS_PHASE_WAIT_READ;
>>>>>>>> kcs_bmc->data_in_avail = false;
>>>>>>>> kcs_bmc->data_in_idx = 0;
>>>>>>>> }
>>>>>>>>
>>>>>>>> spin_unlock_irq(&kcs_bmc->lock);
>>>>>>>>
>>>>>>>> out_mutex_unlock:
>>>>>>>> mutex_unlock(&kcs_bmc->read_mutex);
>>>>>>>>
>>>>>>>> kfree(data);
>>>>>>>>
>>>>>>>> return ret;
>>>>>>>> }
>>>>>>>> Note that I added a state, KCS_PHASE_WRITE_END_DONE, which
>>>>>>>> would be
>>>>>>>> set after the final byte from the host is received. You want
>>>>>>>> the read here
>>>>>>>> done before you can do the write below to avoid the race I
>>>>>>>> talked about.
>>>>>>>>
>>>>>>>> There is a local copy made of the data. What you *never* want
>>>>>>>> to happen
>>>>>>>> here is for the state machine to start processing a new write
>>>>>>>> command
>>>>>>>> while the data is being copied. It could result in corrupt
>>>>>>>> data being read
>>>>>>>> and some random operation being done by the BMC.
>>>>>>>>
>>>>>>>> If you want to avoid the local copy, it could be done, but it's
>>>>>>>> more complex.
>>>>>>>>
>>>>>>>>>>> + if (!(filp->f_flags & O_NONBLOCK))
>>>>>>>>>>> + wait_event_interruptible(kcs_bmc->queue,
>>>>>>>>>>> + kcs_bmc->data_in_avail);
>>>>>>>>>>> +
>>>>>>>>>>> + spin_lock_irq(&kcs_bmc->lock);
>>>>>>>>>>> +
>>>>>>>>>>> + if (kcs_bmc->data_in_avail) {
>>>>>>>>>>> + kcs_bmc->data_in_avail = false;
>>>>>>>>>>> +
>>>>>>>>>>> + if (count > kcs_bmc->data_in_idx)
>>>>>>>>>>> + count = kcs_bmc->data_in_idx;
>>>>>>>>>>> +
>>>>>>>>>>> + if (!copy_to_user(buf, kcs_bmc->data_in, count))
>>>>>>>>>>> + ret = count;
>>>>>>>>>>> + else
>>>>>>>>>>> + ret = -EFAULT;
>>>>>>>>>>> + }
>>>>>>>>>>> +
>>>>>>>>>>> + spin_unlock_irq(&kcs_bmc->lock);
>>>>>>>>>>> +
>>>>>>>>>>> + return ret;
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +static ssize_t kcs_bmc_write(struct file *filp, const char
>>>>>>>>>>> *buf,
>>>>>>>>>>> + size_t count, loff_t *offset)
>>>>>>>>>>> +{
>>>>>>>>>>> + struct kcs_bmc *kcs_bmc = file_kcs_bmc(filp);
>>>>>>>>>>> + ssize_t ret = count;
>>>>>>>>>>> +
>>>>>>>>>>> + if (count < 1 || count > KCS_MSG_BUFSIZ)
>>>>>>>>>>> + return -EINVAL;
>>>>>>>>>>> +
>>>>>>>>>>> + spin_lock_irq(&kcs_bmc->lock);
>>>>>>>>>>> +
>>>>>>>>>>> + if (kcs_bmc->phase == KCS_PHASE_WAIT_READ) {
>>>>>>>>>>> + if (copy_from_user(kcs_bmc->data_out, buf, count)) {
>>>>>>>>>>> + spin_unlock_irq(&kcs_bmc->lock);
>>>>>>>>>>> + return -EFAULT;
>>>>>>>>>>> + }
>>>>>>>>>>> +
>>>>>>>>>>> + kcs_bmc->phase = KCS_PHASE_READ;
>>>>>>>>>>> + kcs_bmc->data_out_idx = 1;
>>>>>>>>>>> + kcs_bmc->data_out_len = count;
>>>>>>>>>>> + write_data(kcs_bmc, kcs_bmc->data_out[0]);
>>>>>>>>>>> + } else if (kcs_bmc->phase == KCS_PHASE_READ) {
>>>>>>>>>>> + ret = -EBUSY;
>>>>>>>>>>> + } else {
>>>>>>>>>>> + ret = -EINVAL;
>>>>>>>>>>
>>>>>>>>>> Is there a reason you return -EINVAL here? Why not just
>>>>>>>>>> -EBUSY in all
>>>>>>>>>> cases? Is there something that userland will need to do
>>>>>>>>>> differently?
>>>>>>>>>>
>>>>>>>>>>> + }
>>>>>>>>>>> +
>>>>>>>>>>> + spin_unlock_irq(&kcs_bmc->lock);
>>>>>>>>>>> +
>>>>>>>>>>> + return ret;
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +static long kcs_bmc_ioctl(struct file *filp, unsigned int cmd,
>>>>>>>>>>> + unsigned long arg)
>>>>>>>>>>> +{
>>>>>>>>>>> + struct kcs_bmc *kcs_bmc = file_kcs_bmc(filp);
>>>>>>>>>>> + long ret = 0;
>>>>>>>>>>> +
>>>>>>>>>>> + spin_lock_irq(&kcs_bmc->lock);
>>>>>>>>>>> +
>>>>>>>>>>> + switch (cmd) {
>>>>>>>>>>> + case IPMI_BMC_IOCTL_SET_SMS_ATN:
>>>>>>>>>>> + update_status_bits(kcs_bmc, KCS_STATUS_SMS_ATN,
>>>>>>>>>>> + KCS_STATUS_SMS_ATN);
>>>>>>>>>>> + break;
>>>>>>>>>>> +
>>>>>>>>>>> + case IPMI_BMC_IOCTL_CLEAR_SMS_ATN:
>>>>>>>>>>> + update_status_bits(kcs_bmc, KCS_STATUS_SMS_ATN,
>>>>>>>>>>> + 0);
>>>>>>>>>>> + break;
>>>>>>>>>>> +
>>>>>>>>>>> + case IPMI_BMC_IOCTL_FORCE_ABORT:
>>>>>>>>>>> + set_state(kcs_bmc, ERROR_STATE);
>>>>>>>>>>> + read_data(kcs_bmc);
>>>>>>>>>>> + write_data(kcs_bmc, KCS_ZERO_DATA);
>>>>>>>>>>> +
>>>>>>>>>>> + kcs_bmc->phase = KCS_PHASE_ERROR;
>>>>>>>>>>> + kcs_bmc->data_in_avail = false;
>>>>>>>>>>> + break;
>>>>>>>>>>> +
>>>>>>>>>>> + default:
>>>>>>>>>>> + ret = -EINVAL;
>>>>>>>>>>> + break;
>>>>>>>>>>> + }
>>>>>>>>>>> +
>>>>>>>>>>> + spin_unlock_irq(&kcs_bmc->lock);
>>>>>>>>>>> +
>>>>>>>>>>> + return ret;
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +static int kcs_bmc_release(struct inode *inode, struct file
>>>>>>>>>>> *filp)
>>>>>>>>>>> +{
>>>>>>>>>>> + struct kcs_bmc *kcs_bmc = file_kcs_bmc(filp);
>>>>>>>>>>> +
>>>>>>>>>>
>>>>>>>>>> What happens if the device gets closed in the middle of a
>>>>>>>>>> transaction? That's
>>>>>>>>>> an important case to handle. If something is in process, you
>>>>>>>>>> need to abort it.
>>>>>>>>>>
>>>>>>>>> The device just provides the read & write data, the
>>>>>>>>> transaction is handled in the KCS
>>>>>>>>> controller's IRQ handler.
>>>>>>>>
>>>>>>>> From the spec, section 9.14:
>>>>>>>>
>>>>>>>> The BMC must change the status to ERROR_STATE on any
>>>>>>>> condition where it
>>>>>>>> aborts a command transfer in progress.
>>>>>>>>
>>>>>>>> So you need to do something here.
>>>>>>>>
>>>>>>> In practice, we do this as spec said in ipmid, NOT in driver,
>>>>>>> driver can't handle anything, let's
>>>>>>> make it simple, thanks!
>>>>>>
>>>>>> If ipmid crashes or is killed, how does it accomplish this?
>>>>>>
>>>>> Every time ipmids (or kcsd) crashed or killed, it needs start to
>>>>> call FORCE_ARBORT firstly, to sync with
>>>>> host side software.
>>>>>>>
>>>>>>> Whenever the BMC is reset (from power-on or a hard reset), the
>>>>>>> State Bits are initialized to “11 - Error State”. Doing so
>>>>>>> allows SMS to detect that the BMC has been reset and that any
>>>>>>> message in process has been terminated by the BMC.
>>>>>>
>>>>>> Right, that's fine, like it should be. But we are not talking
>>>>>> about a reset.
>>>>>>
>>>>> I think the final error handling solution is that kcsd (user land)
>>>>> runs, otherwise, the host software side still got stuck. We meet
>>>>> this kind of issue, so in general, we just doesn't handle some
>>>>> mirror errors in driver, then in kcsd, when it can provide the real
>>>>> IPMI service, it will reset the channel firstly to sync with host
>>>>> side software.
>>>>
>>>> "Userland will do the right thing" is not very convincing to a
>>>> kernel developer.
>>>>
>>>> Plus if the above is true, I would think that you would just want
>>>> to hold the device
>>>> in an error state when it wasn't opened.
>>>>
>>> I understand your concern, of course, driver need handles things
>>> well. But in fact, if a user app is truly a bad boy, it still can hang
>>> the host side: set SMS_ATN, but no message returned when software
>>> host side requests, then host open-ipmi driver will hang, we
>>> meet this kind of error to hang the customer's host. :) In my
>>> understanding, kcs-bmc should do the right thing about read and write,
>>> the real transaction should be handled correctly by the kcsd.
>>>
>>> And if no kcsd starts, then this kind of BMC can't be sold out. :)
>>
>> True. I'm not as concerned about this sort of thing. It's nicer to
>> the host side if
>> it can detect problems quickly, but it will eventually time out.
>>
>> From what I can tell from the current design, if the BMC userland is
>> not running,
>> the driver will step through the state machine until it hits read
>> state, then it
>> will sit there until the host times out and aborts the operation.
>>
>> IMHO, it would be better for the host side if the driver just stayed
>> in error state
>> if nothing had it open. It would think the spec says that in the
>> quote I referenced
>> above, but that quote, like many things in the IPMI spec, is fairly
>> vague and could
>> be interpreted many ways.
>>
> Well, I will try to fix this errors as possible.
>> -corey
>>
>>
>>>> -corey
>>>>
>>>>>> -corey
>>>>>>
>>>>>>>>>>> + spin_lock_irq(&kcs_bmc->lock);
>>>>>>>>>>> +
>>>>>>>>>>> + kcs_bmc->running = 0;
>>>>>>>>>>> +
>>>>>>>>>>> + spin_unlock_irq(&kcs_bmc->lock);
>>>>>>>>>>> +
>>>>>>>>>>> + return 0;
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
Powered by blists - more mailing lists