Speaking CAN: Write Buffer Overflow

In the first post of the Speaking CAN series, we learned how the terminal sends a read-parameter request to the ECU and how the ECU sends a response with the value of the parameter back to the terminal. This works fine, as long as the terminal does not send too many requests too fast.

We set the size txqueuelen of the write or TX buffer to 10, which is the default for many Yocto-based Linux systems. If the terminal writes, say, 50 requests to the CAN bus without any pause, we’ll see several error messages No buffer space available in the log window of the terminal. The terminal caused an overflow of the write buffer in the CAN controller.

If we have a request-and-response scenario, the terminal can wait for the response, before it sends the next request. If the response does not arrive within a certain time, the terminal flags an error.

Avoiding a write buffer overflow becomes more difficult, if the terminal sends out messages without expecting a response or if the terminal expects a response only every, say, 200 messages (e.g., an acknowledgement how many messages the ECU has received). The solution is to configure the CAN controller of the terminal to receive its own messages.

Triggering a Write Buffer Overflow

In the previous post Speaking CAN: Request and Response, I described in detail what happens when we press the button TX 10. The terminal requests the values of the parameters with IDs 1 to 10. It sends the 10 read-parameter requests over the CAN bus to the ECU without any pause. For each request, the ECU reads the parameter value and sends back a CAN response with this value. The terminal sends 10 requests and receives 10 responses. Everything works fine and as expected.

Now, we become more daring and press the button TX 50. The function TerminalModel::simulateTxBufferOverflow writes 50 read-parameter requests to the CAN bus without any pause. The terminal emits eight error messages No buffer space available in its log window, as shown in the next figure. The number 2 at the end of the error message means that a write error QCanBusDevice::WriteError occurred.

We see error messages in the log window, because we connected the signal QCanBusDevice::errorOccurred with the slot EcuBase::onErrorOccurred in the EcuBase constructor.

    connect(m_canBus.get(), &QCanBusDevice::errorOccurred,
            this, &EcuBase::onErrorOccurred);

The slot emits a log message with the human-readable description of the last error and the error number.

void EcuBase::onErrorOccurred(QCanBusDevice::CanBusError error)
{
    emit logMessage(QString("ERROR: %1 (%2).")
        .arg(canBus()->errorString()).arg(error));
}

A closer analysis of the ECU’s log window reveals that the ECU never receives the requests for parameters 0x0c (highlighted with the red frame in the next figure), 0x15, 0x19, 0x1c, 0x21, 0x25, 0x2b and 0x32. The number and the IDs of the missing read-parameter requests may vary slightly from button press to button press.

The terminal (left) sends 50 requests, which makes its write buffer overflow. The ECU (right) does not receive all 50 requests. The requests for parameter 0x0c and 7 other parameters never arrive at the ECU.

The error message and the observed behaviour suggest that the write buffer in the terminal’s CAN controller overflows. This doesn’t come as a surprise, because we set the size txqueuelen of the write buffer to 10 in the systemd script can0.service.

Our first idea to fix the overflow problem might be to increase the write buffer size txqueuelen, say, to 64 or even 128. This is certainly a good idea, because it will eliminate the problem most of the time. However, the day will come when the error “No buffer space available” will reappear.

One scenario could be that the user of a harvester terminal wants to export all the parameters whose value differs from the default value. The user wants to import these parameters on another harvester to have exactly the same settings. Harvesters easily have 1500-2000 parameters. This number easily exceeds reasonable buffer sizes like 64 or 128.

Another scenario could be the firmware update of ECUs. The terminal sends the contents of a firmware image in 1-KB blocks to the ECU. It splits up each 1-KB block into 147 CAN messages with a 1-byte sequence number and seven data bytes. The ECU sends a response every 147 CAN messages to acknowledge the receipt of a 1-KB block. Again, buffer sizes of 64 or 128 won’t do.

Our second idea to fix the overflow problem might be to add a pause of 5 ms after every CAN frame written or to add a pause of 40 ms after every eight CAN frames written. We would have to choose the pauses longer than strictly necessary, because we had to cater for the worst-case load of the CAN bus and the worst-case number of frames written into the buffer. The utilisation of the CAN bus would be very poor. We wouldn’t have a guarantee that the pauses were long enough to avoid a write buffer overflow in any possible scenario.

A good solution should guarantee that a write buffer overflow never occurs and that the terminal adapts the number of outgoing frames dynamically to the CAN bus load.

Waiting for Receipt of Response

The terminal sends a read-parameter request and waits for the response. When the terminal receives the response from the ECU, it sends the next read-parameter request, waits for the response, sends the next request and so on. The terminal can send as many requests as it wants. The buffer of the CAN controller will never overflow.

TerminalModel::simulateTxBufferOverflow still calls EcuProxy::sendReadParameter 50 times or 500 times without a pause. However, EcuProxy::sendReadParameter doesn’t write the read-parameter requests directly on the CAN bus. It appends the requests to the end of a queue of outgoing CAN frames.

void EcuProxy::sendReadParameter(quint16 pid, quint32 value)
{
    enqueueOutgoingFrame(QCanBusFrame(0x18ef0201U,
                         encodedReadParameter(pid, value)));
}

The function EcuBase::enqueueOutgoingFrame writes the CAN frame directly to the CAN bus, if the outgoing queue is empty. This is the case, when the terminal sends the first read-parameter request and after the terminal receives the response of the last read-parameter request. In any case, the function stores the frame in the queue m_outgoingQueue. The function temporally decouples the call to sendReadParameter from the call to writeFrame.

void EcuBase::enqueueOutgoingFrame(const QCanBusFrame &frame)
{
    auto empty = m_outgoingQueue.isEmpty();
    m_outgoingQueue.append(frame);
    if (empty) {
        canBus()->writeFrame(frame);
    }
}

When the terminal receives the response to a read-parameter request in EcuBase::onFramesReceived, it handles the response as before by calling EcuProxy::receiveReadParameter. In addition, it calls the new function EcuBase::dequeuOutgoingFrame.

if (isReadParameter(frame)) {
    receiveReadParameter(frame);
    dequeueOutgoingFrame();
}

When everything goes well, the first frame in the outgoing queue is the request frame of the response frame just received. The function dequeueOutgoingFrame removes this request frame from the queue. The function writes the new first frame of the queue to the CAN bus. When the response to the next request is received, dequeueOutgoingFrame is called again, and so on.

void EcuBase::dequeueOutgoingFrame()
{
    if (!m_outgoingQueue.isEmpty()) {
        m_outgoingQueue.removeFirst();
    }
    if (!m_outgoingQueue.isEmpty()) {
        canBus()->writeFrame(m_outgoingQueue.first());
    }
}

When we press the buttons Tx 50 or Tx 500, the error messages No buffer space available (2) are gone. The TX buffer of the CAN controller doesn’t overflow any more. The good case works.

Let us now look at the error case, where the ECU doesn’t respond to the request. We change Ecu::receiveReadParameter such that it doesn’t respond to the read-parameter request for parameter 5.

void Ecu::receiveReadParameter(const QCanBusFrame &frame)
{
    quint16 pid = 0U;
    quint32 value = 0U;
    std::tie(pid, value) = decodedReadParameter(frame);
    if (pid == 5U) {
        return;
    }
    sendReadParameter(pid, QRandomGenerator::global()->generate());
}

When we press the button Tx 10 in the terminal, the ECU’s log messages stop after sending the value of parameter 4.

Ecu/Recv: Read(0x0001, 0x00000000)
Ecu/Send: Read(0x0001, 0xd26a77c4)
Ecu/Recv: Read(0x0002, 0x00000000)
Ecu/Send: Read(0x0002, 0xee3572aa)
Ecu/Recv: Read(0x0003, 0x00000000)
Ecu/Send: Read(0x0003, 0x25f00549)
Ecu/Recv: Read(0x0004, 0x00000000)
Ecu/Send: Read(0x0004, 0x07947167)

The terminal never calls dequeOutgoingFrame to send the next read-parameter request, because it doesn’t receive any read-parameter response. The trigger for the next round of request and response is missing.

We introduce a timer m_receiptTimer to check whether the last read-parameter request has been gone unanswered for too long. The code for setting up the timer is in the constructor of EcuBase.

    connect(&m_receiptTimer, &QTimer::timeout, [this]() {
        if (!m_outgoingQueue.isEmpty() &&
                isReceiptMissing(toMs(m_outgoingQueue.first()))) {
            emit logMessage(
                QString("ERROR: No response for request %1.")
                .arg(m_outgoingQueue.first().toString()));
            dequeueOutgoingFrame();
        }
    });
    m_receiptTimer.start(receiptTimeOut() / 2);

When the timer fires, the connected slot checks whether the time elapsed between writing the frame to the CAN bus – toMs(m_outgoingQueue.first()) – and the current time is greater than a threshold – receiptTimeOut() (100 ms by default). If the receipt for the first request in the queue has been missing for longer than the threshold, the slot emits an error message, removes the unanswered request from the queue and writes the next request to the CAN bus. The execution continues with the next round of request and response.

bool EcuBase::isReceiptMissing(qint64 stamp) const
{
    return QDateTime::currentMSecsSinceEpoch() - stamp > 
           receiptTimeOut();

A QCanBusFrame has a time stamp. The function timeStamp returns the time when the frame is received. We are interested, however, in the time when the terminal writes the frame to the CAN bus. We must set this time on our own in enqueueOutgoingFrame and dequeueOutgoingFrame. Otherwise, the time stamp of the frames in the queue would always be 0.

void EcuBase::enqueueOutgoingFrame(const QCanBusFrame &frame)
{
    auto empty = m_outgoingQueue.isEmpty();
    m_outgoingQueue.append(frame);
    if (empty) {
        m_outgoingQueue.first()
           .setTimeStamp(currentTimeStampSinceEpoch());
        canBus()->writeFrame(m_outgoingQueue.first());
    }
}

void EcuBase::dequeueOutgoingFrame()
{
    if (!m_outgoingQueue.isEmpty()) {
        m_outgoingQueue.removeFirst();
    }
    if (!m_outgoingQueue.isEmpty()) {
        m_outgoingQueue.first()
            .setTimeStamp(currentTimeStampSinceEpoch());
        canBus()->writeFrame(m_outgoingQueue.first());
    }
}

The class QCanBusFrame defines a simple type for time stamps: QCanBusFrame::TimeStamp. We introduce two helper functions in the file ecubase.cpp to convert between time stamps and milliseconds. The first helper function currentTimeStampSinceEpoch converts QDateTime::currentMSecsSinceEpoch() into a TimeStamp object. The second helper function toMs takes a CAN frame and converts its time stamp into milliseconds since the epoch.

Waiting for Receipt of Own Frame

Our approach to avoid write buffer overflows works fine, as long as every request has a response. The approach breaks down, if a CAN device regularly sends frames without ever expecting a response. The drive, engine or implement ECUs of a harvester easily send 50, 100 or more messages per second. Such ECUs must ensure that their write buffers don’t overflow.

The approach also breaks down, if a CAN device expects a response only after sending a certain number of frames. The classical example is when the terminal updates the firmware of an ECU over CAN bus. The terminal splits up the firmware image into 1-KB blocks. As CAN frames have a maximum payload of 8 bytes, the terminal must split up every 1-KB block into at least 128 frames. The ECU acknowledges the receipt of the full 1-KB block or raises an error if a frame is missing.

We solve this problem by configuring the CAN controller to receive its own frames. When the terminal writes a frame on the CAN bus, every CAN device will see this frame – including the terminal itself. The CAN controller normally filters out the frames sent by itself. We change this behaviour by setting the configuration parameter QCanBusDevice::ReceiveOwnKey to true in the constructors of TerminalModel and EcuModel.

    m_can0.reset(CanBus::setUp(QStringLiteral("socketcan"),
                 QStringLiteral("can0"), errorStr));
    ...
    m_can0->setConfigurationParameter(QCanBusDevice::ReceiveOwnKey, 
                                      true);

The CAN controller will then forward its own frames to the terminal. If the terminal sees one of its own frames in EcuBase::onFramesReceived, it calls dequeueOutgoingFrame to remove the frame last written from the queue and to write the next frame from the queue to the CAN bus. The for loop in onFramesReceived looks as follows.

    for (qint64 i = count; i > 0; --i) {
        auto frame = canBus()->readFrame();
        if (isOwnFrame(frame)) {
            dequeueOutgoingFrame();
        }
        else if (isReadParameter(frame)) {
            receiveReadParameter(frame);
        }
    }

We moved the call to dequeueOutgoingFrame from the isReadParameter case to the isOwnFrame case. The function isOwnFrame compares the frame just received with the first frame in the outgoing frame queue.

bool EcuBase::isOwnFrame(const QCanBusFrame &frame) const
{
    return !m_outgoingQueue.isEmpty() &&
           frame.frameId() == m_outgoingQueue.first().frameId() &&
           frame.payload() == m_outgoingQueue.first().payload();
}

Checking for its own frame is the most general solution to avoid a write buffer overflow. It doesn’t increase the bus load, because frames are written to the CAN bus any way. It adapts nicely to the bus load. If the bus load is high, the terminal will see its own frame later. Otherwise, it will see its own frame earlier.

The terminal saves the time the response needs for the trip from the ECU back to the terminal. Hence, the pause between writing two frames to the CAN bus becomes shorter.

Getting the Example Code and Trying It Out

The example code is available on GitHub. We get it with the following commands.

$ git clone https://github.com/bstubert/embeddeduse.git
$ cd embeddeduse
$ git checkout can-comm-2
$ cd BlogPosts/CanComm

CanComm is a normal CMake-based Qt project.

In addition to the pure CAN communication code described above, the version can-comm-2 provides a couple of check buttons in the terminal and ECU. We can easily try out different scenarios by checking and unchecking these buttons.

Let us start with the two check buttons of the terminal. When we check the button Direct Write and press one of the buttons Tx 50 or Tx 500 in the terminal, the terminal will show several error messages No buffer space available (2). The write buffer in the CAN controller of the terminal overflows. The log messages of the ECU show that several messages from the terminal are missing. The write buffer overflow motivated this post.

When the button Direct Write is unchecked, the terminal will write the next frame only after it has received its own previous frame. It does not write frames directly to the CAN bus but queues them temporarily. When the button Skip Write is unchecked (the default), the terminal will write all the 10, 50 or 500 frames to the CAN bus.

When the button Skip Write is checked, it will skip every 8th frame. Hence, the terminal will not receive any of these skipped frames. The receipt timer will kick in, the terminal will remove the missing frame from the outgoing queue, and the terminal will continue with the next frame from the queue. For every skipped frame, the terminal will display an error message starting with ERROR: Frame not written to CAN bus: <frame>.

Let us now move to the two check buttons of the ECU. When the button No Resp is checked (the default), the ECU will not respond to any request. This scenario motivated the check for own frames. When the button No Resp is unchecked, the ECU answers every request with a response.

When we check the Miss Resp button, the ECU drops every 8th response. We will see neither errors about missing responses nor any other errors. Nevertheless, the CAN communication continues correctly after the dropped frames, because the terminal waits for the receipt of its own frame and not for the receipt of the request.

Speaking CAN: Write Buffer Overflow

Triggering a Write Buffer Overflow

Waiting for Receipt of Response

Waiting for Receipt of Own Frame

Getting the Example Code and Trying It Out

Leave a Reply Cancel reply