Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto keep alive not working #123

Closed
ctron opened this issue Mar 20, 2019 · 46 comments
Closed

Auto keep alive not working #123

ctron opened this issue Mar 20, 2019 · 46 comments
Assignees
Milestone

Comments

@ctron
Copy link
Contributor

ctron commented Mar 20, 2019

The auto keep alive seems not to be working. Checking the source I assume this is due to the following way it is implemented:

pipeline.addBefore("handler", "idle",new IdleStateHandler(0, this.options.getKeepAliveTimeSeconds(), 0));

This will register an idle detector which only listens for write events. However, opening a connection and only pushing QoS messages will not trigger a response from the server,
so the client does never receive anything from the server.

The client should send a PINGREQ when the reader gets idle. Which shouldn't happen
when you send QoS 1 or 2 messages and the server sends responses. However if it does,
then sending a PINGREQ would be the appropriate action.

If the server doesn't response then, it should be considered dead.

@vietj
Copy link
Contributor

vietj commented Mar 20, 2019

@ctron so this is mainly an mqtt-client concern right ?

@ctron
Copy link
Contributor Author

ctron commented Mar 20, 2019

@ctron so this is mainly an mqtt-client concern right ?

Yes, this is an issue in the MQTT client.

@vietj
Copy link
Contributor

vietj commented Mar 20, 2019

so basically we should set a timer on the client to periodically send PING commands to the server and that's it right ?

@ctron
Copy link
Contributor Author

ctron commented Mar 20, 2019

so basically we should set a timer on the client to periodically send PING commands to the server and that's it right ?

Well you don't need a timer, a simple "reader idle" condition should work, swapping that with the current "writer idle".

@vietj
Copy link
Contributor

vietj commented Mar 20, 2019

what do you mean ?

@ctron
Copy link
Contributor Author

ctron commented Mar 21, 2019

Swapping the current reader idle condition with writer idle …

@vietj
Copy link
Contributor

vietj commented Mar 21, 2019

ahhh... that makes sense lol :-) sorry

@vietj
Copy link
Contributor

vietj commented Mar 21, 2019

how can we best make a reproducer test for this ?

@vietj vietj added this to the 3.7.0 milestone Mar 21, 2019
@ctron
Copy link
Contributor Author

ctron commented Mar 21, 2019

how can we best make a reproducer test for this ?

I am not that familiar with testing in vertx. However the test should be:

Common start

  • Create a connection, with an auto-handshake period of X seconds
  • Start sending QoS 0 messages to the server, every 1 second
  • After X seconds, the client should automatically send a PINGREQ

Variant A:

  • The server does not respond
  • The connection must be closed after X mutliplied by 1.5 seconds (MQTT spec)

Variant B:

  • The server responds with a PINGRESP
  • After X more seconds, the connection must still be open

@vietj
Copy link
Contributor

vietj commented Mar 21, 2019 via email

@ctron
Copy link
Contributor Author

ctron commented Mar 21, 2019

that's what I actually asked for :-) thanks.

Cool 😄 … thanks for looking into this!

@vietj
Copy link
Contributor

vietj commented Mar 25, 2019

I'm working on it @ctron , I can't see where the 1.5 multiplier is specified in the spec, all I can read is If a Client does not receive a PINGRESP Packet within a reasonable amount of time after it has sent a PINGREQ, it SHOULD close the Network Connection to the Server.

@vietj vietj self-assigned this Mar 25, 2019
@ctron
Copy link
Contributor Author

ctron commented Mar 25, 2019

MQTT Spec:

If the Keep Alive value is non-zero and the Server does not receive a Control Packet from the Client within one and a half times the Keep Alive time period, it MUST disconnect the Network Connection to the Client as if the network had failed [MQTT-3.1.2-24].

@vietj
Copy link
Contributor

vietj commented Mar 25, 2019

@ctron it seems this statement applies to the server and not the client

@vietj
Copy link
Contributor

vietj commented Mar 25, 2019

@ctron please see #124 which provides a better handling of client ping expiration

@konradmichael
Copy link
Contributor

You are aware, that I made a simmilar issue? Including a PR?
Issue: #117
PR: #118

I have no Idea why you closed it.

@vietj
Copy link
Contributor

vietj commented Mar 25, 2019

@konradmichael yes I remember now, I tried also to use the READER idle state but it did not work in all cases. Basically writing to the channel a message defeats the idle state and the client does not get properly timeout, so I had to use Vertx timers instead.

@vietj
Copy link
Contributor

vietj commented Mar 25, 2019

also the 1.5 factor does not apply to the client but to the server instead. That's why there is a keepAliveTimeout value to specify how long on the client side a disconnect should happen when a ping response is not received from the server

@vietj
Copy link
Contributor

vietj commented Mar 25, 2019

you can help to review the PR @konradmichael

@ctron
Copy link
Contributor Author

ctron commented Mar 25, 2019

@ctron it seems this statement applies to the server and not the client

Maybe you should have a look at the whole section, that might explain it:

If a Client does not receive a PINGRESP Packet within a reasonable amount of time after it has sent a PINGREQ, it SHOULD close the Network Connection to the Server.

@vietj
Copy link
Contributor

vietj commented Mar 25, 2019 via email

@ctron
Copy link
Contributor Author

ctron commented Mar 25, 2019

So what would be the definition of "reasonable"?

@vietj
Copy link
Contributor

vietj commented Mar 25, 2019

there is a new property in the MqttClientOptions called keepAliveTimeout of 10 seconds by default in the PR

@ctron
Copy link
Contributor Author

ctron commented Mar 26, 2019

there is a new property in the MqttClientOptions called keepAliveTimeout of 10 seconds by default in the PR

With the meaning of "10 seconds after the PINGREQ was sent, a response is overdue"? … I guess that would work.

@ctron
Copy link
Contributor Author

ctron commented Mar 26, 2019

I just had a look at the PR, and I think this really overcomplicates things.

When you send out a PINGREQ, you expected a PINGRESP to come back. In most cases you don't really care about getting back and PINGRESP or "something else" from the server. The PINGREQ is just a trigger to make the server send you something. This way you can reset the "idle reader" condition.

My proposal would be to:

  • set an IDLE reader timeout to "keepalive"
  • when the IDLE reader condition triggers, send a PINGREQ and set the IDLE reader timeout to "keepalive timeout"

Option A:

  • on the next packet received reset the IDLE reader timeout back to "keepalive"

Option B:

  • when the IDLE reader is triggered again, close the connection

@ppatierno
Copy link
Member

Finally I had the time to take a quick look at the PR.
First of all, I think that we are creating an ambiguity for who knows the MQTT world and now he sees keepAliveTimeSeconds and keepAliveTimeout. The second thing, is that the code seems to be over complicated. I would love having the result from @konradmichael if the idle just works to simplify.
As far as I understand from the code, the pingCheck related timer just starts on connect even if there is no PINGREQ sent. At some point it starts to send PINGREQ. Of course, it's possible that the MQTT client is so active that there is never need for sending a PINGREQ because the connection has always traffic so the timer is sending not usefull PINGREQ, or am I missing something?

@vietj
Copy link
Contributor

vietj commented Mar 26, 2019

As I said, I believe it is not possible to implement this with @konradmichael solution for the reason I explained. Of course I might be wrong and I would be glad, so I believe that @konradmichael can provide a new PR. His original PR did not have any tests, mine has and I believe they are correct. @konradmichael is welcome to reuse the tests I wrote.

In addition @konradmichael was wrong on the 1.5 factor applied to the client side, whereas the spec clearly say that this is a server concern and he should have introduced a configurable timeout on the client.

The timer only starts when the autoKeepAlive setting is on. In @konradmichael solution, there is also a timer in the IdleStateHandler, it's just that we don't see it. In addition it requires 2 Netty handlers which add overhead at runtime (handlers modify the pipeline and they don't come for free both in CPU and object creation).

I believe that my current PR could also be improved to clear the timeout when a message is written to the channel.

I named it keepAliveTimeout because MqttClientOptions extends NetClientOptions which has idleTimeout so I believe actually that

  • it is correct to name it keepAliveTimeout
  • keepAliveTimeSeconds is actually an incorrect name and instead if should be keepAlivePeriod as we are specifying a period, so keepAliveTimeSeconds should be deprecated in favor of the more correct keepAlivePeriod instead

So for now I believe we are going to leave this out for 3.7.0 (that shall be released this week), until we figure out the issues mentioned above.

@vietj vietj modified the milestones: 3.7.0, 3.7.1 Mar 26, 2019
@ppatierno
Copy link
Member

Did you test what I mention about having PINGREQ sent even when the channel is busy writing?
It seems to me that even when the MQTT client is sending data (i.e. PUBLISH messages, or PUBACK as ack for received packets, ...) the timer always sends PINGREQ which is not necessary.
I would agree on the KeepAliveTimeSeconds name but I would use keepAliveInterval as mentioned in the spec and heavily used in the Eclipse Paho library which is the reference for MQTT implementations. Btw even in this case I would leave it for a different PR.

@vietj
Copy link
Contributor

vietj commented Mar 26, 2019 via email

@ppatierno
Copy link
Member

I have just created issue #126

@konradmichael
Copy link
Contributor

New Pull request: #129

@vietj vietj modified the milestones: 3.7.1, 3.8.0 May 23, 2019
@vietj vietj modified the milestones: 3.8.0, 3.8.1 Jul 19, 2019
@vietj vietj modified the milestones: 3.8.1, 3.8.2 Aug 23, 2019
@vietj vietj modified the milestones: 3.8.2, 3.8.3 Oct 4, 2019
@vietj vietj modified the milestones: 3.8.3, 3.8.4 Oct 22, 2019
@vietj vietj modified the milestones: 3.9.3, 3.9.4 Sep 17, 2020
This was referenced Oct 7, 2020
@vietj vietj closed this as completed Oct 7, 2020
@AlmostRuio
Copy link

@vietj @konradmichael Hello, any chance that this fix will be applied to 4.0.0? In our project we are actually using 4.0.0.CR1 and we have this problem. Thank you and have a nice day!

@vietj
Copy link
Contributor

vietj commented Dec 1, 2020

it is supposedly applied #173

do you have a reproducer ?

@AlmostRuio
Copy link

AlmostRuio commented Dec 1, 2020

@vietj thank you for the fast answer :)
i hope i didn't forgot something in my code, if is the case i'm really sorry.
The server is moquitto, running in a local docker image.
vertx 4.0.0.RC2


    @Override
    public void start() {
        String brokerHost = getenv("BROKER_HOST");
        String portFromEnv = getenv("BROKER_PORT");
        int brokerPort = portFromEnv != null ? parseInt(portFromEnv) : BROKER_PORT;

        MqttClientOptions options = new MqttClientOptions();
        options.setAutoKeepAlive(true);
        client = MqttClient.create(vertx, options);

        Handler<AsyncResult<MqttConnAckMessage>> connectionHandler = ar -> {
            if (ar.succeeded()) {
                LOGGER.info("connected to {}:{}", brokerHost, brokerPort);
            } else {
                LOGGER.error("failed to connect to MQTT broker", ar.cause());
            }
        };
        client.connect(brokerPort, brokerHost, connectionHandler);

        client.closeHandler(handler -> {
            LOGGER.warn("connection to broker closed, reconnect");

            // the handling of the ping-req for connection "keepAlive" does not work correctly, if there
            // is no publishing (only consuming messages disables the ping's in the implementation of
            // io.vertx.mqtt.impl.MqttClientImpl#initChannel since version 3.9.4 due to the changes described
            // in this issue: https://github.com/vert-x3/vertx-mqtt/issues/123)
            // workaround for now is to reconnect if the MQTT server closes the connection
            client.connect(brokerPort, brokerHost, connectionHandler);
        });

        scheduleNextUpdate();

        LOGGER.info("{} ready!", SimulatorExternalData.class.getName());
    }

@vietj
Copy link
Contributor

vietj commented Dec 1, 2020

can you open a new issue with the reproducer ? if you can provide a project it is better.

@AlmostRuio
Copy link

@vietj done
#183

@vietj
Copy link
Contributor

vietj commented Dec 2, 2020

thanks @AlmostRuio

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants