Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[netatmo] Refresh misbehavior #16485

Open
jlaur opened this issue Mar 4, 2024 · 5 comments · May be fixed by #18029
Open

[netatmo] Refresh misbehavior #16485

jlaur opened this issue Mar 4, 2024 · 5 comments · May be fixed by #18029
Assignees
Labels
bug An unexpected problem or unintended behavior of an add-on

Comments

@jlaur
Copy link
Contributor

jlaur commented Mar 4, 2024

I have had many alerts about old data from my Netatmo weather stations today and am now in the progress of digging into the logs. Luckily I already had trace level logging enabled since I've experienced stability issues since 4.1.

The full logs are pretty extensive, and can be requested by private mail.

Here's a preliminary analysis...

Everything was normal until around 8:46. I have two weather stations, and here are the logs from that point filtered by lines containing "executeUri GET":

2024-03-04 08:44:22.467 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false
2024-03-04 08:45:43.138 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station46&get_favorites=false
2024-03-04 08:54:27.832 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false
2024-03-04 09:00:43.454 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station46&get_favorites=false
2024-03-04 09:00:47.986 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station46&get_favorites=false
2024-03-04 09:00:48.084 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station46&get_favorites=false
2024-03-04 09:15:48.179 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station46&get_favorites=false
2024-03-04 09:15:50.516 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station46&get_favorites=false
2024-03-04 09:15:50.568 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false
2024-03-04 09:26:18.453 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station46&get_favorites=false
2024-03-04 09:26:28.114 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false
2024-03-04 09:26:28.231 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station46&get_favorites=false
2024-03-04 09:28:28.250 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false
2024-03-04 09:28:28.559 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station46&get_favorites=false
2024-03-04 09:28:32.814 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false
2024-03-04 09:28:32.898 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station46&get_favorites=false

at the end of this snippet, the number of calls are doubled, and the frequency is down to two minutes.

Around 9:00 when three calls are made instead of two, what can be noticed:

2024-03-04 09:00:48.167 [WARN ] [handler.capability.WeatherCapability] - Error retrieving weather data 'station46' : Request interrupted

It seems this can only be caused by:

} catch (InterruptedException e) {
Thread.currentThread().interrupt();
updateStatus(ThingStatus.OFFLINE, ThingStatusDetail.COMMUNICATION_ERROR, e.getMessage());
throw new NetatmoException("Request interrupted");

and handled here:

try {
return List.of(owned ? api.getOwnedStationData(handler.getId()) : api.getStationData(handler.getId()));
} catch (NetatmoException e) {
logger.warn("Error retrieving weather data '{}' : {}", handler.getId(), e.getMessage());
}

This repeated around 9:15 two times and 9:28 one time.

After that things gradually went worse and worse. more and more additional calls are made:

2024-03-04 09:30:28.611 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false
2024-03-04 09:30:28.880 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station46&get_favorites=false
2024-03-04 09:30:31.773 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false
2024-03-04 09:30:31.794 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station46&get_favorites=false
2024-03-04 09:30:52.851 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station46&get_favorites=false
2024-03-04 09:30:53.050 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false
2024-03-04 09:30:57.071 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false
2024-03-04 09:30:57.084 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station46&get_favorites=false
2024-03-04 09:32:28.923 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false
2024-03-04 09:32:29.184 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station46&get_favorites=false
2024-03-04 09:32:31.469 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false
2024-03-04 09:32:31.532 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station46&get_favorites=false
2024-03-04 09:32:53.076 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station46&get_favorites=false
2024-03-04 09:32:53.268 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false
2024-03-04 09:32:56.942 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false
2024-03-04 09:32:56.987 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station46&get_favorites=false
2024-03-04 09:34:29.205 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false
2024-03-04 09:34:31.491 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station46&get_favorites=false
2024-03-04 09:34:31.618 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station46&get_favorites=false
2024-03-04 09:34:31.744 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false
2024-03-04 09:34:53.286 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station46&get_favorites=false
2024-03-04 09:34:53.503 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false
2024-03-04 09:34:56.953 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false
2024-03-04 09:34:56.962 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station46&get_favorites=false
[cut long history]
2024-03-04 10:33:33.948 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station46&get_favorites=false
2024-03-04 10:33:34.157 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false
2024-03-04 10:33:41.141 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station46&get_favorites=false
2024-03-04 10:33:41.275 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false
2024-03-04 10:33:41.382 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false
2024-03-04 10:33:49.842 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false
2024-03-04 10:33:49.948 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false
2024-03-04 10:33:52.801 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station46&get_favorites=false
2024-03-04 10:34:20.184 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station46&get_favorites=false
2024-03-04 10:34:20.291 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false
2024-03-04 10:34:20.504 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false
2024-03-04 10:34:20.736 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station46&get_favorites=false
2024-03-04 10:34:20.837 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station46&get_favorites=false
2024-03-04 10:34:32.049 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false
2024-03-04 10:34:32.419 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false

And as a result:

2024-03-04 10:34:32.476 [TRACE] [mo.internal.handler.ApiBridgeHandler] -  -returned: code [403 Forbidden] body {"error":{"code":26,"message":"User usage reached"}}

This error was not taken into account, and the aggressive polling went on until late evening when I started looking into it.

At this point I disabled all Things by removing my .things file:

2024-03-04 22:21:15.122 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:outdoor:home:indoor:outdoor' changed from OFFLINE (BRIDGE_OFFLINE) to UNINITIALIZED
2024-03-04 22:21:15.142 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:outdoor:home:indoor:outdoor' changed from UNINITIALIZED to UNINITIALIZED (HANDLER_MISSING_ERROR)
2024-03-04 22:21:15.145 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor:indoor3' changed from OFFLINE (BRIDGE_OFFLINE) to UNINITIALIZED
2024-03-04 22:21:15.157 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor:indoor3' changed from UNINITIALIZED to UNINITIALIZED (HANDLER_MISSING_ERROR)
2024-03-04 22:21:15.160 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor:indoor2' changed from OFFLINE (BRIDGE_OFFLINE) to UNINITIALIZED
2024-03-04 22:21:15.172 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor:indoor2' changed from UNINITIALIZED to UNINITIALIZED (HANDLER_MISSING_ERROR)
2024-03-04 22:21:15.175 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:weather-station:home:indoor' changed from OFFLINE (BRIDGE_OFFLINE) to UNINITIALIZED
2024-03-04 22:21:15.194 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:weather-station:home:indoor' changed from UNINITIALIZED to UNINITIALIZED (HANDLER_MISSING_ERROR)
2024-03-04 22:21:15.196 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:outdoor:home:indoor_small_bathroom:outdoor_shed' changed from OFFLINE (BRIDGE_OFFLINE) to UNINITIALIZED
2024-03-04 22:21:15.208 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:outdoor:home:indoor_small_bathroom:outdoor_shed' changed from UNINITIALIZED to UNINITIALIZED (HANDLER_MISSING_ERROR)
2024-03-04 22:21:15.211 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor_small_bathroom:indoor_alva' changed from OFFLINE (BRIDGE_OFFLINE) to UNINITIALIZED
2024-03-04 22:21:15.226 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor_small_bathroom:indoor_alva' changed from UNINITIALIZED to UNINITIALIZED (HANDLER_MISSING_ERROR)
2024-03-04 22:21:15.230 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor_small_bathroom:indoor_office' changed from OFFLINE (BRIDGE_OFFLINE) to UNINITIALIZED
2024-03-04 22:21:15.251 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor_small_bathroom:indoor_office' changed from UNINITIALIZED to UNINITIALIZED (HANDLER_MISSING_ERROR)
2024-03-04 22:21:15.255 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor_small_bathroom:indoor_linus' changed from OFFLINE (BRIDGE_OFFLINE) to UNINITIALIZED
2024-03-04 22:21:15.277 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor_small_bathroom:indoor_linus' changed from UNINITIALIZED to UNINITIALIZED (HANDLER_MISSING_ERROR)
2024-03-04 22:21:15.285 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:weather-station:home:indoor_small_bathroom' changed from OFFLINE (BRIDGE_OFFLINE) to UNINITIALIZED
2024-03-04 22:21:15.307 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:weather-station:home:indoor_small_bathroom' changed from UNINITIALIZED to UNINITIALIZED (HANDLER_MISSING_ERROR)
2024-03-04 22:21:15.310 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:account:home' changed from OFFLINE (COMMUNICATION_ERROR): Rest call failed: statusCode=MAXIMUM_USAGE_REACHED, message=User usage reached to UNINITIALIZED
2024-03-04 22:21:15.357 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:account:home' changed from UNINITIALIZED to UNINITIALIZED (HANDLER_MISSING_ERROR)

However, this did not stop the polling:

2024-03-04 22:21:25.476 [DEBUG] [er.capability.CacheWeatherCapability] - Requesting fresh data
2024-03-04 22:21:25.478 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false 
2024-03-04 22:21:25.482 [WARN ] [.core.thing.binding.BaseThingHandler] - Handler ApiBridgeHandler of thing netatmo:account:home tried checking if channel monitoring#request-count is linked although the handler was already disposed.
2024-03-04 22:21:25.484 [TRACE] [mo.internal.handler.ApiBridgeHandler] -  -with headers: Accept-Encoding: gzip, User-Agent: Jetty/9.4.52.v20230823, Authorization: Bearer xxx
2024-03-04 22:21:25.531 [DEBUG] [er.capability.CacheWeatherCapability] - Requesting fresh data
2024-03-04 22:21:25.534 [TRACE] [mo.internal.handler.ApiBridgeHandler] -  -returned: code [403 Forbidden] body {"error":{"code":26,"message":"User usage reached"}}
2024-03-04 22:21:25.536 [WARN ] [.core.thing.binding.BaseThingHandler] - Handler ApiBridgeHandler tried updating the thing status although the handler was already disposed.
2024-03-04 22:21:25.537 [WARN ] [handler.capability.WeatherCapability] - Error retrieving weather data 'station32' : Rest call failed: statusCode=MAXIMUM_USAGE_REACHED, message=User usage reached
2024-03-04 22:21:25.537 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=70%station32&get_favorites=false 
2024-03-04 22:21:25.538 [DEBUG] [handler.capability.RefreshCapability] - Module is not ONLINE; special refresh interval is used
2024-03-04 22:21:25.539 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 900s
2024-03-04 22:21:25.540 [WARN ] [.core.thing.binding.BaseThingHandler] - Handler ApiBridgeHandler of thing netatmo:account:home tried checking if channel monitoring#request-count is linked although the handler was already disposed.
2024-03-04 22:21:25.541 [TRACE] [mo.internal.handler.ApiBridgeHandler] -  -with headers: Accept-Encoding: gzip, User-Agent: Jetty/9.4.52.v20230823, Authorization: Bearer xxx
2024-03-04 22:21:25.569 [DEBUG] [handler.capability.RefreshCapability] - Module is not ONLINE; special refresh interval is used
2024-03-04 22:21:25.570 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 900s
2024-03-04 22:21:25.588 [TRACE] [mo.internal.handler.ApiBridgeHandler] -  -returned: code [403 Forbidden] body {"error":{"code":26,"message":"User usage reached"}}
2024-03-04 22:21:25.590 [WARN ] [.core.thing.binding.BaseThingHandler] - Handler ApiBridgeHandler tried updating the thing status although the handler was already disposed.
2024-03-04 22:21:25.591 [WARN ] [handler.capability.WeatherCapability] - Error retrieving weather data 'station32' : Rest call failed: statusCode=MAXIMUM_USAGE_REACHED, message=User usage reached
2024-03-04 22:21:25.592 [DEBUG] [handler.capability.RefreshCapability] - Module is not ONLINE; special refresh interval is used
2024-03-04 22:21:25.593 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 900s
2024-03-04 22:21:25.663 [DEBUG] [handler.capability.RefreshCapability] - Module is not ONLINE; special refresh interval is used
2024-03-04 22:21:25.664 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 900s
2024-03-04 22:21:25.764 [DEBUG] [handler.capability.RefreshCapability] - Module is not ONLINE; special refresh interval is used
2024-03-04 22:21:25.766 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 900s

So it seems there are three distinct issues identified here:

  1. The binding started to generate too many calls.
  2. When HTTP code 403/Netatmo error code 26 ("User usage reached") is returned, this is not respected/taken into consideration.
  3. When the Things are removed, the handlers are not correctly disposed. This is probably the most severe, because it requires a full system restart to recover from the other issues.

Expected Behavior

Handlers should be disposed when Things are disabled or removed.

Netatmo error code 26 should be handled appropriately, for examle by lowering the polling frequency or waiting for some time before resuming any polling. Perhaps there are some guidelines to be found in the API documentation.

The binding should not increase the number of calls after experiencing issues such as InterruptedException.

Current Behavior

See description.

Possible Solution

Steps to Reproduce

Since it's a cloud service, it would be hard to reproduce natually, but can perhaps be reproduced artifically by some code modifications mimicking the observed events in my logs.

Your Environment

  • Version used: 4.1.1
@jlaur jlaur added the bug An unexpected problem or unintended behavior of an add-on label Mar 4, 2024
@clinique clinique self-assigned this Mar 25, 2024
@clinique
Copy link
Contributor

Situation now seems stabilized https://community.openhab.org/t/netatmo-bridge-offline-99-of-time/149168/65?u=glhopital. @jlaur : can we close the issue ?

@jlaur
Copy link
Contributor Author

jlaur commented Dec 13, 2024

Situation now seems stabilized

I agree, it's much more stable now. I had a scenario more than a month ago where one of my stations was offline and didn't come online again until I disabled/enabled the Thing. Unfortunately I didn't secure the logs and performed investigations because I was short on time. In any case, it's rare and it's unrelated to the issues that were already fixed by your PR's. I'll create a new issue if/when it happens again, and I have collected the needed evidence. 🙂

can we close the issue ?

The only thing mentioned in the issue remaining is:

When HTTP code 403/Netatmo error code 26 ("User usage reached") is returned, this is not respected/taken into consideration.

You have already created #16489, which went off my radar a few times, sorry. The issue is that it's hard to test in real life, so I have to run some simulations in different scenarios, which is time-consuming and has to be done on my production system because of the way Netatmo handles OAuth2 sessions. Since it was also waiting for some fixes for a long time, I forgot how exactly I performed those tests previously. 😆 I should be able to get back to this during Christmas holiday, so we can finally merge that PR and close this issue.

@jlaur
Copy link
Contributor Author

jlaur commented Dec 29, 2024

@clinique - unfortunately, as mentioned here and repeated here and here, #16489 is still not working as intended (I guess).

When hitting 429, all Things will go offline:

2024-12-29 01:38:23.420 [TRACE] [mo.internal.handler.ApiBridgeHandler] -  -returned: code [429 Too Many Requests] body {"body":{"devices":[{"_id":"xx:00","date_setup":1612625135,"last_setup":1612625135,"type":"NAMain","last_status_store":1735432698,"m>
2024-12-29 01:38:23.425 [DEBUG] [tatmo.internal.handler.DeviceHandler] - bridgeStatusChanged for bridge netatmo:weather-station:home:indoor_small_bathroom to OFFLINE (COMMUNICATION_ERROR): @text/maximum-usage-reached [ "3600" ]
2024-12-29 01:38:23.425 [DEBUG] [tatmo.internal.handler.DeviceHandler] - bridgeStatusChanged for bridge netatmo:weather-station:home:indoor to OFFLINE (COMMUNICATION_ERROR): @text/maximum-usage-reached [ "3600" ]
2024-12-29 01:38:23.427 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - Reconnection scheduled in 3600 seconds
2024-12-29 01:38:23.427 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:indoor:home:indoor_small_bathroom:indoor_alva to OFFLINE (BRIDGE_OFFLINE)
2024-12-29 01:38:23.427 [WARN ] [handler.capability.WeatherCapability] - Error retrieving weather data 'xx:00':
2024-12-29 01:38:23.427 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:indoor:home:indoor_small_bathroom:indoor_linus to OFFLINE (BRIDGE_OFFLINE)
2024-12-29 01:38:23.427 [DEBUG] [l.handler.capability.CacheCapability] - MeasureCapability requesting fresh data for netatmo:weather-station:home:indoor_small_bathroom
2024-12-29 01:38:23.427 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:outdoor:home:indoor:outdoor to OFFLINE (BRIDGE_OFFLINE)
2024-12-29 01:38:23.428 [DEBUG] [l.handler.capability.CacheCapability] - MeasureCapability requesting fresh data for netatmo:indoor:home:indoor_small_bathroom:indoor_office
2024-12-29 01:38:23.427 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:indoor:home:indoor_small_bathroom:indoor_office to OFFLINE (BRIDGE_OFFLINE)
2024-12-29 01:38:23.428 [DEBUG] [l.handler.capability.CacheCapability] - MeasureCapability requesting fresh data for netatmo:outdoor:home:indoor_small_bathroom:outdoor_shed
2024-12-29 01:38:23.428 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:indoor:home:indoor:indoor3 to OFFLINE (BRIDGE_OFFLINE)
2024-12-29 01:38:23.427 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:indoor:home:indoor:indoor2 to OFFLINE (BRIDGE_OFFLINE)
2024-12-29 01:38:23.427 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:outdoor:home:indoor_small_bathroom:outdoor_shed to OFFLINE (BRIDGE_OFFLINE)
2024-12-29 01:38:23.431 [DEBUG] [handler.capability.RefreshCapability] - Thing 'netatmo:weather-station:home:indoor_small_bathroom' is not ONLINE, using special refresh interval
2024-12-29 01:38:23.433 [DEBUG] [handler.capability.RefreshCapability] - 'netatmo:weather-station:home:indoor_small_bathroom' next refresh in PT15M

Notice "Reconnection scheduled in 3600 seconds" and the last log line: "next refresh in PT15M". And surely enough:

2024-12-29 01:53:23.434 [DEBUG] [l.handler.capability.CacheCapability] - WeatherCapability requesting fresh data for netatmo:weather-station:home:indoor_small_bathroom
2024-12-29 01:53:23.437 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=xx%3A00&get_favorites=false

and everything came back online.

@jlaur
Copy link
Contributor Author

jlaur commented Jan 2, 2025

@clinique - I believe all calls are going through the bridge handler. So if we store Instant.now().plusSeconds(3600) after this line:

prepareReconnection(API_LIMIT_INTERVAL_S,
"@text/maximum-usage-reached [ \"%d\" ]".formatted(API_LIMIT_INTERVAL_S), null, null);

we could check here:

and refrain from making any calls within the next hour. WDYT?

EDIT: But probably we also need this to affect the calculations here, so we don't just skip, but actually adjust the rescheduled jobs according to the "grace" period for reconnecting:

private void proceedWithUpdate() {
Duration delay;
handler.proceedWithUpdate();
if (!ThingStatus.ONLINE.equals(handler.getThing().getStatus())) {
delay = OFFLINE_DELAY;
logger.debug("Thing '{}' is not ONLINE, using special refresh interval", thingUID);
} else {
delay = calcDelay();
}
rescheduleJob(delay);
}
private void rescheduleJob(Duration delay) {
if (refreshJob.isPresent()) {
ScheduledFuture<?> job = refreshJob.get();
Instant now = Instant.now();
Instant expectedExecution = now.plus(delay);
Instant scheduledExecution = now.plusMillis(job.getDelay(TimeUnit.MILLISECONDS));
if (Math.abs(ChronoUnit.SECONDS.between(expectedExecution, scheduledExecution)) <= 3) {
logger.debug("'{}' refresh as already pending roughly as the same time, will not reschedule", thingUID);
return;
} else {
stopJob();
}
}
logger.debug("'{}' next refresh in {}", thingUID, delay);
refreshJob = handler.schedule(this::proceedWithUpdate, delay);
}

@clinique clinique linked a pull request Jan 3, 2025 that will close this issue
@clinique
Copy link
Contributor

clinique commented Jan 3, 2025

@jlaur : let me know if my proposal suits your thoughts.
Edit : did not had time to test this. Wanted to ack to your proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An unexpected problem or unintended behavior of an add-on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants