Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved logging when plugins are cancelled for taking too long #323

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

yrro
Copy link
Contributor

@yrro yrro commented Mar 12, 2024

With these changes, it's a lot more obvious when reported errors are caused by a health check timeout.

Starting Execute IPA Healthcheck for Prometheus...
Internal server error 'Link'
--- Health check pki.server.healthcheck.clones.connectivity_and_data:ClonesConnectivyAndDataCheck cancelled after after 7 sec ---
  File "/usr/bin/ipa-healthcheck", line 11, in <module>
    load_entry_point('ipahealthcheck==0.12', 'console_scripts', 'ipa-healthcheck')()
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/main.py", line 52, in main
    sys.exit(ipachecks.run_healthcheck())
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/core.py", line 494, in run_healthcheck
    config.timeout))
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/core.py", line 238, in run_plugins
    for result in run_plugin(plugin, available, timeout):
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/core.py", line 74, in run_plugin
    for result in plugin.check():
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/plugin.py", line 18, in wrapper
    for result in f(*args, **kwds):
  File "/usr/lib/python3.6/site-packages/pki/server/healthcheck/clones/connectivity_and_data.py", line 161, in check
    self.check_kra_clones()
  File "/usr/lib/python3.6/site-packages/pki/server/healthcheck/clones/connectivity_and_data.py", line 56, in check_kra_clones
    '/kra/admin/kra/getStatus')
  File "/usr/lib/python3.6/site-packages/pki/server/healthcheck/clones/plugin.py", line 52, in get_status
    self.instance.export_ca_cert()
  File "/usr/lib/python3.6/site-packages/pki/server/__init__.py", line 277, in export_ca_cert
    nssdb.extract_ca_cert(self.ca_cert, nickname)
  File "/usr/lib/python3.6/site-packages/pki/nssdb.py", line 1865, in extract_ca_cert
    output_format="PEM")
  File "/usr/lib/python3.6/site-packages/pki/nssdb.py", line 1449, in export_cert_from_db
    subprocess.check_call(cmd)
  File "/usr/lib64/python3.6/subprocess.py", line 306, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/usr/lib64/python3.6/subprocess.py", line 289, in call
    return p.wait(timeout=timeout)
  File "/usr/lib64/python3.6/subprocess.py", line 1477, in wait
    (pid, sts) = self._try_wait(0)
  File "/usr/lib64/python3.6/subprocess.py", line 1424, in _try_wait
    (pid, sts) = os.waitpid(self.pid, wait_flags)
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/core.py", line 61, in signal_handler
    traceback.print_stack()
--- The following messages were logged by the plugin after it was cancelled. They may not indicate the reason why the plugin took too long to run. ---
Unable to reach KRA at https://ipa6.example.com:443: Health check pki.server.healthcheck.clones.connectivity_and_data:ClonesConnectivyAndDataCheck cancelled after after 7 sec
--- Increasing the timeout in /etc/ipahealthcheck/ipahealthcheck.conf may allow this healthcheck to complete before the deadline. ---
FileNotFoundException: /tmp/tmpk1c1jgm2/password.txt (No such file or directory)
--- Health check ipahealthcheck.ipa.certs:IPACertRevocation cancelled after after 7 sec ---
  File "/usr/bin/ipa-healthcheck", line 11, in <module>
    load_entry_point('ipahealthcheck==0.12', 'console_scripts', 'ipa-healthcheck')()
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/main.py", line 52, in main
    sys.exit(ipachecks.run_healthcheck())
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/core.py", line 494, in run_healthcheck
    config.timeout))
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/core.py", line 238, in run_plugins
    for result in run_plugin(plugin, available, timeout):
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/core.py", line 74, in run_plugin
    for result in plugin.check():
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/plugin.py", line 18, in wrapper
    for result in f(*args, **kwds):
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/ipa/certs.py", line 1303, in check
    all=True)
  File "/usr/lib/python3.6/site-packages/ipalib/frontend.py", line 471, in __call__
    return self.__do_call(*args, **options)
  File "/usr/lib/python3.6/site-packages/ipalib/frontend.py", line 499, in __do_call
    ret = self.run(*args, **options)
  File "/usr/lib/python3.6/site-packages/ipalib/frontend.py", line 816, in run
    return self.execute(*args, **options)
  File "/usr/lib/python3.6/site-packages/ipaserver/plugins/cert.py", line 1388, in execute
    chain=chain,
  File "/usr/lib/python3.6/site-packages/ipalib/frontend.py", line 471, in __call__
    return self.__do_call(*args, **options)
  File "/usr/lib/python3.6/site-packages/ipalib/frontend.py", line 499, in __do_call
    ret = self.run(*args, **options)
  File "/usr/lib/python3.6/site-packages/ipalib/frontend.py", line 816, in run
    return self.execute(*args, **options)
  File "/usr/lib/python3.6/site-packages/ipaserver/plugins/ca.py", line 252, in execute
    msg = set_certificate_attrs(result['result'], options)
  File "/usr/lib/python3.6/site-packages/ipaserver/plugins/ca.py", line 192, in set_certificate_attrs
    der = ca_api.read_ca_cert(ca_id)
  File "/usr/lib/python3.6/site-packages/ipaserver/plugins/dogtag.py", line 2113, in read_ca_cert
    headers={'Accept': 'application/pkix-cert'})
  File "/usr/lib/python3.6/site-packages/ipaserver/plugins/dogtag.py", line 1267, in _ssldo
    method=method, headers=headers, body=body
  File "/usr/lib/python3.6/site-packages/ipapython/dogtag.py", line 218, in https_request
    method=method, headers=headers)
  File "/usr/lib/python3.6/site-packages/ipapython/dogtag.py", line 272, in _httplib_request
    res = conn.getresponse()
  File "/usr/lib64/python3.6/http/client.py", line 1365, in getresponse
    response.begin()
  File "/usr/lib64/python3.6/http/client.py", line 320, in begin
    version, status, reason = self._read_status()
  File "/usr/lib64/python3.6/http/client.py", line 281, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib64/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib64/python3.6/ssl.py", line 1005, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib64/python3.6/ssl.py", line 867, in read
    return self._sslobj.read(len, buffer)
  File "/usr/lib64/python3.6/ssl.py", line 590, in read
    v = self._sslobj.read(len, buffer)
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/core.py", line 61, in signal_handler
    traceback.print_stack()
--- The following messages were logged by the plugin after it was cancelled. They may not indicate the reason why the plugin took too long to run. ---
--- Increasing the timeout in /etc/ipahealthcheck/ipahealthcheck.conf may allow this healthcheck to complete before the deadline. ---
GSSAPI client step 1
GSSAPI client step 1
GSSAPI client step 1
GSSAPI client step 2
--- Health check ipahealthcheck.meta.core:MetaCheck cancelled after after 7 sec ---
  File "/usr/bin/ipa-healthcheck", line 11, in <module>
    load_entry_point('ipahealthcheck==0.12', 'console_scripts', 'ipa-healthcheck')()
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/main.py", line 52, in main
    sys.exit(ipachecks.run_healthcheck())
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/core.py", line 494, in run_healthcheck
    config.timeout))
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/core.py", line 238, in run_plugins
    for result in run_plugin(plugin, available, timeout):
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/core.py", line 74, in run_plugin
    for result in plugin.check():
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/plugin.py", line 18, in wrapper
    for result in f(*args, **kwds):
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/meta/core.py", line 65, in check
    raiseonerr=False,)
  File "/usr/lib/python3.6/site-packages/ipapython/ipautil.py", line 538, in run
    stdout, stderr = p.communicate(stdin)
  File "/usr/lib64/python3.6/subprocess.py", line 863, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "/usr/lib64/python3.6/subprocess.py", line 1534, in _communicate
    ready = selector.select(timeout)
  File "/usr/lib64/python3.6/selectors.py", line 376, in select
    fd_event_list = self._poll.poll(timeout)
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/core.py", line 61, in signal_handler
    traceback.print_stack()
--- The following messages were logged by the plugin after it was cancelled. They may not indicate the reason why the plugin took too long to run. ---
--- Increasing the timeout in /etc/ipahealthcheck/ipahealthcheck.conf may allow this healthcheck to complete before the deadline. ---

One other significant change: when a plugin is interrupted by a timeout, a CRITICAL result is returned with the key healthcheck_timeout.

  • this gives monitoring systems something specific to look for, that indicates that a check did not complete, therefore something may be broken that has not been reported on
  • if the check itself catches the TimeoutException and does not throw anything, the healthcheck_timeout result is still returned - previously there would be no failures raised by the plugin
  • but if a plugin catches the TimeoutException and raises its own exception, there will now be two failures returned, because the plugin caller can't know if an exception thrown by a plugin was really caused by a TimeoutException (well, perhaps it could if it examined the exception chain, that part of Python is a bit new to me, and it relies on the plugin code chaining the exception it throws to the TimeoutException it caught).

Fixes: #322

@yrro yrro marked this pull request as draft March 12, 2024 17:11
@yrro yrro force-pushed the plugin-timeout-logging branch from 2fc63e6 to e82e074 Compare March 12, 2024 19:47
@yrro yrro marked this pull request as ready for review March 13, 2024 18:19
@rcritten
Copy link
Collaborator

Thanks for the patch. I'll try to review it soon.

@rcritten
Copy link
Collaborator

I don't think including a traceback is necessary when a timeout occurs. I think this would lead to confusion in users thinking that this is a code issue and not merely a timeout. If it is persistent then users can debug logging to identify where it is timing out.

I set timeout=1 and ran --source ipahealthcheck.ipa.certs and the tracebacks don't really tell what is going on other than reads are failing (which isn't the root cause of the issue).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve reporting when checks time out
2 participants