Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Support nomad #750

Open
wants to merge 14 commits into
base: develop
Choose a base branch
from
Open

Conversation

asauray
Copy link

@asauray asauray commented Oct 16, 2019

Nomad API

We can use the Nomad Api to place Docker containers on nodes. I chose to use a Python library to make things simpler https://github.com/jrxFive/python-nomad

How to address containers

To address those containers through ip & port, we have two options

  1. Either we use a load balancer, but Nomad environments are very flexible and there are many options on the market like Fabio, Nginx, HAProxy ...
  2. We use the DNS server to gather ip and ports of containers through SRV DNS requests. We have to do DNS caching though but it is quite standardized.

I chose to go with the option (2) because it is more standard across environments. The (1) can be supported but will require specific code.

Workflow when connecting

When the Clipper Admin connects, It will try to determine the ip and ports of each of the service in order to know if it needs to submit a new job, or use what already exists.

Example: For Redis, a redis.service.consul DNS request is sent, if it returns at least one ip:port, it is used, otherwise a Redis instance job is submitted. It will then keep sending SRV requests until the service is up, otherwise the process stops.

Selecting containers

Nomad does not have the notion of selectors. I propose to use conventions to solve this problem. Job are prefixed with clipper-{cluster-name}. This allows us to select them based on their name (when we want to stop a container for instance).

That is how it looks like in Consul UI

Managing the connection between Models and Query Frontend

This one is tricky. The problem if that both the ip and port of Query Frontend are going to change overtime. Meaning that we have to submit a new job every time.

The only way I could solve this was to use a load balancer (namely Fabio, one of the previously mentionned) and to do TCP forwarding. This leaves the responsability of Fabio to route to the correct ip and port. But this implementation is specific.

That means we are booting the Model containers with CLIPPER_IP='fabio.service.consul' and CLIPPER_PORT='7000'. This part needs to be improved though.

If you have any questions don't hesitate, I know this is quite a big description

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@asauray asauray changed the title Support nomad [WIP] Support nomad Oct 16, 2019
@rkooo567
Copy link
Collaborator

rkooo567 commented Oct 16, 2019

@antoinesauray Thanks for making this PR. This will be a really helpful feature. Would you be able to

  1. Tag issues you created? (I remember you wrote about this in an issue page)
  2. Write a little bit more description in design / implementation?
  3. Write a way to QA this PR (because it is a big change, we want to QA on the top of tests)?
  4. Could you add some basic integration tests at least?

Thank you!

Copy link
Collaborator

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is possible to expose resource allocation & port number to containerManager? Seems like you hardcodes those values in job description

qf_http_thread_pool_size,
qf_http_timeout_request,
qf_http_timeout_content,
num_frontend_replicas=1):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on your description about DNS resolution, it seems like Consul should also run in the boot up process? Is it going to be built in to nomad cluster?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nomad leaves the DNS responsability to another service, Consul is one of them but it should be flexible enough to support alternatives. An abstract class / interface is the better choice here I think.

Also, It should be configured on the host (through dnsmasq). I'll document the setup I have in the PR.

import dns.resolver
import socket

class ConsulDNS(DNS):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't find the usage in this PR. Where is it used?

Copy link
Author

@asauray asauray Oct 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be used on the client side as follows:

from clipper_admin.deployers import python as python_deployer
from clipper_admin import ClipperConnection, DockerContainerManager, NomadContainerManager, ConsulDNS

nomad_ip_addr = '10.65.30.43'
dns = ConsulDNS() # We use Consul for DNS resolution
container_manager = NomadContainerManager(
    nomad_ip=nomad_ip_addr,
    dns=dns
)
clipper_conn = ClipperConnection(container_manager)
clipper_conn.connect()

I will document this as well

@asauray
Copy link
Author

asauray commented Oct 17, 2019

#749

@asauray
Copy link
Author

asauray commented Oct 17, 2019

Do you think it is possible to expose resource allocation & port number to containerManager? Seems like you hardcodes those values in job description

I was thinking about it, it can be done in the instanciation of NomadContainerManager

@asauray
Copy link
Author

asauray commented Nov 8, 2019

Need support here, i'm still stuck because of #751

@rkooo567
Copy link
Collaborator

rkooo567 commented Nov 8, 2019

@antoinesauray I don't have enough bandwidth to handle this recently. For the testing stuff, I will try to resolve by next Tuesday. Please leave one more message if I don't come back by next Tuesday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants