Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proper server? #3

Open
sckott opened this issue Jul 6, 2021 · 11 comments
Open

proper server? #3

sckott opened this issue Jul 6, 2021 · 11 comments

Comments

@sckott
Copy link
Contributor

sckott commented Jul 6, 2021

The only server I'm familiar with is Caddy, which is really easy to use if you have a domain name and it does Letsencrypt TLS certs for you. However, since there's no domain name, I couldn't figure out how to just use http easily

Anyway, I have some local work with docker compose and caddy if you want to see that

@mpadge

@mpadge
Copy link
Member

mpadge commented Jul 7, 2021

Caddy looks great - i've not used it, but the docs make it all look very straighfoward. so yes, that would be great. http doesn't matter at all anymore, it's fine that all is https.

@sckott
Copy link
Contributor Author

sckott commented Jul 7, 2021

Okay, cool, but are you planning on using a domain name or just using the server IP?

@mpadge
Copy link
Member

mpadge commented Jul 9, 2021

Server IP is fine, and it's not planned to be a general public endpoint, rather only one for buffy to directly call.

@sckott
Copy link
Contributor Author

sckott commented Jul 9, 2021

Okay, IP it is. I think nginx would make sense in that case

@mpadge
Copy link
Member

mpadge commented Jul 9, 2021

One maybe important issue here, reflective of my ignorance in these domains: The server has to be set up to deliver a response semi-immediately, and then trigger a background process which later dumps results via system call to the GitHub cli. So whatever system is used has to be able to also launch a background process, and ideally have an ability to monitor that (in order to access the second log endpoint, which reads sys output from the background process).

@sckott
Copy link
Contributor Author

sckott commented Jul 9, 2021

You're talking about all routes here? Or primarily /editorcheck?

dumps results via system call to the GitHub cli

What do you mean by the GitHub cli?

I see you're using callr::r_bg on the /editorcheck route to run as a background process, so you've got the run in the background part covered it seems. I do wonder how it will work though if you have many calls to /editorcheck simultaneously. I ran 3 rather large packages the other day and it seemed to get bogged down and calling /stdlogs seemed to suggest that all 3 processes were hung and not progressing, but wasn't sure how to dig into the docker instance to find out more

Maybe (in addition to using repo url in /stdlogs) it's a good idea to return an identifier so that you can then pass that to an API route to get progress on the specific R session, e.g.,

# a call to /editorcheck?... returns something like
{
  "message": "Editor check started",
  "id": 234238923
}
# where id is perhaps the output of `$get_pid()` which uniquely identifies the background R session

Then use the identifier to check on progress /check-progress?id=234238923 or /stdlogs?id=234238923. But maybe that's not helpful? Feel free to disregard this idea!

@mpadge
Copy link
Member

mpadge commented Jul 9, 2021

No disregarding - that is indeed a great idea. I'd just been concentrating on getting the logs working, and hadn't given enough thought to the actual endpoint design. It shall be done! (And the Github cli is just that: cli.github.com, the advent of which means I've effectively stopped using ghql, because the cli has inbuilt mutation queries, and takes 1-2 lines instead of 20-30.)

@mpadge
Copy link
Member

mpadge commented Jul 9, 2021

One thing that would help if you could @sckott would be to provide some links to vaguely equivalent set ups with "proper servers" that we could use as templates. You said you had some local work, the provision of which would be great, but even better for posterity purposes would be any public links you might be able to provide. Anything you could offer in that regard here?

@sckott
Copy link
Contributor Author

sckott commented Jul 9, 2021

Ah cool, i use gh as well.

provide some links

Yes, I'll have a think and share some links

@sckott
Copy link
Contributor Author

sckott commented Jul 9, 2021

The "proper server" question here was really just about something like nginx or caddy "in front of" the plumber service (see nginx point below), but I'll try to answer more broadly about other issues on the machine itself. The only other stuff I had local was some docker-compose stuff I was playing with for nginx, but like I said below, gave up on that.

  • nginx: I ended up not getting it to work in the last few hours I had left today, not used it before and it's very complicated compared to caddy (but caddy is mostly for use w/ domain names). not sure if you really need this or not
  • better introspection of each editor check run would be good. I played around running some big packages with many dependencies, e.g., taxize, and it would run for hours and hours, while the /stdlogs output was the same every time I checked, so there was either something going on, or something broken, because the output didn't get posted to the assigned issue. maybe since you designed this you have a way to dig into these issues.
  • I don't know whether you expect to have many simultaneous /editorcheck runs going on or not. If you do, perhaps one of these ways would work:
    • spin up separate docker instances from pre-built images for each call to /editorcheck, e.g, using stevedore, then just destroy each instance once you've collected results
    • (stay with one docker container &) use a lightweight queue (e..g, https://github.com/r-lib/liteq) to put each run into a queue, and run N at a time (n=1 or 2?) - so you have a max of N running simultaneously. this would also at least in part help avoid any abuse/spam

@mpadge
Copy link
Member

mpadge commented Jun 28, 2022

The queing stuff was mostly due to #7. Since that was resolved almost a year ago, no processes have hung or frozen, including simulataneous processes. The only issue which i think really needs to be resolved to address this is authenication, which is actually pretty straighforward and described in the nginx docs. Passwords for actual users are easy to configure; the bot less so.

If heroku has a static IP address (@maelle?), then we could allow that address, but require all others to submit a password, and combine the two methods with satisfy: any.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants