Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shutting down our build servers #45956

Open
jaredpar opened this issue Jan 14, 2025 · 11 comments
Open

Shutting down our build servers #45956

jaredpar opened this issue Jan 14, 2025 · 11 comments
Labels
Area-NetSDK needs team triage Requires a full team discussion untriaged Request triage from a team member
Milestone

Comments

@jaredpar
Copy link
Member

Motivation

The .NET build contains a number of long lived server processes: msbuild nodes, razor and roslyn. A requirement of the .NET SDK is for customers to be able to reliably shut down all servers created during any build. This is important for shutting down a build operation and ensuring that no files remain locked on disk.

This challenging because there is no easy answer to discovering which processes are build servers:

  • These are primarily .NET SDK process which means they all share the same process name: dotnet
  • These processes do not have a strict parent / child relationship with build commands. Several, like roslyn, live past the msbuild nodes that created them.
  • The ones inside the .NET SDK can be overridden by installing custom NuPkg.

Each build server has taken a different approach to implementing shutdown. Each of these approaches has its own weakness and it's left us with a rather disjoint experience in shutdown.

  • Razor uses a PID file that is subject to PID recycling. It also doesn't differentiate between servers created in the current SDK vs. any other SDK.
  • Roslyn looks for common install locations and tries to send a shutdown command. This falls apart when a custom NuPkg is installed or doing shutdown across SDK versions.

To solve this we're going to implement a uniform approach to shutdown for all of our custom build servers using named pipes.

Overview

The .NET SDK will be responsible for creating a new environment variable called $DOTNET_HOST_SERVER_PATH. This will point to a directory on disk that is restricted to the current user (chmod 700). On startup a build server will take the following actions:

The build server will use a pipe named $DOTNET_HOST_SERVER_PATH/<pid>.pipe where <pid> is the PID of the server process. On startup if the named pipe exists it will be deleted as it's an indication of a previous build server that did not shut down gracefully. The server will then create the named pipe and begin listening for input. If any input is received then it will shut down the server process. During normal shut down the server will close the named pipe and delete the file.

If there are any issues creating the named pipe on start up the server will log an appropriate error to the command line as a warning but continue starting up. This is not a fatal issue for build.

The dotnet build-server shutdown command will be implemented by iterating over every file in $DOTNET_HOST_SERVER_PATH, connecting to it, sending the value 1 and waiting for the corresponding PID to terminate. If no process with the PID exists then the command will not attempt to connect to the named pipe. At the conclusion of the command all files in the directory will be deleted.

The use of the $DOTNET_HOST_SERVER_PATH also allows us to transparently apply different policies around server shutdown. For example by default this should be partitioned by .NET SDK major versions. For example .NET SDK 8 and 9 would produce distinct paths like /home/jaredpar/.dotnet/server/8 and /home/jaredpar/.dotnet/server/9. Then build-servershutdown could easily be tweaked to shutdown the current .NET SDK or all servers.

@dotnet-issue-labeler dotnet-issue-labeler bot added Area-NetSDK untriaged Request triage from a team member labels Jan 14, 2025
@jaredpar
Copy link
Member Author

@ViktorHofer, @baronfel, @rainersigwald FYI

@KalleOlaviNiemitalo
Copy link
Contributor

KalleOlaviNiemitalo commented Jan 14, 2025

Does this support parallel independent builds on the same user account, e.g. by setting $DOTNET_HOST_SERVER_PATH in a script that then runs dotnet msbuild and dotnet build-server shutdown? So that dotnet doesn't change the the environment variable if it has been set already.

@jaredpar
Copy link
Member Author

So that dotnet doesn't change the the environment variable if it has been set already.

I've gone back and forth on this. Having .NET SDK respect this if set is seemingly only useful if either:

  1. They're willing to commit to this implementation detail thus allowing consumers to say write a python script that enumerates the named pipes in the directory and send the close message.
  2. They're willing to add a directory argument to dotnet build-server shutdown at which point the developer could set $DOTNET_HOST_SERVER_PATH then later uses it with the shutdown command

Unless they're willing to do one of those I'm struggling to think of a good use case for respecting the existing set. Am I missing one?

@KalleOlaviNiemitalo
Copy link
Contributor

Scenario: A persistent build agent running multiple parallel builds (of branches of the same repository) on the same user account. The NuGet global-packages folder is shared between builds, to save time in package restore; but the temp directories are separate so that they can be eagerly cleaned up after each build. The build servers that use the temp directories should also be separate, and shutting down the build servers of one build should not affect the build servers of other parallel builds. I'd make the build script point $DOTNET_HOST_SERVER_PATH to a subdirectory of the temp directory that is already specific to the build. Then build servers would create pipes in that directory and finally the script would run dotnet build-server shutdown and delete the temp directory.

@baronfel
Copy link
Member

This might be a good way to implement "one shot" execution modes: #44846

If a user requested the "one shot" mode then the CLI would synthesize a more unique directory to store the PIDs, and then call shutdown at the end of the command.

@KalleOlaviNiemitalo
Copy link
Contributor

One-shot mode would work for my purposes, but there are multiple build steps in the script, so I think it can save time if the build servers are kept alive between the steps and only shut down near the end of the whole script.

@jaredpar
Copy link
Member Author

Then build servers would create pipes in that directory and finally the script would run dotnet build-server shutdown and delete the temp directory

This is effectively (2) of what I said above. Basically if the build-server shutdown command has the capability to take a path argument then I agree there is sense in allowing the path to be customized. But I'm not sure it makes sense otherwise.

As for whether we should do that, will leave that to @baronfel but it sounds reasonable

@KalleOlaviNiemitalo
Copy link
Contributor

If the script has to preset the environment variable anyway, in order to isolate the parallel builds from each other, then it is easier to pass that environment variable to dotnet build-server shutdown as well than add a command-line option.

@marcpopMSFT
Copy link
Member

Triage: Adding this to .NET 10 as there seems to be enough compelling benefit to centralize this. The main issue to sort out is where to store these. The recommendation from us is to potentially hash the dotnet.dll directory path cause then it should be unique per SDK instance. We'd want to run this by the security folks if we're using a crypto hash or we can just use a non-crypto hash and risk some small chance of collision.

Needed on the sdk side:

  • agree on env variable
  • agree on path
  • implement setting that in the sdk
  • implement shutting down anything in that location in the sdk on server shutdown call
  • remove shutdown implementations as each consumer implements this

@marcpopMSFT marcpopMSFT added this to the 10.0.1xx milestone Jan 14, 2025
@jaredpar
Copy link
Member Author

e'd want to run this by the security folks if we're using a crypto hash or we can just use a non-crypto hash and risk some small chance of collision

Where would a hash come into play here?

@baronfel
Copy link
Member

For determining what the SDK's default 'pid file root' might be:

The recommendation from us is to potentially hash the dotnet.dll directory path cause then it should be unique per SDK instance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-NetSDK needs team triage Requires a full team discussion untriaged Request triage from a team member
Projects
None yet
Development

No branches or pull requests

4 participants