Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core feature] Pass a dict to containertask #6154

Open
2 tasks done
doLei-2001 opened this issue Jan 9, 2025 · 4 comments
Open
2 tasks done

[Core feature] Pass a dict to containertask #6154

doLei-2001 opened this issue Jan 9, 2025 · 4 comments
Labels
enhancement New feature or request needs discussion

Comments

@doLei-2001
Copy link

Motivation: Why do you think this is important?

As far as I know, only basic types such as str-int-float can be passed into containertask. If you want to pass dict/json type into it, you can only use json.dumps to convert it to str first, and then pass it. It still needs to be loaded into dict-json type inside the image. This is very troublesome.

for 2 ways:

  1. json.dumps() / json.loads()
  2. flytefile pass a file into containertask

Goal: What should the final outcome look like, ideally?

We can directly pass dict type data to containertask, and we can easily get it in the image

Describe alternatives you've considered

easy to use

Propose: Link/Inline OR Additional context

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@doLei-2001 doLei-2001 added enhancement New feature or request untriaged This issues has not yet been looked at by the Maintainers labels Jan 9, 2025
@wild-endeavor
Copy link
Contributor

hey @doLei-2001 - thank you for the issue. Could we get you to give some more information though please? What's the ux look like on the python side (when you're declaring the container task)? what types of inputs are you envisioning - is it just an untyped dict? or will dataclasses and pydantics be passed as well?

I think you know this, but as background, something to consider is that dict[str, MyType] gets interpreted differently than dict[something-other-than-str, MyType]. the first gets interpreted as a flyte-level map whereas the latter gets interpreted as a scalar.

To clarify this issue a bit more - if you have a task that produces an untyped dict, that gets stored in the Flyte backend blob store as a flyte scalar of a msgpack (or a generic protobuf struct). This is fine if a downstream task consuming this output is another Python task that has flytekit installed. flytekit has all the libraries it needs in order to understand both msgpack and proto structs. This request is for flyte copilot (the helper sidecar container that downloads inputs) to be able to deserialize msgpack to json right?

@Future-Outlier
Copy link
Member

container task is language agnostic, so it means that you have to support
python dict(output) -> copilot task, java dict(input) ... or something like this.

prerequiste: dataclass and dict will be serialized and deserialized through msgpack bytes and protobuf struct.

so in your copilot task, your customized image should know how to handle the deserialization logic for protobuf struct or msgpack bytes, which is inconvenient.

@doLei-2001
Copy link
Author

In other words, Are there any other ways to pass parameters to the image besides using environment variables and command lines?

If there are only these two ways, then it seems that what can be passed in is limited to these two ways.

  1. command=["python", "calculate-ellipse-area-new.py", "{{.inputs.a}}", "{{.inputs.b}}", "/var/outputs",],
  2. environment={"API_KEY": "{{.inputs.api_key}}"},
    

so, the most important question is that is there only the two ways to pass the variable into image?

@eapolinario
Copy link
Contributor

eapolinario commented Jan 16, 2025

@doLei-2001 , copilot translates the Flyte objects into values accessible through the {{.input.X}} syntax, where X is a Flyte Literal rendered via

func Render(ctx context.Context, inputTemplate []string, params Parameters) ([]string, error) {
if len(inputTemplate) == 0 {
return []string{}, nil
}
// TODO: Change GetGeneratedName to follow these conventions
var perRetryUniqueKey = params.TaskExecMetadata.GetTaskExecutionID().GetGeneratedName()
perRetryUniqueKey = startsWithAlpha.ReplaceAllString(perRetryUniqueKey, "a")
perRetryUniqueKey = alphaNumericOnly.ReplaceAllString(perRetryUniqueKey, "_")
logger.Debugf(ctx, "Using [%s] from [%s]", perRetryUniqueKey, params.TaskExecMetadata.GetTaskExecutionID().GetGeneratedName())
if params.Inputs == nil || params.OutputPath == nil {
return nil, fmt.Errorf("input reader and output path cannot be nil")
}
res := make([]string, 0, len(inputTemplate))
for _, t := range inputTemplate {
updated, err := render(ctx, t, params, perRetryUniqueKey)
if err != nil {
return res, err
}
res = append(res, updated)
}
return res, nil
}

As of flyte 1.14.0, untyped dictionaries are encoded using msgpack, so we could turn those into json strings. Keep in mind that if we go that route we'll have to update the documentation as well.

@eapolinario eapolinario removed the untriaged This issues has not yet been looked at by the Maintainers label Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs discussion
Projects
Status: Backlog
Development

No branches or pull requests

4 participants