Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example creating workflow programmatically #56

Open
esanzgar opened this issue Oct 16, 2017 · 6 comments
Open

Example creating workflow programmatically #56

esanzgar opened this issue Oct 16, 2017 · 6 comments

Comments

@esanzgar
Copy link

Please, could you provide an example of how to create a workflow from scratch?

It would be nice if it contained some of these:

  • Add workflow's input and output
  • Add a couple of steps with its inputs and outputs
  • Edit label and doc
  • Serialize

Thanks!

@esanzgar esanzgar changed the title Example file Example creating workflow programatically Oct 17, 2017
@esanzgar esanzgar changed the title Example creating workflow programatically Example creating workflow programmatically Oct 17, 2017
@esanzgar
Copy link
Author

esanzgar commented Oct 17, 2017

Is the following approach correct?

I have two issues with serialization:

  • requirements are not serialized
  • all inputs and outputs are single parameters, but they are serialized as arrays. That causes me to use linkMerge option and MultipleInputFeatureRequirement.
import {
    V1WorkflowModel,
    V1StepModel,
    V1WorkflowInputParameterModel,
    V1WorkflowOutputParameterModel
} from 'cwlts/models/v1.0';
import {
    RequirementBaseModel
} from 'cwlts/models/generic';

export function createWorkflow(){
        const wf = new V1WorkflowModel();
        wf.label = 'My label';
        wf.description = 'My doc'; // It is serialized as 'doc'
        wf.requirements.push(new RequirementBaseModel({class: 'SubworkflowFeatureRequirement'}));

        // Add workflow inputs
        const inputs = new V1WorkflowInputParameterModel({
            id: 'protein',
            label: 'UniProt ID',
            doc: 'Enter UniProt identifier',
            type: 'string?',
            default: 'uniprot:P01038'
        });
        wf.addEntry(inputs, 'inputs');

        // Add two steps
        const step1 = new V1StepModel({
            id: 'sss',
            label: 'NCBI BLAST',
            doc: 'Sequence similarity search',
            in: {
                sequence: 'protein'
            },
            out: ['proteins'],
            run: 'https://raw.githubusercontent.com/psafont/gluetools-cwl/master/ncbiblast/ncbiblast.cwl'
        });
        wf.addEntry(step1, 'steps');

        const step2 = new V1StepModel({
            id: 'filter',
            label: 'Top 20 sequences',
            doc: 'Use DbFetch to get the 20 top most similar sequences',
            in: {
                accessions: 'sss/proteins'
            },
            out: ['sequences'],
            run: 'https://raw.githubusercontent.com/psafont/gluetools-cwl/master/workflows/fetch-proteins.cwl'
        });
        wf.addEntry(step2, 'steps');

        // Add workflow outputs
        const outputs = new V1WorkflowOutputParameterModel({
            id: 'result',
            label: 'Filtered sequences',
            doc: 'Top X sequences',
            type: 'File',
            outputSource: 'filter/sequences'
        });
        wf.addEntry(outputs, 'outputs');

        // wf.serialize()
        return wf;
}

@mayacoda
Copy link
Contributor

Hi @esanzgar, sorry for the late reply.

I will admit the API behind the workflow model isn't the prettiest or most consistent, I've mostly been developing it to satisfy the needs of the Rabix Composer. The lack of documentation is also unfortunately.

There are specific ways in which the Composer creates a WorkflowModel which aren't the easiest/most convenient to replicate programmatically. The philosophy behind workflow creation in the composer is as follows:

  • Workflow creation starts with adding steps
  • Steps are added as resolved tools (the whole object, not just the path) with references to their location for later serialization, calling the method addStepFromProcess
  • Step inputs and outputs are generated from the step's run property (this is why the model needs the whole tool/workflow instead of just the path)
  • Workflow inputs and outputs are created from ports on the step, calling the methods createInputFromPort and createOutputFromPort
  • Direct manipulation of objects on the model is avoided in favor of helper methods, as they ensure a consistent state of the model's graph, validation tree and validity.

That being said, the example you show could also work. The issue you have with requirements is actually a bug, as we haven't had a need for adding/serializing requirements in the Composer so the functionality was never added.

I'm not sure I understand the issue related to serializing inputs and outputs, though. Workflow.inputs and Workflow.outputs are always serialized as an array out of habit, they could easily be a map<id, input> as this is just a syntax sugar. linkMerge and MultipleInputFeatureRequirement are only necessary when you have multiple incoming connections on a single step, which serialization does not affect.

@esanzgar
Copy link
Author

Maya,

Thank you for your reply.

Would you mind posting an example of a Composer workflow creation approach (with the addStepFromProcess, createInputFromPort and createOutputFromPort)?

@esanzgar
Copy link
Author

Regarding the potential bug (serialising requirements), would you like me to create an independent issue?

@esanzgar
Copy link
Author

esanzgar commented Oct 23, 2017

Sorry, my explanation about MultipleInputFeatureRequirement was not accurate.

If I define the input of a step in this way:

            in: {
                accessions: 'sss/proteins'
            },

It is serialized in this way:

        "in": [{
            "id": "accessions",
            "source": ["sss/proteins"]
        }],

However, I was expecting this:

        "in": [{
            "id": "accessions",
            "source": "sss/proteins"
        }],

Because source is an array I have to add the requirement MultipleInputFeatureRequirement (linkMerge defaults to "merge_nested") to make it work.

Because there is a problem serialising requirements I am in a little predicament.

Thanks

@esanzgar
Copy link
Author

Workaround to serialise requirements:

        // Standard way of adding requirements doesn't work
        // wf.requirements.push(new RequirementBaseModel({class: 'SubworkflowFeatureRequirement'}));
        // wf.requirements.push(new RequirementBaseModel({class: 'MultipleInputFeatureRequirement'}));

        // Workaround
        wf.customProps.requirements = []
        wf.customProps.requirements.push({class: 'SubworkflowFeatureRequirement'});
        wf.customProps.requirements.push({class: 'MultipleInputFeatureRequirement'});

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants