Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JBrowse2: #5695 rebased with @abretaud (#3997) #5697

Closed
wants to merge 9 commits into from

Conversation

fubar2
Copy link
Member

@fubar2 fubar2 commented Jan 8, 2024

  • - I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
  • - License permits unrestricted use (educational + commercial)
  • - This PR adds a new tool or tool collection

Merges #5695 derived from @hexylena JBrowse1 tool with most of the improvements made by @abretaud #3997 such as adding track controls to the tool form.

Sad I didn't know about #3997 until after repeating a lot of the same updates - but that PR included all the track formatting UI form code that #5695 was missing.

Merge adds:

  • track json without calls to jbrowse cli - can be more finely controlled.
  • added tests for some common track types
  • working default session
  • Form has track options including colour/styling for different track types. Expanded UI sample image shown below.
  • MAF tracks will work when bcbio-gff is updated when updates for biopython 1.82 chapmanb/bcbb#141 is accepted

Removes some #3997 tests failing because looking for JB1 elements - need updating.
Fixes testing in biocontainers - different path in Conda so now use jbrowse create.

Available for testing from the toolshed

Screenshot shows test bed bam vcf maf bigwig gff sample data tracks
image

and HiC
image

  • SPARQL and paf remain to be implemented.
  • Needs a test for each track input type

expanded UI example:

image

@fubar2 fubar2 mentioned this pull request Jan 8, 2024
3 tasks
@fubar2 fubar2 changed the title JBrowse2: manually merged @abretaud #3997 into #5695 JBrowse2: manually merged #5695 with lots of fixes from @abretaud (#3997) Jan 8, 2024
@hexylena
Copy link
Member

hexylena commented Jan 8, 2024

It will take some time to review. Would it be possible to re-base your work atop #3997 and retain the commits? @abretaud has been working on this for a long time, it would be a pity to lose those commits and authorship attribution fully (merging in isn't quite the same). It would really help the reviewing process, it'd be clearer what was new, relative to the work Anthony has already done, so he could say whether the changes made sense or what might still be needed.

Copy link
Member

@hexylena hexylena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments in line, i'm really excited for jb2, but there's a lot of small details to get the tool optimal.

tools/jbrowse2/servejb2.py Outdated Show resolved Hide resolved
tools/jbrowse2/static/images/bam.png Outdated Show resolved Hide resolved
tools/jbrowse2/static/images/blast.png Outdated Show resolved Hide resolved
tools/jbrowse2/static/images/sections.png Outdated Show resolved Hide resolved
tools/jbrowse2/test-data/index.html Show resolved Hide resolved
tools/jbrowse2/macros.xml Show resolved Hide resolved
tools/jbrowse2/macros.xml Outdated Show resolved Hide resolved
tools/jbrowse2/jbrowse2.py Outdated Show resolved Hide resolved
tools/jbrowse2/jbrowse2.py Outdated Show resolved Hide resolved
tools/jbrowse2/jbrowse2.py Outdated Show resolved Hide resolved
@abretaud
Copy link
Contributor

abretaud commented Jan 8, 2024

I still have to look into details but for now:

URLs are used wherever easy to do to minimise the size of JBrowse2 files folder so these are much smaller than jb1 where all were copied. (from #5695)

That can be a problem, ie the output dataset is no longer self-containing but depends on the existence/sharing of other ones, that can be problematic. And I have a use case where I use galaxy to generate jbrowse archives that I can download/deploy on another server.
Having at least an option for that would be helpful

The default session / track styling was still largely WIP, I remember I wasn't sure of the proper way to implement this, and opened a discussion there: GMOD/jbrowse-components#3568
I don't know if there wer changes in jb code to make this easier since my last try

@fubar2
Copy link
Member Author

fubar2 commented Jan 9, 2024

@abretaud URLs no longer used for reference and track files because too unreliable - requires the history to be published and the server must allow anonymous access - either of those could change any time so IMHO too fragile.
A pity as it saves a lot of disk/archive bloat.

I still have to look into details but for now:

The default session / track styling was still largely WIP, I remember I wasn't sure of the proper way to implement this, and opened a discussion there: GMOD/jbrowse-components#3568 I don't know if there wer changes in jb code to make this easier since my last try

The default session using mostly unchanged code from your PR seems to work perfectly well - so well done!!

@fubar2
Copy link
Member Author

fubar2 commented Jan 9, 2024

Would it be possible to re-base your work atop #3997 and retain the commits?

@hexylena: rebased so those commits are now visible above.
An, um, interesting experience since I've never tried it before.

Thanks for all the helpful suggestions - incorporated and resolved I think, other than:

  • blastxml gff rendering
  • waiting on bcbbio-gff PR update for MAF

There's lots more to do and it needs feedback from users to improve, but arguably useful enough to expose as is so we can get that user feedback.

@fubar2 fubar2 requested a review from hexylena January 9, 2024 02:33
@fubar2 fubar2 changed the title JBrowse2: manually merged #5695 with lots of fixes from @abretaud (#3997) JBrowse2: #5695 rebased with @abretaud (#3997) Jan 9, 2024
@hexylena
Copy link
Member

An, um, interesting experience since I've never tried it before.

did a bang up job, cheers. really appreciate it!

There's lots more to do and it needs feedback from users to improve, but arguably useful enough to expose as is so we can get that user feedback.

yeah, i mean, it sounds like there's a lot of pressure from one side to get this shipped, it could be installed just on that server in the current as-is state for better "real life" testing before we get it finalised since, yeah it's a massive tool.

Copy link
Contributor

@abretaud abretaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've made a first review of the code, with a few comments/changes to do. Can you look into it before merging?
In the meantime I'd like to also make a few tests locally

tools/jbrowse2/apache2_license.txt Show resolved Hide resolved
tools/jbrowse2/macros.xml Show resolved Hide resolved
tools/jbrowse2/readme.rst Show resolved Hide resolved
tools/jbrowse2/readme.rst Outdated Show resolved Hide resolved
tools/jbrowse2/macros.xml Outdated Show resolved Hide resolved
tools/jbrowse2/jbrowse2.py Outdated Show resolved Hide resolved
tools/jbrowse2/jbrowse2.py Show resolved Hide resolved
tools/jbrowse2/jbrowse2.py Show resolved Hide resolved
dataset_path,
outputTrackConfig,
)
elif dataset_ext == "bam":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cram is missing, it's very similar to bam, I had a add_xam for both in my PR IIRC

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR please. I cannot do this as I have no data or experience.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unresolving this conversation.

It's not a task for Anthony to do; you dropped his changes, and then ask him to spend the effort to bring his already functioning code back?

Had this PR been properly rebased on top of the existing PR, we would've gotten support for this "for free" (i.e. the code that was already written, could still be used.)

tools/jbrowse2/jbrowse2.py Outdated Show resolved Hide resolved
@fubar2
Copy link
Member Author

fubar2 commented Jan 17, 2024

@bgruening @hexylena @abretaud

Simplified. Useful enough to serve as a starting point. Plenty of creative opportunities remain. Send code.

Track details, like colours, can only be configured by the user in the track browser - not on the tool form.

Making them configurable requires hacking the internal default state.
It is not documented, so requires reverse engineering and it has no long term stable guarantees.
It can be added if anyone wants.

Limits

I really appreciate the effort required for review but please give some thought about how best to share responsible for this tool going forward. Advice is wonderful, but I cannot sustain all the work of implementing it, so if you think this tool is worth having, please pitch in, as it is not going to be sustainable otherwise.

Features remaining require an understanding of paf, cram and sparql - not me. Jbrowse2 is a very substantial undertaking, and as far as I can tell, this PR is now sufficiently useful to have available as a starting point, to which others can contribute the bits they know and care about?

Remaining features to complete

  • testing/fixing PAF synteny tracks - the code is there but untested as I have no data or clue;
  • testing/fixing SPARQL tracks - as above
  • modifying the working bam code and testing with cram input - as above.
  • update to 2.10.1 and bcbbio-gff to restore MAF tracks - awaiting Brad merging a PR to deal with BioPython deprecations
    • Biopython has temporarily restored the deprecation with a warning but only until next release - it will all go then so we need Brad's help. @bgruening: if you need to manually fiddle with the biocontainer when the requirements change, best to do both at once?

@hexylena
Copy link
Member

hexylena commented Jan 17, 2024

Track details, like colours, can only be configured by the user in the track browser - not on the tool form.

That's unfortunate, but no, let's not undertake modifying internal state, that sounds fraught. We should consider flagging this upstream, since it was a nice feature in the past and gave users a bit of control, but also maybe we can live without it. (I'm saying: I used it and loved it; not sure anyone else did.)

Advice is wonderful, but I cannot sustain all the work of implementing it.

I mean, yeah... this is why it took us so long to get here, none of us have been able to allocate the time and energy to implementing such a big change. @abretaud had done all the work up until here, I would have loved to help but haven't been able to find free time. It doesn't help that for many of our use cases, jb1 is still sufficient (especially e.g. apollo.), and there wasn't an immediate need for jb2 even though it's a massive improvement in many many ways.

Sorry you had to find out the hard way :( doubly so, for learning about the existing PR after expending so much energy.

Please pitch in if you think this tool is worth having.

We definitely do, but "things we think are worth having" and "science-capitalism allocates us available bandwidth to work" are two very, very separate axes.

Jbrowse2 is a very substantial undertaking, and as far as I can tell, this PR is now sufficiently useful to have available as a starting point

yes, we can look towards merging. I guess you'll have to give @abretaud and I some time to install and test it

testing/fixing SPARQL tracks - as above

not a priority.

@fubar2
Copy link
Member Author

fubar2 commented Jan 17, 2024

and I some time to install and test it

That would be wonderful when you can find time. No more I can do at this point. It installs from the toolshed and works with the test data and on some VGP data I've tried but nobody else has tried it AFAIK.

We should consider flagging this upstream

I did some thread-jacking here - Colin was very responsive and is giving it some thought. Not a new idea - there is a now outdated CL jbrowse create tool for adding a default view to a track configuration so it has been on their radar.

@bgruening
Copy link
Member

@hexylena would it help if I deploy it on EU as test tool?

@hexylena
Copy link
Member

@bgruening that would be a fantastic help!!

@bgruening
Copy link
Member

@hexylena @abretaud here you go: https://usegalaxy.eu/?tool_id=jbrowse2

@fubar2
Copy link
Member Author

fubar2 commented Jan 19, 2024

whoopsie

@fubar2
Copy link
Member Author

fubar2 commented Jan 19, 2024

not intended

@fubar2
Copy link
Member Author

fubar2 commented Jan 19, 2024

This is being reopened as a new PR as some git repairs went bad.

@hexylena
Copy link
Member

Danke sehr @bgruening

@hexylena
Copy link
Member

https://usegalaxy.eu/u/helena-rasche/h/jbrowse2-testing looks like some issues unfortunately, not sure what's happening there.

@fubar2
Copy link
Member Author

fubar2 commented Jan 19, 2024

https://usegalaxy.eu/u/helena-rasche/h/jbrowse2-testing looks like some issues unfortunately, not sure what's happening there.

dependencies were not installed so jbrowse not found :(
I just ran it on EU - please try again?

@bgruening
Copy link
Member

please try again, works now for me.

@hexylena
Copy link
Member

success-ish

is it expected that no track data is rendering for me? (chrome + firefox on linux)
https://usegalaxy.eu/datasets/4838ba20a6d8676556a41e2bf6a52c7d/preview?__vkey__=1705659579999

jbrowse with uniformly empty tracks

@fubar2
Copy link
Member Author

fubar2 commented Jan 20, 2024

Thanks @hexylena! Very glad to see that some things can be made to work.

Since you have chosen to continue using the trackXML that we devised,

Not just that. Most code and logic is unchanged from what you devised. Configuration json is the only real addition.

Much remains to do. Question is, when is a big complex tool ready to expose for proper testing?
So far, only three of us have tried it AFAIK.

See below for default location fix that may help your test case - fixed one of mine.

I think this will also be relevant when we want to display multiple genome assemblies together for the synteny view. E.g. we'll need to add a "grape" and "peach" assembly separately, with separate genomes and separate tracks.

The paf code has been rewritten and so has the genome maker, so syntenic tracks can add genomes. The jbrowseConnector now has a list of genomes for that purpose. It keeps the default genome for non-synteny tracks like vcf that need a genome. It is set when the first assembly is added. May need rethinking but that's what's there. Problem is I have neither a paf file nor a clue about synteny, to test the new synteny track.

and some other comments
track groupings do not seem to be used, they should be.
I don't understand the purposes of the colours. I changed mine, nothing seemed to change in the output. Maybe they aren't passed correctly? I'd like to use them that's a cool feature.
as far as i can tell font size also isn't taking effect.
why does setting the default location break the session? I am no longer loaded directly into jb2 with the track view, instead I'm in a "create a new session" view where i have to select that yes, I really want that file. Should this be passed via the URL or something instead to prevent that? I just wanted to load a specific location.

Yes - a very helpful list thanks!
+1 on fixing all these.
They are of varying levels of criticality to different users, have arisen in different ways and will take a wide range of specialised skill contributions to fix.

  • Code referring to display and general browser settings such as colour, fonts etc is working better now - was overwriting some configuration each time. Big fonts at last. Toolshed updating now. Will require manual fix on EU. You have an account - check out https://dev.gvl.org.au/histories/list_published - the top two are the ones I'm working on if you want to play there it has the latest version.
  • Which tool detailed track settings worked in your PR code @abretaud?
  • Default location parsing a period bug fixed - please test with that failing data?
  • I've never ever tried more than one track group, so yes please.
  • Like most things, the xml has not been altered much and essentially drives the whole process, so generating a jbrowse2 config.json makes sense to me - particularly for complex configurations - I've only tested with a handful of tracks at a time. There it's not a big deal to start again but it's tedious when there are dozens of tracks...
  • Synteny needs testing.
  • sparql needs work if it will be used.
  • What to do about bcbio-gff PR delay?
    • it's a hack, but could include the working PR code as a module until Brad responds and updates.
    • that would allow blastxml and maf gff conversion to work.

@fubar2
Copy link
Member Author

fubar2 commented Jan 20, 2024

Grrr. Now that I've fixed that, the genome name is wierd. Hang about.

Ah. It's the regexp not recognising anything with a period as the contig name - fixed.
Was breaking with H1.scaffold_1:1-10 - note the period - the underscore is not a problem.

fixed case where contig has . breaking the regexp for defaultLocation
@fubar2
Copy link
Member Author

fubar2 commented Jan 20, 2024

why does setting the default location break the session?

@hexylena: It shouldn't break now, but the genome.fa assembly contig name contains a period - NC_00913.3 and that period was breaking the rebased regexp - re.search(r"^(\w+):(\d+)\.+(\d+)$", data["defaultLocation"]).

Added a period to first regexp group, and it seems to work for the similar failing VGP example I found.

The updated code is installed on https://dev.gvl.org.au - your gmail address is already registered there. Best place to test for now with small data, because I can keep the code updated as you find more bugs. I managed to cause a space crisis with VGP samples so will move those to EU where there's more space available for testing the workflows to generate JB2 configurations.

@hexylena
Copy link
Member

hexylena commented Jan 22, 2024

Much remains to do. Question is, when is a big complex tool ready to expose for proper testing?
So far, only three of us have tried it AFAIK.

I would be open to shipping a first version when:

  • top level is wrapped in an "assembly" repeat, in planning for future multiple genome builds, without any expectation of that working now/PAF/etc.
  • it opens on the first contig

the rest can be skipped. The checklist in my last issue, yeah, they're issues but they can wait, none of them are experience breaking. I do not expect synteny to be done soon, that will probably need some discusison with jbrowse regarding CLI tools for configuration that into a a good user experience in galaxy.

@hexylena
Copy link
Member

hexylena commented Jan 22, 2024

Oh, good work, colours are there, and it's opening on the locaion correctly!

image

That is fascinating behaviour though, it really subsets the genome, it doesn't just focus the entire genome's view there, it gives you an actual subset that you cannot scroll past the end of. Note the disabled zoom out box, you cannot zoom out. Hilarious.

Ok, then I change my sine qua non list to be:

  • initial location renamed to "subset to display to new users", because it isn't an initial location, it's an initial subset. You have to change your viewed contig to fix that. (just a tiny text change)
  • Wrap it in an (unused) repeat for "assembly" with min=1/max=1 (should be very easy)
  • it opens on the first contig by default (a must-have for UX) (currently still not working? https://dev.gvl.org.au/u/hexylena/h/imported-from-archive-jbrowse2-testing still goes to genome.fa rather than something useful)

after that you've got my approval for merging.

@hexylena
Copy link
Member

re.search(r"^(\w+):(\d+)\.+(\d+)$",

that regex is quite restrictive, what about re.search(r"^([^:]+):(\d+)\.+(\d+)$"?

@hexylena
Copy link
Member

hexylena commented Jan 22, 2024

The paf code has been rewritten and so has the genome maker, so syntenic tracks can add genomes. The jbrowseConnector now has a list of genomes for that purpose. It keeps the default genome for non-synteny tracks like vcf that need a genome. It is set when the first assembly is added. May need rethinking but that's what's there. Problem is I have neither a paf file nor a clue about synteny, to test the new synteny track.

(not for now) I mean, for my use case it would look like this:

  • i have a reference genome
  • I have a couple of related genomes I want to visualise in synteny
  • as a user, I want to go to jbrowse and add my assembly + track, and then additional assemblies with their own datasets, all from my own data.

so, we'll need the ability to provide multiple "assemblies" eventually to the jb2 tool.

(for now) I don't think we can do the synteny on a decent time frame now, so, again let's just wrap it all in one big repeat for assemblies, and then in the future we'll tackle that in a good way. I just feel quite certain we'll want to support multiple assemblies, and if the tool already had the XML element for it, it won't break re-run for our users.

@fubar2
Copy link
Member Author

fubar2 commented Jan 22, 2024

top level is wrapped in an "assembly" repeat, in planning for future multiple genome builds, without any expectation of that working now/PAF/etc.

The code is already a loop for a list, but the XML does not allow multiselect.

def process_genomes(self):
 
       for genome_node in self.genome_paths:

The code isn't changed much so genome list at jbrowseConnector construction items will all be compressed and indexed into the browser. The very first one will be used when a vcf or other track wants a genome. I'm not clear what to change - what's the use case? Can a user put tracks with two or more different reference sequences to good use other than synteny? How do the tracks know which reference they are supposed to be related to? Other than synteny, not sure how multiple genomes works. Even MAF all use the same reference.

For synteny, JB2 expects a paf and a list of genomes. It adds any new genomes ready for a track to the list from the paf form choice, ready for display - untested.

@fubar2
Copy link
Member Author

fubar2 commented Jan 22, 2024

so, we'll need the ability to provide multiple "assemblies" eventually to the jb2 tool.

You can. When a paf track is specified, it includes another genome. That gets indexed and added to the directory ready for tracks if the code works right which it probably does not. As best as I could manage, it does what the docs seem to suggest - but needs some testing and fixin'

image

@fubar2
Copy link
Member Author

fubar2 commented Jan 22, 2024

Ah. Here's what I got by rerunning the most recent job in that history, setting the default location (name has been changed to subset as requested...will update after testing) to NC_000913.3:1..5000 Interestingly it is not exactly what is showing - it's also the session name.
There's something fishy going on - it does go to the right contig but not the specified range. No idea why. Both galaxy.xml and config.json are correctly showing the range from the form so it may be a jbrowse2 bug?

For JB2, the reference label - your genome.fa is the equivalent of your reference or assembly dbkey for MAF or similar uses.
The display has to be start..end of one specific contig or chromosome in that assembly.

image

change regexp for that subset
suggestions from @hexylena
@hexylena
Copy link
Member

hexylena commented Jan 22, 2024

You can. When a paf track is specified, it includes another genome. That gets indexed and added to the directory ready for tracks if the code works right which it probably does not. As best as I could manage, it does what the docs seem to suggest - but needs some testing and fixin'

Duplicated/rewritten to public discussion from backchannel.

We are talking past each other, a bit. That's not the same thing as what I'm requesting, PAF (synteny) != assembly. Two separate (but related concepts.)

A PAF file does not include multiple genome sequences: https://github.com/lh3/miniasm/blob/master/PAF.md I do not see how, by adding only a PAF file (which is just chr1 start1 end 1 chr2 start2 end2 type mapping) would include a whole new genome. Yes it is a required input for synteny to work, but it is not the entire thing required.

Multiple assemblies are possible in the jbrowse2 interface

image

that means multiple genomes with their own tracks. This is needed for when you want to do synteny. E.g. this was done manually, here you can see I added both an "ecoli" and "mrsa" assembly, two different assemblies with different tracks + sequencing data + genome sequence.

image

A PAF file would need to be added to one, which would then let the user do the comparison in the synteny view

Again I'm not asking you to implement any of this, I was asking for a top level repeat, wrapping the rest of the parameters, to be added, supporting the "Assembly" concept

now the current tool looks like:

  • genome
  • track
  • track

We will need it to look like this for synteny:

  • assembly
    • genome
    • track
    • track
  • assembly 2
    • genome
    • track
    • track

So multiple assemblies can be added, each with their own genome + own tracks.

If we make this change now (again should literally just be a <repeat name="assembly" min="1" max="1" title="Assembly"> that's it the entire diff I'm asking for), then when users re-run these jobs, and we've finished the synteny support, they won't have to completely re-configure their JBrowse2. Because if it looks like

  • assembly
    • genome
    • track
    • track

Galaxy will of course be able to map that to a newer version of the tool.

I'm sorry this has been so difficult to communicate, I just want a repeat, wrapping everything, min=1 max=1.

Can a user put tracks with two or more different reference sequences to good use other than synteny?

It is mainly for synteny, but I can see the following use case: they want to publish all of their genome data, from the genomes they annotate, in one jbrowse that a user can view. I would've done this with phages, maybe pick 5 phages, their annotation tracks, and configure that as a genome browser. There'd be no synteny view, but the user could still swap between all of our different cool annotated genomes.

How do the tracks know which reference they are supposed to be related to? Other than synteny, not sure how multiple genomes works. Even MAF all use the same reference.

Yes, exactly, the tracks need to be associated with a genome at the top level. They cannot use the same reference. That's what the assembly repeat is for. Then the user can

  1. Add assembly -> specify genome + track.
  2. Add new assembly -> specify genome + track

And the genome + track are each associated to a different assembly (this requires a command line parameter in the add-track -a, --assemblyNames=<value> Assembly name or names for track as comma separated string. If none, will default to the assembly in your config file, but I'm not asking you to add that) That's why the repeat is needed.

Ah. Here's what I got by rerunning the most recent job in that history, setting the default location (name has been changed to subset as requested...will update after testing) to NC_000913.3:1..5000 Interestingly it is not exactly what is showing - it's also the session name. There's something fishy going on - it does go to the right contig but not the specified range. No idea why. Both galaxy.xml and config.json are correctly showing the range from the form so it may be a jbrowse2 bug?

Ok, I will cede this one. Fine, not required. JBrowse2 requires you to select an assembly on first view. I do not like it, but, ok, let's call that a JBrowse2 problem.

@hexylena
Copy link
Member

Track categories are pretty trivial to support properly, you just need to pass --category="Some Text"

npx @jbrowse/cli add-track topics/assembly/tutorials/mrsa-nanopore/data/raw/d6de62f94b0de381be4ee8f27dd61ab9_0.gff.gz --out /tmp/jb2/jbrowse -a mrsa --load copy --category=Testing

Copy link
Member

@hexylena hexylena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My previous review(s) have been looking only at the functionality of the UI/UX as a user, we've reached an agreement on that for all intents and purposes:

  • I've ceded a default view as an upstream issue, and all of the other potential things I flagged, it's fine, don't do them.
  • (except for the repeat I want wrapping the entire inputs section to support 'assemblies' as a top level concept to make it more future proof. If you cannot add this, tell me, I will once we get there. Ignoring for the test cases, it will be a 4 line diff.)

Now I'm looking at the code

Since the rebasing hasn't happened yet (or rather it did, and then was subsequently un-done), I'm looking primarily at a diff of the two branches, to see what's actually changed (git diff -w pr-3997:tools/jbrowse2/jbrowse2.py pr-5697:tools/jbrowse2/jbrowse2.py, if you have two appropriately named branches on hand.)

jbrowse.xml diff, compared to #3997

  • mostly things that are supported in anthony's branch, are removed (cram)
  • some new test cases added (yay!)
  • most of the help text deleted (shrug. I don't think anyone was ever reading it.)

jbrowse.py diff

  • clearly run through black (yay)
  • all of the tracks are manually configured via JSON, rather than using the built in add-track command. (This is not good.)
  • categories support lost
  • assemblies support lost (rather, assemblies are made per-provided .fa file which is not desirable from my view.)

You've added some scripts and utilities and nice stuff (cool! it's nice! the webserver I think would be more generally useful as redbean, but I also don't think anyone will actually use this, so, I do not care, just leave it as is.)

That said, I do not think I can accept this PR, I am -1 on it as it stands.

Anthony's code is significantly cleaner, it has native support for assemblies (in the .py at least, already handling an edge case I noticed of naming collisions when testing myself with the CLI), and more features like CRAM support. I want all of that work to still be included. I can't accept it being dropped. Nor can I accept you asking him to PR his changes into your branch, when it should be the other way around.

The diff over what is currently present in @abretaud's PR is maybe interesting, but the replacement of add-track with more brittle json configuration is also sub-optimal, though I suspect a relic of your first PR where you mention using pointing inputs at URLs to avoid copies (I would have rejected that too, sorry, it will fail in far too many environments.). The result however, is that it relies too much on jbrowse2 internals when there's no need and they provide really nice CLI tools, whose cli interface is going to be more stable over time, and doesn't require reverse engineering their internal configuration, which makes adding new formats a cinch because they're already supported by the underlying add-track function.

I really do want whatever useful changes you've made!! I'm really happy you're motivated. I've probably killed a lot of that motivation, and I'm sorry for that, but I have my standards for what I approve, and I'd like to see all of our goals met; you, me, GGA, Anthony. You've done great work getting the dependencies updated and worrying about a lot of the little details that needed to be done, Thank you.

Still, I feel really sad at how much code has been written, how many commits, how many adventures you've had with git, how many PRs that have been opened and closed, how much stress was generated for all of us. It's not fun for anyone, I'm sorry it happened, that sucks.

As you've told me, this is not a pleasant review process, it's not a pleasant feeling, and I can completely understand and empathise there, I think all three of us feel that way. I really had hoped that after the first discussion, this could have been rebased on @abretaud 's PR, and we could review the you know, 5 commits or whatever, and merge it. Instead we've had a lot of miscommunications that lead to long discussions (assemblies), and multiple branches opened and closed, dozens of commits made that have to be re-reviewed, and that has been really unfortunate. I don't know how we could work around that, if you have suggestions here for next time I am very open to what could change.

The way I see to move forward is to:

  • open a PR to anthony's branch
  • format his code with black as one commit, with no other changes, to make diffs clearer.
  • add in the bits of your code that are missing to his, as one big commit. I didn't want you to lose all of your commits, but, maybe rebasing is too complicated due to the very divergent histories.
  • leave it, make no more changes, we will review that and try and merge it quickly for you.

Perhaps I shouldn't've suggested rebasing, perhaps I should've suggested something like this from the beginning.

dest = self.outdir
cmd = ["rm", "-rf", dest + "/*"]
self.subprocess_check_call(cmd)
cmd = ["jbrowse", "create", dest, "-t", JB2VER, "-f"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this is an acceptable compromise for reproducibility, though it's still sub-optimal as it will not work for any cluster without internet access (the entire norwegian secure data center), and will cause jobs to fail if the internet is interuppted where they would not otherwise.

"type": "AlignmentsTrack",
"trackId": tId,
"name": trackData["name"],
"assemblyNames": [self.genome_name],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is: why are we doing all of this in json? Why not use the add-track interface?

This entire thing should be something like

subprocess.check_call(['jbrowse', 'add-track', fname, '--indexFile', fname + '.bai',
 '--load', 'copy', '--assemblyNames', self.genome_name, '--category', trackData['category']])

we shouldn't be figuring out things like adapter and index location much less displayId that seems like a very internal detail that better handled by the jbrowse cli command. You could delete 80% of the lines from each of these methods, just like @abretaud was already doing, it makes these methods so small and clean abretaud/tools-iuc@a462ff9/tools/jbrowse2/jbrowse2.py#L592

dataset_path,
outputTrackConfig,
)
elif dataset_ext == "bam":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unresolving this conversation.

It's not a task for Anthony to do; you dropped his changes, and then ask him to spend the effort to bring his already functioning code back?

Had this PR been properly rebased on top of the existing PR, we would've gotten support for this "for free" (i.e. the code that was already written, could still be used.)

@fubar2
Copy link
Member Author

fubar2 commented Jan 22, 2024

@hexylena Thanks!
A lot to digest. I think it makes more sense for you to make #3997 work the way you want, rather than me trying to clean up the mess I've made here. I'll close this so there's less distraction.

@fubar2
Copy link
Member Author

fubar2 commented Jan 26, 2024

tl;dr Would far prefer to hand this important tool over to the IUC to manage, but regretfully closing this since I cannot complete the required repairs.

Will continue development at https://github.com/usegalaxy-eu/temporary-tools/tree/master/jbrowse2 if anyone wants to contribute.

@hexylena: Correct me if I got this wrong, but it looks like this requires fixes beyond my capacities. There also seems to be some confusion about how it diverged from #3997

  • rebase this PR to an independent previous pull request.
    • Jbrowse2 tool #3997 was not functional and sat idle for most of the year before I started on this code.
    • Annoying I didn't find it before I started, but that's how things have worked out.
    • Got a rebase done after much pfaffing about somehow, and then later screwed up the branch history totally.
    • Default/general session code was adapted, but nothing else that I recall.
    • A PR from someone competent is required.
  • swap out functional generated json for the CLI.
    • The CLI requires writing less code, so is simpler and cleaner, when it works.
    • Not always reliable in my hands, so swapped out for json that is.
      • CLI default session generator acknowledged broken but no sign of a fix.
      • Assemblies need json - the old PR used it it seems.
      • A PR from someone who cares is required.
  • Why was cram removed.
    • Confused with a different PR. This PR has never dealt with it. Not a requirement.
    • When first proposed, this PR was feature incomplete, but not broken, working very well for VGP workflows.
    • Adding cram from Jbrowse2 tool #3997 was not possible by then. Too much divergence.
    • Will be added when I can unless someone else contributes before then.
  • Multiple separate sets of assemblies and their tracks are mandatory for approval.
    • I don't understand or have any uses for these.
    • No objections - just no interest or insight.
    • Synteny works without this feature.
    • A PR from someone who understands and cares is required.

Synteny is now working on the Australian dev server for anyone with an account there. MAF and blastxml remain broken until Brad updates bcbio-gff.
image

@fubar2 fubar2 closed this Jan 26, 2024
@hexylena
Copy link
Member

Will continue development at usegalaxy-eu/temporary-tools@master/jbrowse2 if anyone wants to contribute.

EU's temporary tools repo is not intended for outside contributors. It's just used for deploying existing repo (from elsewhere) to EU. I would recommend against it.

It is really unfortunate that you can't contribute on top of Anthony's PR. I guess the two of us will cherry-pick from your branch when we find time to continue that PR.

And in the end we will have two competing jbrowse2 implementations. I don't think that's the best result for the community, but ok, maybe that's the best solution in this case.

swap out functional generated json for the CLI.

If it's not always reliable, I think those are upstream issues! The whole community would benefit from us reporting.

Multiple separate sets of assemblies and their tracks are mandatory for approval.

I get that you don't have a use case, but I do, I'm sorry we can't reach agreement on that.

for the record again, I was not asking for full support for multiple assemblies, but only a 2 line diff (ok some some more to fix up test cases but all changes not requiring a lot of work), adding a repeat element I could help fill in later, that would help users in the future.

Thank you for the time spent on this discussion, I wish you much success with your implementation.

@hexylena
Copy link
Member

hexylena commented Jan 26, 2024

@fubar2
Copy link
Member Author

fubar2 commented Jan 27, 2024

tl;dr Compared to the more typical method of providing a working PR, this seems an unusually coercive method for a reviewer to add features they want to a submission that already works satisfactorily without them. Please show a 2 line working patch and it will be merged and the problem solved the usual way - bet it's a lot more than 2 lines since every template variable name inside a new repeat will change.

for the record again, I was not asking for full support for multiple assemblies, but only a 2 line diff (ok some some more to fix up test cases but all changes not requiring a lot of work), adding a repeat element I could help fill in later, that would help users in the future.

If I understand correctly, @hexylena will not approve this until a placeholder outermost form repeat is added, and all the resulting template breakage fixed, so when the code needed to make it work is added, users will be able to rerun old version jobs with the updated tool version, something Galaxy has never previously guaranteed.

This novel future-proofing is so important to the community we serve, that the reviewer feels justified in blocking the merge, demanding additional non-trivial work, that fixes no reported defects, and adds no new functionality. Am I really the only one who thinks that the community could be better served by a merge, followed by the usual PR improvement process with new versions introducing new features, leaving old jobs as reproducible as they always have been?

@abretaud
Copy link
Contributor

abretaud commented Feb 1, 2024

Giving my point of view now as I didn't have much time to follow closely enough before. I have to say I'm very troubled to see this unfortunate turn of events.

First I'm 100% happy to see interest in a jbrowse2 tool, and can't wait to see it working. I need it for different projects, I have no interest in blocking or delaying anything.

I'm guilty of not finishing what I started in #3997, but I've always been completely open to anyone contributing to it, that's exactly why I opened a WIP PR. I think the jb code has also evolved in good ways since then, probably solving a few of the problems I encountered.

It's unfortunate that you hadn't seen #3997 before you started, but it was clearly not hidden, it was evoked at last GCC and various annotation meetings, and on GGA matric channel. But ok, that's how things have worked out, we can't change this now.

So you opened your new PR out of nowhere and have been super pushy to get it reviewed and merged quick. @hexylena and I have spent time reviewing it, although we had other plans. Now what I see is that :

  • apart from a first rebase that was then lost in opened/closed PR and forced pushes, you haven't made more efforts to integrate your work with what had already been done in Jbrowse2 tool #3997. I know this is a huge PR, and it's not easy to do, but that's how it works: when you contribute you need to adapt to what is existing.
  • you declined many of our comments with "A PR from someone more xxx is required" : again, you propose a contribution, you can ask politely for help but you can't just ignore other peoples' use cases
  • I don't see how these comments were so hard to address, given that @hexylena proposed to focus on a shortlist, and that for CRAM it works essentially as BAM files and there's working code in Jbrowse2 tool #3997 (coded it in like 30min in my PR without ever using CRAM format before), and adding a top level repeat is maybe boring (renaming fields) but without any programming challenge
  • you end up with an attack against @hexylena who have spent hours reviewing/testing your work and trying to find solutions constructively
  • and finally you seem to continue working on your forked tool on a repo that has never been made for that, which will only bring confusion to us contributors, and to Galaxy users ("wait, which jbrowse2 tool did you use?").

I don't like any of this. I understand that there's pressure to get something out quick, but it's no reason for this kind of turn of events.

So yes, I'm troubled. I'm confident that the latest work from @hexylena will allow us to merge #3997 soon, I just hope that we can integrate your valuable work at some point.

@fubar2
Copy link
Member Author

fubar2 commented Feb 1, 2024

@abretaud
This has indeed turned unpleasant through no fault of yours or mine as far as I can tell, but please understand I had independently started with jbrowse1, and submitted a functional, useful PR that was being used in VGP demonstration workflows, before your work was pointed out.

There are now 10 track types currently working with test data, and when the IUC reproduces that work, I'll use it. Whoever wrote it is of no interest to me.

No matter what, I refuse to be bullied into making a change someone else wants that has no impact on what I am busy working on. Apparently all it needs is a 2 line diff to make multiple genomes work. That exercise is left to someone who understands how those work.

As always, feel free to submit a PR, or do whatever you need to do.

I will add a cool input as I have a hic converter working and then will stop adding features to that repo of mine, as I have all I need for TreeValGal.

Thanks.

@hexylena hexylena mentioned this pull request May 22, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants