-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JBrowse2: #5695 rebased with @abretaud (#3997) #5697
Conversation
It will take some time to review. Would it be possible to re-base your work atop #3997 and retain the commits? @abretaud has been working on this for a long time, it would be a pity to lose those commits and authorship attribution fully (merging in isn't quite the same). It would really help the reviewing process, it'd be clearer what was new, relative to the work Anthony has already done, so he could say whether the changes made sense or what might still be needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments in line, i'm really excited for jb2, but there's a lot of small details to get the tool optimal.
I still have to look into details but for now:
That can be a problem, ie the output dataset is no longer self-containing but depends on the existence/sharing of other ones, that can be problematic. And I have a use case where I use galaxy to generate jbrowse archives that I can download/deploy on another server. The default session / track styling was still largely WIP, I remember I wasn't sure of the proper way to implement this, and opened a discussion there: GMOD/jbrowse-components#3568 |
@abretaud URLs no longer used for reference and track files because too unreliable - requires the history to be published and the server must allow anonymous access - either of those could change any time so IMHO too fragile.
The default session using mostly unchanged code from your PR seems to work perfectly well - so well done!! |
@hexylena: rebased so those commits are now visible above. Thanks for all the helpful suggestions - incorporated and resolved I think, other than:
There's lots more to do and it needs feedback from users to improve, but arguably useful enough to expose as is so we can get that user feedback. |
did a bang up job, cheers. really appreciate it!
yeah, i mean, it sounds like there's a lot of pressure from one side to get this shipped, it could be installed just on that server in the current as-is state for better "real life" testing before we get it finalised since, yeah it's a massive tool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've made a first review of the code, with a few comments/changes to do. Can you look into it before merging?
In the meantime I'd like to also make a few tests locally
dataset_path, | ||
outputTrackConfig, | ||
) | ||
elif dataset_ext == "bam": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cram is missing, it's very similar to bam, I had a add_xam for both in my PR IIRC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR please. I cannot do this as I have no data or experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unresolving this conversation.
It's not a task for Anthony to do; you dropped his changes, and then ask him to spend the effort to bring his already functioning code back?
Had this PR been properly rebased on top of the existing PR, we would've gotten support for this "for free" (i.e. the code that was already written, could still be used.)
@bgruening @hexylena @abretaud Simplified. Useful enough to serve as a starting point. Plenty of creative opportunities remain. Send code. Track details, like colours, can only be configured by the user in the track browser - not on the tool form.Making them configurable requires hacking the internal default state. LimitsI really appreciate the effort required for review but please give some thought about how best to share responsible for this tool going forward. Advice is wonderful, but I cannot sustain all the work of implementing it, so if you think this tool is worth having, please pitch in, as it is not going to be sustainable otherwise. Features remaining require an understanding of paf, cram and sparql - not me. Jbrowse2 is a very substantial undertaking, and as far as I can tell, this PR is now sufficiently useful to have available as a starting point, to which others can contribute the bits they know and care about? Remaining features to complete
|
That's unfortunate, but no, let's not undertake modifying internal state, that sounds fraught. We should consider flagging this upstream, since it was a nice feature in the past and gave users a bit of control, but also maybe we can live without it. (I'm saying: I used it and loved it; not sure anyone else did.)
I mean, yeah... this is why it took us so long to get here, none of us have been able to allocate the time and energy to implementing such a big change. @abretaud had done all the work up until here, I would have loved to help but haven't been able to find free time. It doesn't help that for many of our use cases, jb1 is still sufficient (especially e.g. apollo.), and there wasn't an immediate need for jb2 even though it's a massive improvement in many many ways. Sorry you had to find out the hard way :( doubly so, for learning about the existing PR after expending so much energy.
We definitely do, but "things we think are worth having" and "science-capitalism allocates us available bandwidth to work" are two very, very separate axes.
yes, we can look towards merging. I guess you'll have to give @abretaud and I some time to install and test it
not a priority. |
That would be wonderful when you can find time. No more I can do at this point. It installs from the toolshed and works with the test data and on some VGP data I've tried but nobody else has tried it AFAIK.
I did some thread-jacking here - Colin was very responsive and is giving it some thought. Not a new idea - there is a now outdated CL |
@hexylena would it help if I deploy it on EU as test tool? |
@bgruening that would be a fantastic help!! |
@hexylena @abretaud here you go: https://usegalaxy.eu/?tool_id=jbrowse2 |
whoopsie |
not intended |
This is being reopened as a new PR as some git repairs went bad. |
Danke sehr @bgruening |
https://usegalaxy.eu/u/helena-rasche/h/jbrowse2-testing looks like some issues unfortunately, not sure what's happening there. |
dependencies were not installed so jbrowse not found :( |
please try again, works now for me. |
success-ish is it expected that no track data is rendering for me? (chrome + firefox on linux) |
Thanks @hexylena! Very glad to see that some things can be made to work.
Not just that. Most code and logic is unchanged from what you devised. Configuration json is the only real addition. Much remains to do. Question is, when is a big complex tool ready to expose for proper testing? See below for default location fix that may help your test case - fixed one of mine.
The paf code has been rewritten and so has the genome maker, so syntenic tracks can add genomes. The
Yes - a very helpful list thanks!
|
Grrr. Now that I've fixed that, the genome name is wierd. Hang about. Ah. It's the regexp not recognising anything with a period as the contig name - fixed. |
fixed case where contig has . breaking the regexp for defaultLocation
@hexylena: It shouldn't break now, but the Added a period to first regexp group, and it seems to work for the similar failing VGP example I found. The updated code is installed on https://dev.gvl.org.au - your gmail address is already registered there. Best place to test for now with small data, because I can keep the code updated as you find more bugs. I managed to cause a space crisis with VGP samples so will move those to EU where there's more space available for testing the workflows to generate JB2 configurations. |
I would be open to shipping a first version when:
the rest can be skipped. The checklist in my last issue, yeah, they're issues but they can wait, none of them are experience breaking. I do not expect synteny to be done soon, that will probably need some discusison with jbrowse regarding CLI tools for configuration that into a a good user experience in galaxy. |
Oh, good work, colours are there, and it's opening on the locaion correctly! That is fascinating behaviour though, it really subsets the genome, it doesn't just focus the entire genome's view there, it gives you an actual subset that you cannot scroll past the end of. Note the disabled zoom out box, you cannot zoom out. Hilarious. Ok, then I change my sine qua non list to be:
after that you've got my approval for merging. |
that regex is quite restrictive, what about |
(not for now) I mean, for my use case it would look like this:
so, we'll need the ability to provide multiple "assemblies" eventually to the jb2 tool. (for now) I don't think we can do the synteny on a decent time frame now, so, again let's just wrap it all in one big repeat for assemblies, and then in the future we'll tackle that in a good way. I just feel quite certain we'll want to support multiple assemblies, and if the tool already had the XML element for it, it won't break re-run for our users. |
The code is already a loop for a list, but the XML does not allow multiselect.
The code isn't changed much so For synteny, JB2 expects a paf and a list of genomes. It adds any new genomes ready for a track to the list from the paf form choice, ready for display - untested. |
You can. When a paf track is specified, it includes another genome. That gets indexed and added to the directory ready for tracks if the code works right which it probably does not. As best as I could manage, it does what the docs seem to suggest - but needs some testing and fixin' |
Ah. Here's what I got by rerunning the most recent job in that history, setting the For JB2, the reference label - your |
change regexp for that subset suggestions from @hexylena
Duplicated/rewritten to public discussion from backchannel. We are talking past each other, a bit. That's not the same thing as what I'm requesting, PAF (synteny) != assembly. Two separate (but related concepts.) A PAF file does not include multiple genome sequences: https://github.com/lh3/miniasm/blob/master/PAF.md I do not see how, by adding only a PAF file (which is just chr1 start1 end 1 chr2 start2 end2 type mapping) would include a whole new genome. Yes it is a required input for synteny to work, but it is not the entire thing required. Multiple assemblies are possible in the jbrowse2 interface that means multiple genomes with their own tracks. This is needed for when you want to do synteny. E.g. this was done manually, here you can see I added both an "ecoli" and "mrsa" assembly, two different assemblies with different tracks + sequencing data + genome sequence. A PAF file would need to be added to one, which would then let the user do the comparison in the synteny view Again I'm not asking you to implement any of this, I was asking for a top level repeat, wrapping the rest of the parameters, to be added, supporting the "Assembly" concept now the current tool looks like:
We will need it to look like this for synteny:
So multiple assemblies can be added, each with their own genome + own tracks. If we make this change now (again should literally just be a
Galaxy will of course be able to map that to a newer version of the tool. I'm sorry this has been so difficult to communicate, I just want a repeat, wrapping everything, min=1 max=1.
It is mainly for synteny, but I can see the following use case: they want to publish all of their genome data, from the genomes they annotate, in one jbrowse that a user can view. I would've done this with phages, maybe pick 5 phages, their annotation tracks, and configure that as a genome browser. There'd be no synteny view, but the user could still swap between all of our different cool annotated genomes.
Yes, exactly, the tracks need to be associated with a genome at the top level. They cannot use the same reference. That's what the assembly repeat is for. Then the user can
And the genome + track are each associated to a different assembly (this requires a command line parameter in the add-track
Ok, I will cede this one. Fine, not required. JBrowse2 requires you to select an assembly on first view. I do not like it, but, ok, let's call that a JBrowse2 problem. |
Track categories are pretty trivial to support properly, you just need to pass
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My previous review(s) have been looking only at the functionality of the UI/UX as a user, we've reached an agreement on that for all intents and purposes:
- I've ceded a default view as an upstream issue, and all of the other potential things I flagged, it's fine, don't do them.
- (except for the repeat I want wrapping the entire inputs section to support 'assemblies' as a top level concept to make it more future proof. If you cannot add this, tell me, I will once we get there. Ignoring for the test cases, it will be a 4 line diff.)
Now I'm looking at the code
Since the rebasing hasn't happened yet (or rather it did, and then was subsequently un-done), I'm looking primarily at a diff of the two branches, to see what's actually changed (git diff -w pr-3997:tools/jbrowse2/jbrowse2.py pr-5697:tools/jbrowse2/jbrowse2.py
, if you have two appropriately named branches on hand.)
jbrowse.xml diff, compared to #3997
- mostly things that are supported in anthony's branch, are removed (cram)
- some new test cases added (yay!)
- most of the help text deleted (shrug. I don't think anyone was ever reading it.)
jbrowse.py diff
- clearly run through black (yay)
- all of the tracks are manually configured via JSON, rather than using the built in
add-track
command. (This is not good.) - categories support lost
- assemblies support lost (rather, assemblies are made per-provided .fa file which is not desirable from my view.)
You've added some scripts and utilities and nice stuff (cool! it's nice! the webserver I think would be more generally useful as redbean, but I also don't think anyone will actually use this, so, I do not care, just leave it as is.)
That said, I do not think I can accept this PR, I am -1 on it as it stands.
Anthony's code is significantly cleaner, it has native support for assemblies (in the .py at least, already handling an edge case I noticed of naming collisions when testing myself with the CLI), and more features like CRAM support. I want all of that work to still be included. I can't accept it being dropped. Nor can I accept you asking him to PR his changes into your branch, when it should be the other way around.
The diff over what is currently present in @abretaud's PR is maybe interesting, but the replacement of add-track
with more brittle json configuration is also sub-optimal, though I suspect a relic of your first PR where you mention using pointing inputs at URLs to avoid copies (I would have rejected that too, sorry, it will fail in far too many environments.). The result however, is that it relies too much on jbrowse2 internals when there's no need and they provide really nice CLI tools, whose cli interface is going to be more stable over time, and doesn't require reverse engineering their internal configuration, which makes adding new formats a cinch because they're already supported by the underlying add-track function.
I really do want whatever useful changes you've made!! I'm really happy you're motivated. I've probably killed a lot of that motivation, and I'm sorry for that, but I have my standards for what I approve, and I'd like to see all of our goals met; you, me, GGA, Anthony. You've done great work getting the dependencies updated and worrying about a lot of the little details that needed to be done, Thank you.
Still, I feel really sad at how much code has been written, how many commits, how many adventures you've had with git, how many PRs that have been opened and closed, how much stress was generated for all of us. It's not fun for anyone, I'm sorry it happened, that sucks.
As you've told me, this is not a pleasant review process, it's not a pleasant feeling, and I can completely understand and empathise there, I think all three of us feel that way. I really had hoped that after the first discussion, this could have been rebased on @abretaud 's PR, and we could review the you know, 5 commits or whatever, and merge it. Instead we've had a lot of miscommunications that lead to long discussions (assemblies), and multiple branches opened and closed, dozens of commits made that have to be re-reviewed, and that has been really unfortunate. I don't know how we could work around that, if you have suggestions here for next time I am very open to what could change.
The way I see to move forward is to:
- open a PR to anthony's branch
- format his code with black as one commit, with no other changes, to make diffs clearer.
- add in the bits of your code that are missing to his, as one big commit. I didn't want you to lose all of your commits, but, maybe rebasing is too complicated due to the very divergent histories.
- leave it, make no more changes, we will review that and try and merge it quickly for you.
Perhaps I shouldn't've suggested rebasing, perhaps I should've suggested something like this from the beginning.
dest = self.outdir | ||
cmd = ["rm", "-rf", dest + "/*"] | ||
self.subprocess_check_call(cmd) | ||
cmd = ["jbrowse", "create", dest, "-t", JB2VER, "-f"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this is an acceptable compromise for reproducibility, though it's still sub-optimal as it will not work for any cluster without internet access (the entire norwegian secure data center), and will cause jobs to fail if the internet is interuppted where they would not otherwise.
"type": "AlignmentsTrack", | ||
"trackId": tId, | ||
"name": trackData["name"], | ||
"assemblyNames": [self.genome_name], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question is: why are we doing all of this in json? Why not use the add-track
interface?
This entire thing should be something like
subprocess.check_call(['jbrowse', 'add-track', fname, '--indexFile', fname + '.bai',
'--load', 'copy', '--assemblyNames', self.genome_name, '--category', trackData['category']])
we shouldn't be figuring out things like adapter
and index location
much less displayId
that seems like a very internal detail that better handled by the jbrowse cli command. You could delete 80% of the lines from each of these methods, just like @abretaud was already doing, it makes these methods so small and clean abretaud/tools-iuc@a462ff9/tools/jbrowse2/jbrowse2.py#L592
dataset_path, | ||
outputTrackConfig, | ||
) | ||
elif dataset_ext == "bam": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unresolving this conversation.
It's not a task for Anthony to do; you dropped his changes, and then ask him to spend the effort to bring his already functioning code back?
Had this PR been properly rebased on top of the existing PR, we would've gotten support for this "for free" (i.e. the code that was already written, could still be used.)
tl;dr Would far prefer to hand this important tool over to the IUC to manage, but regretfully closing this since I cannot complete the required repairs. Will continue development at https://github.com/usegalaxy-eu/temporary-tools/tree/master/jbrowse2 if anyone wants to contribute. @hexylena: Correct me if I got this wrong, but it looks like this requires fixes beyond my capacities. There also seems to be some confusion about how it diverged from #3997
Synteny is now working on the Australian dev server for anyone with an account there. MAF and blastxml remain broken until Brad updates bcbio-gff. |
EU's temporary tools repo is not intended for outside contributors. It's just used for deploying existing repo (from elsewhere) to EU. I would recommend against it. It is really unfortunate that you can't contribute on top of Anthony's PR. I guess the two of us will cherry-pick from your branch when we find time to continue that PR. And in the end we will have two competing jbrowse2 implementations. I don't think that's the best result for the community, but ok, maybe that's the best solution in this case.
If it's not always reliable, I think those are upstream issues! The whole community would benefit from us reporting.
I get that you don't have a use case, but I do, I'm sorry we can't reach agreement on that. for the record again, I was not asking for full support for multiple assemblies, but only a 2 line diff (ok some some more to fix up test cases but all changes not requiring a lot of work), adding a repeat element I could help fill in later, that would help users in the future. Thank you for the time spent on this discussion, I wish you much success with your implementation. |
I would recommend changing your toolshed API key @fubar2 |
tl;dr Compared to the more typical method of providing a working PR, this seems an unusually coercive method for a reviewer to add features they want to a submission that already works satisfactorily without them. Please show a 2 line working patch and it will be merged and the problem solved the usual way - bet it's a lot more than 2 lines since every template variable name inside a new repeat will change.
If I understand correctly, @hexylena will not approve this until a placeholder outermost form repeat is added, and all the resulting template breakage fixed, so when the code needed to make it work is added, users will be able to rerun old version jobs with the updated tool version, something Galaxy has never previously guaranteed. This novel future-proofing is so important to the community we serve, that the reviewer feels justified in blocking the merge, demanding additional non-trivial work, that fixes no reported defects, and adds no new functionality. Am I really the only one who thinks that the community could be better served by a merge, followed by the usual PR improvement process with new versions introducing new features, leaving old jobs as reproducible as they always have been? |
Giving my point of view now as I didn't have much time to follow closely enough before. I have to say I'm very troubled to see this unfortunate turn of events. First I'm 100% happy to see interest in a jbrowse2 tool, and can't wait to see it working. I need it for different projects, I have no interest in blocking or delaying anything. I'm guilty of not finishing what I started in #3997, but I've always been completely open to anyone contributing to it, that's exactly why I opened a WIP PR. I think the jb code has also evolved in good ways since then, probably solving a few of the problems I encountered. It's unfortunate that you hadn't seen #3997 before you started, but it was clearly not hidden, it was evoked at last GCC and various annotation meetings, and on GGA matric channel. But ok, that's how things have worked out, we can't change this now. So you opened your new PR out of nowhere and have been super pushy to get it reviewed and merged quick. @hexylena and I have spent time reviewing it, although we had other plans. Now what I see is that :
I don't like any of this. I understand that there's pressure to get something out quick, but it's no reason for this kind of turn of events. So yes, I'm troubled. I'm confident that the latest work from @hexylena will allow us to merge #3997 soon, I just hope that we can integrate your valuable work at some point. |
@abretaud There are now 10 track types currently working with test data, and when the IUC reproduces that work, I'll use it. Whoever wrote it is of no interest to me. No matter what, I refuse to be bullied into making a change someone else wants that has no impact on what I am busy working on. Apparently all it needs is a 2 line diff to make multiple genomes work. That exercise is left to someone who understands how those work. As always, feel free to submit a PR, or do whatever you need to do. I will add a cool input as I have a hic converter working and then will stop adding features to that repo of mine, as I have all I need for TreeValGal. Thanks. |
Merges #5695 derived from @hexylena JBrowse1 tool with most of the improvements made by @abretaud #3997 such as adding track controls to the tool form.
Sad I didn't know about #3997 until after repeating a lot of the same updates - but that PR included all the track formatting UI form code that #5695 was missing.
Merge adds:
Removes some #3997 tests failing because looking for JB1 elements - need updating.
Fixes testing in biocontainers - different path in Conda so now use
jbrowse create
.Available for testing from the toolshed
Screenshot shows test bed bam vcf maf bigwig gff sample data tracks
and HiC
expanded UI example: