Outputs and deliverables from the Galaxy project. WIP! Mostly stubs. Help wanted

Author(s)

Overview
Questions:

What are the main identifiable project outputs?

How can Galaxy offer ‘free’ analysis computational resources?

How does the open source community support Galaxy

How are new project outputs or resources created?

Objectives:

Understand Galaxy outputs and open science services

Understand how different communities work together to make things happen

Understand opportunities for engagement and contributing your skills

Time estimation: 10 minutes

Supporting Materials:

instances Available on these Galaxies

Known Working

Possibly Working

UseGalaxy.eu

UseGalaxy.org

UseGalaxy.org.au

Containers

docker_image Docker image

Last modification: May 22, 2023

License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT

Comment: Note to contributors

Work in progress!! First draft to try to get a structure to make sense.

Needs many contributors to make it useful. What would you like to have known, when you first tried getting things done in Galaxy? Please add what’s missing and fix what’s broken. Headings are mostly stubs waiting to be edited and extended

Trying to describe the big picture will necessarily be big. Will probably need to break this already very wordy module into separate parts.

Add your story or stories to the stories tutorial too please!

Ross has strong opinions.

Many of them are probably wrong but he doesn’t know which ones yet.

Please feel free to contribute your own, to make this more useful to future readers.

Comment: Note to readers

The most important outputs of the Galaxy project are grouped arbitrarily here, and there are many overlaps, because Galaxy grows organically through collaborations, rather than by design.

The project continues to expand rapidly, so this module will need updating regularly

The Hub provides much more detail about many of the same structures and their activities, but this material is designed to provide simplified views of the project, so the Hub becomes easier to navigate.

This is an attempt at a kind of field guide to the ecosystem generating those Hub activities, for the use of participants trying to navigate it.

Agenda: Field Guide Part 3. Project outputs and impact

3. Project outputs

Source code

Open science analysis services

Capacity building: training resources and services

Downstream impacts: Normalising transparent, shareable open science analyses

Further reading

3. Project outputs

Galaxy source code is the core deliverable. Many other project activities build on it, to add additional value. This makes the code an essential resource for the project, but for clarity, it is arbitrarily grouped as a deliverable in this guide.

The most important project supported resources used by researchers are from the project source code being run to provide a web accessible scientific workflow application. There are an unknown number of private deployments, and 100+ specialised public servers are listed on the Hub. For many researchers, the most important examples are the free usegalaxy.* services. They, in turn, depend on all the other parts of the project, including the source code, and the people and the resources described in this guide.

Source code includes pluggable, shareable tool infrastructure, allowing third-party open source command line analysis software packages to be integrated. These can be installed to provide a specific local “flavour” for each framework deployment. In this way, generic project deliverables can be tailored to serve different kinds of science. The GTN supports integrated training resources, to make the framework and tools even more useful and valuable to researchers. The Galaxy communities of practice illustrate how efficiently the project can be extended to serve the analysis needs of whole communities of researchers in new fields.

Source code

The core framework source code is supported by many other project repositories. For example, providing tools for system administrators and developers, and ToolShed code for tool distribution services. These are not very useful outside the project, since they are specific to Galaxy. Their impact on open science is through their support for Galaxy.

Comment: Do we need to enumerate these?

Will most readers care about the details?

Generic analysis framework source code
ToolShed source code
Developer and system administrator utilities
- planemo, ephemeris, ansible…
Tool wrappers to “flavour” framework instances
- +8k variable quality tools in public toolshed
- Click to install from ToolShed, in any framework server
- Many well maintained tools from IUC and communities of practice.
Your ideas here please?

Open science analysis services

“Free” project supported services
- usegalaxy.*
- Large Australian, European and US research infrastructure allocations
- Tens of thousands of users
- Professional user support
- Stress testing framework code and tools
100+ specialised public instances
Unknown number of private installations
Your ideas here please?

Capacity building: training resources and services

Providing training to build community capacity is an essential activity for the project, to ensure wide, well managed deployment and long term sustainability.

GTN user training integrated directly into the Galaxy user interface, helps new users to gain the skills they need to be productive and efficient.
Training system administrators helps support the public usegalaxy.* and the many private servers that operate in academic and commercial laboratories.
Training for external developers makes it easier for them to contribute efficiently, improving Galaxy code and wrapping new tools.
The Galaxy Trainng Network (GTN) is central to building community capability.
Offers free training to enhance global open science research capacity.
- Generic aspects of using Galaxy for new users
- Specific kinds of analyses with common types of open data.
- System administrators are key to running reliable framework services
- Software developers can contribute more efficiently with appropriate training
- Trainers can learn how to prepare material for new GTN topics
Your ideas here please?

Downstream impacts: Normalising transparent, shareable open science analyses

Scientific scrutiny and trustworthiness: Problems with “black box” analyses.

An analysis where all scripts, package source code, assumptions, settings and methods cannot be readily shared and made accessible for independent replication, is effectively a “black box”. Lack of effective transparency prevents the scientific trustworthiness of the results being routinely tested. It is said that many eyes make bugs shallow, but commercial or other unshareable and opaque analyses, cannot easily be scrutinised, so results must be taken on trust.

Commercial or other “black box” analysis code and settings may be perfect. Unfortunately, experience suggests that all complex software contains errors, many of which can only be found after widespread and thorough independent scrutiny. This is as true of expensive commercial software, as it is of open source software. Open source package assumptions, methods and code are readily accessible for review, testing and improvement. Open projects encourage and facilitate scrutiny and replication, in order to decrease the risks to scientific integrity and trustworthiness from hidden coding or methodological errors. Making any new analysis transparent and reproducible is a hard technical problem, that is largely solved for Galaxy users without requiring any special effort on their part.

Transparent open science analysis for any researcher.

The downstream impact of Galaxy, on open science analysis practice, is important, and probably large, but it is hard to measure. Open science outputs are hard to identify.

Research outputs from Galaxy users in open science, indirectly represent increased analysis productivity for researchers. It is a process measure that suggests that it is useful to researchers. If scientific discovery is the desired output, it is very far removed but a hopeful indicator of activity at least.
More than 10,000 publications of all types are another, more tangible and direct measure of project impact.
Access to efficient and reliable analysis methods for large, complex data resources, probably increases their application in research, and thus their scientific value. Galaxy enables this for very large data collections in any scientific field, with configurable remote data sources. Measuring this impact is challenging, just as the opportunity costs of data lying idle because it is too hard to analyse, are unknown.
Improved trustworthiness of sharable, replicable analyses has important and lasting impact on open science. Again, this is very challenging to measure.
Analysis of provable scientific integrity are arguably the most important project deliverable
10k+ publications
Tens of thousands of scientists trained
Millions of jobs run.
Your ideas here please?

Frequently Asked Questions

Have questions about this tutorial? Check out the FAQ page for the Galaxy project guide topic to see if your question is listed there. If not, please ask your question on the GTN Gitter Channel or the Galaxy Help Forum

Feedback

Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.

Citing this Tutorial

ggsc, Ross Lazarus, Outputs and deliverables from the Galaxy project. WIP! Mostly stubs. Help wanted (Galaxy Training Materials). http://0.0.0.0:4000/training-material/topics/galaxy-project/tutorials/outputs/tutorial.html Online; accessed TODAY
Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012

@misc{galaxy-project-outputs,
author = "ggsc and Ross Lazarus",
title = "Outputs and deliverables from the Galaxy project. WIP! Mostly stubs. Help wanted (Galaxy Training Materials)",
year = "",
month = "",
day = ""
url = "\url{http://0.0.0.0:4000/training-material/topics/galaxy-project/tutorials/outputs/tutorial.html}",
note = "[Online; accessed TODAY]"
}
@article{Hiltemann_2023,
	doi = {10.1371/journal.pcbi.1010752},
	url = {https://doi.org/10.1371%2Fjournal.pcbi.1010752},
	year = 2023,
	month = {jan},
	publisher = {Public Library of Science ({PLoS})},
	volume = {19},
	number = {1},
	pages = {e1010752},
	author = {Saskia Hiltemann and Helena Rasche and Simon Gladman and Hans-Rudolf Hotz and Delphine Larivi{\`{e}}re and Daniel Blankenberg and Pratik D. Jagtap and Thomas Wollmann and Anthony Bretaudeau and Nadia Gou{\'{e}} and Timothy J. Griffin and Coline Royaux and Yvan Le Bras and Subina Mehta and Anna Syme and Frederik Coppens and Bert Droesbeke and Nicola Soranzo and Wendi Bacon and Fotis Psomopoulos and Crist{\'{o}}bal Gallardo-Alba and John Davis and Melanie Christine Föll and Matthias Fahrner and Maria A. Doyle and Beatriz Serrano-Solano and Anne Claire Fouilloux and Peter van Heusden and Wolfgang Maier and Dave Clements and Florian Heyl and Björn Grüning and B{\'{e}}r{\'{e}}nice Batut and},
	editor = {Francis Ouellette},
	title = {Galaxy Training: A powerful framework for teaching!},
	journal = {PLoS Comput Biol} Computational Biology}
}

                   

Congratulations on successfully completing this tutorial!

Galaxy Administrators: Install the missing tools

You can use Ephemeris's shed-tools install command to install the tools used in this tutorial.
shed-tools install [-g GALAXY] [-a API_KEY] -t <(curl http://localhost:4000//training-material//api/topics/galaxy-project/tutorials/outputs/tutorial.json | jq .admin_install_yaml -r)
Alternatively you can copy and paste the following YAML
--- {}