Introduction to the user guide

Author(s) AvatarggscAvatarRoss Lazarus
Overview
Creative Commons License: CC-BY Questions:
  • What are the main identifiable project components?

  • How do they fit together and interact?

  • Who pays for the ‘free’ analysis computational resources?

  • How are decisions really made?

  • Who is in charge?

  • Why do they give everything away

  • How can researchers join in?

Objectives:
  • Understand how the Galaxy project works

  • Understand how people make things happen

  • Understand opportunities for engagement and contributing your skills

Time estimation: 20 minutes
Supporting Materials:
Last modification: May 22, 2023
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT
Comment: Note to contributors
  • Work in progress!! First draft to try to get a structure to make sense.
  • Needs many contributors to make it useful. What would you like to have known, when you first tried getting things done in Galaxy? Please add what’s missing and fix what’s broken. Headings are mostly stubs waiting to be edited and extended
  • Trying to describe the big picture will necessarily be big. Will probably need to break this already very wordy module into separate parts.
  • Add your story or stories to the stories tutorial too please!
  • Ross has strong opinions.
  • Many of them are probably wrong but he doesn’t know which ones yet.
  • Please feel free to contribute your own, to make this more useful to future readers.
Comment: Note to readers

This module introduces the most important parts of the Galaxy project. They are grouped arbitrarily and there are many overlaps, because Galaxy grows organically through collaborations, rather than by design. The Hub provides much more detail about many of the same structures and their activities, but this material is designed to provide simplified views of the project, so the Hub becomes easier to navigate. This is a user manual, or field guide to the ecosystem generating those Hub activities, to assist participants trying to navigate it.

Agenda: Contents
  1. Motivation, goals and definitions
    1. Why does Galaxy need a user guide?
    2. How to use the guide.
    3. Structure for the guide
  2. A field guide to the Galaxy collaboration.
    1. People and their interactions
    2. Resources used in project activities
    3. Project outputs and deliverables
    4. Community development success stories

Motivation, goals and definitions

Why does Galaxy need a user guide?

Because it is a widely used, adaptable open science workflow platform embedded in the global open source and open science ecosystems.

Galaxy is the result of a global open science collaboration, sustained by communities and contributors. It has widespread impact on open science analysis practice, by making every analysis and result transparent and reproducible, without any technical effort from the user. Galaxy was designed as a scientific analysis workflow platform, with pluggable tools to suit any kind of research data.

Not just source code

The project produces much more than source code. Many project activities add additional value. For example, topic based activities, such as for Proteomics and Climate science, add value to the source code, by making Galaxy useful for anyone in those fields. These self managed open collaborations maintain “best practice” tools and workflows for common analyses, improving productivity, reliability and transparency for results. To make their work more useful and accessible, they provide integrated training resources, through the Galaxy Training Network (GTN).

In short, there is a lot going on, even in this subset of all project activity. In addition to sofware engineering, a wide range of different skills are needed to manage and run all the activities that make up the project.

Many ways to see Galaxy

Complexity and diversity offer a multitude of ways to engage, to suit individual interests. For example, it serves as

  • A convenient and reliable way to share complex open science analyses, for researchers,
  • Inter-dependent open source code repositories, for software developers,
  • A global open science collaboration, for investigators.

Each of these perspectives is valid, but incomplete. Seeing the big picture requires many perspectives and opinions. The project has grown so large, that few individuals can devote enough time to remain engaged in, and fully informed, about every project activity.

At this point in the growth of the project, the need for documenting the project itself is becoming increasingly apparent, partly to guide newcomers finding ways to engage on their own terms, and partly to help optimise leadership, governance and resilience.

All in one place

This material serves as a field guide to a complicated project, embedded and co-dependent in the global open science ecosystem. Many perspectives are needed for a comprehensive view. It aims to provide structured, orderly information about:

  • Major components and their interactions,
  • Resources consumed and outputs produced
  • How communities and collaborations interact to sustain the project,
  • How individuals can become engaged, on their own terms.

Galaxy is not controlled by any single institution, so participants must take responsibility for governing and leading activities. This is challenging in a distributed global enterprise. There are recognisable corporate structures, with an Executive Board at the top, providing leadership and direction, and many Working Groups and other sub-communities, where most of the day to day work in the project is done.

Scientific goals and direction, standards and milestones are discussed and determined by consensus, in an open, collaborative process involving the engaged community and professional staff. Ongoing survival depends on recognising the harsh realities of competitive grant deliverables and timetables, for investigators who pay the project’s bills. Ensuring that decisions are equitable, transparent and acceptable to the community, is essential for project sustainability, in contrast to most global commercial entities. While shared open values help in this process, it not sustainable in the long term with volunteer goodwill alone.

When activities satisfy genuine community needs, they grow and succeed in an open project. Investigators bringing ideas and resources to support new collaborations are always welcomed and assisted. That is how open projects like Galaxy grow, when participants lead and contribute to new activities that relate to their own research work. In undertaking that, they improve the value of the project for the entire community. It is hard to imagine a reason for an open project to reject a new self-sustaining open collaboration that serves real community need.

How to use the guide.

The Hub provides much more detail with current activities. This training material refers to the Hub where possible, but aims to present an orderly overview of the main structures.

This introduction is recommended reading as a guide to the guide, making it easier to follow. It also provides a restricted definition for the term “community” to simplify things. Related topic sections are linked at the end of this module, or from the main Topic page.

The guide includes descriptions by those involved, of how activities were started. These community development success stories can provide information and strategies for participants thinking about initiating new activities, based on existing successful initiatives.

Every open science project functions in, and depends on the context of the global open science ecosystem. That context is an essential part of a complete project description, and it is assumed that the reader is familiar with it, because they have found this guide.

Given the complexities, descriptive categories and divisions are necessarily arbitrary. The project has grown organically, and of its own accord, not neatly from a blueprint. It is driven by shared vision and values, and constantly adapting to rapidly changing externalities and project growth.

A useful guide breaks all the complexities into smaller, more manageable chunks, since the material is potentially overwhelming. Large open projects are complex, and each has particular complexities, so there are many alternative ways to describe them.

Many participants, structures and processes could fit into more than one category, with many users engaging with more than one community for example. It seems that there is no obvious, “best” way to describe an open science collaboration. It is a kind of virtual ecosystem with many interdependent and dynamic components with complex interactions and synergies. Descriptive terminology for the components and interactions needs to be invented, since it is not yet in wide use.

Structure for the guide

The arbitrary division used for this guide gives 3 high level categories, related in the following trivial model:

People + Resource inputs = Project outputs

This simplifies the challenge of seeing it all at once a little, and allows the guide to be split into corresponding sections. “Details” below provide more information about each component in overview. This division does not change the fact that in practice, the project depends on them all working together for success. Galaxy flourishes, because all these components govern themselves, in an efficient and productive global collaboration.

Open source is a very efficient way of delivering reliable complex applications, allowing Galaxy to build on and add value to the work of thousands of other open source communities. Shared values and participatory self-governance help people get things done. Galaxy participants must govern the collaboration for themselves, because there are many independent institutions and investigators, so no single institution has complete control

Individuals engage with the project according to their interests and on their own terms, such as:

  • Analysing and sharing experimental data
  • Working as a member of the core professional team
  • Contributing skills, support, code and ideas while working in a related field.
  • Building best practice tool kits and workflows for specific fields
  • Leading new activities in the collaboration by providing community and project leadership.
  • Contributing to project governance

Shared values encourage participants to take responsibility for fixing things, and to lead new initiatives, where they can, in all open source communities. A detailed module on people and their interactions is available

Computational resource allocations used for free services are provided by collaborating institutions, They add value to the source code, in the form of large scale computing power and professional staff, to provide free analysis and training services.

Core “corporate” project services like outreach, communication and administration are needed to keep the project on track. Professional, dedicated staff are needed for source code curation, user and contributor support, usegalaxy.* services, ToolShed maintenance, GTN infrastructure and services, and many other related resources. These also depend on community contributions, and there are many project management and administrative tasks in coordinating such a large and complicated global enterprise, that require dedicated effort, to support community volunteers.

The open science value added by these services is likely to be a multiple of the total grant investment in terms of return. Skilled community effort adds helps multiply total project impact and value, greatly exceeding the investment in collaborating grants.

A detailed guide to Resources used in project activities is available.

Galaxy source code is substantial, but it depends on, and is probably a small fraction of total source code lines, in the context of thousands of other open source project libraries, utilities, and analysis packages needed for it to work properly. Galaxy builds on and adds value to all those resources.

Flavours for Galaxy servers depend on open source command line packages. Any of those can become integrated tools, with effort from a skilled developer. Potential flavours for users can be delivered, for any field where reliable, transparent complex analysis computing is needed and suitable command line packages are available.

Wrapping a command line package as a tool, makes it interoperable and reproducible, and exposes it through a uniform graphical and forms based interface. Galaxy provides reproducibility, sharing, GUI, interoperability and workflows for users, without requiring effort from the package developer, or the user.

Galaxy source code is the core project deliverable. It is widely deployed in public and many more private settings, but it is used to support many other important project outputs. Those other project activities build on the source code, to provide a range of open science benefits such as:

  • The usegalaxy.* public deployments are large, free analysis services that support tens of thousands of researchers each day.
  • The Galaxy Training Network supports training to improve researcher productivity, integrated into Galaxy’s user interface.
  • Specialised training, toolkits and workflows for specific fields are supported and distributed by communities of practice

These and other outputs are described in more detail here

Communities: Definition and importance in this guide.

When individuals collaborate on an important shared interest, and organise activities that improve or extend the project, they may attract enough participants to engage and form an active project community. All active communities advertise open activities on the Hub. They are an important resource, adding value and helping sustain successful, self-governing open projects.

Communities organise open project related activities, allowing participants sharing particular common interests to work together to extend and improve the project.

Active, self-governing communities are important in the project ecology, because their activities engage users and extend the project, producing added value for open science.

When participants with a shared interest are sufficiently motivated to organise themselves to work on project activities related to that shared interest, they attract a new community, that extends and improves the project with related activities. They are a major source of added value, growth and sustainability for self-governing open projects. They are open, and welcome anyone sharing that interest to join in the activities they advertise on the Hub.

Definition

For clarity, the term is used here in a restrictive way, referring to communities that organise publicised activities for interested participants. Participants cannot engage easily if a community does not organise open activities, although it may be important in other ways. The functional definition used in this guide, is that a project community arises when those interactions become sufficiently frequent and distinct from existing communities, to require their own pages on the Hub.

Origins and sustainability

Communities form when participants come together to work on a project related initiative. Participants who share a common interest are more likely to interact when there are project activities related to that interest. That special interest might be ecology, proteomics, muon science, India or almost anything else that generates enough activity and participation.

Communities organise and publicise activities for interested participants through the Hub. Leadership and participation are responsive to the real needs of the community because they are motivated and organised by participants. The project provides support and publicity, but the community must be able to sustain itself, with minimal project resources.

Some specialised, small communities were initiated by the project team, but many are from participants organising themselves around an interest. All are open, and welcome interested participants. The origins of some are described in the companion community development success stories lesson.

In some specialised communities, “open” may need some nuance.

In some small communities, such as the IUC, formal decision making on technical issues is restricted to designated members. These groups manage and maintain important integrated research infrastructure, including tools, data and workflows. They are self-governing and open, because regular communication channels, code and documentation are public and visible. Their work depends on highly specialised skills, and they welcome contribution and ideas from anyone with an interest. In these community structures regular contributors are routinely recruited as formal members.

Importance

Communities are a core element of the success of any large open project. They represent a mix of self-selected individuals and professional staff, who work on a project activity in which they have a particular shared interest. Communities feature as a descriptive model for the companion People and their interactions lesson.

Ironically, the largest “community” of all, comprising the tens of thousands of researchers who use Galaxy for open science analysis each day, does not organise its own activities, so fails the restrictive definition. That does not diminish their importance, such as providing evidence of productivity for grant renewals, and serving as the major recruitment source of individuals choosing to engage in active project communities. The sharable, reproducible analysis results those users generate using Galaxy, are important downstream open science outputs, so they appear in the Project outputs and deliverables.

Shared values help communities flourish

Open source shared values, supported by an explicit participant Code of Conduct, ensure that all community activities are safe, productive and enjoyable for all participants:

  • inclusive, participatory, professional behaviours are encouraged
  • The project explicitly strives to welcome and engage contributors and users
  • Community members are encouraged to help make the project better.

Communities are a core resource, adding substantial value to project grant resources. Project success is the result of efficient and coherent self-governing collaboration, involving many specialised active communities of contributors.

A field guide to the Galaxy collaboration.

For more on the main components, and stories of how people get things done in the project, choose from the other lessons in this Topic: