Overview

The key aim of this github project is to provide a space where key questions, analytical steps, and results of our community modelling can be collated. Although some members of the R-GEMS project team are reasonably fluent in the use of git and github, for those who are not as familiar with how we tend to use this platform, this page provides an outline of how collaborators may contribute to this project.

There are several ways in which this project manages these different components - and, perhaps more importantly, where collaborators can contribute. These include:

  1. The Wiki (i.e. where you are at the moment).
  2. Issues and milestones.
  3. The code.

If you are new to this project, please spend some time familiarising yourself with how best to contribute.

Details for the individual elements are provided below:

The Wiki

Individual wiki pages aim to summarise a key conceptual or analytical problem that this study will formally address. Each page provides some basic background to the question, why it is important, and maps out the steps to address it. This provides a space for collaborators to contribute alternative perspectives, relevant works, or additional steps that should be considered in the process.

As analyses progress, these pages will be updated with key results and preliminary interpretation. Here, we will aim to keep live links to our result outputs, so that these are continually updated. Contributors are encouraged to provide feedback on results & interpretations and raising of new issues or tasks on emergent themes [more on this below].

Issues and Milestones

Issues raised on this platform is one of the principal ways in which we track the progress of analyses, tasks and basically “who is doing what”. We have chosen to create Milestones as a means of setting targets for when a suite of tasks should be completed. Some of these are somewhat arbitrary, but others represent hard deadlines that we must meet (e.g. conference presentation, manuscript submission date).

The raising of issues largely stems from the key steps/approach outlined in the wiki [described above], but can also arise from other conversations. The important thing here is that the issues are documented, including comments related to individual issues, and these are assigned to a particular person to ensure it is addressed (or closed). Github provides a way to assign different labels to individual issues (e.g. data, workflow, visualisations), that help us understand what the problem is and how collaborators can help with closing them.

The issues platform provides a way to reference individual collaborators on specific elements or comments on an issue. Collaborators are encouraged to use this functionality, but also be aware that it sends an automatic message to alert that person. The usefulness of this functionality can be lost when individuals stop ‘watching’ the project or put a filter on their email - and therefore not notified when their import is needed most. Just keep that in mind.

The issues are necessarily dynamic and, more often than not, relate to one another. This reflects the nature of the work. Again, the important thing is that the issues (tasks) are documented, and we know how we are progressing towards our milestones. Issues can be cross-referenced between each other, so when appropriate, collaborators are encouraged to use this functionality.

Our preference is to include a single task (or group of closely related tasks) in a single issue. Again, this helps in being able to track the progress of individual tasks, but also helps align them to a particular conceptual question [outlined in the wiki above] or part of the workflow [described below].

The Code

The workflow for this project is managed through the integrate.R script which serves to provide the basic functionality, documentation on the individual steps of the analysis, and materials for dissemination. Individual scripts are created for each individual analysis or task (e.g. data grooming) and the embedded folder structure attempts to organise the different data sources & products, exploratory analyses, and outputs.

The intention is that the entire project can be run from integrate.R and will reproduce all of the data grooming, analyses and outputs for the thesis - something akin to reproducible research.

Each modular script provides self-describing purpose and approach for carrying out its function, and individual comments embedded in the code provide documentation on technical aspects of the data, analysis, visualisation. Blocks or snippits of code can be imbedded into the issues [mentioned above] which can also be a good way to document what is going on in individual scripts (if there are any issues that need to be resolved).

Our experience is that including technical comments in the code itself provides a much easier way to track issues at the code level. Although the ultimate documentation should be reflected in the code itself - and we are tracking its development through git - collaborators should be aware that meaningful contributions can be made in the scripts themselves.