The key aim of this github project is to provide a space where key questions, analytical steps, and results of our community modelling can be collated. Although some members of the R-GEMS project team are reasonably fluent in the use of git and github, for those who are not as familiar with how we tend to use this platform, this page provides an outline of how collaborators may contribute to this project.
There are several ways in which this project manages these different components - and, perhaps more importantly, where collaborators can contribute. These include:
If you are new to this project, please spend some time familiarising yourself with how best to contribute.
Details for the individual elements are provided below:
Individual wiki pages aim to summarise a key conceptual or analytical problem that this study will formally address. Each page provides some basic background to the question, why it is important, and maps out the steps to address it. This provides a space for collaborators to contribute alternative perspectives, relevant works, or additional steps that should be considered in the process.
As analyses progress, these pages will be updated with key results and preliminary interpretation. Here, we will aim to keep live links to our result outputs, so that these are continually updated. Contributors are encouraged to provide feedback on results & interpretations and raising of new issues or tasks on emergent themes [more on this below].
Issues
raised on this platform is one of the principal
ways in which we track the progress of analyses, tasks and basically
“who is doing what”. We have chosen to create
Milestones
as a means of setting targets for when a suite
of tasks should be completed. Some of these are somewhat arbitrary, but
others represent hard deadlines that we must meet (e.g. conference
presentation, manuscript submission date).
The raising of issues largely stems from the key steps/approach outlined in the wiki [described above], but can also arise from other conversations. The important thing here is that the issues are documented, including comments related to individual issues, and these are assigned to a particular person to ensure it is addressed (or closed). Github provides a way to assign different labels to individual issues (e.g. data, workflow, visualisations), that help us understand what the problem is and how collaborators can help with closing them.
The issues
platform provides a way to reference
individual collaborators on specific elements or comments on an issue.
Collaborators are encouraged to use this functionality, but also be
aware that it sends an automatic message to alert that person. The
usefulness of this functionality can be lost when individuals stop
‘watching’ the project or put a filter on their email - and
therefore not notified when their import is needed most. Just keep that
in mind.
The issues are necessarily dynamic and, more often than not, relate to one another. This reflects the nature of the work. Again, the important thing is that the issues (tasks) are documented, and we know how we are progressing towards our milestones. Issues can be cross-referenced between each other, so when appropriate, collaborators are encouraged to use this functionality.
Our preference is to include a single task (or group of closely related tasks) in a single issue. Again, this helps in being able to track the progress of individual tasks, but also helps align them to a particular conceptual question [outlined in the wiki above] or part of the workflow [described below].
The workflow for this project is managed through the
integrate.R
script which serves to provide the basic
functionality, documentation on the individual steps of the analysis,
and materials for dissemination. Individual scripts are created for each
individual analysis or task (e.g. data grooming) and the embedded folder
structure attempts to organise the different data sources &
products, exploratory analyses, and outputs.
The intention is that the entire project can be run from
integrate.R
and will reproduce all of the data grooming,
analyses and outputs for the thesis - something akin to reproducible
research.
Each modular script provides self-describing purpose and approach for carrying out its function, and individual comments embedded in the code provide documentation on technical aspects of the data, analysis, visualisation. Blocks or snippits of code can be imbedded into the issues [mentioned above] which can also be a good way to document what is going on in individual scripts (if there are any issues that need to be resolved).
Our experience is that including technical comments in the code itself provides a much easier way to track issues at the code level. Although the ultimate documentation should be reflected in the code itself - and we are tracking its development through git - collaborators should be aware that meaningful contributions can be made in the scripts themselves.