Knowledge management and version control

Previous steps

If you would like to return to information from the previous session, please click here.

Context

To ensure that the code we produce is transferable well-documented and reproducible, we will introduce approaches & tools that record progressive steps in the data standardisation, analysis, and production of results.

This wiki page aims to provide some aspects of Knowledge Management (KM) that are important for monitoring programmes and provide an introduction to the core tools for KM and version control. During this training course, we will be using these tools throughout, so the real learning will be in the application and use of the tools.

The conversion of raw data to Knowledge

One way to think about the “knowledge production” process is the relationship between data diversity (i.e. different types of information) and their integration (i.e. how well they are connected):

In this depiction, raw data is something that has little diversity and not integrated (i.e. sits in the lower left-hand corner of the figure). If we collect a lot of similar data, with little diversity (i.e. moving vertical in the figure), we tend to refer to these data as “mundane”. In the other extreme, if we collect lots of different data types, but do not integrate them, we tend to refer to these data sets as “complex”.

If we increase the diversity and integrate data, we begin to observe patterns. This process leads to the production of “information”. This is the level of information commonly provided in data portals: that is, they provide a bunch of different data types that allows one to link or integrate them in a limited way (like this one).

Management however, requires knowledge of process or information. This is when we integrate different patterns. Humans can actually do this very well, we often take different bits of “information” (or patterns observed in nature) and integrate them to form Knowledge. However, this process is much more difficult to serve up in a data portal.

One of the aims of this training course is to provide the skills and tools that can facilitate the conversion from raw data to information to knowledge. And by documenting this in a well documented, reproducible script we provide a register of that process of transforming data to knowledge. In addition, we will teach the use of “verson control”, which provides another layer of documentation of how we got from the raw data to informing process for management.

Keen observers will have noticed another level in our “Knowledge conversion” diagram, where the integration of different processes gives rise to principles which lead to Wisdom! That is, people who have “wisdom” are those who are able integrate different processes that give rise to general principles.

Paying off the Knowledge Debt

Furthermore, there is an expected relationship between data and knowledge. In science, if we don’t know something, the response often is “we need data”. This relationship could be “step shaped”, if there are major break throughs or with a gradual slope (e.g. in a slower moving discipline like ecology), but there is a general agreement that this relationship is positive (i.e. increasing data increases knowledge).

What tends to happen in a monitoring programme, where data are collected, but not converted to Knowledge. This “plateau” in the accumulation of data and lack of corresponding knowledge, creates a “Knowledge gap”.

This is often not intentional, but a rather a consequence of competing demands and other “priority” activities.

One of the aims of this training course is to provide the skills and a set of tools that allow for the routine processing of coral reef monitoring data, which can facilitate the routine production of knowledge and help keep everyone out of debt!

Version control

In order to doucment our steps of converting data to knowledge, we will be using a system of version control, which basically records all of the changes in the history of the project, provides documentation, and can be shared across collaborators to continually build on others’ experience.

There are a number of such version control systems and we will be using one called git:

One of the other advantages of using git is that we will only be “tracking” raw data and code (i.e. and not the ‘binary’ outputs (e.g. .pdf, png, et cetera)). This means it is efficient with the use of bandwith when synchronising, which is very valuable if working from remote locations or locations with limited bandwith!

To better understand the working of git, it us useful to think about how work is “staged” before “commiting” to the repository. This provides an opportunity to revise the changes prior to saving them and sharing them with colleagues.

This is helpful when we are working on multiple tasks and only a subset of them are ready to be “committed” and shared with colleagues.

Interfacing with Github

For this course, we are using Github as a way to exchange and synchronise code. This platform not only provides a secure point to save and distribute the project contents among collaborators, it also provides a number of useful features, including:

Creating & commenting on issues
Setting project deadlines and tracking progress
Project documentation using *.Rmd and Wiki pages

We will use these functions for aspects of the Homework and in the Project Documentation & Reporting Module.

Quick Instructions

Git works through a few command line instructions (although there are other “push button” and “menu driven” interfaces available). Most of the time, we only need to use a few commands:

  # this adds all changes to the staging area
    git add -A

  # this commits changes to the local repository with a message
    git commit -m 'explain whatever it was you were doing'

  # and then synchronise
    git push

  # to receive changes from others
    git pull

Next Steps

Now that we have some background for knowledge management and approaches to version control, we should now make sure that our computers are set up.