Continuous Integration

In continuous integration, the practices of version control, automated building, automated configuration, and automated testing are combined so that, as changes are checked in to the version control repository, the system is automatically rebuilt, tested, reports generated, and the results posted to a project website.

1 Big Builds

Think of everything we have started to put into our automated builds:

fetching and setup of 3rd party libraries
static analysis
compilation
unit testing
documentation generation
static analysis reports
packaging of artifacts
deployment/publication of artifacts
updating of project website

and, coming up, we will want to expand our testing to include

integration testing
test coverage reporting
system testing

There’s a danger of the builds becoming so unwieldy and slow that programmers will start to look for ways to circumvent steps,

Do We Need to do All of Those Steps, All of the Time?

One possible breakdown:

Every build	Occasional
fetching and setup of 3rd party libraries	documentation generation
static analysis	static analysis reports
compilation	deployment/publication of artifacts
unit testing	updating of project website
packaging of artifacts	integration testing
	test coverage reporting
	system testing

This should provide someone actively working on a specific module/story the info they need, deferring some of the more time-consuming build activities.

How do we divide these steps in the build?

Even the “occasional” activities may be done many times over the history of a project.
So we want to keep them automated, both for ease of performing them and to ensure they are performed consistently each time.
With make/ant/maven, we can have different targets/goals for the frequent and the occasional cases.
- But we have to remember to use the proper targets at the right time.
  - Maybe not a bid deal…
But there’s an opportunity here to do something much more interesting…

2 Continuous Integration

When we combine

Automated testing (unit, integration, system, and regression)
Centralized version control
- or distributed VC with a central “official” repository
Automated builds, capable of running tests, running analysis tools, and publishing the results on a project web site

we can rebuild and retest automatically as developers check in changes.

2.1 Key Ideas

Our project should have the characteristics:

Version control with a clearly identified main branch or set of main development branches.
Automated build is set up as usual.
Developers commit frequently (maybe many times per day)
- Commits to “private” branches (or local copies of a distributed repository) are ignored.
- Every commit of a tracked branch to the main repository is built on a separate server
  - The build includes all integration-related tasks (for early detection of integration problems.
  - Can also include more time-consuming reporting tasks.
Testing is done, ideally, in a clone of the production environment(s)
- May differ from development environments
- Probably not checked frequently under normal practice
- Can use multiple remote machine “runners” to provide varying target operating systems and environments.
Make the results highly visible

2.1.1 Advantages

Integration problems caught early and fixed fast
- avoids “integration hell”
Immediate testing of all changes
Emphasis on frequent check-ins encourages modularity
Visible code quality metrics motivate developers.

2.1.2 Disadvantages

Initial setup effort to set up
Level of sophistication required of team to put build, configuration mgmt, testing, reporting, into an automated build

2.2 Continuous Integration Systems

A CI system consists of a server/manager and one or more runners…

2.2.1 The Continuous Integration Server

A continuous integration server is a network-accessible machine that

Can be told of development projects under way, including
- location & access info to version control (VC) repository
- which branch(es) to watch
- how to build the project
- what reports are produced by the build
Monitors, in some fashion, the VC repository for commits
When a commit (to a monitored branch) takes place, the CI server notifies one or more runners.

2.2.2 Continuous Integration Runners

A CI runner (a.k.a., nodes or slave processors) is a process on a machine that

has the the necessary compilers and other tools for building a project.
is managed by the CI server.

When notified by the server, the runner

Checks out a designated branch of a project from its version control system.
Runs the build.
Publishes reports on the results of the build(s).

Runners are usually separate machines from the CI server.
A CI project may launch several different runners, each with a different configuration environment (e.g., different operating systems) to test the build under multiple configurations.

3 Case study: Jenkins

Jenkins is a popular CI server.

The CS Dept runs its own Jenkins server

An example of a Jenkins project

3.1 Projects on Jenkins

When you set up a project on Jenkins you must supply:

Basic project info:
- name and description,
- public/private,
- who can access.
Version control:
- What kind of version control is used,
- URL and access info to check out a copy of the project.
Build management:
- What build manager is used,
- where the build file can be found within the project directories
- what target/goal to use with the build
  (I usually add a special “jenkins” target to my Ant build.xml files.)
- Which of Jenkin’s nodes can be used for the build.
Reporting
- What reports Jenkins should publish.
- Where in your project directories the raw data for these reports can be found.

3.1.1 Jenkins and Project Reports

Many report-generating programs (e.g., JUnit, FindBugs, etc.) have separate “collection” and “reporting” stages.
- Typically the collection step writes raw data out in an XML format.
- Normally, you then run a separate task to reformat that XML into HTML or some other readable format.
Jenkins, however, has its own formatting functions for many common reports.
Among other things, these often add “historical” or “trend” reporting on how the collected data has varied over a period of time.

4 Case study: gitlab-ci

gitlab-ci is a CI server integrated into Gitlab.

Project build status is integrated into the version control activity reports,
- Example
  
  Click on Green checkmarks and Red X’s to see successful and failed builds.
Setup is generally easier if your project is already hosted on Gitlab.

4.1 gitlab-ci setup

Projects must activate gitlab-ci by designating a runner, generally on a remote machine under the developer’s control.
- You can have multiple runners. For example, you could have a Linux runner, a Windows 10 runner, and a MacOS runner.
Then add a file .gitlab-ci.yml to the repository root directory.

This is a YAML script that gets run after each new commit.

Script can limit which branches it applies to

4.1.1 An example of `.gitlab-ci.yml`

    stages:
      - build
      - test
      - deploy


    build-job:
      tags:
       - e-3208
      stage: build
      script:
       - eval $(ssh-agent -s -t 600)
       - ssh-add <(echo "$REPORTS_SSH_KEY")
       - cd codeCompCommon
       - ./gradlew build deployReports
      only:
       - master
      artifacts:
       paths:
         - codeCompCommon/build/libs/codeCompCommon-1.3.jar
         - README.md

tags: identifies the runner to be selected.

script: gives the build commands to be run on the runner machine.

only: limits these runs to checkins on the “master” branch.

artifacts: lists files that will be kept after the run and made available for downloading from the GitLab project page.

4.1.2 The `script` dissected

      script:
       - eval $(ssh-agent -s -t 600)                         ➀
       - ssh-add <(echo "$REPORTS_SSH_KEY")                  ➁
       - cd codeCompCommon                                   ➂
       - ./gradlew build deployReports                       ➃

This line launches an ssh key agent. The -t 600 option limits this agent to a maximum of 600 seconds before it shuts itself down.

We know the ssh-add command as a way to add private ssh keys to the agent.
In this case, the text of the (passphrase-free) private key is being supplied by GitLab as a secret project variable REPORTS_SSH_KEY.

These are created as part of the project settings and visible only to project “masters”.

They provide a useful way to add private keys and other secret credentials without putting them into the repository files where anyone might be able to download them.
cd into the subproject directory containing the code and the Gradle files
Run the build with targets build and deployReports.
- We’ll look at deployReports shortly.

5 gitlab-ci vs Jenkins

Reporting
- Jenkins provides fancier reporting options. It composes a nice-looking project summary page.
- Such activities must be scripted as part of the build to work in gitlab-ci.
  - Discussed later.
Flexibility
- Jenkins has a definite Java bias.
- gitlab-ci can run any language you can script a build for.
Setup
- Jenkins setup can be confusing.
- gitlab-ci setup is easier, but requires a properly setup remote runner.
Runners:
- Jenkins allows remote runners, which are easily shared among projects.
- gitlab-ci requires remote runners, often requiring each project to set up their own.

6 Case Study: Enhanced reporting

Goal: add project website pages with reports from analysis tools and trend information (historical graphs) similar to those offered by Jenkins.

We will build on the example of our JBake-generated website.
An example of a page we would like to generate is this PMD report page.

6.1 Generating Graphs

Highcharts is a Javascript package that can generate plots from data captured in CSV (Comma-Separated-Values) format.

For example, the plot on this page is generated from a file pmd.csv that looks like:

pmd,Violations
2019-03-23T14:30,22.0
2019-03-23T14:33,22.0
2019-03-23T15:21,22.0
2019-03-23T15:45,22.0
2019-03-24T21:59,22.0
2019-03-31T19:48,22.0
2019-03-31T20:26,22.0
2019-04-03T20:42,11.0

Each line (after the headers) represents one data point in the chart.

Highcharts requires a bit of Javascript to inject a chart into an HTML div element.

We load the Highcharts code in our website header

header.ftl.listing

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8"/>
    <title><#if (content.title)??><#escape x as x?xml>${content.title}</#escape><#else>codecentric</#if></title>
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <meta name="description" content="">
    <meta name="author" content="">
    <meta name="keywords" content="">
    <meta name="generator" content="JBake">

    <link href="<#if (content.rootpath)??>${content.rootpath}<#else></#if>css/base.css" rel="stylesheet">
    <link href="<#if (content.rootpath)??>${content.rootpath}<#else></#if>css/projectReports.css" rel="stylesheet">
    <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.12.0/jquery.min.js" type="text/javascript"></script>
    <script src="https://code.highcharts.com/highcharts.js"></script>
    <script src="https://code.highcharts.com/modules/data.js"></script>
    <script src="<#if (content.rootpath)??>${content.rootpath}<#else></#if>js/projectReports.js"></script>


  </head>
  <body>
    <div id="mainBody">

This includes my own Javascript file to make it easier to share the code among several pages.

projectReports.js.listing

/*
 * Register a div in the webpage as a HighCharts (http://www.highcharts.com/) chart
 * portraying a series of data in a CSV file
 */
 
 
 /* register1
  *    For CSV files containing a single series of data. Column A contains the
  *    Build numbers used as x values, column B the y values.
  *
  *  @param graphName  The id of the div to hold this chart.
  *  @param csvURL     URL to the .csv file
  *  @param title      Title for this chart.
  *  @param yAxistitle Title for the Y axis. (Series title is taken from
  *                      row 1 of the CSV file).
  */

	        function register1(graphName, csvURL, title, yAxisTitle) {
	            var divName = "#" + graphName;
	            $(document).ready(function() {  
			        $.get(csvURL, function(csv) {
			             $(divName).highcharts({
			                chart: {
			        	        type: 'area'
			                },
			                data: {
			                    csv: csv
			                },
			                title: {
						        text: title
					        },
					        yAxis: {
   						        title: {
							        text: yAxisTitle
						        }
					        },
					        xAxis: {
   						        title: {
							        text: "Build #"
						        }
					        },
                            plotOptions: {
                                series: {
                                    stacking: 'normal'
                                }
                            },
                            series: [{
                                color: "#0000cc"
                                }
                            ]
			             });
			        });
			    });	
	        }



 /* register2
  *    For CSV files containing two series of data. Column A contains the
  *    Build numbers used as x values, columns B and C the y values.
  *
  *  @param graphName  The id of the div to hold this chart.
  *  @param csvURL     URL to the .csv file
  *  @param title      Title for this chart.
  *  @param yAxistitle Title for the Y axis. (Series title is taken from
  *                      row 1 of the CSV file).
  */

	        function register2(graphName, csvURL, title, yAxisTitle) {
	            var divName = "#" + graphName;
	            $(document).ready(function() { 
			        $.get(csvURL, function(csv) {
			             $(divName).highcharts({
			                chart: {
			        	        type: 'area'
			                },
			                data: {
			                    csv: csv
			                },
			                title: {
						        text: title
					        },
					        yAxis: {
   						        title: {
							        text: yAxisTitle
						        }
					        },
					        xAxis: {
   						        title: {
							        text: "Build #"
						        }
					        },
                            plotOptions: {
                                series: {
                                    stacking: 'normal'
                                }
                            },
                            series: [{
                                color: "#009933"
                                }, {
                                color: "#cc0000"
                                }
                            ]
			             });
			        });
			    });	
	        }

Which makes for a reasonably straightforward content page:
pmd.html.listing
```
title=CodeCompCommon PMD Report
type=page
status=published
~~~~~~
</p>
<div class=reportGraphs>
   <div id="theGraph" class="graph">PMD</div>             ➀
</div>

<iframe class="docFrame" src="pmd/main.html"> </iframe>   ➁

<script type="text/javascript">
    register1("theGraph", "pmd.csv", "PMD", "Warnings");  ➂
</script>

<p>
```
1. This is the div that will be replaced by the chart.
2. The body of the report.
  This is very similar to the way we loaded our Javadoc and other reports earlier.
3. This calls my Javascript function, which in turn calls the Highchart functions, to schedule replacement of the above div by a chart derived from the data in pmd.csv.

6.2 Generating the Data

Where does the data for the plots come from?

Individual data points are extracted from the reports generated by PMD and other tools.

Each analysis tool requires some custom coding to get the data point.

Those data points are accumulated into a .csv file for the report by

Downloading the old CSV file (if there is one) fro mthe project website.
Adding the new data point as a new line at the end of the CSV file.
When the website is uploaded, it will include the new, one-line-longer, CSV file,

These steps are carried out by my own Report Accumulator Gradle plugin.

6.3 Report Accumulator

Back to build.gradle to load another plugin:

build.gradle.listing

buildscript {
	repositories {
		⋮
        ivy { // for report-accumulator
            url 'https://secweb.cs.odu.edu/~zeil/ivyrepo'     ➀
        }
	}
	
	dependencies {
		⋮
	    classpath 'edu.odu.cs.zeil:report_accumulator:1.2'
		⋮
    }
}

⋮


// Reporting


import edu.odu.cs.zeil.report_accumulator.ReportStats     ➁
import edu.odu.cs.zeil.report_accumulator.ReportsDeploy


task collectStats (type: ReportStats, dependsOn: ['build','reports']) {  ➂
    description "Collect statistics from various reports & analysis tools"
    reportsURL = 'https://www.cs.odu.edu/~zeil/gitlab/' + project.name + '/reports'
}


task site (dependsOn: ['copyBake', 'copyJDocs', 'collectStats']){ ➃
    description "Build the project website (in build/reports)"
    group "reporting"
}



task deployReports (type: ReportsDeploy, dependsOn: 'site') {  ➄
    description 'Deploy website to remote server'
    group 'reporting'
    deployDestination = 'rsync://zeil@atria.cs.odu.edu:codeCompCommon/reports/'
}

The usual steps to include a plugin.
Import the new task types created by the plugin
A ReportStats task
- scans the build/reports directory for reports from various tools,
- extracts a new data point where it can,
- downloads an existing CSV file for that report from the reportsURL
- adds the new data point to the end of the CSV file
A slight tweak to our earlier site target to make sure that the ReportStats task is performed
A ReportsDeploy task uploads the build/reports directory to the deployDestination URL.
- Assumes that any required ssh keys are already in an agent.

Continuous deployment publishes snapshots of deliverables as changes are checked in.

A variation of the rather common “daily build” practice seen on many projects.

Some Maven repositories (including our own Artifactory instance) provide separate “snapshot” repositories for this purpose.

The idea of “artifacts”, seen earlier, is one way to carry this out.

GitHub and later versions of GitLab now use the artifacts mechanism to deploy entire project websites, making it unnecessary to work with a separate web server.
Some organizations actually wire up build light indicators to provide a highly visible indicator of the status of the latest integration build.
- Some then point a webcam at the light and broadcast their status.
- Others opt for publishing a software analog of such lights on their project website.