Continuous Integration
Steven J Zeil
Abstract
In continuous integration, the practices of version control, automated building, automated configuration, and automated testing are combined so that, as changes are checked in to the version control repository, the system is automatically rebuilt, tested, reports generated, and the results posted to a project website.
1 Big Builds
Think of everything we have started to put into our automated builds:
- fetching and setup of 3rd party libraries
- static analysis
- compilation
- unit testing
- documentation generation
- static analysis reports
- packaging of artifacts
- deployment/publication of artifacts
- updating of project website
and, coming up, we will want to expand our testing to include
- integration testing
- test coverage reporting
- system testing
There’s a danger of the builds becoming so unwieldy and slow that programmers will start to look for ways to circumvent steps,
Do We Need to do All of Those Steps, All of the Time?
One possible breakdown:
Every build | Occasional |
---|---|
fetching and setup of 3rd party libraries | documentation generation |
static analysis | static analysis reports |
compilation | deployment/publication of artifacts |
unit testing | updating of project website |
packaging of artifacts | integration testing |
test coverage reporting | |
system testing |
This should provide someone actively working on a specific module/story the info they need, deferring some of the more time-consuming build activities.
How do we divide these steps in the build?
-
Even the “occasional” activities may be done many times over the history of a project.
-
So we want to keep them automated, both for ease of performing them and to ensure they are performed consistently each time.
-
With
make
/ant
/maven
, we can have different targets/goals for the frequent and the occasional cases.- But we have to remember to use the proper targets at the right time.
- Maybe not a bid deal…
- But we have to remember to use the proper targets at the right time.
-
But there’s an opportunity here to do something much more interesting…
2 Continuous Integration
When we combine
- Automated testing (unit, integration, system, and regression)
- Centralized version control
- or distributed VC with a central “official” repository
- Automated builds, capable of running tests, running analysis tools, and publishing the results on a project web site
we can rebuild and retest automatically as developers check in changes.
2.1 Key Ideas
Our project should have the characteristics:
-
Version control with a clearly identified main branch or set of main development branches.
-
Automated build is set up as usual.
-
Developers commit frequently (maybe many times per day)
- Commits to “private” branches (or local copies of a distributed repository) are ignored.
- Every commit of a tracked branch to the main repository is built on a separate server
- The build includes all integration-related tasks (for early detection of integration problems.
- Can also include more time-consuming reporting tasks.
-
Testing is done, ideally, in a clone of the production environment(s)
- May differ from development environments
- Probably not checked frequently under normal practice
- Can use multiple remote machine “runners” to provide varying target operating systems and environments.
-
Make the results highly visible
2.1.1 Advantages
- Integration problems caught early and fixed fast
- avoids “integration hell”
- Immediate testing of all changes
- Emphasis on frequent check-ins encourages modularity
- Visible code quality metrics motivate developers.
2.1.2 Disadvantages
- Initial setup effort to set up
- Level of sophistication required of team to put build, configuration mgmt, testing, reporting, into an automated build
2.2 Continuous Integration Systems
A CI system consists of a server/manager and one or more runners…
2.2.1 The Continuous Integration Server
A continuous integration server is a network-accessible machine that
-
Can be told of development projects under way, including
- location & access info to version control (VC) repository
- which branch(es) to watch
- how to build the project
- what reports are produced by the build
-
Monitors, in some fashion, the VC repository for commits
-
When a commit (to a monitored branch) takes place, the CI server notifies one or more runners.
2.2.2 Continuous Integration Runners
A CI runner (a.k.a., nodes or slave processors) is a process on a machine that
-
has the the necessary compilers and other tools for building a project.
-
is managed by the CI server.
When notified by the server, the runner
-
Checks out a designated branch of a project from its version control system.
-
Runs the build.
-
Publishes reports on the results of the build(s).
-
Runners are usually separate machines from the CI server.
-
A CI project may launch several different runners, each with a different configuration environment (e.g., different operating systems) to test the build under multiple configurations.
3 Case study: Jenkins
Jenkins is a popular CI server.
The CS Dept runs its own Jenkins server
- An example of a Jenkins project
3.1 Projects on Jenkins
When you set up a project on Jenkins you must supply:
-
Basic project info:
- name and description,
- public/private,
- who can access.
-
Version control:
- What kind of version control is used,
- URL and access info to check out a copy of the project.
-
Build management:
- What build manager is used,
- where the build file can be found within the project directories
- what target/goal to use with the build
(I usually add a special “
jenkins
” target to myAnt
build.xml
files.) -
Which of Jenkin’s nodes can be used for the build.
-
Reporting
- What reports Jenkins should publish.
- Where in your project directories the raw data for these reports can be found.
3.1.1 Jenkins and Project Reports
-
Many report-generating programs (e.g., JUnit, FindBugs, etc.) have separate “collection” and “reporting” stages.
-
Typically the collection step writes raw data out in an XML format.
-
Normally, you then run a separate task to reformat that XML into HTML or some other readable format.
-
-
Jenkins, however, has its own formatting functions for many common reports.
-
Among other things, these often add “historical” or “trend” reporting on how the collected data has varied over a period of time.
4 Case study: gitlab-ci
gitlab-ci
is a CI server integrated into Gitlab.
-
Project build status is integrated into the version control activity reports,
-
Click on Green checkmarks and Red X’s to see successful and failed builds.
-
-
Setup is generally easier if your project is already hosted on Gitlab.
4.1 gitlab-ci setup
-
Projects must activate
gitlab-ci
by designating a runner, generally on a remote machine under the developer’s control.-
You can have multiple runners. For example, you could have a Linux runner, a Windows 10 runner, and a MacOS runner.
-
-
Then add a file
.gitlab-ci.yml
to the repository root directory.This is a YAML script that gets run after each new commit.
Script can limit which branches it applies to
4.1.1 An example of .gitlab-ci.yml
stages:
- build
- test
- deploy
build-job:
tags:
- e-3208
stage: build
script:
- eval $(ssh-agent -s -t 600)
- ssh-add <(echo "$REPORTS_SSH_KEY")
- cd codeCompCommon
- ./gradlew build deployReports
only:
- master
artifacts:
paths:
- codeCompCommon/build/libs/codeCompCommon-1.3.jar
- README.md
tags:
identifies the runner to be selected.script:
gives the build commands to be run on the runner machine.only:
limits these runs to checkins on the “master” branch.artifacts:
lists files that will be kept after the run and made available for downloading from the GitLab project page.
4.1.2 The script
dissected
script:
- eval $(ssh-agent -s -t 600) ➀
- ssh-add <(echo "$REPORTS_SSH_KEY") ➁
- cd codeCompCommon ➂
- ./gradlew build deployReports ➃
- This line launches an ssh key agent. The
-t 600
option limits this agent to a maximum of 600 seconds before it shuts itself down. - We know the
ssh-add
command as a way to add private ssh keys to the agent.In this case, the text of the (passphrase-free) private key is being supplied by GitLab as a secret project variable
REPORTS_SSH_KEY
.-
These are created as part of the project settings and visible only to project “masters”.
-
They provide a useful way to add private keys and other secret credentials without putting them into the repository files where anyone might be able to download them.
-
-
cd
into the subproject directory containing the code and the Gradle files -
Run the build with targets
build
anddeployReports
.-
We’ll look at
deployReports
shortly.
-
5 gitlab-ci vs Jenkins
-
Reporting
-
Jenkins provides fancier reporting options. It composes a nice-looking project summary page.
-
Such activities must be scripted as part of the build to work in gitlab-ci.
- Discussed later.
-
-
Flexibility
-
Jenkins has a definite Java bias.
-
gitlab-ci can run any language you can script a build for.
-
-
Setup
-
Jenkins setup can be confusing.
-
gitlab-ci setup is easier, but requires a properly setup remote runner.
-
-
Runners:
- Jenkins allows remote runners, which are easily shared among projects.
- gitlab-ci requires remote runners, often requiring each project to set up their own.
6 Case Study: Enhanced reporting
Goal: add project website pages with reports from analysis tools and trend information (historical graphs) similar to those offered by Jenkins.
-
We will build on the example of our JBake-generated website.
-
An example of a page we would like to generate is this PMD report page.
6.1 Generating Graphs
Highcharts is a Javascript package that can generate plots from data captured in CSV (Comma-Separated-Values) format.
- For example, the plot on this page is generated from a file
pmd.csv
that looks like:pmd,Violations 2019-03-23T14:30,22.0 2019-03-23T14:33,22.0 2019-03-23T15:21,22.0 2019-03-23T15:45,22.0 2019-03-24T21:59,22.0 2019-03-31T19:48,22.0 2019-03-31T20:26,22.0 2019-04-03T20:42,11.0
Each line (after the headers) represents one data point in the chart.
Highcharts requires a bit of Javascript to inject a chart into an HTML div
element.
-
We load the Highcharts code in our website header
header.ftl.listing<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"/> <title><#if (content.title)??><#escape x as x?xml>${content.title}</#escape><#else>codecentric</#if></title> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta name="description" content=""> <meta name="author" content=""> <meta name="keywords" content=""> <meta name="generator" content="JBake"> <link href="<#if (content.rootpath)??>${content.rootpath}<#else></#if>css/base.css" rel="stylesheet"> <link href="<#if (content.rootpath)??>${content.rootpath}<#else></#if>css/projectReports.css" rel="stylesheet"> <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.12.0/jquery.min.js" type="text/javascript"></script> <script src="https://code.highcharts.com/highcharts.js"></script> <script src="https://code.highcharts.com/modules/data.js"></script> <script src="<#if (content.rootpath)??>${content.rootpath}<#else></#if>js/projectReports.js"></script> </head> <body> <div id="mainBody">
-
This includes my own Javascript file to make it easier to share the code among several pages.
projectReports.js.listing/* * Register a div in the webpage as a HighCharts (http://www.highcharts.com/) chart * portraying a series of data in a CSV file */ /* register1 * For CSV files containing a single series of data. Column A contains the * Build numbers used as x values, column B the y values. * * @param graphName The id of the div to hold this chart. * @param csvURL URL to the .csv file * @param title Title for this chart. * @param yAxistitle Title for the Y axis. (Series title is taken from * row 1 of the CSV file). */ function register1(graphName, csvURL, title, yAxisTitle) { var divName = "#" + graphName; $(document).ready(function() { $.get(csvURL, function(csv) { $(divName).highcharts({ chart: { type: 'area' }, data: { csv: csv }, title: { text: title }, yAxis: { title: { text: yAxisTitle } }, xAxis: { title: { text: "Build #" } }, plotOptions: { series: { stacking: 'normal' } }, series: [{ color: "#0000cc" } ] }); }); }); } /* register2 * For CSV files containing two series of data. Column A contains the * Build numbers used as x values, columns B and C the y values. * * @param graphName The id of the div to hold this chart. * @param csvURL URL to the .csv file * @param title Title for this chart. * @param yAxistitle Title for the Y axis. (Series title is taken from * row 1 of the CSV file). */ function register2(graphName, csvURL, title, yAxisTitle) { var divName = "#" + graphName; $(document).ready(function() { $.get(csvURL, function(csv) { $(divName).highcharts({ chart: { type: 'area' }, data: { csv: csv }, title: { text: title }, yAxis: { title: { text: yAxisTitle } }, xAxis: { title: { text: "Build #" } }, plotOptions: { series: { stacking: 'normal' } }, series: [{ color: "#009933" }, { color: "#cc0000" } ] }); }); }); }
-
Which makes for a reasonably straightforward content page:
pmd.html.listingtitle=CodeCompCommon PMD Report type=page status=published ~~~~~~ </p> <div class=reportGraphs> <div id="theGraph" class="graph">PMD</div> ➀ </div> <iframe class="docFrame" src="pmd/main.html"> </iframe> ➁ <script type="text/javascript"> register1("theGraph", "pmd.csv", "PMD", "Warnings"); ➂ </script> <p>
- This is the
div
that will be replaced by the chart. - The body of the report.
This is very similar to the way we loaded our Javadoc and other reports earlier.
-
This calls my Javascript function, which in turn calls the Highchart functions, to schedule replacement of the above
div
by a chart derived from the data inpmd.csv
.
- This is the
6.2 Generating the Data
Where does the data for the plots come from?
-
Individual data points are extracted from the reports generated by PMD and other tools.
-
Each analysis tool requires some custom coding to get the data point.
-
-
Those data points are accumulated into a
.csv
file for the report by- Downloading the old CSV file (if there is one) fro mthe project website.
- Adding the new data point as a new line at the end of the CSV file.
-
When the website is uploaded, it will include the new, one-line-longer, CSV file,
-
These steps are carried out by my own Report Accumulator Gradle plugin.
6.3 Report Accumulator
Back to build.gradle
to load another plugin:
buildscript {
repositories {
⋮
ivy { // for report-accumulator
url 'https://secweb.cs.odu.edu/~zeil/ivyrepo' ➀
}
}
dependencies {
⋮
classpath 'edu.odu.cs.zeil:report_accumulator:1.2'
⋮
}
}
⋮
// Reporting
import edu.odu.cs.zeil.report_accumulator.ReportStats ➁
import edu.odu.cs.zeil.report_accumulator.ReportsDeploy
task collectStats (type: ReportStats, dependsOn: ['build','reports']) { ➂
description "Collect statistics from various reports & analysis tools"
reportsURL = 'https://www.cs.odu.edu/~zeil/gitlab/' + project.name + '/reports'
}
task site (dependsOn: ['copyBake', 'copyJDocs', 'collectStats']){ ➃
description "Build the project website (in build/reports)"
group "reporting"
}
task deployReports (type: ReportsDeploy, dependsOn: 'site') { ➄
description 'Deploy website to remote server'
group 'reporting'
deployDestination = 'rsync://zeil@atria.cs.odu.edu:codeCompCommon/reports/'
}
- The usual steps to include a plugin.
- Import the new task types created by the plugin
- A
ReportStats
task- scans the
build/reports
directory for reports from various tools, - extracts a new data point where it can,
- downloads an existing CSV file for that report from the
reportsURL
- adds the new data point to the end of the CSV file
- scans the
-
A slight tweak to our earlier
site
target to make sure that theReportStats
task is performed - A
ReportsDeploy
task uploads thebuild/reports
directory to thedeployDestination
URL.- Assumes that any required ssh keys are already in an agent.
7 Related ideas
-
Continuous deployment publishes snapshots of deliverables as changes are checked in.
- A variation of the rather common “daily build” practice seen on many projects.
- Some Maven repositories (including our own Artifactory instance) provide separate “snapshot” repositories for this purpose.
- The idea of “artifacts”, seen earlier, is one way to carry this out.
- GitHub and later versions of GitLab now use the artifacts mechanism to deploy entire project websites, making it unnecessary to work with a separate web server.
-
Some organizations actually wire up build light indicators to provide a highly visible indicator of the status of the latest integration build.
- Some then point a webcam at the light and broadcast their status.
- Others opt for publishing a software analog of such lights on their project website.