Deploying Websites

Steven J Zeil

Last modified: Jan 7, 2022
Contents:

Abstract

We’ve looked at how to use automated tools to generate project websites.

In this lesson we will look at how to get those deployed to a web server.

1 Approaches

1.1 via SSH

In many cases, we have SSH/SFTP access to a web server, which is configured so that files and directories copied into a certain location will be served and mapped onto URLs. For example, on the CS Linux network, files stored in /home/yourName/secure_html/whatever are served at the URLs https://www.cs.odu.edu/~yourName/whatever.

So, if I were to update the file served at https://www.cs.odu.edu/~zeil/officehours/index.html, I could get a local copy of the file,

scp zeil@linux.cs.odu.edu:/home/zeil/secure_html/officehours/index.html .

or

wget https://www.cs.odu.edu/~zeil/officehours/index.html

edit it, and could then update the copy on the server like this:

scp index.html zeil@linux.cs.odu.edu:/home/zeil/secure_html/officehours/

That’s fine for working with one file or two. But if I have a website with many files (e.g., this course website) and I have updated several of them, I’d rather not risk forgetting to upload one or two of the changed files. Most command-line versions of scp have an option for recursive copy of directories, e.g.,

scp -r cs350/website/* zeil@linux.cs.odu.edu:/home/zeil/secure_html/cs350/

But a better choice, in many cases is rsync. rsync is a program specifically designed for copying large directory trees in circumstances where only selected files are likely to have changed.

When rsync is given a source directory and a destination directory, it computes a hash function for each file and, for large files, for portions of those files. If the hashes match, those files (or blocks of large files) are presumed to be identical. rsync then proceeds to transfer only the files that have been changed. So, if I have rsync available, I am much more likely to do

rsync -auzv -e ssh cs350/website/ zeil@linux.cs.odu.edu:/home/zeil/secure_html/cs350/

The “-e ssh” part of that command tells rsync to do its communications with the remote machine via SSH, the default being an rsync-specific protocol that requires a dedicated rsync server to be running on the remote machine.

The biggest limitation to using rsync is that it is not available on all machines. It can be easily installed on Linux and MacOS machines. Native Windows ports of it have been, in my opinion, unreliable. It can best be run in the Windows Subsystem ofr linux or the CygWin Unix emulator, but that introduces the complication of mapping paths between the Unix and Windows file systems.

1.2 via git branches

GitHub provides a web server (called GitHub Pages) for projects hosted on it. A project hosted at https://github.com/owner/project will have web pages hosted at https://owner.github.io/project/.

But GitHub has adopted a very different approach to deployment. When you activate GitHub Pages for your project, you specify a specific git branch to manage your website content. By default, this branch is named gh-pages.

For example, I have a page at https://sjzeil.github.io/CoWeM/userReference/Directory/outline/index.html. That means that, in my project repository, I have a file in my gh-pages branch at userReference/Directory/outline/index.html (relative to the root of my project). Now, the “normal” content of my project repository has, as you might guess, a settings.gradle and a build.gradle file in the root directory, as well as a src directory that holds a lot of sub-directories containing various source codes for the project. But you won’t find any of that in the gh-pages branch.

The makes the gh-pages branch different from other examples of branching that we have looked at, in that it does not mirror the structure of main at all. This means that setting up the gh-pages branch for the first time requires deleting the entire project contents after creating the branch, something that, honestly, feels profoundly uncomfortable even though it is perfectly safe.

How would be deploy a website that we have built in, say, build/jbake?

One possibility would be

  1. Check out the gh-pages branch. (I’ll assume that the entire build/ directory is listed in .gitignore, so it will be unaffected by this operation.)
  2. Delete all files except the ones under build/.
  3. Copy all files from build/jbake to the project root.
  4. Commit those changes (to the gh-pages branch).
  5. Check out the main branch to get back into a state where we can resume working.

Now, although this seems plausible, I wouldn’t do it this way. First, it strikes me as a risky, fragile approach. If I start of with any changed files that have not been committed, I risk losing them (or having the gh-pages checkout fail). Second, step 2 deletes, among other things, my local copy of the build.gradle file. We can get it back from the main branch, but is something goes wrong with any of the earlier steps, this would seem to complicate diagnosis and debugging.

So, I’m inclined instead to leave my main branch files in place, and construct the gh-pages files in a separate location.

  1. Check out gh-pages into a second working directory in build/gh-pages/.

    Modern versions of git have an ability to have multiple working directories for a single local repository, so long as they are each looking at a different branch. This is a accomplished via a git workspace ... command.

  2. Delete all files inside build/gh-pages/.

  3. Copy all files from build/jbake to the build/gh-pages/.
  4. Commit those changes (to the gh-pages branch).
  5. Release and delete the working directory in build/gh-pages/.

After either of the procedures, the website would be updated the next time we push our commits to GitHub. Now, when we push, gh-pages will not be our current working branch. But we can force the push to include commits from all branches with

git push --all

2 Automating Website Deployment with Gradle

Suppose that we have set up Gradle to build our website in build/jbake/ using a task named bake.

2.1 An SSH solution

Gradle plugin org.hidetake:gradle-ssh-plugin allows you to

A plausible set of Gradle steps:

  1. Create a .zip file of the entire constructed website
  2. Use scp to upload the zip file to the remote server.
  3. Use ssh to issue an unzip command on the remote server.
  4. If necessary, use ssh to issue chmod commands as necessary on the unzipped content.

build.gradle

plugins {
  id 'org.hidetake.ssh' version '2.9.0'
}

task zipWebsite (type: Zip, dependsOn: 'bake') {   ➀
    archiveFileName = 'website.zip'
    destinationDirectory = file('build')
    from 'build/jbake'
}

remotes {
  webServer {
    host = IP address
    user = userName
    identity = file(ssh-private-key)  ➁
  }
}

task deploy (dependsOn: 'zipWebsite') {
  doLast {
    ssh.run {
      session(remotes.webServer) {
       put from: 'build/website.zip', into: 'websitePath' ➂
       execute 'unzip websitePath/website.zip' -d websitePath➃
      }
    }
  }
}

2.2 An rsync Solution

The Java library rsync4j-all provides a Java interface to rsync:


build.gradle

buildscript {
    /*...*.

    dependencies {
        ⋮
        classpath "com.github.fracpete:rsync4j-all:3.1.2-15"
    }
}

import com.github.fracpete.rsync4j.RSync;  ➀
import com.github.fracpete.processoutput4j.output.ConsoleOutputProcessOutput;

task deployWebsite (dependsOn: "bake") {
    doLast {
        def sourceDir = "build/jbake/";
        def destURL = "destination";  ➁
        RSync rsync = new RSync()
                .source(sourceDir)
                .destination(destURL)
                .recursive(true)
                .archive(true)
                .delete(true)
                .verbose(true)
                .rsh("ssh -o IdentitiesOnly=yes");  ➂
        ConsoleOutputProcessOutput output
                = new ConsoleOutputProcessOutput();
        output.monitor(rsync.builder());
    }
}

2.3 A GitHub Solution

GitHub’s “Pages” services hosts websites by allowing the repository owner to specify a branch that will hold the website contents. The static (unchanging) components of a website could simply be constructed and committed “manually”, but dynamic contents that depend upon the build will necessarily involve some git manipulation.

In this section we look at the standard approach that GitHub intended - using one or more branches for the main project content and a separate “special” branch for the website.

This approach is limited, however, to public repositories or to private repositories in paid accounts. Although paid accounts at GitHub are not terribly expesive, some students might prefer to stick to the free ones. So in a later section, we will modify this approach to work with free accounts.

2.3.1 A Portable Approach using a Git Plugin

The Gradle plugin org.xbib.gradle.plugin.git provides a portable interface to a java library implementation git.

Unfortunately, the version of git provided does not support multiple workspaces for a repository. So I need to modify the approach I described earlier.

  1. Clone the repository in our project root directory into build/gh-pages/ and check out the gh-pages branch in that clone.

    Note that because this is a clone of a local repository, the new repository will have the local repository in the project root as its origin. (The project root repository, presumably, has the remote GitHub repository as its origin). So pushes and pulls to the build/gh-pages/ repository will go to the project root repository, not directly to GitHub.

  2. Delete all files inside build/gh-pages/.

  3. Copy all files from build/jbake to the build/gh-pages/.

    If your own website contents are in a different directory, or are split among multiple directories, you might need to add some additional copy steps.

  4. Commit those changes (to the gh-pages branch).

  5. Push that commit in the build/gh-pages/ repository to the project root repository.

build.gradle

plugins {
    ⋮
    id "org.xbib.gradle.plugin.git" version "2.0.0"
    ⋮
}


    ⋮



////////  Website publication on GitHub pages ///////////////////


task clonePages() {                                             ➀
    doLast {
        mkdir 'build/gh-pages'
        def thisRepo = rootProject.projectDir.toString()
        def pagesDir = "$buildDir/gh-pages"
        project.delete {
            delete pagesDir
        }
        def grgit = git.clone {
            dir = pagesDir
            uri = 'file:' + thisRepo
            bare = false
            refToCheckout = 'gh-pages'
        }
        grgit.checkout {
            branch = 'gh-pages'
        }
        grgit.close()
    }
}

task copyReports (type: dependsOn: ['bake', 'clonePages']) {  ➁
    doLast {
        ant.copy (todir: 'build/gh-pages') {
            fileset(dir: 'build/jbake')
        }
    }
}

task updateGHPages (dependsOn: 'copyReports') {                     ➂
    group = "Reporting"
    description  'Copies reports to the gh-pages branch in preparation for a future push to GitHub'
    doLast {
        def pagesDir = "$buildDir/gh-pages"
        def grgit = git.open {
            dir = pagesDir + "/.git"
        }
        grgit.add (update: false, patterns: ['reports/'])
        grgit.add (update: true, patterns: ['reports/'])
        grgit.commit {
            message = "Updating web pages"
        }
        grgit.push {}
        grgit.close()
    }
}

At the end of this process, the updated website has not yet been sent to GitHub. It has, however, been committed within our original copy of the local repository and simply awaits a later git push origin gh-pages or git push --all command to send the new pages to GitHub.

2.3.2 A Less-Portable Approach using Native Git Commands

I have to admit that I find the git plugin to be a bit tedious to work with. If we know that the only machines that we will be running on will have a native version of git, we can simplify the above by letting gradle invoke git directly. This is accomplised via the gradle exec command, which can run a native OS command in a specified working directory:

////////  Website publication on GitHub pages ///////////////////


task clonePages() {
    doLast {
        mkdir 'build/gh-pages'
        def thisRepo = rootProject.projectDir.toString()
        def pagesDir = "$buildDir/gh-pages"
        project.delete {
            delete pagesDir
        }
        exec {
            workingDir = 'build/gh-pages'
            commandLine = ['git', 'clone', 'file:' + thisRepo]
        }
        exec {
            workingDir = 'build/gh-pages'
            commandLine = ['git', 'checkout', 'gh-pages']
        }
    }
}

task copyReports (type: dependsOn: ['bake', 'clonePages']) {
    doLast {
        ant.copy (todir: 'build/gh-pages') {
            fileset(dir: 'build/jbake')
        }
    }
}

task updateGHPages (dependsOn: 'copyReports') {
    group = "Reporting"
    description  'Copies reports to the gh-pages branch in preparation for a future push to GitHub'
    doLast {
        def pagesDir = "$buildDir/gh-pages"
        exec {
            workingDir = 'build/gh-pages'
            commandLine = ['git', 'add', '.']
        }
        exec {
            workingDir = 'build/gh-pages'
            commandLine = ['git', 'commit', '-m', 'Updating-webpages']
        }
        exec {
            workingDir = 'build/gh-pages'
            commandLine = ['git', 'push']
        }
    }
}

The exec commands allow us to give a shell command as an array of string values. You can see that the steps performed are largely the same as in the plugin-based solution.

I find the resulting tasks to be much easier to read. In general, though, I advise caution in using exec because it is potentially less portable.

2.4 Case Study: a Github Solution for Free Accounts

If you are working with a private repository from a free account, then Github pages will not be available to you.

A workaround is to create a second, public, repository whose only purpose is to host the website. Becausethis second repository will not have anything in it but the web content (which was always going to be public anyway), there’s no great loss of security in making this second repository public.

The second repository can also be simpler. It doesn’t need multipel branches to separate the project code from the website, bcause no project code will be stored there. So we can tell Github to use the main branch as the source of the website.

The gradle code for this modified approach is quite similar to the previous approach:

////////  Website publication on GitHub pages ///////////////////


def websiteRepo='git@github.com:sjzeil/pages-sandbox.git'    ➀

task clearPages(type: Delete) {
    delete 'build/gh-pages'
}

task clonePages(dependsOn: ['clearPages']) {                ➁
    doLast {
        exec {
            workingDir = '.'
            commandLine = ['git', 'clone', websiteRepo, 'build/gh-pages']
        }
    }
}


task copyWebsite (dependsOn: ['reports', 'clonePages']) {   ➂
    doLast {
        ant.copy (todir: 'build/gh-pages') {
            fileset(dir: 'build/reports')
        }
        ant.copy (todir: 'build/gh-pages') {
            fileset(dir: 'build/docs')
        }
        ant.copy (todir: 'build/gh-pages') {
            fileset(dir: 'src/main/html')
        }
    }
}



task updateGHPages (dependsOn: 'copyWebsite') {
    group = "Reporting"
    description  'Copies reports to the gh-pages branch in preparation for a future push to GitHub'
    doLast {
        def pagesDir = "$buildDir/gh-pages"
        exec {
            workingDir = 'build/gh-pages'                                ➃
            commandLine = ['git', 'add', '.']
        }
        exec {
            workingDir = 'build/gh-pages'
            commandLine = ['git', 'commit', '-m', 'Updating-webpages']   ➄
        }
        exec {
            workingDir = 'build/gh-pages'
            commandLine = ['git', 'push']
        }
    }
}

Unlike our previous examples, this action deploys the website content immediately. We do not have to issue a separate push afterwards.