Code syntax highlighting for static websites (2022/12/17)

Code syntax highlighting for static websites

2022-12-17 18:00:00 +07:00 by Mark Smith

It’s the first blog post on the new site!

The past few weeks I’ve been finishing off the migration to the new website which now houses the blog, linkblog and podcast, plus a bunch of independent one off pages. It’s been slowly taking shape but yesterday I got syntax highlighting working, and given that I mostly write about tech, javascript and the web, it started to feel like a proper website.

The site is still very minimal, which I like because I want the focus to be on reading, but I do have aspirations to improve on the design in the coming months / years. This site for instance has a nice simple elegant and readable style. That’s vaguely the sort of thing I’m aiming towards.

In any case most of the essentials are there and so what better way to celebrate all this by a blog post describing how I did it? Maybe it will help others out there on their programming journey.

A quick note about my site. It’s built using a static generator that I wrote, which generates the static HTML files that make up the site. If you have a Wordpress site or use one of the popular static site generators then there will likely be some sort of syntax highlighting plugin available for you to use. On the other hand if you have something custom or you are curious how syntax highlighting works in general, then you might enjoy this post.

The way syntax highlighting of code works:

Add your code to a block surrounded by <pre> and <code> tags in your static HTML file
Add special syntax highlighting CSS classes throughout the block of code
Add the syntax highlighting javascript library to your page
Add CSS rules to your page that use the classes peppered through the HTML to colorise the code on page load

Once you have it working with the default setup, it’s really straight foward to swap out those CSS rules for a different theme. There are loads of themes available which work for most popular programming languages.

I used two libraries. The first is marked, which renders the HTML from the source markdown files. The second is highlight.js which is the syntax highlighting library. There are other options, which might be more appropriate depending on what features you are after, but I’m going to focus on these since that’s what I got working on my site. I think they all work in a somewhat similar way. The code that glues all this together is written in javascript and runs in a standard nodejs runtime.

As I was saying, marked reads your markdown files and renders them, outputting HTML files. It also has hooks for syntax highlighting. What this means is that with a bit of additional configuration, marked will add in the syntax highlighting classes to your HTML (step 2 listed above) when it renders the markdown files.

I put together a minimal example that I’ll run through here, but it’s also available in this github repo. It is meant to run on Unix/Linux type systems. You will likely need to make some modifications to have it run on Windows. I got some great help on the marked repo issues page after posting a question.

The place to start is the markdown file that will be transformed into HTML.

index.md

# minimal-syntax-highlighting

A code block:

```js
function hw() {
  console.log('Hello World'); 
}
hw();
```

That's all folks!

There’s a code block in there delimited by the triple backticks. Some people refer to them as code fences. When the file is rendered, the renderer wraps their contents in the <pre><code></code></pre> tags, which help to keep newlines, spaces and indentation in your code intact. The other way you sometimes see to add code blocks is to indent all your code by four spaces, however that doesn’t work with the syntax highlighting.

Using backticks enables you to specify the programming language, which can be used by the highlighting library. In this example I have specified javascript by adding js after the opening set of backticks. The list of available languages is on the website.

You will need to add a package.json which will be used by the node runtime to install the libraries:

package.json

{
  "name": "minimal-syntax-highlighting",
  "version": "0.0.1",
  "description": "Minimal example demonstrsting syntax highlighting",
  "main": "index.js",
  "scripts": {
    "build": "mkdir -p dist && node app.js"
  },
  "author": "Mark Smith",
  "dependencies": {
    "fs-extra": "^9.1.0",
    "marked": "^2.0.3",
    "highlight.js": "^11.7.0"
  }
}

The most important part is the npm build script which creates the target directory then runs the main app file. Also note the listed libraries which will be installed when you use npm to install the project libraries.

Then the main app file which ties everything together:

app.js

#!/usr/bin/env node

const fse = require('fs-extra');
const { readFile } = require('node:fs/promises');
const path = require('path');
const marked = require('marked');
const hljs = require('highlight.js');

console.log('starting...');

const templatePath = path.join(
  process.cwd(), 
  'index.md'
);

const outputPath = path.join(
  process.cwd(),
  'dist', 
  'index.html'
);

console.log(`templatePath: [${templatePath}]`);
console.log(`outputPath: [${outputPath}]`);

marked.setOptions({
  renderer: new marked.Renderer(),
  highlight: function(code, lang) {
    const language = hljs.getLanguage(lang) ? lang : 'plaintext';
    return hljs.highlight(code, { language }).value;
  },
  langPrefix: 'hljs language-', // highlight.js css expects a top-level 'hljs' class.
  pedantic: false,
  gfm: true,
  breaks: false,
  sanitize: false,
  smartypants: false,
  xhtml: false
});

(async () => {
  try {
    const contents = await readFile(templatePath, { encoding: 'utf8' });
    const renderedContent = marked.parse(contents);
    const html = `
      <!DOCTYPE html>
      <html lang="en">
      <head>
        <title>minimal-syntax-highlighting</title>
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width">
        <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/styles/default.min.css">
      </head>
      <body>
        ${renderedContent}
        <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/highlight.min.js"></script>
      </body>
      </html>
    `;
    await fse.outputFile(outputPath, html);
    console.log('done');
  }
  catch(e) {
    console.error(e.message);
    console.error(e.stack);
    process.exit(1);
  }
})();

The first line tells the shell which runtime to use to execute the program. In this case it’s node.

The marked configuration is done using the setOptions function that expects an object with configuration options. The important keys are ‘highlight’, which is the function that will be used to add required CSS classes. It detects which language is being used, the calls the library’s highlight function. Different highlighting libraries have slightly different highlight functions.

The other option that’s important is the langPrefix option which is a string that will be added to the CSS classes throughout the code. Again this is going to be different for different highlighting libraries.

Most of the important code is in an async IIFE so we can use await for asynchronous calls. We read the template into a variable, render the template using marked, insert the rendered HTML into the layout using a template literal, and finally write the file to disk.

Notice that the layout has both a @ <link> in the head which specifies the libraries CSS rules, and a <script> at the very bottom of the body, which will be used when the page loads to add the colorisation to the code.

You can run this in a terminal by executing:

npm install
npm run build

That will output a file in the following location:

dist/index.html

If everything worked as expected, when you open that file in a web browser, it will contain a code block which will have all the syntax highlighting, making it much nicer to read. You will need to be connected to the internet since the js and css files are hosted on a public CDN.

Here’s an example deployment on Netlify.

There’s also a Github action configuration in the repo which will be triggered for code pushed to branches with names matching the regex. The workflow clones the repo, configured the nodejs runtime, installs project dependencies, builds the HTML, then deploys the file to Netlify. You will need to add the capitalised environment variables in the deploy step to your Github repo’s secrets. You should be able to get the required info from your Netlify account settings.

.github/workflows/buildAndDeploy.js

name: Build And Deploy
on:
  push:
    branches: 
      - main
      - fix-*
      - feature-*
  workflow_dispatch:
    inputs:
      git_ref:
        description: Git Ref (Optional)    
        required: false
      deploy_enabled:
        description: Deploy after build? (yes|no)
        default: 'yes'
        required: true

env:
  deploy_enabled: ${{ github.event.inputs.deploy_enabled || 'yes' }}
  git_ref: ${{ github.event.inputs.git_ref }}
 
jobs:
  build_and_deploy:
    name: Build And Deploy
    runs-on: ubuntu-latest
    env:
      node-version: 16.14.2
      
    steps:
      - name: Clone Deploy Repository (Latest)
        uses: actions/checkout@v3
        if: ${{ env.git_ref == '' }}
 
      - name: Clone Deploy Repository (Custom Ref)
        uses: actions/checkout@v3
        if: ${{ env.git_ref != '' }}
        with:
          ref: ${{ env.git_ref }}
 
      - name: Setup NodeJS
        uses: actions/setup-node@v3
        with:
          node-version: ${{ env.nodejs_version }}
 
      - name: Install modules
        run: |
          npm install
 
      - name: Build Site
        run: |
          npm run build
        
      - name: Run tree
        run: |
          tree -ap dist
          
      - name: Print output to console
        run: |
          ls -la dist/index.html
          cat dist/index.html
 
      - name: Deploy using Netlify CLI
        if: ${{ env.deploy_enabled == 'yes' }}
        uses: netlify/actions/cli@master
        env:
          NETLIFY_AUTH_TOKEN: ${{ secrets.NETLIFY_AUTH_TOKEN }}
          NETLIFY_SITE_ID: ${{ secrets.NETLIFY_SITE_ID }}
        with:
          args: deploy --dir=./dist --prod

A couple of things to watch out for when integrating this into bigger projects, both of which happened to me simultaneously, causing many days of tedious and difficult debugging.

My static file generator was running all output through an HTML prettified library, which basically makes the output look nice by standardising the indentation and using consistent line breaks. The prettifier removed all the newlines in the code blocks, so in the final output, each code block was a single really long line. I added a way to specify not to use the prettifier to fix that issue.

The static site generator also runs all the renders through an EJS render so that I can have EJS syntax in the markdown files. I was using a render option called rmWhitespace, which removes excess whitespaces that sometimes get added by the EJS render process. The problem though is that it was removing the indentation in the code blocks. So it was all rendering fine except every line was squashed up against the left margin. I set the default for markdown files to be false to fix the issue.

That’s pretty much it for this tutorial. If you’ve made it through to here then you should now be able to implement syntax highlighting on your website, and you’ll be aware of some things that can trip you up when integrating it into a bigger project. Best of luck with your projects.