Architecting an Updated CyberSym Blogs Site

Written by Bruce R. Copeland on June 20, 2019

Tags: aws, cms, cybersym blogs, gatsby, nodejs, react, s3, wordpress

Some of you will have noticed that the CyberSym Blogs have a new look/feel. The changes are actually far more than skin deep. For a long while I’ve been dissatisfied with Joomla as a blog and content management system. The extensive php codebase used for Joomla has proved increasingly clunky. There have been frequent security problems. The mechanism for updating versions on a site is cumbersome. And maddeningly, Joomla has often made choices designed to attract new users, but problematic for existing users.

WordPress and Drupal (Joomla’s main competitors) suffer from many of the same problems (cumbersome upgrade, bloated php codebases, and frequent security problems). Over the years I’ve tested a number of other content management systems based on java, python, or javascript, but all have proved to be seriously lacking.

Recently there has been a lot of interest in headless CMS. These headless systems emphasize backend content management and leave the frontend presentation of content to other developers. As part of this trend both WordPress and Drupal have developed APIs which allow those platforms to function like a headless CMS.

Also recently there is a lot of renewed interest in static websites, which load and respond much more rapidly than most dynamic sites. Gatsby is a static site generator based on React. React is a highly popular site design framework using NodeJS javascript. In React, just about everything on a site is a component (a header, a top menu, a side widget, a page template, etc). Component files are JSX – a blend of HTML, javascript, and sometimes inline CSS. Gatsby looked like a promising way to migrate existing blogs in Joomla, WordPress, or Drupal to something much leaner. Gatsby has plugins that support the migration of WordPress or Drupal content into a Gatsby static site, but nothing for Joomla. Fortunately WordPress has a plugin that will import a Joomla site into WordPress (primarily a database migration). So the strategy was to import the Joomla blog content into WordPress and then use headless WordPress as the content source for a Gatsby static site.

Set Up an Instance with WordPress, NodeJS, Gatsby, etc

Over the years, I’ve installed WordPress for several company clients. It’s pretty straightforward. Useful links describing various steps are:

https://www.digitalocean.com/community/tutorials/how-to-install-wordpress-on-centos-7
https://www.rosehosting.com/blog/install-wordpress-on-a-centos-7-vps/
https://www.sitepoint.com/wordpress-headless-cms/

WordPress needs a LAMP install on some recent distribution of ubuntu, centos, or redhat, I chose to use a vagrant/ansible script to instantiate a centos 7.5 instance, install LAMP components, install gnome desktop, configure mariadb/mysql, download and install WordPress, download and install Visual Studio Code, install nodejs, and install various NPM packages, including Gatsby and its relevant plugins. (The ansible script will also be important later for setting up an AWS instance to maintain the site going forward.) Recent WordPress versions install with the API needed for headless CMS already enabled. It is however a good idea to create and install a headless theme. Mine is:

index.php:

<?php
<script type="text/javascript">
  window.location = 'http://localhost';
</script>
?>

functions.php:

<?php
?>

style.css:

/*
Theme Name: Emptyhead Headless WordPress
Theme URI: https://cybersym.com/emptyhead
Author: cybersym
Author URI: http://cybersym.com
Description: Empty (of course)
*/

all bundled inside an emptyhead folder under wp-content/themes.

Migrate the Joomla CyberSym Blogs to WordPress

The Joomla to WordPress migration is a one-time thing. I therefore excluded those steps from the ansible script and performed them manually:

I first created a joomla_blogs database in the mysql client; then I imported that database from an earlier sql dump:

[~/projects]$ mysql -u dbuser -p joomla_blogs < cstblogs/joomla_blogs_2019-02-12.sql

Next I logged into the WordPress Admin interface (http://localhost/wp-login.php) and navigated to the Plugins tab, where I selected Add New and then installed and activated the FG Joomla to WordPress plugin. After next switching to the Import menu on the WordPress Tools tab, I chose the run importer link under Joomla (FG). This brought up a configuration page for the import. Needed are the Joomla database parameters: Hostname, Database, Username, Password, and Joomla Table Prefix. Also needed is the url for the Joomla site (so the import process can pull any media associated with the Joomla content). There are also some other choices for what to import, etc. Finally I started the import, which took a minute or so to complete. Afterwards I logged into the mysql client to verify the WordPress database (originally empty) contained the desired content from Joomla.

Create a Gatsby Project to Pull Content from a Headless WordPress CMS

Most Gatsby documentation, tutorials, etc. focus on the use of Markdown as the content for a Gatsby static blog site. I found a few useful articles which discuss WordPress HTML content as the starting point:

https://www.creativebloq.com/how-to/use-wordpress-as-a-headless-cms
https://www.iamtimsmith.com/blog/how-to-build-a-blog-with-wordpress-and-gatsby-part-1
https://indigotree.co.uk/how-use-wordpress-headless-cms/

I am a strong proponent of agile programming, so I like to get things working in a minimal way as proof of concept, then worry later about refinements. Little effort was required to get an index page showing titles and excerpts for all the blog articles pulled from WordPress. Basically I configured the gatsby-plugin-wordpress-source with appropriate details about paths for my articles and commented out a few things in the src/index.js file for my gatsby project.

Design/Implement New Site Styling

Most of the desired new site styling was developed previously as part of various (unsatisfactory) tests of different content management systems. All that was needed here was to install a few of those earlier LESS files in a src/styles folder and import some appropriate google fonts using the gatsby-plugin-prefetch-google-fonts. Here is the config I use for this plugin in gatsby-config.js:

   {
      resolve: `gatsby-plugin-prefetch-google-fonts`,
      options: {
        fonts: [
          {
            family: 'Roboto',
            variants: ['400', '700']
          },
          {
            family: 'Open Sans',
            variants: ['400', '700']
          },
          {
            family: 'Roboto Mono',
            variants: ['400', '700']
          },
          {
            family: 'Oxygen Mono',
            variants: ['400', '700']
          },
        ]
      },
    },

Develop a Page Template for the Gatsby Site Flexible Enough to Support a Multi-blog

This is where the real work began. CyberSym Blogs is actually a multi-blog site with (currently) three different blogs. Each blog has its own category, root url, styling, images, and content. Fortunately the blog content from Joomla/Wordpress was already divided into categories that map to the different blogs. A few articles have multiple categories if those articles are relevant on more than one blog.

Step one was to build a TemplateWrapper JSX component that accepts category as part of input, and uses it to do the correct styling. The TemplateWrapper code is

import React from "react";
import Link from "gatsby-link";
import SEO from "./seo.js";
import Helmet from "react-helmet"
import Header from "./header.js";
import Sidebar from "./sidebar.js";
import Footer from "./footer.js";
import TagCloud from "./tag-cloud.js";
import SocialShare from "./social-share";
import "../styles/breakpoints.less";
import "../styles/colors.less";
import "../styles/background-colors.less";
import "../styles/blog-typography.less";
import "../styles/flex.less";
import "../styles/blogcontent.less";

const TemplateWrapper = content => {
  const { textAggregate, category, share } = content;
  const realContent = (share && share.realContent === true);

  return (
  <div className="wrapper">
    <SEO share={share} category={category} />
    <Helmet>
      <link rel="stylesheet" href="/styles/prism.css" />
    </Helmet>  
    <div className="flex-wrapper">
      <Header headerImages = {content.headerImages} category = {category} />
        <div className="flex-mid-tier">
          <div className="flex-content cst-blog-content">
            <div style={{textAlign: "center", float: "center", display: "block"}}>
              {realContent ? <SocialShare orientation="row" url={share.url} 
              title={share.title} excerpt={share.excerpt} /> : ""}
            </div>
            {content.children}
          </div>
          <div className="flex-sidebar">
            <Sidebar
              title="CyberSym Blogs"
              content={
                <ul className="menu" style={{marginBottom: "0"}}>
                  <li className="li"><Link className="a" to="/" >
                    <span>CyberSym Blogs Home</span></Link></li>
                  <li className="li"><Link className="a" to="/science-society/">
                    <span>Science &amp; Society</span></Link></li>
                  <li className="li"><Link className="a" to="/tech-intersection/">
                    <span>Technology Intersection</span></Link></li>
                  <li className="li"><Link className="a" to="/ultrarunning/">
                    <span>Ultrarunning Edge</span></Link></li>
                  <li className="li"><a href="http://cybersym.com" className="a">
                    <span>CyberSym Technologies Home</span></a></li>
                </ul>
              }
            />
            { category && category === "ultrarunning" ?
            <Sidebar
              title="Special Topics"
              content={
                <ul className="menu" style={{marginBottom: "0"}}>
                  <li className="li"><Link className="a" 
                    to="/ultrarunning/ultrarunning-biochem">
                    {<span>Biochemical Strategies for Ultrarunning</span>}</Link>
                  </li>
                </ul>
              }
            />
            : "" }
            { textAggregate && textAggregate !== "" ? 
              <TagCloud textAggregate = {textAggregate} /> : "" }
            { realContent ? <SocialShare orientation="column" url={share.url} 
              title={share.title} excerpt={share.excerpt} /> : ""}
          </div>
        </div>
      <Footer category={category} />
    </div>
    <script src="/utils/prism.js"></script>
  </div>
  )
};

export default TemplateWrapper;

Notice that ternary operators are used in several places to dictate whether certain elements appear on a page or not (TagCloud, Special Topics menu, etc.). Analogous approaches are used in other components like Header, Footer, etc. Also notice how all the blog content is handled here in one very concise line {content.children}.

Gatsby uses graphQL to query for content. So Step Two involved setting up three separate “blog index” pages named for the different blog categories. The page for Technology Intersection looks like:

import React from "react";
import h2p from 'html2plaintext';
import Link from "gatsby-link";
import { graphql } from 'gatsby';
import buildCategoryPath from "../utils/category-path.js";
import TemplateWrapper from "../components/template-wrapper";

export default function techInterIndex({ data }) {
  let { edges: posts } = data.allWordpressPost;
  const headerImages = data.headerImages;
  const category = "tech-intersection";
  let counter = 0;
  const site = "https://blogs.cybersym.com";
  const share = {
    url: site + "/" + category,
    title: "Technology Intersection Blog",
    excerpt: "The Technology Intersection blog explores all things technology — "
    + "especially computer and information technology, biotechnology, communications, "
    + "energy, transportation, nanotechnology, and robotics, to name a few.",
    realContent: true
  };
  const textAggregate = (
    posts.reduce( ( accum, post ) => {
      if (accum === undefined)
        accum = "";
      if (post === undefined)
        return accum;

      return accum += " " + h2p(post.node.content);
    }, "")
  );

  return (
    <div className="palette--tech-site">
    <TemplateWrapper headerImages = {headerImages} textAggregate = {textAggregate}
      category={category} share={share}
    >
      {posts
        .filter(post => post.node.title.length > 0)
        .map( ({ node: post }) => {
          counter = 0;
          return (
            <div className="blog-post-preview" key={post.id}>
              <h2>
                <Link to={buildCategoryPath(category, post.link)} 
                  dangerouslySetInnerHTML={{__html: post.title}}/>
              </h2>
              <h4>By {post.author.name} on {post.date}</h4>
              <h4>Tags:   
              {
                post.tags &&
                post.tags
                .map( tag => (
                  <span key={tag.id}>
                    { counter++ > 0 ? `, ` : ' '}
                    {tag.name}
                  </span>
              ))}
              </h4>  
              <div dangerouslySetInnerHTML={{__html: post.excerpt}} />
              <div class="read-more" ><Link 
                to={buildCategoryPath(category, post.link)}>Read More</Link></div>
            </div>
          );
        })}
    </TemplateWrapper>
    </div>
  );
}

export const pageQuery = graphql`
  query techInterIndexQuery {
    headerImages: allImageSharp(
      filter: {
        fluid: { originalName: { regex: "/tech-intersection/" } }
      }  
    ) {
      edges {
        node {
          id
          fluid(maxWidth: 1600, maxHeight: 350) {
            ...GatsbyImageSharpFluid
          }
        }
      }
    }
    allWordpressPost(
      filter: {
        categories: {
          elemMatch: {
            name: {
              eq: "tech-intersection"
            }
          }
        }
      },
      sort: {
        order:
          DESC,
          fields: [date]
      },
      limit: 20,
    ) {
      edges {
          node {
            id
            title
            categories {
              id
              name
            }
            tags {
              id
              name
            }
            author {
              name
            }
            date(formatString: "MMMM DD, YYYY")
            excerpt
            content
            link
          }
        }
      }
    }  
`;

The various blog index pages are largely identical, differing primarily in their category, which is used to select appropriate styling and also to restrict the category in the graphQL query for blog article content. It is undoubtedly possible to avoid some of the duplication, but the effort probably is not justified.

Gatsby employs a gatsby-node.js file to build all the article pages for different blog articles. Step Three therefore necessitated using the category for each article to build a page path appropriate for that blog category in gatsby-node.js.

Implement Improved Image Handling

One of Gatsby’s great strengths is its use of image sharp plugins to provide extremely fast and attractive site image handling. It was fairly straightforward to get this working for header images. Gatsby uses graphQL to query for appropriate images. This is usually done with a regex, so the main trick was to make sure the header images were named in a way to make a regex work conveniently.

At this point the project went a bit sideways… Many of us embed images inline into our blog content (and blog excerpts). This means the sharp image processing of such images needs to be done at the time gatsby-node.js is processing blog articles into blog article pages. For Markdown, there are existing gatsby plugins that carry out this processing. But what about WordPress content? It turns out Tyler Barnes (with help from Alex Stukh) has produced the beta version of a gatsby-wordpress-inline-images plugin to do something similar for WordPress content. I installed the plugin and gave it a test. It did produce inline images which were optimized, but those images were very large format, and only the main content (not the excerpts) were processed for my articles. Images in my articles are typically 50% of the page width and usually side-by-side, and I need similar image processing for my article excerpts. There was only one real solution: it was time to hack the plugin. I was able to use the same code which processes blog content to also process excerpts, and I was able to configure the wrapper elements for images to be 48% of content width. I have not yet merged these changes back to github for the plugin because I want to do some more testing. I hope to finish this in the next few weeks.

Design/Implement a Workable Tag Cloud Component for Blog Index/Article Pages

Even static sites need some dynamic content to help maintain user interest. Tag (word) clouds are article eye candy which also encapsulate meaningful information about articles or blog content. Since the layout and color of tag cloud elements is random, the tag cloud offers something which can appear dynamic. Tag clouds require word counts for all the words in an article or on an index page. Collecting and computing this data can be a bit involved:

  1. Convert (all) articles from HTML to plain text
  2. Aggregate the plain text for multiple articles
  3. Compute word counts for all the text using some dictionary exclusions and some kind of acronym handling strategy
  4. Build styling for the words to reflect their importance.

The original plan was to compute word counts initially for articles and pages, store that data, and then simply randomize the information (word order and color) each time the page gets refreshed. An agile approach meant initially getting the entire process to work each time the page is refreshed, then worrying later about more efficient refresh. As it turns out, the full computation of word counts is reasonably performant (100 ms out of a total page load time of 1500 ms even for an index page with 25 large articles). Refresh optimizations may not therefore be very necessary.

There was one quirk having to do with frontend vs backend rendering. The tag cloud is simply a series of spans showing the different significant words in the text content for a page. The order and color of the spans is random. The text size for each span is proportional to the number of counts for that particular word. Weird things happen when the cloud is rendered first on the backend and then again on the frontend. The order of the words in the spans and their colors get changed on the frontend, but the text sizes for the existing spans from backend rendering do not change. This gives rise to a tag cloud which appears to show the wrong words being emphasized. I suspect this is a Gatsby/React bug. It can be avoided by simply not rendering the tag cloud on the backend. The necessary code for the tag cloud is:

import React from "react";
import { countWords } from "../utils/word-count.js";
import { formatTags } from "../utils/word-count.js";
import '../styles/flex.less';

class TagCloud extends React.Component {
  constructor ( props ) {
    super(props);
    this.state = { isClient: false };
  }

  componentDidMount() {
    this.setState({ isClient: true });
    console.log("Changing to isClient state: true");
  }

  render() {
    return (
    <div className="flex-sidebar-content">
      {
        this.state.isClient ?
          <div
            className="tag-cloud"
          >
            {formatTags(countWords(this.props.accumulation))}
          </div>
        : ""
      }
    </div>
    )
  }
}

export default TagCloud;

Notice the isClient data member is false by default. It is only set true when the component gets mounted (the frontend). The cloud only renders when isClient is true.

Deploy and Host All of This in AWS

Deployment of Gatsby projects is straightforward for most hosting environments. Several different articles describe what needs to be done to host a Gatsby project using AWS S3 and possibly AWS CloudFront and Route 53:

https://itnext.io/static-website-over-https-with-s3-cloudfront-gatsby-continuously-delivered-b2b33bb7fa29
https://medium.com/@kyle.galbraith/how-to-host-a-website-on-s3-without-getting-lost-in-the-sea-e2b82aa6cd38
https://www.gatsbyjs.org/docs/deploying-to-s3-cloudfront/

AWS S3 supports HTTP access to static web sites. If you want HTTPS (a VERY good idea) and cached, compressed content, then you also need to configure CloudFront and possibly get a certificate through AWS Certificate Manager. AWS Route 53 is a good option for DNS and/or for DNS Registration, or you can use an existing hosting platform. There is good AWS documentation for most of these tasks. Note, however that the blog articles mentioned above are occasionally out of date with respect to the current AWS console. You should set up an IAM role/user with appropriate keys, and then run AWS-CLI configure in a bash shell where your project is located. If you install gatsby-plugin-s3, then building and deploying your project to S3 is as simple as:

[~/projects]$ npm run build && npm run deploy

There are now a lot of web hosting platforms around which support web sites but do not offer any integrated email for a domain. This can be a BIG problem for a small company. We’ve used InMotionHosting for years to host our company sites, other related sites, and email. When I was ready to deploy these projects, I elected to continue using the InMotionHosting email service. This simply meant adding an MX record in my Route 53 DNS configuration to point mail back to InMotionHosting. I can always revisit this arrangement at some point in the future.

Implement Secure Contact Form Email Handling

Real web sites need some kind of contact form that does not expose unnecessary email contact information and does not function as a spam portal. A contact-form is also another way to make a static site somewhat more dynamic. Three AWS services make this pretty easy to achieve. An AWS API Gateway can be created as the target for a POST request from the site contact form. That API Gateway is then configured to relay the POST request to an AWS Lambda. The lambda then uses AWS Simple Email Service to send an email message to the contact email address, populated with the relevant information in the POST request from the contact form. The contact email address is therefore available only in the AWS lambda. It is a good practice to also include on the contact form some invisible field which is empty by default. If a roving bot fills in the invisible field, the lambda function can then log the request, but not send the email. I like to hide my “bot trap” field using CSS display: none rather than setting the INPUT field hidden property to true. In order to keep browsers happy, the API Gateway also needs to be enabled to allow CORS. You can use all the API Gateway defaults, including a * wildcard for the Access-Control-Allow-Origin header value. (Remember CORS is not actually protecting your site; it’s protecting the browser which your visitors use to visit your site.)

Set Up an Appropriate Mechanism for Writing/Editing Blog Articles (Preferably in Markdown)

The approach described so far worked well for initial deployment. but I wanted to author/edit, build, and deploy from an AWS EC2 instance. This would provide excellent security and reasonable convenience at minimal cost. The EC2 instance needs to be much like the local virtual machine used for previous phases of this project. A slightly modified version of the original ansible script was therefore used to provision the EC2 instance with LAMP, gnome desktop, local WordPress, Visual Studio Code, Gatsby, and the two Gatsby projects. A few steps had to be carried out manually after the script completed: final graphical install of WordPress, import of the WordPress mysql database used in previous phases of the project, and some NPM updates of Gatsby and React packages. The resulting instance can exist in the stopped state on AWS most of the time, but be started whenever there is need to modify any of the Gatsby project code, author/edit an article, or build and deploy to S3. An AWS t2.medium instance is plenty robust enough to do anything associated with this project. However browser and Visual Studio Code display rely on X11 forwarding, which lags badly using a t2.medium network connection. A trick to speed up X11 forwarding is to use a simpler ssh cipher. This should not pose any serious security risks for this particular situation. The resulting command line for the ssh connection looks like:

[~/projects]$ ssh -A -XC4c arcfour,blowfish-cbc centos@18.237.112.228

(where of course the IP changes each time the EC2 instance is restarted). The resulting display lag is acceptable. Another alternative would be to set up a proxy on the AWS EC2 instance.

Much as I like Markdown, it is not the most mature technology in the world. Most IDE and wordprocessor environments offer a good experience with Markdown. The previews (rendered HTML) look great and tend to be reasonably uniform across different environments, and the syntax highlighting of code blocks is wonderful. You might easily assume everything with Markdown is hunky-dory. You would be wrong! Markdown converters vary wildly. The main content does always get rendered to HTML; after that all bets are off. Sometimes the title gets rendered to HTML; sometimes not. Sometimes there is support for an article excerpt; sometimes not. Sometimes the article excerpt is converted to HTML; sometimes not. My blog index pages are always a serial progression of article excerpts, and it is common for me to use styling in my titles. Needless to say, Markdown is rather useless to me if the entire Markdown (including an excerpt) does not get converted to HTML. Ironically Gatsby is one of the worst offenders in this regard. The main CyberSym Technologies site (cybersym.com) uses Gatsby with Markdown. When I was setting up that site, several times I had to use React Showdown to instantiate a Markdown converter in javascript code because a title or excerpt was not automatically converted to HTML by Gatsby.

The situation with Markdown code syntax hightlighting is even more frustrating. Those wonderful previews you see in an IDE are highly proprietary styling. It is unlikely you can use the underlying HTML for anything. Some Markdown converters include highlighting; some do not. Moreover there are different highlighters. HighlightJS is by far the most common, but it does a crummy job with JSX and also makes strange assumptions about how you use <pre> and <code> tags. PrismJS is much better, but it is harder to find examples of how to use Prism. In the end I wrote my own gulpfile.js based on markdown-it and prism to convert markdown articles to HTML in the Visual Studio Code IDE. My gulpfile is:

var gulp = require('gulp');
var markdown = require('gulp-markdown-it');
var prism = require("prismjs");
var loadLanguages = require("prismjs/components/");
var fs = require('fs');

loadLanguages(['php', 'jsx', 'bash', 'json', 'groovy', 'java', 
  'graphql', 'cpp', 'csharp', 'ini', 'sql', 'python', 'less', 'yaml', ]);

gulp.task('markdown', function() {
  return gulp.src(['**/*.md', '!node_modules/**'])
    .pipe(markdown({ 
      options: {
        html: true,

        highlight: function (str, lang) {
          let prismLang = lang ? prism.languages[lang] : null;
          if (prismLang) {
            try {
              return prism.highlight(str, prismLang);
            } catch (e) {}
          }

          return ''; 
        } 
      }
    }))
    .pipe(gulp.dest(function(f) {
      return f.base;
  }));
});

gulp.task('default', function() {
  return gulp.watch(['**/*.md', '!node_modules/**'], gulp.series('markdown'));
});

You can see the corresponding prism.css and prism.js lines in the TemplateWrapper code earlier in this article. Those render the syntax highlighting in the client browser.

With these different pieces in place, the workflow for authoring a new article goes something like this:

  1. Draft the article in Markdown on a local machine using VS Code, Atom, or Google Docs, etc
  2. Convert the article to HTML in VS Code (with gulp)
  3. Copy the HTML into local WordPress in a browser on the AWS EC2 deploy instance
  4. Run gatsby develop or gatsby build on the EC2 deploy instance and examine the formatted article in an EC2 deploy browser
  5. Run npm run build && npm run deploy on the EC2 deploy instance to deploy the updated site

This procedure works well. It was used to author this article!

Concluding Thoughts

I am pretty happy with the way this project has turned out. It fixes many of the problems with the previous Joomla CMS. But Gatsby and Markdown are far from perfect. Hopefully they will both grow in productive directions.