Web page visual history

Myles and I worked on a set of microservices to track visual changes of web pages. We used it as a demo to showcase the launch of Node.js on the App Engine standard environment at Google I/O 2018

Why?

The use case is to automatically watch websites for visual changes. I needed such tool in order to track the latest dates of Radiohead concerts. It can also be used to track visual regressions of landing pages for example.

How?

Every 5 minutes, the task-scheduler microservice runs, queries the Cloud Datastore database for webpages ot screenshot. For each webpage, it sends a message to a Pub/Sub topic. This topic is responsible for pushing these messages to the screenshot microservice. This screenshot service receives the message as a regular HTTP POST request, screenshots requested webpage and stores the result in a Cloud Storage bucket. The creation of a new file in the bucket triggers the a Cloud Function (image-diff). The image-diff function compares the new image with a reference image from the references folder. If a difference is found, it stores the image in a keyframe folder and updates the reference image.

Architecture of the web page visual history demo

The frontend service is a web frontend that allows to browse the data: Its main page lists the currently tracked webpages, clicking on a webpage shows all the saved keyframes for this webpage. Users can start tracking a new webpage by entering its URL, which will add a new entity to the Cloud Datastore database.

To recap:

Give it a try

Find the code on GitHub and watch me demo it on stage: