Web page visual history
Myles and I worked on a set of microservices to track visual changes of web pages. We used it as a demo to showcase the launch of Node.js on the App Engine standard environment at Google I/O 2018
Why?
The use case is to automatically watch websites for visual changes. I needed such tool in order to track the latest dates of Radiohead concerts. It can also be used to track visual regressions of landing pages for example.
How?
Every 5 minutes, the task-scheduler
microservice runs, queries the Cloud Datastore database for webpages ot screenshot. For each webpage, it sends a message to a Pub/Sub topic. This topic is responsible for pushing these messages to the screenshot
microservice. This screenshot
service receives the message as a regular HTTP POST request, screenshots requested webpage and stores the result in a Cloud Storage bucket. The creation of a new file in the bucket triggers the a Cloud Function (image-diff
). The image-diff
function compares the new image with a reference image from the references folder. If a difference is found, it stores the image in a keyframe folder and updates the reference image.
The frontend
service is a web frontend that allows to browse the data: Its main page lists the currently tracked webpages, clicking on a webpage shows all the saved keyframes for this webpage. Users can start tracking a new webpage by entering its URL, which will add a new entity to the Cloud Datastore database.
To recap:
frontend
: A frontend to vizualise which websites are tracked and see their screenshotstask-scheduler
: Every 5 minutes, looks for screenshots to take and schedule them as tasks.screenshot
: Takes screenshots of the given URL, stores it in Cloud Storageimage-diff
: Compares an image with its reference image in Cloud Storage
Give it a try
Find the code on GitHub and watch me demo it on stage: