At a quick glance, it’s not obvious for a human, so how does the machine perform? It turns out it does pretty well, check the results in this gallery:
(also find the album on imgur)
For almost each set, there is one tile that is completely wrong, but the rest is at least in the good category. Overall, I am really surprised how well it performs.
Technically it is built entirely in the browser, there is no server side component except the what’s behind the API of course:
- Images are loaded from presets or via the browser’s File API.
- Each tile is converted in its own image, and converted to base 64.
- All of this is sent at once to the Google Cloud Vision API, asking for label detection results (this is what matters to us here, even if the API can do much more like face detection, OCR, landmark detection…)
- Only the label with the highest score is kept from the results and printed back into the main canvas.