For my diving project, I will need a small fish database.
- I don’t want an exhaustive database (most common fishes will be OK).
- I don’t want to replicate an existing database, if it exists, then I should pull data from it.
- If my database is built collaboratively, I don’t want this process to be hard to do.
- I’m a Wikipedia enthusiast, and I want people to contribute to Wikipedia.
- My website is internationalized, however, I want a fishes to be shared between languages.
So I think a great system would be:
- Is the fish in my database ? If yes, use it.
- If no, look for its wikipedia page. If you find it, then create the fish in my database using this Wikipedia URL, and go to 1.
- If the wikipedia page doesn’t exist, create it, then go to 2.
Let’s analyse this system:
- enrich Wikipedia
- do not duplicate the information
- a lot of data already exists
- Wikipedia pages are text and do not provide an easily callable API for entity properties (but fish pages contain a Taxobox, which would facilitate the parsing)
So far, I have to investigate:
- Wikipedia: how to get the page content ?
- Freebase: is it not too much ?
- dbpedia: API ? fishes ?
- fishbase.org : license ? API ?
And if I use a third party website (such as Freebase, dbpedia), then a problem is that I would have to wait for their database to be updated.