View on GitHub

Make Me a Hanzi

Free, open-source Chinese character data

Download this project as a .zip file Download this project as a tar.gz file

Make Me a Hanzi annotation tool

Make Me a Hanzi Demo

New: No more cut-off strokes (due to @chanind)!

Make Me a Hanzi provides dictionary and graphical data for over 9000 of the most common simplified and traditional Chinese characters. Among other things, this data includes stroke-order vector graphics for all these characters. You can see the project output at the demo site where you can look up a characters by drawing them. You can also download the data for use in your own site or app.

See the project site for general information and updates on the project.

Make Me a Hanzi data is split into two data files, dictionary.txt and graphics.txt, because the sources that the files are derived from have different licenses. In addition, we provide an experimental tarball of animated SVGs, svgs.tar.gz that is licensed the same way as graphics.txt. See the Sources section and the COPYING file for more information.

Sources

This project would not have been possible without the generosity of Arphic Technology, a Taiwanese font forge that released their work under a permissive license in 1999.

In addition, I would like to thank Gábor Ugray for his thoughtful advice on the project and for verifying stroke data for most of the traditional characters in the two data sets. Gábor maintains Zydeo, a free and open-source Chinese dictionary.

Format

Both dictionary.txt and graphics.txt are ‘\n’-separated lists of lines, where each line is JSON object. They differ in which keys are present, but the common key, ‘character’, can be used to join the two data sets. You can also rely on the fact that the two files will always come in the same order.

dictionary.txt keys:

graphics.txt keys:

TODOs and Future Work

There are quite a few clients using the Make Me a Hanzi data. Many of them have had to do additional preprocessing of it for their use case. If you might find this data useful, please feel free to contact me by email - I may be able to give tips or suggest algorithms for making use of it.