Vocabulary Extractor
Introduction
Vocabulary Extractor is a program to split any text into individual words, summarizing information about each unique word. The information is presented in the form of a tab-delimited matrix, so that the results can be easily copied and pasted into a spreadsheet program like Excel.
The program can be extended in three different ways: dictionaries, extra columns, and filtered words. Dictionaries can be changed by adding in extra files into certain directories. The distribution includes a copy of CC-CEDICT and VNEDICT, but alternative dictionaries can be used as a replacement or in combination.
The word summary after text analysis can be modified by adding extra word data files, which will be incorporated into the output as extra columns.
If you need to filter out words from the output (for example, to eliminate words already learned), word lists can be added, and will be used to filter out matching words.
Download
Source code
This project is hosted on GitHub, and the source tree can be cloned using Git tools.Windows
Current version: Vocabulary_Extractor_0.9.0-Windows.zip (2026-04-30)
Linux
On Linux, the program can be executed as a Python 3 script. See the Linux instructions that should work reliably on modern Linux flavors with Python version 3 installed.
Help
See the Help document for more details
