Vocabulary Extractor

Paste any Japanese text and extract every unique word with its hiragana reading, dictionary form, and part of speech. Built for sentence mining, JLPT prep, and turning native content into a study-ready vocabulary list.

0 / 2000 characters
Try an example:

No words yet

Paste a Japanese passage above and click Extract Vocabulary, or pick one of the example texts. The extractor returns a deduplicated table of every content word with its reading and dictionary form.

Advertisement

Support free Japanese study resources

Advertisement area

How the Vocabulary Extractor Works

Morphological Tokenization

The tool runs your text through kuromoji, the same Japanese morphological analyser used by IMEs and dictionary apps. It segments compound text into individual tokens, tags each with its part of speech, and resolves inflected forms (食べました, 食べて, 食べない) back to a single base form (食べる).

Smart Deduplication

Words are deduplicated by base form, so 行きます and 行った count as one entry — 行く — rather than two. The frequency column shows how many times each lemma appeared in the input, which is the single best signal for what to study first.

Particle Filtering

Particles (は, を, が, に, で …), punctuation, and symbols are filtered out automatically. The table contains only content words — nouns, verbs, adjectives, and adverbs — so your sentence-mining export stays clean and high-signal.

Advertisement

Support free Japanese study resources

Advertisement area

Frequently Asked Questions

What is vocabulary extraction and who is it for?

Vocabulary extraction is the process of taking a Japanese sentence or paragraph and pulling out every unique word — together with its reading, dictionary form (lemma), and part of speech — so you can study the words instead of re-reading the same text. It is the core technique behind sentence mining, the study method made popular by polyglots and JLPT learners who build their vocabulary list from native material rather than from a generic textbook word list. Paste a sentence from a manga panel, a news headline, a YouTube subtitle, or a textbook reading, and the extractor returns a clean study-ready table.

How does the tokenizer work — is it the same engine as IMEs?

Yes. The extractor uses kuromoji, the same Japanese morphological analyser that powers most Japanese-language IMEs, dictionaries, and learning apps. Kuromoji segments the input into tokens, attaches a reading in katakana (which the tool converts to hiragana for readability), a base form / lemma, and a part-of-speech tag. The engine handles inflected verbs (食べました → 食べる), conjugated adjectives (寒かった → 寒い), and compound forms accurately for everyday modern Japanese. Rare, archaic, or brand-new slang may still produce a less precise lemma — the table marks those entries so you can verify them in the dictionary before adding them to your study list.

What is sentence mining and how do I use this tool for it?

Sentence mining is a study technique where you collect i+1 sentences — sentences where you understand everything except one word — and turn each unknown word into a flashcard. The classic workflow is: read native content, find a sentence with one new word, extract that word into Anki with its reading and meaning, then drill it. This extractor accelerates step two: paste the sentence, scan the table, identify the word you do not know, and export the row to your study list. Doing this consistently with 5–10 sentences a day is the fastest realistic path from N3 to N1, because every word you study is one you have already encountered in context.

Can I export the vocabulary list to Anki or a spreadsheet?

Yes. Click the "Copy CSV" button and the entire table is copied to your clipboard in the format word,reading,base,pos. Paste it into a spreadsheet (Excel, Google Sheets, Numbers) or directly into Anki via File → Import → CSV. From there you can add your own English meanings, example sentences, or audio. The CSV format also pastes cleanly into Notion, Obsidian, and most flashcard apps that accept tabular imports. Add-on tip: combine this with the Japanese Dictionary tool to look up English meanings for any row that needs more context.

Does the extractor show JLPT level for each word?

Not directly in the extractor table — JLPT level data is shown in the Japanese Dictionary and Vocabulary Explorer tools (both linked below) where each word has a dedicated entry with its JLPT band, example sentences, and related vocabulary. The extractor focuses on speed: it identifies the word, gives you the reading and lemma so you can search the dictionary, and lets you triage 20–50 words from a passage in seconds. To check JLPT level for a specific word, click into the Japanese Dictionary tool and search by the base form returned here.

Why are particles and punctuation skipped from the output?

Particles like は, を, が, に, で are grammar function words rather than vocabulary you "learn" the way you learn nouns or verbs. Including them in a sentence-mining export would flood your study list with the same 10 particles in every sentence and dilute the signal. The extractor filters them automatically, along with punctuation and symbols, so the table contains only content words — nouns, verbs, adjectives, adverbs, and proper nouns. If you want to study particle usage specifically, use the Particle Quiz tool linked below instead.

How long can the input text be?

The tool accepts up to about 2,000 characters per run — roughly a long paragraph or a short news article. Inside that limit the tokenizer runs in well under a second on a typical broadband connection. For longer documents (chapters, full articles), split the text into 1–2 paragraph chunks and run the extractor several times — this also produces more manageable study batches of 20–40 unique words rather than 200+ at once. The character counter under the textarea shows you exactly how much of the budget you have used.

Build your N3 vocabulary the smart way

Combine the extractor with a structured JLPT N3 vocabulary list. Review every word with example sentences, reading drills, and spaced-repetition-ready exports — all free.

Open the N3 Vocabulary List