Evaluate text difficulty by tracking character counts and HSK proficiency levels. Identify unique characters, view frequency rankings, and generate pinyin pronunciations. This tool helps students monitor their progress, teachers assess reading materials, and content creators align their writing with specific learner levels.
To begin, paste your Chinese text into the input field and click "Count Characters." The tool will provide a detailed breakdown of character distribution, HSK levels, and frequency. All processing occurs locally in your browser for total privacy.
You can type or paste any Chinese text into the analysis area. The tool supports both simplified and traditional characters and can handle mixed-language content (Chinese and English). Use the "Clear Text" button to remove the sample content before adding your own.
HSK 1–2 (Foundation): Includes approximately 300 characters that typically account for 60–70% of everyday text. Common examples include 我 (wǒ, I), 你 (nǐ, you), and 是 (shì, to be). Texts primarily composed of these characters are ideal for beginners.
HSK 3–4 (Intermediate): Covers roughly 1,200 characters. At this level, learners can navigate daily life and workplace communications. These texts are generally suitable for students with one to two years of study, allowing them to read news articles with limited dictionary assistance.
HSK 5–6 (Advanced): Spans 2,500 to 5,000 characters. These levels enable the reading of native-level materials, including literature and academic papers. A high concentration of HSK 5–6 characters indicates sophisticated content that requires advanced proficiency.
Unclassified Characters: This category includes rare characters, specific names, and specialized terminology not found in standard HSK vocabulary. A high "unclassified" count may suggest the text is highly technical or uses classical Chinese elements.
The tool generates a list of the 20 most frequent characters in your text. While Chinese contains over 50,000 characters, the top 500 account for approximately 75% of most written material, and the top 1,000 cover about 89%. Focusing on high-frequency characters is the most efficient way to improve reading comprehension.
You can use the frequency list to identify the theme of a text. For example, technical articles often show a high frequency for characters like 电 (diàn, electricity) or 机 (jī, machine), while business content emphasizes 公 (gōng, public) and 司 (sī, company).
Additionally, comparing the frequency list against your own knowledge helps determine if a text is appropriate for your level. If you do not recognize many of the top 20 characters, the material may be too difficult. If you recognize the characters but still struggle, the challenge likely lies in grammar or specific word combinations.
Diversity measures the ratio of unique characters to total characters. For example, a 30% score indicates 30 unique characters for every 100 characters of text. Children's books usually have 20–30% diversity because they use repetition to aid learning. News articles typically range from 35–45%, while academic papers often exceed 50% due to precise, varied vocabulary.
This grid displays every unique character found in your text alongside its pinyin pronunciation. This provides a faster way to assess vocabulary than reading the entire text. Note that each character displays its most common pronunciation; for characters with multiple readings (duōyīnzì), the correct pronunciation is determined by the specific context within the text.
Advanced learners can also use the grid to spot radicals and components. For instance, characters sharing the water radical (氵), such as 河 (hé, river) and 海 (hǎi, sea), often appear together in related topics.
Text Selection: Before starting a new book or article, paste it here. If more than 70% of the characters fall within your current HSK level, the text is likely manageable. This helps prevent the frustration of choosing overly difficult material.
Progress Tracking: Analyze the same text at different stages of your studies. As your proficiency grows, the percentage of characters you recognize will increase, and the HSK distribution will shift toward your current level.
Vocabulary Lists: You can identify unfamiliar characters and export them to flashcard apps like Anki or Pleco. Prioritizing high-frequency characters ensures you are studying the words most likely to appear in future readings.
Teacher Resources: Educators can use the tool to ensure classroom materials match student proficiency. It also allows teachers to identify potentially difficult characters and pre-teach them before the lesson.
Chinese vs. Non-Chinese: Modern texts often mix Hanzi with Latin letters and numbers. A high non-Chinese count usually indicates technical writing, English brand names, or mixed-language content.
Punctuation: The tool tracks both Chinese punctuation (,。!?) and standard Western marks. Comparing these can sometimes indicate if a text is an original Chinese composition or a translation.
Simplified vs. Traditional: The counter recognizes both sets. However, it treats the simplified and traditional versions of the same character as separate entries (e.g., 国 and 國 are counted individually).
Character vs. Word Counting: This tool focuses on individual characters, not words. A two-character word like 学习 (xuéxí, to study) is counted as two characters. Because character recognition is the foundation of Chinese literacy, tracking individual Hanzi is a vital metric for learners.