Enhancing Japanese Study with ChatGPT – Part I
I am an exchange student at the University of Tokyo from Germany.
Since I grew up bilingually, I can speak and understand Japanese.
However, I have lived in Germany for the majority of my life, so my language knowledge has mostly been limited to everyday conversations at home, and kanji up to around 5th grade level.
So I decided to sign up for some Japanese lessons at the engineering language school of the University of Tokyo, which I can really recommend.
Learning kanji takes time
There are around 2000 commonly used kanji in Japanese.
My preferred method of studying kanji and vocabularies (or any language for that matter) is to read texts that I personally find interesting: novels, articles about machine learning, papers …
However, constantly pausing to look up words disrupts the flow – it would be useful to have a table of vocabularies to study before reading the text.
This is where AI comes in helpful: Wouldn’t it be cool to have a table created just for you, for your particular level of knowledge, for this particular text?
With ChatGPT, this becomes possible.
All experiments in this article were performed using GPT-4 on ChatGPT Plus.
Extracting difficult vocabularies
The most straight-forward way of getting a list of potentially unknown words from a text would be an instruction like this:
From the text below, extract all general academic words as a list. [Text from paper] |
However, this yields only a fraction of the academic vocabularies in the text. Prompt Engineering Guide, OpenAI recommends employing multi-step instructions with a clear chain of thought to give the model “time to think” in order to get better results.
This is because all reasoning of the model happens while writing, similar to how you would decompose a large task into multiple subtasks (a, b, c) and write down your intermediate answers.
Below is the refined prompt:
[Text from paper] Can you make a list of commonly used terms that I should know for writing a paper, regardless of the field? E.g. words like 前節, 考察, 結論, 構築, 軽微, 考慮. Use the following chain of thought: Sentence: 大規模言語モデル (LLM) の発展とともに、分野や 言語に特化した言語モデルの構築の必要性が議論さ れてきている。 The words that are used in higher education and research are: 発展, 特化する, 構築する, 必要性, 議論 Write down this chain of thought for every single sentence in the paper. Finally produce a table with 3 columns: the word, furigana and German translation |
You can have a look at the complete answer here.
Although this method is more time-consuming, it significantly improves the quality of the answer.
Now, instead of manually typing this long instruction for each text, one might consider creating a custom GPT.
Here, I suggest an alternative approach, which gives you more control over the instructions that are executed: system text replacements.
Commonly used for composing e-mails more quickly, it can also be used to store your custom prompts.
In the screenshot below, I assign the shortcut `/extract` to be expanded to the lengthy instruction.
This feature can be found in the macOS keyboard settings.
On Windows, an additional text expander browser extension is required, since there is currently no system-wide text replacement option.
While this is already a helpful tool when dealing with unknown texts, it would be useful to have a worksheet as pdf for study on your tablet.
Copying the table over to Word and re-applying the layout is tedious.
Instead, we can copy the output as Markdown code with the copy button in ChatGPT, store it in a file named `my-vocab.md` and utilize pandoc.
Pandoc is a tool for converting documents between various markup formats, like Markdown, HTML, LaTeX or PDF.
As I am writing this article, I am writing it in Markdown inside my IDE because of its simplicity and readability.
Pandoc converts my file to HTML and even applies the appropriate css styles for me.
Similarly, we can convert ChatGPT’s output to a PDF worksheet with predefined styling:
$ pandoc my-vocab.md -o my-vocab.pdf –template=japanese-template.tex |
The LaTeX template can be found here.
You might also consider appending furigana.
This can be achieved through pandoc-filter-furigana.
Here is an example instruction to add a fourth column with common usages.
Put the hiragana reading in brackets with a leading dash (-…). In other words: create a markdown link for the word in 使い方.
Here is an example output:
|———|————–|——————|—————–|
| 蔵 | くら | Speicher | \[蔵書\](-ぞうしょ) |
| 皆 | みな | Alle | \[皆様\](-みなさま) |
Extracting vocabularies from a video
I have always loved watching Japanese TV.
It makes me feel more connected with the society and everyday life here.
As a child, I watched「科学大好き土曜塾」every Saturday.
But as I grew up, the TV shows are more directed towards adults, and the vocabularies become more difficult.
This is what got me thinking about creating vocabulary tables for videos.
With the recent open-source release of Whisper by OpenAI, it is possible to transcribe speech in any language.
There is also a more efficient implementation based on C++ called whisper.cpp.
For macOS users, integrating this tool with an Automator shortcut makes it possible to create transcripts from Japanese video files right from a button in the Finder.
Subsequently, a study sheet can be created using the procedure above.
Creating practice texts
Especially for beginners, interesting articles might include too many unknown vocabularies.
In this case, you can have ChatGPT rewrite it in easier Japanese, so that it is comprehensible to you.
This way, you can still read interesting articles and papers of your choice and gradually work on your vocabulary understanding.
This is somewhat similar to the concept of synthetic datasets for training LLMs.
Maybe you have a preferred style of writing that you encountered in a Japanese book or your practice material in language class.
Passing in an example text and have ChatGPT rewrite your practice text in the same style.
Another useful feature is to have ChatGPT write a text using specific vocabularies to practice newly learned words.
Or you might have a collection of vocabularies to study for class.
ChatGPT can compose a text about your topic of interest including your words.
You can then practice your vocabularies in context, linked to an interesting topic that you understand well, which helps memorization.
Below is an example text about natural language processing, with provided vocabularies printed bold.
See here for the full conversation.
Exam preparation
Revising all your vocabulary before an exam can seem daunting.
ChatGPT can be of great help by creating test problems.
What if you want to revise all your vocabs before an exam?
Just write detailed instructions to replicate the exam problems.
Write a text using the above words (not necessarily in this order). Instead of writing the kanji words, make some underscores and put the hiragana spelling in brackets directly after the underscores. Example: 今日の弁当はご飯と梅干しだけの _______ (てぬき) 弁当だ。 |
An example utilizing this prompt inside a custom GPT can be found here.
Subsequently, you can ask ChatGPT to output the solutions.
Conclusion
In this article, we have introduced various methods for enhancing your Japanese learning process, with custom-made kanji tables to supplement learning with texts and videos, practice texts and finally delved into how ChatGPT can help with your exam preparation.
In the upcoming Part II, we will take a look at how to optimally build upon your previous knowledge, and address some limitations that ChatGPT faces with the Japanese language.
[date] 2024-02-01
[update] 2024-03-06
東京大学 羽室開(Kai Hamuro Eberl)
Enhancing Japanese Study with ChatGPT – Part II
Visualizing All Kanji in a Graph