Always Abandoning English Vocabulary? Learning from Duolingo 🦉 to Let Words Remind You

Memorizing vocabulary is inevitable in learning English. From primary school to graduate school, and even in some jobs, you need to remember words.
But gnawing on vocabulary books or rigidly memorizing on a phone is extremely inefficient. Now that LLMs are so popular, why not use them?
After all, the ‘L’ in LLM stands for Language. LLMs might lack precision in other rigorous tasks, but languages like English are their forte. If technology can solve a problem, don’t trouble yourself!
This article shares how to use the Eudic Dictionary API, an LLM, and a Telegram Bot to create an English vocabulary learning assistant, making vocabulary memorization less painful.
Of course, you might find it annoying after a few days, as the inspiration comes from the Duolingo App 🦉.

Project Goal: Liberating Attention

Instead of actively trying to memorize words, let the program do it for you. It should generate example sentences and articles based on the words to help you understand and remember them better. In short, make memorizing words easier and more efficient.

System Architecture

Flowchart

  • Eudic Dictionary API: Eudic Dictionary can be used for both looking up words and memorizing them, so I chose it.
  • LLM (e.g., Groq): A large language model can generate various example sentences, dialogues, and even articles based on words, helping you better understand their usage. It can also be used to generate articles that meet IELTS writing standards to help you learn English writing. Groq is currently an easily accessible free LLM API, and you can choose the Mixtral 8x7b model.
  • Telegram Bot: Compared to WeChat, QQ, etc., the Telegram platform is more open, has powerful APIs, and rich features, making it the most suitable for a reminder tool.
  • Format Conversion: Due to Telegram’s limitations on Markdown and the Telegraph writing platform’s limitations on HTML tags, we need to perform format conversion to ensure content displays correctly.

Implementation Method

  1. Get Vocabulary List
    Interface documentation: https://my.eudic.net/OpenAPI/doc_api_study

     Get all vocabulary notebooks
     Interface description:
    
     Request method: GET
     Request address: https://api.frdic.com/api/open/v1/studylist/category
     This interface is used to obtain vocabulary notebook information
     The ID of the vocabulary notebook is a unique identifier used to call other interfaces
    
  2. Generate Example Sentences and Articles

    async def gen_chat_completion(sys_prompt: str, prompt: str) -> str:
        # ... LLM call code (e.g., Groq)...
        chat_completion = await client.chat.completions.create(
            messages=[
                {"role": "system", "content": sys_prompt},
                {"role": "user", "content": prompt},
            ],
            # ... other parameter settings ...
        )
        return chat_completion.choices[0].message.content
    

    Use the following sys_prompt to let the LLM explain words:

    You are a professional English teacher.
                     Given a English word, 
                     output their definitions in both English and Chinese, 
                     and synonyms and antonyms.
                     Don't omit any word from the list.
                     Must explain all meanings of the word.
                     Additionally, include all tenses and voices for each word.
                     Then provide collocations and example sentences.
    
                     Format: Markdown.
    
                     Example:
    
                     **desert**
    
                     *Definition*:
    
                     - 🇬🇧: 
                         - n. arid land with little or no vegetation
                         - v. leave someone who needs or counts on you; leave in the lurch
                         - v. desert (a cause, a country or an army), often in order to join the opposing cause, country, or army
                         - adj. located in a dismal or remote area; desolate
                     - 🇨🇳: 
                         - n. 沙漠,荒漠;荒凉的地方;应得的赏罚
                         - v. 离弃,舍弃(某地);抛弃,遗弃(某人);背弃,放弃;擅离,开小差;突然丧失
                         - adj. 无人居住的,荒凉的;像沙漠的
    
    
                     *Synonyms*: 
    
                     - vt. yield , quit
                     - vi. run out on, walk out on
                     - n. sands, wold
                     - adj. arid, desolate, lonely, uninhabited
    
                     *Antonyms*:
                     - adj. inhabited, populated
    
    
                     *Collocation*: 
                     desert island n. 荒岛; sahara desert 撒哈拉大沙漠
    
                     *Sentence*:
    
                     - 🇬🇧: The heat in the desert was extreme.
                     - 🇨🇳: 沙漠中极其炎热
    
                     -----------------             
    

    Use the following sys_prompt to guide the LLM to generate articles meeting IELTS writing standards:

    You are a professional English writer.
    Given some English words, Utilize all of the provided English words to compose an essay that does not exceed 280 words. (Do not omit any word from the list). 
    and following IELTS criteria.
    The topic is in 
    (Art, Business & Money, Communication & Personality, Crime & Punishment, Education, Environment, Family & Children, Food & Diet, Government, Health, Housing, Buildings & Urban Planning, Language, Leisure, Media & Advertising, Reading, Society, Space Exploration, Sport & Exercise, Technology, Tourism and Travel, Transport, Work).
    
    Essay types:
    1.  Opinion Essays 
    2.  Discussion Essays
    3.  Problem Solution Essays
    4.  Advantages & Disadvantages Essays
    5.  Double Question Essays
    
    Criteria:
    --------
    Task achievement
    ...
    
    Coherence and cohesion
    ...
    
    Lexical resource
    ...
    
    Grammatical range and accuracy
    ...
    
    --------
    
    Format: Markdown.
    Include title and content.
    Highlight these given words in the Essay use HTML syntax.
    
  3. Format Conversion

    • Convert LLM-generated Markdown text to a format supported by Telegram, for example using the telegramify_markdown library.
    • Convert Markdown text to HTML format supported by Telegraph, and perform necessary tag filtering and adjustments.
     # llm explain
     llm_explain = await gen_chat_completion(
         sys_message_explanation, f'word: "{word}"'
     )
     await context.bot.send_message(
         job.chat_id,
         telegramify_markdown.convert(llm_explain),
         parse_mode="MarkdownV2",
         reply_to_message_id=msg.message_id,
     )
    
    async def write_to_telegraph(html: str) -> str:
     telegraph = Telegraph()
     if not telegraph.get_access_token():
         logging.info(await telegraph.create_account(short_name="anonymous"))
    
     response = await telegraph.create_page(
         f"Glossary {datetime.datetime.now(datetime.UTC)}",
         html_content=html,
     )
     logging.info(response)
     return response["url"]
    
    
     article_url = await write_to_telegraph(
         mistune.create_markdown(
             escape=False,
             hard_wrap=True,
             plugins=["strikethrough", "footnotes", "table", "speedup"],
         )(
             llm_response,
         )
     )
     await context.bot.send_message(job.chat_id, article_url)
    
  4. Edge TTS Reading

    VOICE = os.getenv("EDGE_TTS_VOICE", "en-US-AvaMultilingualNeural")
    
    async def gen_tts_audio(text: str, filename: str) -> None:
        """Main function"""
        communicate = edge_tts.Communicate(text, VOICE)
        await communicate.save(filename)
    
  5. Scheduled Telegram Messages

        times = [(x, random.randint(15, 55)) for x in range(8, 23)]
        for h, m in times:
            context.job_queue.run_daily(
                daily_message,
                datetime.time(hour=h, minute=m, tzinfo=pytz.timezone("Asia/Shanghai")),
                name=f"{h}:{m} job",
                chat_id=update.message.chat_id,
            )
    

Future Plans

  • More Vocabulary Sources: Such as Shanbay, Youdao, or local mdict files, etc.
  • More Powerful LLMs: Integrating GPT-4 or Claude Opus level models.
  • More Reminder Methods: Such as generating ical files to add to calendars.

In short, let’s liberate ourselves from the boring traditional way of memorizing words sequentially. In the AI era, make full use of LLMs and rethink the habit of vocabulary memorization.

References

https://github.com/zhoukuncheng/llm-vocabulary-reminder
https://my.eudic.net/OpenAPI/doc_api_study
https://console.groq.com/docs/quickstart