背英语单词总是 Abandon? 向多邻国 🦉 取经,让单词主动提醒自己

背单词,在英语学习中实在无法避免,从小学到研究生,甚至部分工作岗位也需要记单词。
但抱着单词书啃,或者手机上一板一眼背单词,效率实在太低。现在 LLM 这么火,为何不利用起来?
毕竟 LLM 中间的 L 就是 Language 的意思,LLM 对付其他严谨任务可能差点意思,但英语等语言可是它的强项,能用技术解决问题,就不要麻烦自己!
本文就来分享一下,怎么用欧路词典 API、LLM 和 Telegram 机器人搞个英语词汇学习助手,让你背单词不再痛苦。
当然,也可能用几天就感觉很烦人,毕竟灵感来自多邻国 App🦉。

项目目标:解放注意力

不用主动去硬背单词,而是让程序代劳,并且要根据单词生成例句和文章,让你更好地理解和记忆。总之,就是让你背单词更轻松、更有效率。

系统架构

流程

  • 欧路词典 API: 欧路词典既可以查词又可以背单词,就选它了。
  • LLM (比如 Groq): 大语言模型,可以根据单词生成各种例句、对话甚至文章,让你更好地理解单词的用法。还可以利用它生成符合雅思 IELTS 写作考核标准的文章,帮助你学习英语写作。Groq 是目前容易免费获取的 LLM API,可以选择 Mixtral 8x7b 模型。
  • Telegram 机器人: Telegram 平台相比微信,QQ 等自由、API 强大,功能丰富,最适合用它来当提醒工具。
  • 格式转换: 由于 Telegram 消息对 Markdown 的限制,和 Telegraph 写作平台对 HTML 标签的限制,我们需要进行格式转换,确保内容能够正确显示。

实现方法

  1. 获取生词列表 接口文档: https://my.eudic.net/OpenAPI/doc_api_study

     获取所有生词本
     接口说明:
    
     请求方式:GET
     请求地址:https://api.frdic.com/api/open/v1/studylist/category
     本接口用于获取生词本信息
     生词本的ID是唯一标识,用于调用其他接口
    
  2. 生成例句和文章

    async def gen_chat_completion(sys_prompt: str, prompt: str) -> str:
        # ... LLM 调用代码 (例如 Groq)...
        chat_completion = await client.chat.completions.create(
            messages=[
                {"role": "system", "content": sys_prompt},
                {"role": "user", "content": prompt},
            ],
            # ... 其他参数设置 ...
        )
        return chat_completion.choices[0].message.content
    

    使用如下 sys_prompt 让 LLM 解释单词:

    You are a professional English teacher.
                     Given a English word, 
                     output their definitions in both English and Chinese, 
                     and synonyms and antonyms.
                     Don't omit any word from the list.
                     Must explain all meanings of the word.
                     Additionally, include all tenses and voices for each word.
                     Then provide collocations and example sentences.
    
                     Format: Markdown.
    
                     Example:
    
                     **desert**
    
                     *Definition*:
    
                     - 🇬🇧: 
                         - n. arid land with little or no vegetation
                         - v. leave someone who needs or counts on you; leave in the lurch
                         - v. desert (a cause, a country or an army), often in order to join the opposing cause, country, or army
                         - adj. located in a dismal or remote area; desolate
                     - 🇨🇳: 
                         - n. 沙漠,荒漠;荒凉的地方;应得的赏罚
                         - v. 离弃,舍弃(某地);抛弃,遗弃(某人);背弃,放弃;擅离,开小差;突然丧失
                         - adj. 无人居住的,荒凉的;像沙漠的
    
    
                     *Synonyms*: 
    
                     - vt. yield , quit
                     - vi. run out on, walk out on
                     - n. sands, wold
                     - adj. arid, desolate, lonely, uninhabited
    
                     *Antonyms*:
                     - adj. inhabited, populated
    
    
                     *Collocation*: 
                     desert island n. 荒岛; sahara desert 撒哈拉大沙漠
    
                     *Sentence*:
    
                     - 🇬🇧: The heat in the desert was extreme.
                     - 🇨🇳: 沙漠中极其炎热
    
                     -----------------             
    

    使用以下 sys_prompt 指导 LLM 生成符合雅思写作标准的文章:

    You are a professional English writer.
    Given some English words, Utilize all of the provided English words to compose an essay that does not exceed 280 words. (Do not omit any word from the list). 
    and following IELTS criteria.
    The topic is in 
    (Art, Business & Money, Communication & Personality, Crime & Punishment, Education, Environment, Family & Children, Food & Diet, Government, Health, Housing, Buildings & Urban Planning, Language, Leisure, Media & Advertising, Reading, Society, Space Exploration, Sport & Exercise, Technology, Tourism and Travel, Transport, Work).
    
    Essay types:
    1.  Opinion Essays 
    2.  Discussion Essays
    3.  Problem Solution Essays
    4.  Advantages & Disadvantages Essays
    5.  Double Question Essays
    
    Criteria:
    --------
    Task achievement
    ...
    
    Coherence and cohesion
    ...
    
    Lexical resource
    ...
    
    Grammatical range and accuracy
    ...
    
    --------
    
    Format: Markdown.
    Include title and content.
    Highlight these given words in the Essay use HTML syntax.
    
  3. 格式转换

    • 将 LLM 生成的 Markdown 格式文本转换为 Telegram 支持的格式,例如使用 telegramify_markdown 库。
    • 将 Markdown 格式文本转换为 Telegraph 支持的 HTML 格式,并进行必要的标签过滤和调整。
     # llm explain
     llm_explain = await gen_chat_completion(
         sys_message_explanation, f'word: "{word}"'
     )
     await context.bot.send_message(
         job.chat_id,
         telegramify_markdown.convert(llm_explain),
         parse_mode="MarkdownV2",
         reply_to_message_id=msg.message_id,
     )
    
    async def write_to_telegraph(html: str) -> str:
     telegraph = Telegraph()
     if not telegraph.get_access_token():
         logging.info(await telegraph.create_account(short_name="anonymous"))
    
     response = await telegraph.create_page(
         f"Glossary {datetime.datetime.now(datetime.UTC)}",
         html_content=html,
     )
     logging.info(response)
     return response["url"]
    
    
     article_url = await write_to_telegraph(
         mistune.create_markdown(
             escape=False,
             hard_wrap=True,
             plugins=["strikethrough", "footnotes", "table", "speedup"],
         )(
             llm_response,
         )
     )
     await context.bot.send_message(job.chat_id, article_url)
    
  4. Edge TTS 朗读

    VOICE = os.getenv("EDGE_TTS_VOICE", "en-US-AvaMultilingualNeural")
    
    async def gen_tts_audio(text: str, filename: str) -> None:
        """Main function"""
        communicate = edge_tts.Communicate(text, VOICE)
        await communicate.save(filename)
    
  5. 定时发送 Telegram 消息

        times = [(x, random.randint(15, 55)) for x in range(8, 23)]
        for h, m in times:
            context.job_queue.run_daily(
                daily_message,
                datetime.time(hour=h, minute=m, tzinfo=pytz.timezone("Asia/Shanghai")),
                name=f"{h}:{m} job",
                chat_id=update.message.chat_id,
            )
    

未来规划

  • 更多生词本来源: 如扇贝、有道,或者本地 mdict 文件等。
  • 更强大的 LLM: 接入 GPT 4 或者 Claude Opus 级别。
  • 更多提醒方式: 如生成 ical 文件,可以添加到日历中。

总而言之,让我们从传统按顺序啃单词的枯燥方式解放,在 AI 时代,充分利用 LLM,重新思考背单词的习惯。

参考

https://github.com/zhoukuncheng/llm-vocabulary-reminder
https://my.eudic.net/OpenAPI/doc_api_study
https://console.groq.com/docs/quickstart