AI医生赛博算命:一款新型AI可预测人一生中患病风险 | 经济学人
作者:微信文章点击蓝字 关注大厨
https://mmbiz.qpic.cn/sz_mmbiz_png/p4hYM0n6exxQC3FdbgZHDXOreCUibAb21Q61GTh0g3ALdCq45rCUvmOicGicutGdESF2v3UkclWjV7VCx8lWCz3rQ/640?wx_fmt=other&tp=webp&wxfrom=5&wx_lazy=1#imgIndex=1
提示:
本篇原文约928词
文后附有单词讲解、思维导图和阅读练习
概要
想知道你未来可能得什么病吗?科学家们开发了一款人工智能模型,能够通过分析你的健康数据,预测你未来患上疾病的风险。虽然离实际应用还有距离,但其仍为预测疾病带来了可能,还可能揭示疾病之间隐藏的秘密联系。
相关阅读与写作热门话题:人工智能、医疗与健康、AI替代
01 原文及翻译
A new AI model can forecast a person’s risk of diseases across their life
一款新型人工智能可预测人一生中患病风险
英文文章选自经济学人 20250920期科技板块
Delphi-2M can predict which of more than 1,000 conditions a person might face next
Delphi-2M模型能够预测一个人接下来可能面临的1000多种疾病中的任何一种
不好意思拿错剧本了
图源:经济学人官网/Francesco Cicolella
MUCH OF THE art of medicine involves working out, through detailed questioning and physical examination, which disease a given patient has contracted. Far harder, but no less desirable, would be identifying which diseases a patient might develop in the future. This is what the team behind a new artificial-intelligence (AI) model, details of which were published in Nature on September 17th, claims to do.
医学的艺术很大程度上在于通过细致的问诊和体格检查,判断出某位病人究竟患上了哪种疾病。而一项难度更大但同样至关重要的任务,是识别出病人未来可能患上哪些疾病。这正是一个新人工智能模型背后的团队所宣称能够做到的,该模型的细节已于9月17日发表在《自然》(Nature)上。
Though the model, named Delphi-2M, is not yet ready for deployment in hospitals, its creators hope it could one day allow doctors to predict if their patients are likely to get one of more than 1,000 different conditions, including Alzheimer’s disease, cancer and heart attacks, which all affect many millions every year. In addition to helping flag patients who are at high risk, it might also help health authorities allocate budgets for disease areas that may need extra funds in the future.
该模型名为Delphi-2M,虽然尚未准备好在医院部署,但其开发者希望,有朝一日它能让医生预测患者是否可能罹患1000多种不同疾病中的某一种,包括每年影响数百万人的阿尔茨海默病(Alzheimer’s disease)、癌症和心脏病。除了帮助识别出高风险患者外,该模型还可能协助卫生部门为未来可能需要额外资金的疾病领域分配预算。
The model was developed by teams at the European Molecular Biology Laboratory (EMBL) in Cambridge and the German Cancer Research Centre in Heidelberg. It takes inspiration from large language models (LLMs)—such as GPT-5, which powers ChatGPT—that are capable of producing fluent prose. LLMs are trained to spot patterns in enormous amounts of text scraped from the internet, which allows them to select the word most likely to come next in any given sentence. Delphi-2M’s creators reasoned that an AI model fed on large amounts of human-health data could have similar predictive power.
该模型由位于剑桥的欧洲分子生物学实验室(European Molecular Biology Laboratory, EMBL)和位于海德堡的德国癌症研究中心(German Cancer Research Centre)的团队共同开发。其灵感来自大语言模型(LLMs),例如驱动 ChatGPT 的 GPT-5,这类模型能够流畅生成文本。大语言模型通过从互联网抓取的海量文本中识别模式,从而预测句子中下一个最可能出现的词。Delphi-2M的开发者们推断,若将大量人类健康数据输入人工智能模型,也可能获得类似的预测能力。
In many respects, the design of established LLMs was well-suited to the task. One major tweak that was needed, however, was to teach such a model to account for the time that had passed between events in a patient’s life. In written text, consecutive words immediately follow one another; the same is not true for diagnoses in a patient’s history. High blood pressure following a positive pregnancy test, for example, requires different interpretations depending on whether the two are separated by weeks—in which cases the pregnancy can be affected—or years.
在许多方面,现有大语言模型的设计非常适合这项任务。然而,一个关键的调整是,需要教会模型考虑病人生活中各项事件之间的时间跨度。在书面文本中,连续的词语是紧密相连的;但病患史中的诊断记录却并非如此。例如,妊娠试验呈阳性后出现高血压,其解读会因两者相隔数周还是数年而大相径庭,若相隔数周,可能会影响妊娠。
This adjustment was performed by swapping out the part of an LLM that encodes a word’s position for one encoding a person’s age. (It wasn’t without mishaps: in an early version of the model new diagnoses were sometimes predicted after a person had died.)
为了实现这一调整,研究人员将大语言模型中编码单词位置的部分替换为编码个人年龄的部分。(这个过程并非一帆风顺:在模型的早期版本中,有时竟会预测出某人在去世后又得了新的疾病。)
Delphi-2M was then trained on data from 400,000 people from UK Biobank, a database that contains arguably the world’s most complete human biological data set. The model was given the timing and sequence of ICD-10 codes, the international medical shorthand doctors use to register officially recognised diagnoses, representing the 1,256 different diseases that appeared in the Biobank data set. The model was subsequently validated on data from the remaining 100,000 people in the Biobank before being tested further on Danish health records, which are famously long-running and thorough. In this case, the team used data from 1.9m Danes going back to 1978, ensuring a much more diverse and representative sample than the UK Biobank could provide.
随后,Delphi-2M利用来自英国生物样本库(UK Biobank)40万人的数据进行训练,该数据库堪称拥有世界上最完整的人类生物数据集。模型接收了ICD-10编码的时间和序列信息,这是医生用来登记官方确诊疾病的国际医疗简称,涵盖了生物样本库数据集中出现的1256种不同疾病。模型随后利用该样本库中剩余10万人的数据进行了验证,之后又在以长期、详尽著称的丹麦健康记录上进行了进一步测试。在这次测试中,团队使用了自1978年以来190万丹麦人的数据,确保了样本比英国生物样本库更多样化、更具代表性。
To judge the model’s performance, researchers measured its AUC (short for “area under the curve”, referencing a region in a probability chart), in which a value of 1 would mean perfect predictions and 0.5 would be no better than random. For predictions of diagnoses within five years of a previous one, on average Delphi-2M performed at a value of 0.76 on British data, with a small drop to 0.67 for the Danish data. Events that would often follow a specific previous one—death following sepsis, say—were correctly predicted more often, whereas those caused by more random, external factors, such as picking up a virus, were harder to predict. Unsurprisingly, the model’s accuracy also dropped a little over time: when forecasting ten years into the future, it scored 0.7 on average.
为了评估模型的性能,研究人员测量了其AUC值(“曲线下面积”的缩写,指概率图中的一个区域),AUC值为1表示预测完全准确,0.5则表示与随机猜测无异。在预测某个诊断发生后五年内的其他诊断时,Delphi-2M在英国数据上的平均AUC值为0.76,在丹麦数据上则略降至0.67。那些通常会紧随某一特定事件发生的情况的预测准确率更高,例如罹患败血症后死亡。而由更随机的外部因素导致的事件则更难预测,例如感染病毒。不出所料,模型的准确性也随时间推移而略有下降:在预测未来十年的疾病时,其平均AUC值为0.7。
Real-world applications remain far off for now. Delphi-2M will first need to go through a much more rigorous trial period giving clinicians the opportunity to explore if it leads to better outcomes for their patients. That process could take many years. The Delphi-2M team is also working on updating the model to enable it to take in more sophisticated data than chronological lists of diagnoses. As the UK Biobank also contains medical images and genome sequences, adding this data to the model might further improve its accuracy.
目前来看,该模型的实际应用依然遥遥无期。Delphi-2M首先需要经历一段更为严格的试验期,让临床医生有机会验证它是否能改善患者的治疗结果。这个过程可能需要数年时间。Delphi-2M团队也正在努力更新模型,使其能够处理比按时间顺序排列的诊断列表更复杂的数据。鉴于英国生物样本库还包含医学影像和基因组序列,将这些数据纳入模型或许能进一步提高其准确性。
As impressive as Delphi-2M appears, it is not the only artificial health forecaster in town. For instance, an AI model called Foresight, originally developed at King’s College London in 2024, also uses patients’ medical histories to predict future health events. (A larger version of the project was paused in June following concerns that NHS England had not sought the proper approvals when it gave the Foresight team access to the data.) The ETHOS model being developed at Harvard University also has similar aims.
尽管Delphi-2M的表现令人印象深刻,但它并非唯一的人工智能健康预测工具。例如,一款名为Foresight的人工智能模型,最初于2024年在伦敦国王学院(King’s College London)开发,同样利用患者的病史来预测未来的健康事件。该项目的一个更大型版本已于六月暂停,原因在于有人担忧英格兰国家医疗服务体系(NHS England)在向Foresight团队提供数据访问权限时未获得适当批准。哈佛大学(Harvard University)正在开发的ETHOS模型也有类似的目标。
Although patients will have to wait to feel the direct benefits of Delphi-2M, even the preliminary version of the model already offers a potential treasure trove for biologists. Its style of prediction reveals which conditions cluster together, which may in turn suggest previously unexplored relationships between diseases. Future, beefier AI models, could take that work even further. The possibilities are exciting, says Ewan Birney, a geneticist at EMBL. “I’m like a kid in a candy shop.” ■
尽管患者还需要等待才能直接感受到Delphi-2M带来的好处,但即便是模型的初步版本,也已为生物学家们提供了一个潜在的宝库。其预测方式揭示了哪些疾病倾向于聚集出现,这反过来又可能暗示了以往未被探索过的疾病间的关联。未来更强大的人工智能模型有望将这项工作推向更远。“这些可能性激动人心”,欧洲分子生物学实验室的遗传学家伊万·伯尼(Ewan Birney)说道,“我就像一个进了糖果店的孩子”。 ■
02 阅读理解练习题
What is the primary innovation of the Delphi-2M model compared to standard Large Language Models (LLMs) as described in the text?
(A) It was trained on a much larger and more diverse dataset from multiple countries.
(B) It incorporates a mechanism to account for the variable time intervals between life events.
(C) It can predict diseases with perfect accuracy (an AUC score of 1).
(D) It is the only AI model capable of forecasting future health events.
答案在文末哦先来复习一下单词吧
03 重点单词&词组
1. forecast /ˈfɔː(r)kɑːst/ v. 预测;预报
英文释义: to predict or estimate a future event or trend.
考纲: IELTS
文中用法: A new AI model can forecast a person’s risk of diseases across their life
例句:
Experts forecast a significant increase in oil prices next year. 专家预测明年油价将大幅上涨。 It is difficult to forecast the outcome of the election with any certainty. 很难准确地预测选举结果。
2. contract /kənˈtrækt/ v. 感染(疾病);(肌肉)收缩;签订合同
英文释义: to catch or become ill with a disease.
考纲: IELTS, 专八
文中用法: ...which disease a given patient has contracted.
例句:
He contracted malaria while traveling in a tropical region. 他在热带地区旅行时感染了疟疾。 The company contracted with a supplier for the necessary raw materials. 公司与一家供应商签订了必要原材料的合同。
3. deployment /dɪˈplɔɪmənt/ n. 部署;配置;展开
英文释义: the action of bringing resources into effective action.
考纲: IELTS
文中用法: Though the model, named Delphi-2M, is not yet ready for deployment in hospitals...
例句:
The deployment of troops to the border was a precautionary measure. 向边境部署军队是一项预防措施。 The successful deployment of the new software system improved the company's efficiency. 新软件系统的成功部署提高了公司的效率。
4. allocate /ˈæləkeɪt/ v. 分配;分派;拨给
英文释义: to distribute (resources or duties) for a particular purpose.
考纲: GRE, IELTS, 专八
文中用法: ...it might also help health authorities allocate budgets for disease areas...
例句:
The government decided to allocate more funds to education and healthcare. 政府决定向教育和医疗领域拨配更多资金。 The manager will allocate tasks to each member of the team. 经理将为团队的每位成员分配任务。
5. prose /prəʊz/ n. 散文;白话文
英文释义: written or spoken language in its ordinary form, without metrical structure.
文中用法: ...that are capable of producing fluent prose.
例句:
Her prose is known for its clarity and elegance. 她的散文以清晰和优雅著称。 The book is written in a simple, straightforward prose. 这本书是用简单直白的散文体写成的。
6. consecutive /kənˈsekjətɪv/ adj. 连续的;连贯的
英文释义: following each other continuously.
考纲: GRE, IELTS
文中用法: In written text, consecutive words immediately follow one another...
例句:
The team has won five consecutive games. 该队已连续赢得五场比赛。 He was absent from work for three consecutive days due to illness. 他因病连续三天没有上班。
7. mishap /ˈmɪshæp/ n. 不幸事故;小灾难
英文释义: an unlucky accident.
文中用法: It wasn’t without mishaps: in an early version of the model new diagnoses were sometimes predicted after a person had died.
例句:
The entire journey was completed without mishap. 整个旅程顺利完成,没有发生任何意外。 We had a slight mishap with the car, but no one was hurt. 我们的车出了点小事故,但没有人受伤。
8. validate /ˈvælɪdeɪt/ v. 证实;验证;使生效
英文释义: to check or prove that something is true, accurate, or effective.
考纲: GRE, IELTS, 专八
文中用法: The model was subsequently validated on data from the remaining 100,000 people...
例句:
Scientists conducted several experiments to validate the new theory. 科学家们进行了多次实验来验证这一新理论。 You need to validate your ticket at the machine before boarding the train. 在上火车前,你需要在机器上验证你的车票。
9. rigorous /ˈrɪɡərəs/ adj. 严格的;严谨的;严酷的
英文释义: extremely thorough, exhaustive, or accurate.
考纲: GRE, IELTS, 专八
文中用法: Delphi-2M will first need to go through a much more rigorous trial period...
例句:
The research was praised for its rigorous methodology. 这项研究因其严谨的方法论而受到赞扬。 All new pilots must undergo a rigorous training program. 所有新飞行员都必须接受严格的培训计划。
10. preliminary /prɪˈlɪmɪnəri/ adj. 初步的;预备的
英文释义: preceding or done in preparation for something fuller or more important.
考纲: GRE, IELTS, 专八
文中用法: ...even the preliminary version of the model already offers a potential treasure trove for biologists.
例句:
The preliminary results of the study are very promising. 研究的初步结果非常鼓舞人心。 After a few preliminary remarks, the chairman introduced the main speaker. 在几句开场白之后,主席介绍了主讲人。
11. trove /trəʊv/ n. 宝库;珍藏
英文释义: a store of valuable or delightful things.
文中用法: ...a potential treasure trove for biologists.
例句:
The museum's archives are a treasure trove of historical documents. 博物馆的档案室是历史文件的宝库。 The old bookshop was a trove of rare and forgotten titles. 这家旧书店是稀有和被遗忘书籍的宝库。
12. cluster /ˈklʌstə(r)/ v. 聚集;成群
英文释义: to form or gather together in a small, close group.
文中用法: Its style of prediction reveals which conditions cluster together...
例句:
A group of reporters cluster around the celebrity. 一群记者簇拥在名人周围。 The stars cluster together in distant galaxies. 恒星在遥远的星系中聚集在一起。
重点词组
1. take inspiration from 从…获得灵感
文中用法: It takes inspiration from large language models (LLMs)...
例句:
The architect took inspiration from nature when designing the building. 建筑师在设计这座建筑时从大自然中获得了灵感。 Many young artists take inspiration from the works of the great masters. 许多年轻艺术家从大师们的作品中汲取灵感。
2. far off 遥远的;久远的
文中用法: Real-world applications remain far off for now.
例句:
The day when we can travel to other galaxies seems very far off. 我们可以前往其他星系的日子似乎还很遥远。 Retirement is still far off for me, so I haven't thought much about it. 退休对我来说还很遥远,所以我还没有过多考虑。
3. go through 经历;经受
文中用法: Delphi-2M will first need to go through a much more rigorous trial period...
例句:
She had to go through a difficult time after losing her job. 她失业后不得不经历一段艰难的时期。 All applications must go through a formal approval process. 所有的申请都必须经过正式的审批程序。
04 思维导图
05 练习题答案
正确答案:(B)
解析:
(A) 选项不完全准确。虽然文章提到了模型在英国和丹麦的数据集上进行了训练和测试,但其创新并非仅仅是数据集的规模或多样性,而是其模型的内部结构调整。
(B) 选项正确。文章明确指出,与处理连续单词的大语言模型不同,医疗诊断事件之间的时间间隔是变化的。Delphi-2M的一个重大调整就是教会模型考虑病人生活中各项事件之间的时间跨度,这是通过将编码单词位置的部分替换为编码个人年龄的部分来实现的。
(C) 选项错误。文章提到模型的AUC值为0.76和0.67,这表明其预测能力优于随机猜测,但远未达到1所代表的精准预测。
(D) 选项错误。文章在倒数第二段明确提到了其他类似的模型,如Foresight和ETHOS,说明Delphi-2M并非该领域唯一的模型。
你答(猜)对了吗?欢迎在评论区留言。
好不容易看到这了,点点♥点点赞吧大家的点赞和♥就是我坚持更新的动力。
页:
[1]