Charlotta Lindvall, MD, PhD
Palliative Medicine, Dana-Farber
The latest AI tools are extracting key patient‑centered details from clinical notes, helping clinicians better understand and respond to patient needs.
At Dana-Farber Cancer Institute, Charlotta Lindvall, MD, PhD, and her team are transforming how clinicians understand and respond to patient experiences—especially pain, symptoms, and goals of care—by applying artificial intelligence (AI), natural language processing (NLP), and large language models (LLMs) to clinical records, which have traditionally contained a wealth of information that is hard to retrieve and apply in any meaningful way. Now this work is helping unlock these insights buried in unstructured text, making it possible to study and improve care in ways that were previously out of reach through tools such as ClinicalRegex, one of several NLP systems her lab has developed and that are now widely adopted across research groups and healthcare systems.
Addressing a Pressing Gap in Clinical Care
Trained in both medicine and computational science, Lindvall began her research career in cancer genomics. But about a decade ago, she noticed a critical gap in her clinical practice. “So much important data—symptoms, goals of care, treatment preferences—was embedded in text,” she explains. “Unlike lab values or medications, this information wasn’t easily extractable from the electronic health record.”
Looking for insights in medical records is like finding a needle in a haystack. Manual chart review is incredibly time-consuming and error-prone. Our tool makes it scalable.
Charlotta Lindvall, MD, PhD
Motivated by this challenge, she pursued additional training in NLP at the Massachusetts Institute of Technology and went on to launch a lab dedicated to building tools that make clinical text data accessible and actionable. ClinicalRegex was one of the lab’s earliest breakthroughs, developed to extract patient-centered information from the thousands of clinical notes generated each day. By replacing labor-intensive manual review with scalable automation, the tool allows clinicians and researchers to uncover meaningful insights that otherwise remain buried.
“Looking for insights in medical records is like finding a needle in a haystack,” she says. “Manual chart review is incredibly time-consuming and error-prone. Our tool makes it scalable.”
From Research to Real-World Impact
ClinicalRegex was first put to the test in a Dana-Farber clinical trial focused on improving documentation of goals‑of‑care conversations for patients with advanced cancer. By scanning thousands of notes, the tool identified cases where these discussions had not been recorded, prompting oncologists with timely reminders and monthly feedback reports. The approach proved highly effective, and it has since been implemented across all Dana-Farber sites to help ensure that patient values and care preferences are consistently captured.
As evidence of its utility grew, ClinicalRegex was increasingly adopted outside Dana-Farber. The platform now supports projects across more than eight NIH‑funded clinical trials and is used by approximately 20 healthcare systems in the U.S. and Europe. Institutions rely on it to measure care quality, identify documentation gaps, and monitor key patient‑provider conversations—tasks that are difficult to track through structured data alone.
Building the Next Generation of AI Tools
Building on the success of ClinicalRegex, Lindvall’s team is now advancing more sophisticated AI systems powered by large language models (LLMs). Unlike keyword-based approaches, LLMs can interpret language in context, allowing them to capture nuance and meaning in complex clinical text. “These models don’t need to be programmed to look for specific words,” Lindvall explains. “They learn patterns from data and can understand language in a much more nuanced way.”
To ensure these technologies are clinically safe, the lab develops secure, locally run models that do not require sending sensitive patient data to the cloud. One example is BioClinical ModernBERT, a next‑generation, domain‑adapted encoder transformer developed by Lindvall’s lab to accurately interpret clinical language—an area where general‑purpose AI tools often fall short. First released in 2025, the model was pretrained on the largest biomedical and clinical corpus to date—approximately 53.5 billion tokens—and incorporates long‑context processing with faster, more efficient performance, outperforming prior clinical encoders across multiple downstream tasks.
Since its release, BioClinical ModernBERT has been widely adopted by the research community and has been downloaded more than one million times from the open‑source platform Hugging Face, underscoring the demand for high‑performance, clinically grounded language models. These advances are now informing tools designed to directly support clinicians in caring for patients.
Supporting Clinicians, Enhancing Patient Care
One of the most promising uses of these new models is helping clinicians stay current on their patients’ experiences. “Patients with cancer often see multiple providers—oncologists, radiation therapists, social workers, infusion nurses,” Lindvall points out. “Large language models can synthesize all that documentation and provide a concise summary, helping clinicians understand the full picture.”
Improved coordination not only enhances care delivery but also significantly impacts the patient experience. “If everyone is on the same page, it leads to better conversations and less stress for patients,” she says. “They feel heard and understood.”
Recognizing the Power of AI Tools in Patient Care
Recent research from Lindvall’s team highlights the benefit these AI tools can have on the patient experience. For example, a 2024 study published in the Journal of Pain and Symptom Management demonstrated that NLP can accurately extract symptom information from narrative clinical notes, strengthening efforts to improve symptom surveillance for patients with advanced cancer.
Building on this earlier work, a study published in Neuro‑Oncology in September 2025 demonstrated that large language models can accurately and scalably track patient symptoms directly from electronic health record notes in patients with central nervous system cancers undergoing therapy—highlighting a new, automated approach to symptom surveillance that could improve real‑time clinical monitoring and responsiveness.
Further, a 2025 study in JAMA Network Open showed that NLP‑assisted chart review can reliably identify documentation of goals‑of‑care conversations among older adults with advanced cancer, offering a scalable method for tracking this critical quality measure. Most recently, a 2026 study in JCO Oncology Clinical Practice co‑authored by Lindvall found that large language models can identify goals‑of‑care conversations at scale and generate accurate, clinically meaningful summaries with minimal hallucination—pointing toward promising real‑time applications in clinical care.
However, Lindvall stresses that with any AI tool there is a need for thoughtful implementation. “We still need a human in the loop,” she says. “Understanding how AI identifies data and ensuring it aligns with clinical judgment is essential. Just because a tool is sophisticated doesn’t mean it’s the right one for every task.”
Looking Ahead
As Lindvall’s lab continues to push the boundaries of clinical AI, from flexible tools like ClinicalRegex to advanced, secure LLMs, this work is charting a future where critical patient-centered information becomes easier to find, interpret, and act on. The next phase of this research aims to make AI-driven summaries and insights a seamless part of clinical workflows, bringing clinicians closer to the full story of each patient’s experience and enabling more responsive, patient-centered cancer care.
Team Members: Charlotta Lindvall, MD, PhD
Team Member
Palliative Medicine, Dana-Farber