“Unorthodox” RL model, described in the paper for the first time, weighs potential negative consequences of actions (doses) against an outcome (tumor reduction)
Research conducted by the Massachusetts Institute of Technology (MIT) and presented at the 2018 Machine Learning for Healthcare conference hosted by Stanford University has revealed the extent to which Machine Learning (ML) can be used to sharply reduce chemotherapy doses, while retaining the same tumour-shrinking potential
In simulated trials of 50 patients, the ML model designed treatment cycles that reduced chemotherapy potency to a quarter or half of nearly all the doses; many times, it skipped doses altogether, scheduling administrations only twice a year instead of monthly for glioblastoma, the most aggressive form of brain cancer.
The model offers a major improvement over the conventional “eye-balling” method of administering doses, observing how patients respond, and adjusting accordingly, Professor Nicholas J. Schork, of the J. Craig Venter Institute said in a release.
Powered by a “Self-Learning” Technique
The paper outlines the use of novel ML “self-learning” techniques to optimise patient dosages based on current treatment procedures, adjusting doses accordingly.
The researchers’ model uses a technique called reinforced learning (RL), a method inspired by behavioral psychology, in which a model learns to favor certain behavior that leads to a desired outcome, but used an “unorthodox” RL model, described in the paper for the first time, that weighs potential negative consequences of actions (doses) against an outcome (tumor reduction).
Chemotherapy Doses: A “Perfect Balance”
As MIT puts it: “Traditional RL models work toward a single outcome, such as winning a game, and take any and all actions that maximize that outcome. On the other hand, the researchers’ model, at each action, has flexibility to find a dose that doesn’t necessarily solely maximize tumor reduction, but that strikes a perfect balance between maximum tumor reduction and low toxicity.”
The private research institution ensured that the model was taught how to make such vital calculations based on data from fifty simulated patients.
For each patient, the model conducted about 20,000 trial-and-error test runs. Once training was complete, the model learned parameters for optimal regimens. “When given new patients, the model used those parameters to formulate new regimens based on various constraints the researchers provided”, the authors note.
These types of results bear witness to the power of human/machine cooperation, enabling medical professionals to administer healthcare based on data provided by human-taught machines. As Professor Schork puts it: “[Humans don’t] have the in-depth perception that a machine looking at tons of data has, so the human process is slow, tedious, and inexact”.