Using Text Analytics Technology to Validate Clinical Data


Natural Language Processing

BeaconCure developed Natural Language Processing (NLP) tools for the Pharma industry. NLP is a field in Artificial Intelligence and Computer Science that is concerned with programming computers to process, understand and operate natural language. Using the processing capabilities, NLP applications can solve a vast range of challenges from text validation to automatic translation and sentiment analysis.

Many of the tasks involved with the development of new drugs can benefit from NLP solutions. During all stages of the clinical trial process information is being collected, summarized and reported either internally or to regulatory authorities and this sensitive data must at all time be up to date and truthful to the clinical information. The process requires constant intricate validation testing and analysis of information to act effectively in real-time. For example, before submission to the FDA, the clinical data summarized in Clinical Summary Reports must be crosschecked thoroughly with the raw clinical data.

What can we do?

-          Convert clinical data, text reports, and other documents into a smart-structured database capable to support NLP and Data Science analyses.

-          Consolidate and concert information from multiple formats into unified representation.

-          Extract targeted information from a mass of documents (text mining).

-          Match between documents in the database and apply various operations to calculate or validate values. For example, find a source for an information entity and validate its value by crosschecking the source data.

-          Find patterns and trends in data to explain the outcome of experiments or decisions. For example, use documented information about a clinical trial site to see which factors are correlated with successful recruiting.

-          Validate applications to regulators before they are being submitted.


ADAM RUSSELL, AN anthropologist and program manager at the Department of Defense’s mad-science division Darpa, laughs at the suggestion that he is trying to build a real, live, bullshit detector. But he doesn’t really seem to think it’s funny. The quite serious call for proposals Russell just sent out on Darpa stationery asks people—anyone! Even you!—for ways to determine what findings from the social and behavioral sciences are actually, you know, true. Or in his construction: “credible.”

Continue reading at the Source:

Article: Mark Zuckerberg, Priscilla Chan Donate $10M to Advance Health Using Big Data

Gift Supports Atul Butte, Who Leads the

University’s Institute for Computational

Health Sciences

Atul Butte, MD, PhD, who is helping to guide UC San Francisco into a new era of computational science and medicine, has been named the Priscilla Chan and Mark Zuckerberg Distinguished Professor at UCSF.

Butte, who leads UCSF’s Institute for Computational Health Sciences (IHCS), believes that the “trillion points of data” already in the public sphere hold keys to everything from repurposing FDA-approved drugs for new diseases to finding better ways to deliver health care and keep patients safe.


Researching the researchers

Researching the researchers

BeaconCure CEO: When the algorithm is finished, customers can decide whether we are offering them real value.

The big data revolution, which has changed various industries, is also reaching the world of higher education. Startup BeaconCure was founded 18 months ago on the basis of combining information technologies with bioinformatics. High tech veterans CEO Yoran Bar and CPO Ilan Carmeli, who previously worked in advertising and educational technologies, founded BeaconCure. One investor in the company is Noam Stern-Perry, a shareholder in algorithmic trading company Final. The company has three goals: to help researchers easily find articles and research grant proposals relevant to their fields, to distinguish between articles that are good science and those that are bad science, and in the future to develop a "scientist" algorithm that can devise a research hypothesis and examine it by itself.

"We started out with Noam's blessing," Bar says. "We decided to check whether there was a way with a business model that could help research into rare diseases. We started to investigate how companies and investment funds in the rare diseases segment operate, but we gradually realized that this was not our field. This segment requires very large and rich funds and a lot of expertise."

How can the knowledge accumulated in advertising optimization be converted for the medical sector? Bar and Carmeli persisted, and decided to develop a search engine that would mine information from medical research. "Today, access to life sciences research information is considered either impossible or, if possible, very expensive. Even if there is enough money to buy access to all the articles, and all the existing search engines are used, such as Thomson Reuters or Google Scholar, you are likely to get either 5,000 results that cannot be prioritized, or nothing. At the same time, there are many researchers working simultaneously in the same area who do not meet. No one knows where to get funding or what the state of his competitors is. Even the commercialization companies at universities do not always know what they have in their own backyard," Bar says.

He adds, "Researchers spend half their time looking for money. They look at newsletters every day in which most of the grants are not relevant to them"

Meanwhile, a new problem is arising - the reproducibility problem, also referred to as the crisis of reproducibility. This is a polite way of saying that many trials produce good results only in the laboratory of the original researcher who conducted them, not in the real world. "A researcher at a university or a small biotech company wants to know whether other trials support their research hypotheses. They want to know which trials were good and reproducible," Bar explains.

Detecting plagiarism and theft

The choice between reproducible trials and bad trials is now a matter of widespread discussion that has grabbed the attention, for example, of young billionaire John Arnold, who made his money at Enron, but whose name was not linked to the scandal that brought the company down. Arnold and his wife, Laura, founded a fund to support technologies designed to make science more accurate and less biased. For example, they support a system in which researchers can register all of their hypotheses and trials, so that it will be more difficult to twist, cook, or conceal the results later. Mark Zuckerberg also expressed interest in the matter through a fund that he founded together with his wife, Priscilla Chan, for combating rare diseases.

BeaconCure's system combines these two capabilities: it locates information for academic and industrial researchers about research, grants, and relevant competitors. The system also includes an algorithm designed to detect suspicious studies, in contrast to legitimate ones. "The features that are easy to check are who the researcher is, what he published previously, whether he worked alone or with a team, what materials he used and whether it is logical to use these materials, and how much time it took him. It is sometimes possible to detect plagiarism, theft, or something illogical. On the other hand, we use advertising models to discover that research X by researcher A in effect yielded results pointing in the same direction as research Y by researcher B, and it is therefore possible that both of them are going in the right direction, even if what they are saying is completely new to science, or contradicts previous assumptions."

As of now, the accepted way in the scientific world of judging the quality of an article is the number of additional mentions it receives in a bibliography of articles published afterwards. The concept is that just as every scientific field is based on peer review, the importance and general quality of an article can be determined by the accumulated knowledge of other researchers in the same field. If they sense that a certain researcher is unreliable, or that his conclusions are very unreasonable, he will not be cited. In Bar and Carmeli's opinion, however, this is inadequate. "Wisdom of the crowd is an elusive double-edged sword," says Bar. "Articles are sometimes mentioned in order to dispute them, and there can be an unjustified herd effect with respect to a given article." BeaconCure's system takes into effect the wisdom of the crowd concerning an article in assessing its quality and relevance for a searching researcher, but does not use it as the only factor.

In the company's vision, the system will try at some stage to independently generate scientific insights. "If we see that when someone is looking for molecule B in the context of cancer, and constantly retrieves articles pertaining to molecule C, we might realize that molecule C is also important in this area," Bar explains. "It could also be that the system will find a physical similarity between certain molecules on which two researchers are simultaneously working, and that putting them in touch will lead them to new insights about their operational mechanism. It's a whole world."

It is slightly similar to what Compugen Ltd. (Nasdaq: CGEN; TASE: CGEN) did, but Compugen was unable to develop on the basis of a model of providing service to researchers, and decided to apply its knowledge to developing products independently.

"Two experts who used to work for Compugen, and who are working for us, warned us about this process, and we have not yet decided whether we will offer our customers only service, or whether we will also demand royalties from the final product in cases in which we made a major contribution. In the case of royalties, the sky is the limit in potential revenue," Bar declares

"We will get back to the customer with a solution"

Bar: "It is important to realize that our system is not a fully automatic system installed with the customer. We sit with the customer, understand his needs, and create the search that is fed into our engine, which is built on a unique algorithm, but also on clinical data, some of which can be found only with us. We will get back to the customer with a solution, not with many research results. Say that the technologies commercialization company at one of the universities has asked us to do a search of its university for it of all the studies in a given segment before they publish, and all the companies around the world working in the same field that might be interested in this research. We didn't give them search results; we gave them answers."

"Globes": You are relatively diversified in the type of solutions and customers.

"We're in the development process, and have not yet decided in what exactly we will specialize. Each of the spheres we have described - searching for competitors, scientific insights, and even searching for grants - is an entire world in itself. We know that we will deal with all of these areas, because this breadth is essential in order to make our algorithm really good. In which of these channels will we run the fastest and the deepest? We do not know that yet." As of now, the company has run 20 projects, and according to Bar, the customers are satisfied. "As soon as the algorithm is finished, the customers will finally be able to judge whether or not we are offering them real value. Until then, we will be very cautious."

Published by Globes [online], Israel Business News - - on June 27, 2017

© Copyright of Globes Publisher Itonut (1983) Ltd. 2017