Loading…
View analytic
Tuesday, January 24 • 4:30pm - 6:00pm
Poster: Using Vector Representations of Relationships from Biomedical Literature to Identify Drug/Side-Effect Relationships

Sign up or log in to save this to your schedule and see who's attending!

We describe the use of a distributed vector framework with supervised machine learning models in the context of pharmacovigilance. Representing the predicate-based pathways that might explain the relationship of a drug and a side-effect as an abstract vector, and using labels for this vector pathway as either having a positive or negative relationship between a drug and a side-effect, we find encouraging results using a manually curated reference set, with some limitations. While AUC and F1 cross-validation performance are excellent for logistic regression (LogReg) and support vector machine (SVM) algorithms, learning curves and support vectors indicate some degree of model over-fitting, likely a result of the relatively small size of the training set and high-dimensionality of the feature space. Even so, reasonable performance was obtained with simple models, such as k-nearest neighbors, supporting the main finding that these vector representations provide a meaningful basis for classification in this context. This study justifies further research utilizing larger reference sets for training and/or testing to mitigate or improve on the limitations of our model. To this end, we propose further validation on SIDER, a less robust but larger reference set. Additionally, the methods employed are generalizable, and extending the work to additional problem domains may yield additional insights and success.

Tuesday January 24, 2017 4:30pm - 6:00pm
BioScience Research Collaborative Event Hall 6500 Main Street, Houston, TX 77030-1402

Attendees (1)