Loading…
View analytic

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Tuesday, January 24
 

7:30am

Registration and Breakfast
Your ticket to the ML@RICE Workshop includes breakfast.

Tuesday January 24, 2017 7:30am - 8:45am
BioScience Research Collaborative Building 6500 Main Street, Houston, TX 77030-1402

8:45am

Welcome

Speakers
avatar for Richard Baraniuk

Richard Baraniuk

Victor E Cameron Professor of Electrical and Computer Engineering, Rice University
Richard G. Baraniuk is the Victor E. Cameron Professor of Electrical and Computer Engineering at Rice University and the founder and director of OpenStax. In 1999, Dr. Baraniuk launched Connexions (now OpenStax CNX), one of the world’s first and today one of the world’s largest... Read More →
avatar for Jan E. Odegard

Jan E. Odegard

Executive Director, Ken Kennedy Institute for Information Technology and Associate Vice President, Research Computing &, Rice University
Dr. Odegard joined Rice University in 2002, and has over 15 years of experience supporting and enabling research in computing, big-data and information technology.As Executive Director of the Ken Kennedy Institute, Dr. Odegard co-leads the institute’s mission to engage the research... Read More →


Tuesday January 24, 2017 8:45am - 9:00am
BioScience Research Collaborative Building 6500 Main Street, Houston, TX 77030-1402

9:00am

Managing Business Risk under Uncertainty: Development Planning in the Oil and Gas Industry
PRESENTATION NOT AVAILABLE

Speakers
avatar for Thomas Halsey

Thomas Halsey

Chief Computational Scientist, ExxonMobil Upstream Research Company
Thomas C. Halsey is Chief Computational Scientist at ExxonMobil. Since joining ExxonMobil in 1994, he has worked in a variety of research, management, and staff positions in New Jersey and Texas. Previously, he was on the faculty of the University of Chicago. He received a Ph.D... Read More →


Tuesday January 24, 2017 9:00am - 9:25am
BioScience Research Collaborative Building 6500 Main Street, Houston, TX 77030-1402

9:25am

Machine Learning Opportunities and Challenges in Upstream Oil and Gas Industry

PRESENTATION NOT AVAILABLE

The upstream oil and gas industry is ready for disruption by leveraging emerging technology. Big Data and Machine Learning  are two forces for the transformation of the industry. In this talk I will discuss the opportuntiies and challenges associated in transforming upstream oil and gas industry and discuss HALLIBURTONs leadership framework for taming the upstream industry Big Data.

 


Speakers
avatar for Satyam Priyadarshy

Satyam Priyadarshy

Technical Fellow and Chief Data Scientist, Halliburton
Dr. Satyam Priyadarshy a globally recognized leader in the fields of data science, big data, analytics, and emerging technologies. He is the Technical Fellow and Chief Data Scientist at Halliburton, He has published over 35 papers and articles including an expert opinion in magazines... Read More →


Tuesday January 24, 2017 9:25am - 9:50am
BioScience Research Collaborative Building 6500 Main Street, Houston, TX 77030-1402

9:50am

Learning Integrated Networks from Big Biomedical Data

WATCH THE PRESENTATION

Projects such as the Cancer Genome Atlas have used multiple types of genomic technologies to profile hundreds of tumor samples.  Integrating these vast and diverse types of genetic data is critical for understanding the molecular basis of cancer and for developing personalized therapies.  In this talk, we present a network-based approach that integrates diverse multi-modal data by learning a network linking different types of genomic biomarkers.  


Speakers
avatar for Genevera Allen

Genevera Allen

Assistant Professor of Statistics and Electrical and Computer Engineering, Rice University
Genevera Allen is the Dobelman Family Junior Chair and an Assistant Professor of Statistics and Electrical and Computer Engineering at Rice University. She is also a member of the Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital and Baylor College... Read More →


Tuesday January 24, 2017 9:50am - 10:15am
BioScience Research Collaborative Building 6500 Main Street, Houston, TX 77030-1402

10:15am

Break and Networking
Tuesday January 24, 2017 10:15am - 10:35am
BioScience Research Collaborative Building 6500 Main Street, Houston, TX 77030-1402

10:40am

A Probabilistic Framework for Deep Learning: With Applications to Semi-supervised Learning for Visual Recognition

Speakers
avatar for Ankit Patel

Ankit Patel

Assistant Professor, Baylor College of Medicine, Department of Neuroscience; Professor of Electrical and Computer Engineering, Rice University
Ankit B. Patel is currently an Assistant Professor at the Baylor College of Medicine in the Dept. of Neuroscience, and at Rice University in the Dept. of Electrical and Computer Engineering. Ankit is broadly interested in the intersection between machine learning and computational... Read More →


Tuesday January 24, 2017 10:40am - 11:05am
BioScience Research Collaborative Building 6500 Main Street, Houston, TX 77030-1402

11:05am

Deep Learning for Legal Technology

WATCH THE PRESENTATION

Machine learning has been widely applied to legal problems for over two decades, most recently through an iterative learning process called predictive coding. At DISCO we have developed the first commercially available predictive coding based on deep learning for text. In this talk, I will describe DISCO's approach to predictive coding, including the neural architecture and the methods that make it work fast and effectively in a production environment.


Speakers
avatar for Alan J. Lockett

Alan J. Lockett

Data Scientist, DISCO, Inc., DISCO, Inc.
Dr. Alan J. Lockett leads machine learning efforts at DISCO, Inc., a growing startup in the legal technology space. His current research focuses on text classification and clustering using deep neural networks. Prior to DISCO, he studied in Switzerland as an NSF postdoctoral fellow... Read More →


Tuesday January 24, 2017 11:05am - 11:30am
BioScience Research Collaborative Building 6500 Main Street, Houston, TX 77030-1402

11:30am

Hashing Algorithms for Large-Scale Machine Learning

Speakers
avatar for Anshumali Shrivastava

Anshumali Shrivastava

Assistant Professor of Computer Science, Electrical and Computer Engineering, and Statistics, Rice University
Anshumali Shrivastava is an Assistant Professor in the Department of Computer Science at Rice University with joint appointments in Statistics and ECE department. His broad research interests include large scale machine learning, randomized algorithms for big data systems and... Read More →


Tuesday January 24, 2017 11:30am - 11:55am
BioScience Research Collaborative Building 6500 Main Street, Houston, TX 77030-1402

11:55am

Lunch
Your ticket to the ML@RICE Workshop includes a boxed lunch!

Tuesday January 24, 2017 11:55am - 1:15pm
BioScience Research Collaborative Building 6500 Main Street, Houston, TX 77030-1402

1:15pm

Computer Vision Algorithms Inspired by Compressed Sensing

WATCH THE PRESENTATION

Estimating the three-dimensional structure of the world is a vital task in self-driving cars and robotics.  While it can be done using technologies such as LIDAR, these may be prohibitively expensive for many applications.  Computer vision offers the possibility of structure estimation from very cheap optical cameras.  In this talk, we will discuss a mathematical location recovery problem that arises in computer vision.  We will see that ideas from the field of compressed sensing have motivated recent new approaches to this problem.  


Speakers
avatar for Paul Hand

Paul Hand

Assistant Professor of Computational and Applied Mathematics, Rice University
Paul Hand is an Assistant Professor of Computational and Applied Mathematics at Rice University.  His research includes the design and analysis of algorithms for signal recovery problems arising from imaging and vision. He received his B.S in Applied and Computational Mathematics... Read More →


Tuesday January 24, 2017 1:15pm - 1:40pm
BioScience Research Collaborative Building 6500 Main Street, Houston, TX 77030-1402

1:40pm

Machine Learning for Critical Care Environments

Speakers
avatar for Craig Rusin

Craig Rusin

Chief Technology Officer, Baylor College of Medicine
Dr. Craig Rusin is the Chief Technology Officer at Medical Informatics Corporation.  Craig is an engineer and professor whose groundbreaking work in medical research led to the creation of the grid-computing platform that MIC uses to make sense of patient data in order to improve... Read More →


Tuesday January 24, 2017 1:40pm - 2:05pm
BioScience Research Collaborative Building 6500 Main Street, Houston, TX 77030-1402

2:05pm

Break and Networking
Tuesday January 24, 2017 2:05pm - 2:30pm
BioScience Research Collaborative Building 6500 Main Street, Houston, TX 77030-1402

2:30pm

A Wish List for Using Machine Learning to Reduce Patient Misdiagnosis
WATCH THE PRESENTATION

Our research shows that at least 1 in 20 US adults are misdiagnosed annually in the outpatient setting and about half of this misdiagnosis is potentially harmful to patients. This is a hard problem to solve because of the complexity of patient care, variation in definitions of “normal” and unique features of data streams used in clinical care.  Several disciplines will need to work together to make progress. This presentation will provide a short glimpse of both technical and non-technical challenges and opportunities for machine learning innovations to impact this area and enable better patient care and outcomes.

Speakers
avatar for Hardeep Singh

Hardeep Singh

Chief, Michael E. DeBakey VA Medical Center; Director, VA Center of Inquiry to Improve Outpatient Safety through Effective Electronic Communication; Associate Professor, Department of Medicine, Baylor College of Medicine
Hardeep Singh, M.D., M.P.H. is a general internist and Chief of Health Policy, Quality & Informatics Program, Center for Innovations in Quality, Effectiveness and Safety based at the Michael E. DeBakey Veterans Affairs (VA) Medical Center and Baylor College of Medicine, Houston. He... Read More →


Tuesday January 24, 2017 2:30pm - 2:55pm
BioScience Research Collaborative Building 6500 Main Street, Houston, TX 77030-1402

2:55pm

Machine Learning and Signal Processing

Speakers
avatar for Richard Baraniuk

Richard Baraniuk

Victor E Cameron Professor of Electrical and Computer Engineering, Rice University
Richard G. Baraniuk is the Victor E. Cameron Professor of Electrical and Computer Engineering at Rice University and the founder and director of OpenStax. In 1999, Dr. Baraniuk launched Connexions (now OpenStax CNX), one of the world’s first and today one of the world’s largest... Read More →


Tuesday January 24, 2017 2:55pm - 3:20pm
BioScience Research Collaborative Building 6500 Main Street, Houston, TX 77030-1402

3:15pm

Panel Introduction
Speakers
avatar for Moshe Y. Vardi

Moshe Y. Vardi

George Distinguished Service Professor in Computational Engineering, Rice University
Moshe Y. Vardi is the George Distinguished Service Professor in Computational Engineering and Director of the Ken Kennedy Institute for Information Technology at Rice University. He is the recipient of three IBM Outstanding Innovation Awards, the ACM SIGACT Goedel Prize, the ACM... Read More →


Tuesday January 24, 2017 3:15pm - 3:20pm
TBA

3:20pm

Closing Keynote Speaker and Panel Moderator
Speakers
avatar for Alfred Spector

Alfred Spector

Chief Technology Officer and Head of Engineering, Two Sigma
Alfred Spector is Chief Technology Officer and Head of Engineering at Two Sigma, a firm dedicated to using information to optimize diverse economic challenges. Prior to joining Two Sigma, Dr. Spector spent nearly eight years as Vice President of Research and Special Initiatives, at... Read More →


Tuesday January 24, 2017 3:20pm - 4:30pm
BioScience Research Collaborative Building 6500 Main Street, Houston, TX 77030-1402

4:30pm

4:30pm

Deep Learning Approaches to Structured Signal Recovery
Poster Presentations

Tuesday January 24, 2017 4:30pm - 6:00pm
TBA

4:30pm

Networking Reception and Posters
Tuesday January 24, 2017 4:30pm - 6:00pm
BioScience Research Collaborative Event Hall 6500 Main Street, Houston, TX 77030-1402

4:30pm

Poster: Dirichlet Process Multiple Graphical Models
Recently, attention in the statistical literature has turned towards joint estimation of graphical models. These methods utilize the joint structure to inform estimation across groups and improve group level inference. However, the inferential ability on the shared structures is generally lacking. We propose an extension of a recently developed approach which permits formal inference on the shared structures between graphs, and illustrate it's properties through theoretical results and a simulation study.

Tuesday January 24, 2017 4:30pm - 6:00pm
BioScience Research Collaborative Event Hall 6500 Main Street, Houston, TX 77030-1402

4:30pm

Poster: Identify Major Topics Among Negative Information towards Human Papillomavirus Vaccination on Twitter Using Support Vector Machines and Biterm Topic Model
HPV vaccines refusal is a serious public health issue. Negative information on Twitter have been found influential to potential consumers on vaccination behaviours. In this work, we presented hybrid machine learning approaches identify major topics among the negative Twitter information on HPV vaccines.
Support Vector Machines (SVM) model was firstly applied identify tweets containing negative information and then Biterm Topic Model (BTM) was leveraged to explore major topics among the negative tweets. 319,612 English tweets that contains HPV vaccines related keywords were collected during study period (11/03/2015 - 11/02/2016). SVM models have identified 133,506 tweets that contains negative information. BTM on the negative tweets generated 15 sets of tweets tokens.
Following manual review of those tokens sets and their associated tweets identified 9 major topics including “pediatricians warnings”, “general adverse reactions”, “ovaries harm”, “death cases”, “studies evidence”, “necessity issues”, “scandal and fraud”, “lawsuits” and “protest and complains”.
A word cloud was used to visualize those topic in the end.

Tuesday January 24, 2017 4:30pm - 6:00pm
BioScience Research Collaborative Event Hall 6500 Main Street, Houston, TX 77030-1402

4:30pm

Poster: SOLVD: Smartphone- and OnLine-usage-based eValuation for Depression
Depression is one of the most common mental disorders that carries significant emotional and financial burden for modern society. To ensure successful prevention and treatment, the early diagnosis and continuous monitoring of one’s depression state are critical. In recent years, the smartphone has shown its potential as a wearable device to track and manage the mental health condition, yet very limited studies have considered clinically depressed patients, or included the ground truth of depression for comparison.

In this project, we developed the Smartphone- and OnLine-usage-based eValuation for Depression (SOLVD), which is a new tool for continuous tracking of patients’ depression state. We built the SOLVD App and cloud platform, for data collection, analysis and sharing with physicians. We also conducted a 1-year clinical trial of 25 depression patients. Three types of data were collected via the App and bi-weekly clinical visits: 1. Smartphone sensor and usage data, including accelerometer, GPS, steps, screen status, call log, text messages, and apps; 2. Self-reported mood and activity level; 3. Psychometric data including PHQ-9, HamiltonD and HamitonA.

The results showed that the adherence rate to the daily self-reported mood input was about
82%, and the attendance rate for clinic visits was 95%. The correlation between self-reported mood level and PHQ-9 score was 0.73 in the moderate/severe group and 0.36 in the normal/mild group. The passive phone sensor and usage data, including their number of steps, quantity of text messages and the amount of time spent messaging, also correlated with clinical assessments. For example, when depression worsened, the number of calls and text messages dropped. We also trained a SVM classifier which achieved around 80% accuracy categorizing the patients into mild/moderate/severe groups. The preliminary findings indicate that the SOLVD app could be a reliable approach to tracking moderate-to-severe depression.

Tuesday January 24, 2017 4:30pm - 6:00pm
BioScience Research Collaborative Event Hall 6500 Main Street, Houston, TX 77030-1402

4:30pm

Poster: Essential nonlinear properties in neural decoding
To decode task-relevant information from sensory observations, the brain must eliminate nuisance variables that affect those observations. For natural tasks, this generally requires nonlinear computation. Here we contribute new concepts and methods to characterize behaviorally relevant nonlinear computation downstream of recorded neurons. Linear decoding weights can be inferred from correlations between neurons and behavior. However, these weights do not adequately describe the neural code when, due to nuisance variation, mean neural responses are poorly tuned to the task while higher-order statistics of neural responses are well tuned. The task-relevant stimulus information can then be extracted only by nonlinear operations. For example, detecting an object boundary in an image requires contrast invariance: an edge appears when the foreground object is darker lighter than the background, yet any linear function will exhibit opposite responses in these two cases. We generalize past weight-inference methods to determine the brain's nonlinear neural computations from joint higher-order statistics of neural activity and behavioral choices in perceptual tasks. This method is based on a new statistical measure we call nonlinear choice correlation, defined as the correlation coefficient between behavioral choices and nonlinear functions of measured neural responses. Importantly, the exact neural transformations may not be uniquely identifiable, since many neural nonlinearities can generate the same behavioral output. This is expected when sensory signals are expanded into a larger cortical response space, creating a redundant code. We exploit this redundancy to define a new concept of equivalence classes for neural transformations. We then demonstrate how to quantify essential properties of these equivalence classes, and provide simulations that show how these properties can be extracted using neural data from behaving animals. Finally, we explain the functional importance of these nonlinearities in specific perceptual tasks.

Tuesday January 24, 2017 4:30pm - 6:00pm
BioScience Research Collaborative Event Hall 6500 Main Street, Houston, TX 77030-1402

4:30pm

Poster: PPGSecure: Biometrics Presentation Attack Detection Using Photopletysmograms
Authentication of users by exploiting face images or videos
as a biometric is quite commonplace and becoming more
widespread due to the advances made in face recognition
technologies. While face recognition has made rapid advances
in its performance, such face-based authentication systems
remain vulnerable to biometric presentation attacks, such as
the presentation of a video or photograph on a display device,
the presentation of a printed photograph or the presentation
of a face mask resembling the user to be authenticated. In
this paper, we present PPGSecure, a novel methodology that
relies on camera-based physiology measurements to detect and
thwart such biometric presentation attacks. PPGSecure uses
photoplethysmogram (PPG), which is an estimate of vital signs
from the small color changes in the video observed due to
minor pulsatile variations in the volume of blood flowing to the
face. We demonstrate that the temporal frequency spectra of
the estimated PPG signal for real live individuals are distinctly
different than those of presentation attacks and exploit these
differences to detect presentation attacks. We demonstrate
that PPGSecure achieves significantly better performance than
existing state of art methods on photograph, image and video
based presentation attacks.

Tuesday January 24, 2017 4:30pm - 6:00pm
BioScience Research Collaborative Event Hall 6500 Main Street, Houston, TX 77030-1402

4:30pm

Poster: A Probabilistic Framework for Deep Learning
We develop a probabilistic framework for deep learning based on the Deep Rendering Mixture Model (DRMM), a new generative probabilistic model that explicitly capture variations in data due to latent task nuisance variables. We demonstrate that max-sum inference in the DRMM yields an algorithm that exactly reproduces the operations in deep convolutional neural networks (DCNs), providing a first principles derivation. Our framework provides new insights into the successes and shortcomings of DCNs as well as a principled route to their improvement. DRMM training via the Expectation-Maximization (EM) algorithm is a powerful alternative to DCN back-propagation, and initial training results are promising. Classification based on the DRMM and other variants outperforms DCNs in supervised digit classification, training 2-3x faster while achieving similar accuracy. Moreover, the DRMM is applicable to semi-supervised and unsupervised learning tasks, achieving results that are state-of-the-art in several categories on the MNIST and SVHN benchmarks and comparable to state-of-the-art on the CIFAR10 benchmark.

Tuesday January 24, 2017 4:30pm - 6:00pm
BioScience Research Collaborative Event Hall 6500 Main Street, Houston, TX 77030-1402

4:30pm

Poster: Dealbreaker: A Nonlinear Latent Variable Model for Educational Data
Statistical models of student responses on assessment
questions, such as those in homeworks and
exams, enable educators and computer-based personalized
learning systems to gain insights into
students’ knowledge using machine learning. Popular
student-response models, including the Rasch
model and item response theory models, represent
the probability of a student answering a question
correctly using an affine function of latent factors.
While such models can accurately predict student
responses, their ability to interpret the underlying
knowledge structure (which is certainly nonlinear)
is limited. In response, we develop a new,
nonlinear latent variable model that we call the
dealbreaker model, in which a student’s success
probability is determined by their weakest concept
mastery.We develop efficient parameter inference
algorithms for this model using novel methods for
nonconvex optimization. We show that the dealbreaker
model achieves comparable or better prediction
performance as compared to affine models
with real-world educational datasets. We further
demonstrate that the parameters learned by the
dealbreaker model are interpretable—they provide
key insights into which concepts are critical
(i.e., the “dealbreaker”) to answering a question
correctly. We conclude by reporting preliminary
results for a movie-rating dataset, which illustrate
the broader applicability of the dealbreaker
model.

Tuesday January 24, 2017 4:30pm - 6:00pm
BioScience Research Collaborative Event Hall 6500 Main Street, Houston, TX 77030-1402

4:30pm

Poster: Deep Learning Approaches to Structured Signal Recovery
The promise of compressive sensing (CS) has been offset by two significant challenges. First, real-world data is not exactly sparse in a fixed basis. Second, current high-performance recovery algorithms are slow to converge, which limits CS to either non-real-time applications or scenarios where massive back-end computing is available. We attack both of these challenges head-on by developing new signal recovery frameworks using deep learning techniques.

Speakers

Tuesday January 24, 2017 4:30pm - 6:00pm
BioScience Research Collaborative Event Hall 6500 Main Street, Houston, TX 77030-1402

4:30pm

Poster: Genomic Region Detection via Spatial Convex Clustering
Several modern genomic technologies, such as DNA-Methylation arrays, measure spatially registered probes that number in the hundreds of thousands across multiple chromosomes. The measured probes are by themselves less interesting scientifically; instead scientists seek to discover biologically interpretable genomic regions comprised of contiguous groups of probes which may act as biomarkers of disease or serve as a dimension-reducing pre-processing step for downstream analyses. In this paper, we introduce an unsupervised feature learning technique which maps technological units (probes) to biological units (genomic regions) that are common across all subjects. We use ideas from fusion penalties and convex clustering to introduce a method for Spatial Convex Clustering, or SpaCC. Our method is specifically tailored to detecting multi-subject regions of methylation, but we also test our approach on the well-studied problem of detecting segments of copy number variation. We formulate our method as a convex optimization problem, develop a massively parallelizable algorithm to find its solution, and introduce automated approaches for handling missing values and determining tuning parameters. Through simulation studies based on real methylation and copy number variation data, we show that SpaCC exhibits significant performance gains relative to existing methods. Finally, we illustrate SpaCC's advantages as a pre-processing technique that reduces large-scale genomics data into a smaller number of genomic regions through several cancer epigenetics case studies on subtype discovery, network estimation, and epigenetic-wide association.

Tuesday January 24, 2017 4:30pm - 6:00pm
BioScience Research Collaborative Event Hall 6500 Main Street, Houston, TX 77030-1402

4:30pm

Poster: Measuring User Cognitive Engagement in the Wild Via Camera
Accurate measuring of cognitive engagement helps users better manage their cognitive resources when performing tasks. In the past, physiological features such as pupillary response and body movements have been widely used by researchers to characterize user cognitive engagement levels. However, existing feature-based solutions are either overly dependent on intrusive devices or only robust to highly controlled lab settings. In this work, we present Engagementometer, a low-cost cognitive engagement prediction framework that is built upon user-contributed video data in the wild. Engagementometer leverages gaming videos recorded by off-the-shelf webcams as data input, and is capable of extracting user physiological features such as blink rate and head motion from those videos full of motion artifacts. Engagementometer then maps extracted features to user engagement levels and produces regression models for engagement prediction. To validate our approach, we first conduct EEG-based benchmarking experiments to demonstrate that the engagement prediction model developed by blink rate and head motion can be generalized across multiple users. After that, we carry out extensive trials using user contributed data in the wild to verify the overall performance of our prediction model.

Tuesday January 24, 2017 4:30pm - 6:00pm
BioScience Research Collaborative Event Hall 6500 Main Street, Houston, TX 77030-1402

4:30pm

Poster: Methods and Applications for Mixed Graphical Models
"Mixed Data'' comprising a large number of heterogeneous variables (e.g. count, binary, continuous, skewed continuous, among others) is prevalent in varied areas such as imaging genetics, national security, social networking, Internet advertising, and our particular motivation - high-throughput integrative genomics. There have been limited efforts at statistically modeling such mixed data jointly. Recently, new Mixed Markov Random Field (MRFs) distributions, or graphical models, were proposed that assume each node-conditional distribution arises from a different exponential family model. These yield joint densities, which can directly parameterize dependencies over mixed variables. Fitting these models to perform mixed graph selection entails estimating penalized generalized linear models with mixed covariates. This task, however, poses many challenges due to differences in the scale and potential signal interference between mixed covariates. In this poster, we introduce this novel class of MRFs, study model estimation challenges theoretically and empirically, and propose a new iterative block estimation strategy. Our methods are applied to infer a gene regulatory network that integrates methylation, small RNA expression, and gene expression data to fully understand regulatory relationships in ovarian cancer.

Speakers
avatar for Genevera Allen

Genevera Allen

Assistant Professor of Statistics and Electrical and Computer Engineering, Rice University
Genevera Allen is the Dobelman Family Junior Chair and an Assistant Professor of Statistics and Electrical and Computer Engineering at Rice University. She is also a member of the Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital and Baylor College... Read More →


Tuesday January 24, 2017 4:30pm - 6:00pm
BioScience Research Collaborative Event Hall 6500 Main Street, Houston, TX 77030-1402

4:30pm

Poster: News Analytics for forecasting Price of Crude Oil
News plays a key role in financial markets. As news sources, frequency and volume continue to grow, it is becoming increasingly difficult for analysts and traders to analyze every news that is published daily. As markets react rapidly to news, effective models that incorporate news data are highly sought after. This is not only useful for trading and fund management, but also for risk control. Major news events can have a significant impact on the market and investor sentiment, resulting in rapid changes to market price and value of traded commodities. A solution that can significantly reduce the time spent to gather timely trading insights from daily news would be of great value for traders, asset managers, hedge fund managers, market research analysts as well as retail investors.
As news is primarily unstructured textual data, it is hard to analyze with traditional computer models. However, with recent advances in NLP and machine learning technology; we can automate news gathering, filtering, and analysis to generate quantitative sentiment scores from textual narratives. The solution should help address queries such as given below:
1. Is today’s news likely to have an upward or a downward impact on the price of crude oil?
2. Which news is likely to have the most impact on the price of crude oil?
3. What has been the impact of similar news in the past?
The most commonly traded commodity worldwide is Crude Oil and its various derivatives. Hence we decided to build an application to analyze news for crude oil. We have used historical news and market price data from publicly available sources to develop and tune our algorithms including machine learning models. The solution was developed using open source stack with python and related ecosystem tools. Interactive data visualizations were developed using flask, AngularJS and Highcharts.

Tuesday January 24, 2017 4:30pm - 6:00pm
BioScience Research Collaborative Event Hall 6500 Main Street, Houston, TX 77030-1402

4:30pm

Poster: Predicting Resilience to Alzheimer’s Disease
Alzheimer’s disease (AD) is a debilitating neurodegenerative disease that cannot be cured. Yet, one third of the population shows no cognitive decline during lifetime despite AD-related brain pathology at autopsy. This resilience phenomenon is known as cognitive reserve (CR). While little is known about the underlying genetic cause of CR, such work could provide key insights into risk factors and potential treatments for AD. In this work, we draw on four different types of genetics data to identify which of the four types of genetics data are useful in predicting both a patient’s cognition before death and the amount of CR. We also seek to determine the amount of additional predictive power the genetics data provides over standard clinical measures. Our work consists of building a data analysis pipeline and linear and non-linear predictive algorithms to these ends. The main contribution of our work is identification of two sets of genetics data, RNAseq and MicroRNA, which show improvement in predicting global cognition before death over the clinical measures by 10% and 15.8% respectively. Additionally, we show that MicroRNA can improve prediction of CR by 3.2% over clinical predictors.

Tuesday January 24, 2017 4:30pm - 6:00pm
BioScience Research Collaborative Event Hall 6500 Main Street, Houston, TX 77030-1402

4:30pm

Poster: Sparse Factor Analysis for Learning and Content Analytics
We develop a new model and algorithms for machine learning-based learning analytics, which estimate a learner’s knowledge of the concepts underlying a domain, and content analytics, which estimate the relationships among a collection of questions and those concepts. Our model represents the probability that a learner provides the correct response to a question in terms of three factors: their understanding of a set of underlying concepts, the concepts involved in each question, and each question’s intrinsic difficulty. We estimate these factors given the graded responses to a collection of questions. The underlying estimation problem is ill-posed in general, especially when only a subset of the questions are answered.
The key observation that enables a well-posed solution is the fact that typical educational domains of interest involve only a small number of key concepts. Leveraging this observation, we develop a bi-convex maximumlikelihood solution to the resulting SPARse Factor Analysis (SPARFA) problem. We also propose SPARFA-Tag and SPARFA-Top, two extensions to SPARFA that incorporate instructor-defined tags on questions and question text to facilitate the interpretability of the estimated factors.
Additonally, we propose SPARFA-Trace, a new framework for timevarying learning and content analytics. We develop a novel message passing-based, blind, approximate Kalman filtering and smoothing algorithm for SPARFA that jointly traces student concept knowledge evolution over time, analyzes student concept knowledge state transitions (induced by studying learning resources, such as textbook sections, lecture videos, etc., or the forgetting effect), and estimates the content organization and difficulty of the questions in assessments. These quantities are estimated solely from binary-valued (correct/incorrect) graded student response data and the specific actions each student performs (e.g., answering a question or studying a learning resource) at each time instant.

Speakers
avatar for Richard Baraniuk

Richard Baraniuk

Victor E Cameron Professor of Electrical and Computer Engineering, Rice University
Richard G. Baraniuk is the Victor E. Cameron Professor of Electrical and Computer Engineering at Rice University and the founder and director of OpenStax. In 1999, Dr. Baraniuk launched Connexions (now OpenStax CNX), one of the world’s first and today one of the world’s largest... Read More →


Tuesday January 24, 2017 4:30pm - 6:00pm
BioScience Research Collaborative Event Hall 6500 Main Street, Houston, TX 77030-1402

4:30pm

Poster: Sub-linear Privacy-preserving Search on Sensitive Dataset
Privacy-preserving Near-neighbor search (PP-NNS) is a well- studied problem in the literature. The overwhelming growth in the size of current datasets and the lack of any truly secure server in the online world render the existing solutions impractical either due to their high computational requirements or the non-realistic assumptions which potentially compromise privacy. PP-NNS with multiple (semi-honest) data owners having query time sub-linear in the number of users has been proposed as an open research direction. We provide the first such algorithm which has a sub-linear query time and the ability to handle semi-honest (honest but curious) parties. Our algorithm can further manage the situation where a large chunk of the server information is being compromised.

Tuesday January 24, 2017 4:30pm - 6:00pm
BioScience Research Collaborative Event Hall 6500 Main Street, Houston, TX 77030-1402

4:30pm

Poster: Universal Microbial Diagnostics using Random DNA Probes
Early identification of pathogens is essential for limiting development of therapy-resistant pathogens and mitigating infectious disease outbreaks. Most bacterial detection schemes use target-specific probes to differentiate pathogen species, creating time and cost inefficiencies in identifying newly discovered organisms. We present a novel universal microbial diagnostics (UMD) platform to screen for microbial organisms in an infectious sample, using a small number of random DNA probes that are agnostic to the target DNA sequences. Our platform leverages the theory of sparse signal recovery (compressive sensing) to identify the composition of a microbial sample that potentially contains novel or mutant species. We validated the UMD platform in vitro using five random probes to recover 11 pathogenic bacteria. We further demonstrated in silico that UMD can be generalized to screen for common human pathogens in different taxonomy levels. UMD’s unorthodox sensing approach opens the door to more efficient and universal molecular diagnostics.

Tuesday January 24, 2017 4:30pm - 6:00pm
BioScience Research Collaborative Event Hall 6500 Main Street, Houston, TX 77030-1402

4:30pm

Poster: Urban Data Platform
A key activity of the Kinder Institute for Urban Research over the next three years is the development of an Urban Data Platform (UDP) for research related to the Greater Houston Area. The UDP will host research ready data and the statistical tools to analyze the data. Wherever possible, data will be geocoded. With location and time serving as primary indicators, the UDP will facilitate integration across different data sets allowing for stronger and stronger quality research investigations about Houston and the surrounding area. The UDP will offer a series of short courses on key statistical methodologies for urban analytics. Initial courses will include a course covering spatial statistics and spatial disease mapping and a second course on statistical machine learning strategies for analyzing data on the urban environment.

Tuesday January 24, 2017 4:30pm - 6:00pm
BioScience Research Collaborative Event Hall 6500 Main Street, Houston, TX 77030-1402

4:30pm

Poster: Using Domain Knowledge to Construct Causal Models from Clinical Observational Data
Causal discovery methods seek to ascertain causal attribution from observational data. Although their use has been established in cancer and epidemiological research, surprisingly little work has been done with such methods in the area of the detection of causal drug/adverse drug event (ADE) relationships in clinical observational data derived from Electronic Health Records (EHR). Since these data were originally created for other purposes, they are inaccurate and incomplete. We reason that by integrating constraints from domain knowledge, causal methods may compensate for issues that limit the accuracy of purely statistical approaches. To evaluate this hypothesis, we used a publicly available reference data set with 4 ADEs and 399 drug-ADE pairs. Mining the literature, we identified covariates that fell within the orbit of the respective drug-ADE pairs using discovery patterns (relationship constraints based on normalized predicates). We calculated baseline scores using standard disproportionality metrics, with and without the identified covariates. Where drug pharmacological class strongly indicated a causal association with the ADE (e.g., Ibuprofen ∈ NSAIDs -> gastrointestinal bleeding), directed edges were included as prior knowledge. We then constructed causal graphs from the clinical data. We attained significant predictive improvements of ~0.05-0.3 AUC over traditional statistical methods, with ~0.7-0.9 overall AUC.


Tuesday January 24, 2017 4:30pm - 6:00pm
BioScience Research Collaborative Event Hall 6500 Main Street, Houston, TX 77030-1402

4:30pm

Poster: Using Vector Representations of Relationships from Biomedical Literature to Identify Drug/Side-Effect Relationships
We describe the use of a distributed vector framework with supervised machine learning models in the context of pharmacovigilance. Representing the predicate-based pathways that might explain the relationship of a drug and a side-effect as an abstract vector, and using labels for this vector pathway as either having a positive or negative relationship between a drug and a side-effect, we find encouraging results using a manually curated reference set, with some limitations. While AUC and F1 cross-validation performance are excellent for logistic regression (LogReg) and support vector machine (SVM) algorithms, learning curves and support vectors indicate some degree of model over-fitting, likely a result of the relatively small size of the training set and high-dimensionality of the feature space. Even so, reasonable performance was obtained with simple models, such as k-nearest neighbors, supporting the main finding that these vector representations provide a meaningful basis for classification in this context. This study justifies further research utilizing larger reference sets for training and/or testing to mitigate or improve on the limitations of our model. To this end, we propose further validation on SIDER, a less robust but larger reference set. Additionally, the methods employed are generalizable, and extending the work to additional problem domains may yield additional insights and success.

Tuesday January 24, 2017 4:30pm - 6:00pm
BioScience Research Collaborative Event Hall 6500 Main Street, Houston, TX 77030-1402

4:30pm

Poster: Within Group Variable Selection through the Exclusive Lasso
Many data sets consist of variables with an inherent group structure. The problem of group selection has been well studied, but in this paper, we seek to do the opposite: our goal is to select at least one variable from each group in the context of predictive regression modeling. This problem is NP-hard, but we propose the tightest convex relaxation: a composite penalty that is a combination of the `1 and `2 norms. Our so-called Exclusive Lasso method performs structured variable selection by ensuring that at least one variable is selected from each group. We study our method’s statistical properties and develop computationally scalable algorithms for fitting the Exclusive Lasso. We study the effectiveness of our method via simulations as well as using NMR spectroscopy data. Here, we use the Exclusive Lasso to select the appropriate chemical shift from a dictionary of possible chemical shifts for each molecule in the biological sample.

Speakers
avatar for Genevera Allen

Genevera Allen

Assistant Professor of Statistics and Electrical and Computer Engineering, Rice University
Genevera Allen is the Dobelman Family Junior Chair and an Assistant Professor of Statistics and Electrical and Computer Engineering at Rice University. She is also a member of the Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital and Baylor College... Read More →


Tuesday January 24, 2017 4:30pm - 6:00pm
BioScience Research Collaborative Event Hall 6500 Main Street, Houston, TX 77030-1402

4:30pm

Reception and Poster Session
Your ticket to the ML@RICE Workshop includes admission to the Reception and Poster Session.  Appetizers and drinks will be served.

Tuesday January 24, 2017 4:30pm - 6:00pm
BioScience Research Collaborative Event Hall 6500 Main Street, Houston, TX 77030-1402