Answering Questions of Health and Medicine Using Internet Data
Answering questions of health and medicine frequently necessitates the collection of data from large cohorts during real-world interactions. This is costly and, in many cases, extremely challenging due to the nature of these interactions and the difficulty in getting people to report about them. Work in recent years has shown that data generated by people as they browse the Web, including queries submitted to Internet search engines, social media postings, and even merely browsing histories can be used to learn about peoples’ activities in the virtual as well as the physical worlds. Therefore, these data could potentially serve as a cheap alternative for real-world data.
In this talk I will show that specific types of Internet data are less influenced by reporting biases, and are thus a low-cost alternative for extracting medical information from very large populations. I will discuss areas where Internet data are especially advantageous for addressing questions of health and medicine, and how these data can be coupled with other information in a privacy-preserving manner to improve the range of questions we can answer. I will illustrate with several recent examples such as post-market drug surveillance, discovery of a link between medial portrayal of underweight models and the development of anorexia, and the prediction of mood disorder events.