Previous Chapter: 3 Capacity: Listening to and Engaging Communities
Suggested Citation: "4 Capacity: Digital Data and Information Systems." National Academies of Sciences, Engineering, and Medicine. 2023. Effective Health Communication Within the Current Information Environment and the Role of the Federal Government: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/27210.

4

Capacity: Digital Data and Information Systems

Data and information systems provide vital supports for effective federal health communication, said Maimuna Majumder, Assistant Professor in the Computational Health Informatics Program at Boston Children’s Hospital and Harvard Medical School and Planning Committee Member. Workshop panel presentations and discussions explored these systems with a special focus on interpretable data systems that are anticipatory of and adaptable to the public’s concerns, both during health emergencies and within the broader context of day-to-day well-being and longevity. Panelists and participants discussed the data infrastructure needed to better understand the health communication ecosystem. Presentations and discussion also addressed challenges and ethical considerations related to collecting and using these data.

Suggested Citation: "4 Capacity: Digital Data and Information Systems." National Academies of Sciences, Engineering, and Medicine. 2023. Effective Health Communication Within the Current Information Environment and the Role of the Federal Government: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/27210.

DATA INFRASTRUCTURE FOR UNDERSTANDING THE HEALTH COMMUNICATION ECOSYSTEM

The way people get information regarding health and medicine is consequential to their health, said David Lazer, Professor of Political Science and Computer and Information Science at Northeastern University. The good news, he said, is that the internet was built to instrument human behavior, making research possible that was impossible in the 20th-century information ecosystem. However, most data are inaccessible to independent researchers. Data describing how and where people get health information on the internet are an “incredible scientific opportunity,” but accessing these data is increasingly difficult. The fundamental question, Lazer said, is which data are necessary to understand the health information ecosystem of the 21st century.

Lazer described types of data that would be useful for understanding the nature and impacts of the health communication ecosystem, and he assigned a letter grade representing their current accessibility (Box 4-1).

Lazer acknowledged that collecting these types of data carries important ethical challenges. Observing people’s online behavior is comparable to accessing genetic and health data, he said. Inferences about individuals could potentially be made from such data and information about people the individuals are connected to could also “spill over.” However, these issues are not new and need not preclude this type of research; lessons can be drawn from other studies that handle sensitive electronic records, said Lazer.

Drawing on his own work developing the National Internet Observatory, with the objective of recruiting a set of volunteers who allow observation of their online experiences, Lazer explained some of the safeguards in place to protect volunteers’ identities and the privacy of the data that they share. Participants volunteer to send data from their mobile devices and desktops to a data warehouse. The data are first processed on participants’ devices to eliminate as many identifiers as possible. The data are ordered into distributed analysis clusters, which enables researchers to send queries and get results. Access is tiered by data sensitivity, and technical barriers to

Suggested Citation: "4 Capacity: Digital Data and Information Systems." National Academies of Sciences, Engineering, and Medicine. 2023. Effective Health Communication Within the Current Information Environment and the Role of the Federal Government: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/27210.

data extraction are in place. Researchers are required to undergo training about the use of the data and the ethical issues involved, and they must sign a data-use agreement and consent to monitoring. Layered upon these protections, the program has a robust consent process for participants that borrows from the learnings of All of Us.1

This research infrastructure, said Lazer, is one example of a potential solution to the need to better understand how and where people are getting health information. Creating a shared infrastructure addresses some of the research challenges including the large, fixed costs of data collection. This model is highly usable for health-related information consumption and may also be useful for other types of information.

MEDIA CLOUD

Trying to examine the health communication environment is like putting together pieces of a puzzle, said Rahul Bhargava, Assistant Professor of

___________________

1 https://allofus.nih.gov/

Suggested Citation: "4 Capacity: Digital Data and Information Systems." National Academies of Sciences, Engineering, and Medicine. 2023. Effective Health Communication Within the Current Information Environment and the Role of the Federal Government: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/27210.

Journalism and Art + Design at Northeastern University and Co-Principal Investigator of Media Cloud. There are private social platforms (e.g., Messenger), public social platforms (e.g., Facebook, Twitter), and broadcast platforms (e.g., newspapers, television news). Information flows among these platforms and data about what people are sharing and saying in each sphere are difficult to access. Although there are solitary examples of studies and programs to bring data together, said Bhargava, a public data infrastructure is needed to understand how message amplification happens across these three types of channels in the online environment. People in every field, from computational social science to politics to social media analysis, need to work on their “piece of the puzzle” and communicate with each other to avoid duplication of efforts. Researchers and others who need data would greatly benefit from the ability to access data without having to negotiate this access on their own.

Bhargava shared details about the work he and his colleagues have done to develop Media Cloud. Media Cloud is a set of technologies that can be defined in four ways, he said. First, it is a comprehensive database of global online news; the database includes nearly two billion stories from the last 10 years, from sources across the globe. The global aspect is important, he said, because other databases (e.g., Google News, LexisNexis) may not include the breadth of international sources. Second, Media Cloud is a set of online analysis tools and methods. Technologies embed principles, goals, methods, and philosophies, said Bhargava, and are designed for a specific type of end user. Media Cloud is designed to be “less intimidating” than other tools, to facilitate use by users who are not experts in media analysis. Media Cloud’s search tool allows users to investigate media attention, look at coverage in multiple languages, and see top words in coverage and the narratives that people may be reading. Third, Media Cloud is an interdisciplinary team of technologists and researchers; and fourth, it is a cross-sector research service. Bhargava invited workshop participants to try Media Cloud2 themselves. Bhargava said that while Media Cloud does not “give you answers,” it helps “find the space to be able to say what is happening” in one piece of the puzzle.

Moving forward, Bhargava noted the importance of finding funding and support for this infrastructure; in addition, connecting it to other “pieces of the puzzle” will allow users to better understand how, where, and when health messages are being communicated and amplified.

___________________

2 search.mediacloud.org

Suggested Citation: "4 Capacity: Digital Data and Information Systems." National Academies of Sciences, Engineering, and Medicine. 2023. Effective Health Communication Within the Current Information Environment and the Role of the Federal Government: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/27210.

ETHICAL CONSIDERATIONS

The health data ecosystem has expanded rapidly in recent years, said Agata Ferretti, Researcher at the Health Ethics & Policy Lab at ETH Zurich. Medical big data and other data sources, combined with powerful analytics tools and new stakeholders, make new types of research possible. Drawing on her work related to the ethics surrounding health information data, Ferretti provided panelists with insights, in a European context, that could be applied to the use of data and information systems to understand people’s experiences in the health information ecosystem. Even without medical records, noted Ferretti, it may be possible to infer a person’s health information based on location data, credit card purchase data, and social media data. Nontraditional stakeholders, such as big technology companies, provide the infrastructure and hold much of the data. Real-time big data allow scientifically sound and reliable technologies to be built, said Ferretti, but at the same time introduce challenging ethical questions.

Ethical concerns about privacy of health data and other personal information may prevent the adoption of some technologies. For example, the developers of a Swiss contact-tracing tool for COVID-19 expected it to be widely adopted, given its high level of privacy protection; however, fewer than 25 percent of the population adopted the tool.

Ferretti stressed that privacy is not the only ethical issue related to digital health data, although it receives the bulk of the attention. She shared four major unaddressed ethical issues associated with technologies that use health data, like contact tracing applications (Box 4-2).

Based on these issues, Ferretti laid out a path toward ethical digital health. First, there is a need to develop technically robust, privacy-preserving tools; however, she emphasized that addressing privacy is insufficient to ameliorate ethical concerns. Second, representativeness of datasets and scientific efficacy needs to be ensured by tackling issues of accessibility at the social and cultural levels, and by investing in digital infrastructures, digital interoperability, digital literacy, and digital health training. Third, to increase the adoption of technologies and fight misinformation, stakeholders need to provide clear, transparent, and reliable communication about data and their uses, and inform the public about any involvement of private partners in the development and deployment of technologies. Fourth, public trust and social license are critical aspects of ethical digital health; technology developers need to engage with the public and integrate people’s perspectives into technology development and data governance. Finally, to ensure fair benefit distribution among stakeholders, ethical oversight and accountability mechanisms need to be strengthened, including monitoring for conflict of interest arising from public-private collaborations. Much effort has gone into highlighting the importance of using data, said Ferretti,

Suggested Citation: "4 Capacity: Digital Data and Information Systems." National Academies of Sciences, Engineering, and Medicine. 2023. Effective Health Communication Within the Current Information Environment and the Role of the Federal Government: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/27210.

but the uses of data and the accountability mechanisms in place are less clear to the public.

DISCUSSION

During the discussion period, panelists and participants discuss (a) underutilized data types; (b) ensuring credibility, transparency, and effectiveness; (c) priorities for investing in infrastructure and data development; and (d) equity and representation in systems and platforms.

Underutilized Data Types

Majumder began the discussion by asking panelists to identify at least one data type that they believe is underdiscussed or underutilized in health communication, and the progress that could be achieved with those data. Bhargava responded that radio is an understudied area. Radio is used in many public health messaging campaigns, particularly campaigns directed at specific populations or geographic areas. Bhargava said that, as a consumer, he listens to radio stations that he feels kinship with and that match his identity. As researchers “we have only scratched the surface” of understanding the narratives that flow across radio, he said. Ferretti said

Suggested Citation: "4 Capacity: Digital Data and Information Systems." National Academies of Sciences, Engineering, and Medicine. 2023. Effective Health Communication Within the Current Information Environment and the Role of the Federal Government: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/27210.

that one underappreciated area is the importance of new social media platforms. Young people who use social media change platforms very quickly; for example, Facebook was popular ten years ago but is now barely used by young people. Data from new platforms are very valuable, but the platforms may not be familiar to researchers. Majumder added that studies have shown that young people are using TikTok as their primary form of news, including health information. Lazer said that private messaging apps (e.g., WhatsApp) would be an incredibly rich source of data about the spread of information, but that users of private messaging apps have a “deep expectation of privacy” that makes ethical data access challenging. Majumder noted that chain mail messages on private messaging apps are a common source of misinformation, and there may be a way to distinguish such messages from personal messages for data collection.

Ensuring Credibility, Transparency, and Effectiveness

Majumder asked panelists to speak to ways to ensure credibility, transparency, and effectiveness when communicating about health on any platform. Ferretti noted that research on young people indicates that they have unique priorities for information sources; they put less emphasis on privacy and reliability of information, and more emphasis on personalized information and engaging tools. Research is needed to understand the perspectives and priorities of end users so that the most relevant tools and information can be provided, she said. Lazer agreed with this analysis and said that whether a message is “credible” depends on what credibility means to the person receiving the message. For example, does the person put more trust in an expert, their own doctor, a media source, or their peers? A person’s thoughts about credibility are also likely to influence their behavior; Lazer said that self-report data indicate that those who rely on Facebook for COVID-19 information are less likely to be vaccinated. Each of these research questions is important to address and it is currently difficult to understand the connections between cognition and behavior, he noted. One major challenge for credibility and transparency, said Bhargava, is that most online communities are controlled by profit-driven companies. Each community has its own norms and rules, and individuals largely lack control over these communities. This is problematic, said Bhargava, and collaborative design of community platforms may be one way to address the issue.

Priorities for Investing in Infrastructure and Data Development

Given the need for developing new infrastructure and data-collection methods, panelists identified several priorities for investments. Lazer noted two major areas he believes are important to prioritize to support research

Suggested Citation: "4 Capacity: Digital Data and Information Systems." National Academies of Sciences, Engineering, and Medicine. 2023. Effective Health Communication Within the Current Information Environment and the Role of the Federal Government: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/27210.

on the health communication ecosystem. First, policy change is needed to encourage platforms’ transparency and data sharing while protecting privacy. Second, it is necessary to develop shared infrastructures that mitigate the massive, fixed costs of data collection; make data available for analytic access; and address privacy and other ethical issues. Lazer emphasized the needs for legal and ethical data access from platforms, as this work is essential for “societies to work in the 21st century.” The General Data Protection Regulation (GDPR) in the European Union, said Ferretti, has brought beneficial changes and promoted collaboration. Although companies and researchers are required to justify their data usage, the GDPR has provided standards to formalize collaboration and data-sharing processes. Moving forward, she said, infrastructure investments are needed to close the digital divide, not only between countries but also between populations within countries. It is critical to close this gap and then to effectively engage with the public about new technologies and their potential uses. Bhargava encouraged investment in social media alternatives. Currently, most social media platforms reflect libertarian values and a “more speech is better speech” attitude. Platforms can work in other ways, and those alternatives need to be supported and developed. In addition, he said, public funding is needed to support data infrastructure. Much of the work in this area has been privately funded and, Bhargava noted, it is time for government funders to “get on the train.”

One participant noted that during the COVID-19 pandemic, the work on vaccine communication was “incredibly intensive,” particularly in terms of trying to communicate in real time. Bhargava and Lazer noted that both personnel and data infrastructure are critical for collecting and analyzing data. Lazer emphasized the importance of creating actionable data on a timeline, and he said that public infrastructure is needed to collect and disseminate information to inform public discourse. He suggested that people working in academia could do more to support the provision of actionable information in a timely way but reiterated that, ultimately, investments in public data infrastructure are needed to enable the rapid turnaround of public results to inform policy discourse.

Equity and Representation in Systems and Platforms

Social media platforms do not always reflect all communities and can thus overemphasize certain narratives, observed one participant. Given this inequality, she asked panelists how they are “baking in equity” into their systems and platforms. Part of the long-term solution for the National Internet Observatory, said Lazer, is finding ways to make the research infrastructure broadly available to a wide set of researchers. The situation is not “if you build it, they will come,” he said; bridges to the infrastructure

Suggested Citation: "4 Capacity: Digital Data and Information Systems." National Academies of Sciences, Engineering, and Medicine. 2023. Effective Health Communication Within the Current Information Environment and the Role of the Federal Government: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/27210.

need to be built and training needs to be available and accessible. In terms of equity in the data, Lazer said that the National Internet Observatory is currently building the capacity to conduct focused data collection for specific communities. Bhargava shared two generic approaches for improving equity. In terms of data collection, partnering with people already working in a target population ensures inclusion of diverse data; for example, working with a partner that has a large database of Black-owned and -operated media sources. Bhargava noted that some populations are harder to target for data collection; for example, media from Indigenous populations in the United States are often published and distributed in PDF form, which makes extraction more difficult. A second approach involves privileging important, equity-focused projects. For example, he said, Media Cloud is working with a group of activists in the Americas to automate the pipeline of data on gender-based killings.

A workshop participant asked about the representativeness of data gleaned from the internet, particularly whether it makes sense to rely on these data or whether other sources of community-level data are needed. Bhargava acknowledged this issue, noting that certain approaches can provide data on smaller geographic areas or communities. Media Cloud can geolocate articles and run a query asking for articles about vaccine hesitancy and South Florida, for example. Another way to capture community-level data is to look at media sources popular in the community, for example, the Hindi Star Times for the Indian population in Boston, or the various Brazilian news sources for Boston’s large Brazilian population. Tapping into these sources, said Bhargava, requires both a database of global news sources and insight into the community and what they read. He noted that his own knowledge of community news sources can come from personal interactions, such as a WhatsApp group or a parent community meeting. Surveys that ask about media consumption and internet usage are another way to learn about community news sources; for example, Bhargava learned about an active Reddit sub in his own community that discusses community. While each approach to community-level data provides only one piece of the puzzle, wise use of multiple approaches can offer valuable insight, he said. Lazer shared that it would be challenging for the National Internet Observatory to get representative community-level data because of the sheer number of participants needed to represent individual communities.

INSIGHTS FROM DATA AND INFORMATION SYSTEMS BREAKOUT SESSIONS

On day two of the workshop, two small groups participated in separate facilitated discussions to generate ideas for capacity building related to data and information systems to support health communication. Each group considered the same questions:

Suggested Citation: "4 Capacity: Digital Data and Information Systems." National Academies of Sciences, Engineering, and Medicine. 2023. Effective Health Communication Within the Current Information Environment and the Role of the Federal Government: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/27210.
  1. What resources do we have inside or outside of government to address challenges to building needed data and information systems?
  2. What are examples of successful efforts that could we learn from?
  3. What resources are most needed to make progress on this priority/challenge?
  4. Who else should be involved in addressing this challenge/priority?

Appendix D provides a summary of the ideas generated through these discussions, as reported by session facilitators David Scales and William Hallman.

Suggested Citation: "4 Capacity: Digital Data and Information Systems." National Academies of Sciences, Engineering, and Medicine. 2023. Effective Health Communication Within the Current Information Environment and the Role of the Federal Government: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/27210.
Page 41
Suggested Citation: "4 Capacity: Digital Data and Information Systems." National Academies of Sciences, Engineering, and Medicine. 2023. Effective Health Communication Within the Current Information Environment and the Role of the Federal Government: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/27210.
Page 42
Suggested Citation: "4 Capacity: Digital Data and Information Systems." National Academies of Sciences, Engineering, and Medicine. 2023. Effective Health Communication Within the Current Information Environment and the Role of the Federal Government: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/27210.
Page 43
Suggested Citation: "4 Capacity: Digital Data and Information Systems." National Academies of Sciences, Engineering, and Medicine. 2023. Effective Health Communication Within the Current Information Environment and the Role of the Federal Government: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/27210.
Page 44
Suggested Citation: "4 Capacity: Digital Data and Information Systems." National Academies of Sciences, Engineering, and Medicine. 2023. Effective Health Communication Within the Current Information Environment and the Role of the Federal Government: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/27210.
Page 45
Suggested Citation: "4 Capacity: Digital Data and Information Systems." National Academies of Sciences, Engineering, and Medicine. 2023. Effective Health Communication Within the Current Information Environment and the Role of the Federal Government: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/27210.
Page 46
Suggested Citation: "4 Capacity: Digital Data and Information Systems." National Academies of Sciences, Engineering, and Medicine. 2023. Effective Health Communication Within the Current Information Environment and the Role of the Federal Government: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/27210.
Page 47
Suggested Citation: "4 Capacity: Digital Data and Information Systems." National Academies of Sciences, Engineering, and Medicine. 2023. Effective Health Communication Within the Current Information Environment and the Role of the Federal Government: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/27210.
Page 48
Suggested Citation: "4 Capacity: Digital Data and Information Systems." National Academies of Sciences, Engineering, and Medicine. 2023. Effective Health Communication Within the Current Information Environment and the Role of the Federal Government: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/27210.
Page 49
Suggested Citation: "4 Capacity: Digital Data and Information Systems." National Academies of Sciences, Engineering, and Medicine. 2023. Effective Health Communication Within the Current Information Environment and the Role of the Federal Government: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/27210.
Page 50
Next Chapter: 5 Capacity: Expertise and Human Capital
Subscribe to Email from the National Academies
Keep up with all of the activities, publications, and events by subscribing to free updates by email.