Page 67 Cite

Suggested Citation: "4 Considerations for Program Evaluation." National Academies of Sciences, Engineering, and Medicine. 2025. Development, Implementation, and Evaluation of Community-Based Suicide Prevention Grants Programs: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29215.

4

Considerations for Program Evaluation

The next session of the workshop was focused on program evaluation. The session opened with two presentations that established a conceptual basis for broader dialogue in the subsequent moderated panel discussion. Following the panel discussion, the final portion of the session was dedicated to audience Q&A.

This chapter highlights practical strategies and real-world lessons for evaluating suicide prevention programs at the community level. Speakers emphasized the importance of aligning evaluation design with program goals, engaging communities in meaningful ways, and using evaluation data to drive learning, accountability, and improvement.

LESSONS LEARNED AND EXAMPLES FROM THE FIELD

Two speakers shared insights from their long-standing roles in evaluating suicide prevention programs. The first presentation provided an overview of key concepts and decision points in program evaluation design, while the second drew on nearly two decades of experience with a national youth suicide prevention initiative to illustrate lessons learned in practice.

From Program Evaluation to Comprehensive, Community-Based Suicide Prevention Evaluation: Lessons Learned from the Field

Kristen Quinlan (Education Development Center, National Action Alliance for Suicide Prevention) opened her presentation by highlighting her dual roles: providing training and technical assistance to states and

Page 68 Cite

Suggested Citation: "4 Considerations for Program Evaluation." National Academies of Sciences, Engineering, and Medicine. 2025. Development, Implementation, and Evaluation of Community-Based Suicide Prevention Grants Programs: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29215.

communities on evaluation and serving as a representative of the National Action Alliance for Suicide Prevention. In both capacities, she emphasized the importance of integrating evaluation into broader, coordinated, and public health–oriented approaches to suicide prevention.

Quinlan began by acknowledging the major frameworks currently guiding community-based suicide prevention efforts, including the National Strategy for Veteran Suicide Prevention 2018–2028, the 2024 National Strategy for Suicide Prevention, and the accompanying Federal Action Plan. These frameworks share a commitment to upstream and downstream prevention and to comprehensive, coordinated approaches. She emphasized that these same principles should inform how program evaluation is approached.

Quinlan framed logic models as familiar and useful tools for evaluation, referencing Standley’s earlier presentation. A typical logic model progresses from inputs and activities to outputs and then short-, intermediate-, and long-term outcomes. In suicide prevention, long-term outcomes often refer to sustained behavior change, infrastructure, and policy development. However, Quinlan noted that many programs are asked to show impact, especially reductions in suicide rates, without regard for the scale or scope of their contribution to the overall prevention effort.

She cautioned against treating evaluation in isolation, particularly for community-based efforts, which do not occur in a vacuum. Rather, these programs often work in synergy with broader community and state-level inputs and assets. “When we evaluate what we are doing,” she said, “we can’t tell the story in a vacuum either.” She stressed the importance of articulating a program’s unique contribution while avoiding the temptation to overstate its impact.

For example, a program focused on raising awareness about 988 may play a vital role in a community’s overall suicide prevention strategy, but it is not responsible for outcomes related to treatment or crisis intervention. Rather than claim full credit for broader impacts, programs should clearly communicate what they were responsible for. “We want to be real about what we can do,” she explained, “and what we’re on the hook for.”

Quinlan also addressed the limitations of impact measurement at the community level. Suicide is a relatively rare event, and in smaller populations, even minor changes in the number of deaths can lead to dramatic fluctuations in rates. These “small Ns,” she noted, can make it difficult to demonstrate stable trends over time and may lead to premature dismissal of effective programs based on incomplete or misleading data. Instead, she argued, the field must move beyond a narrow emphasis on impact and work toward telling a more complete story of progress.

To support this shift, Quinlan introduced a “nested evaluation model” (see Figure 4-1), in which individual program evaluations are situated

Page 69 Cite

Suggested Citation: "4 Considerations for Program Evaluation." National Academies of Sciences, Engineering, and Medicine. 2025. Development, Implementation, and Evaluation of Community-Based Suicide Prevention Grants Programs: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29215.

**FIGURE 4-1** Nested evaluation model for suicide prevention.
SOURCE: Presentation by Kristen Quinlan on April 29, 2025.

Page 70 Cite

Suggested Citation: "4 Considerations for Program Evaluation." National Academies of Sciences, Engineering, and Medicine. 2025. Development, Implementation, and Evaluation of Community-Based Suicide Prevention Grants Programs: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29215.

within a broader framework of shared state and/or community inputs and goals. She explained that using common logic models and shared metrics allows diverse programs to contribute to a collective evaluation strategy that is better able to assess short-, intermediate-, and long-term outcomes.

Quinlan offered the Colorado National Collaborative as a real-world example of this approach. The initiative was sparked by a meeting of national, state, and local suicide prevention leaders who asked what might happen if high-risk counties implemented upstream and downstream prevention activities simultaneously, rather than through uncoordinated one-off efforts.

Colorado was selected for its strong infrastructure and existing Office of Suicide Prevention. Six high-risk counties with diverse geographic, demographic, and socioeconomic profiles were chosen. Each county agreed to focus on six core suicide prevention priorities, but they were given the freedom to pursue these goals in ways that made sense for their communities. While this variation presented challenges for evaluation, it reflected the need for local flexibility. As long as counties could document their outputs and address shared metrics related to the six pillars—such as connectedness and economic support—the effort could be evaluated through a unified, albeit adaptable, framework. Quinlan added that the Colorado initiative was in its fifth year of funding, and while full evaluation results were still pending, the project exemplified how aligned implementation and evaluation strategies can support community-based prevention at scale.

Looking ahead, she previewed an initiative led by the National Action Alliance for Suicide Prevention to develop a national monitoring and evaluation strategy aligned with the National Strategy for Suicide Prevention. Still in its early stages, the effort aims to help states, tribes, jurisdictions, and communities better align their evaluation efforts with national priorities. The goal is to enable communities to “plug into” shared templates and logic models to clearly articulate their contributions—even if those contributions address only a portion of the national strategy. Through common core metrics and shared frameworks, the evaluation conversation can shift away from the binary of “strategy or impact,” toward a more nuanced understanding of short-, intermediate-, and long-term progress.

In the final portion of her talk, Quinlan focused on the importance of engaging communities in the evaluation process. As an evaluation trainer, she emphasized the goal of building local capacity so that evaluation efforts can be sustained after external support ends. This requires a participatory approach, in which community stakeholders are meaningfully involved in every stage of the evaluation.

Quinlan shared several principles for participatory evaluation, which incorporates invested parties as leaders in all aspects of the evaluation (see

Page 71 Cite

Suggested Citation: "4 Considerations for Program Evaluation." National Academies of Sciences, Engineering, and Medicine. 2025. Development, Implementation, and Evaluation of Community-Based Suicide Prevention Grants Programs: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29215.

**FIGURE 4-2** Principles of participatory evaluation in community-based suicide prevention.
SOURCE: Presentation by Kristen Quinlan on April 29, 2025.

Figure 4-2), including the use of common language, minimizing community burden, encouraging peer learning, and promoting shared decision-making. She called attention to the issue of data ownership and feedback, noting that communities are often tasked with providing data without receiving sufficient access to or benefit from the results. As an example, she described a stakeholder mapping exercise conducted in Colorado to examine who collects, owns, and uses data. The exercise revealed a pattern in which certain actors bore most of the data collection burden while receiving little in return. “If we are going to collect it,” she concluded, “we need to be sure we are feeding the information back.”

Multi-Site Community-Based Suicide Prevention Program Evaluation: An Example from the Field

Christine Walrath (ICF) delivered a comprehensive presentation on the evaluation of the Garrett Lee Smith (GLS) Suicide Prevention Program, detailing its evolution, challenges, and insights from nearly two decades of implementation. She began by setting the context for the GLS program, which funds state, tribal, and campus grantees to implement suicide prevention efforts for youth and young adults. Over time, the evaluation strategy

Page 72 Cite

Suggested Citation: "4 Considerations for Program Evaluation." National Academies of Sciences, Engineering, and Medicine. 2025. Development, Implementation, and Evaluation of Community-Based Suicide Prevention Grants Programs: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29215.

has transitioned from focusing primarily on implementation to examining outcomes more rigorously.

Walrath noted that evaluation efforts began at the launch of the program, and evaluation and data-driven decision making were part of the requirements in the Garrett Lee Smith Memorial Act. The act required grantees to participate in evaluation efforts, so there were local and national evaluation elements. The evaluation requirements created an environment and culture that had evaluation embedded in program decision making and that has continued throughout life of grant programs locally and nationally. Over time, the approach evolved to include targeted studies on the effectiveness of the program, more sophisticated statistical modeling, and dissemination of findings in academic publications and summary briefs.

A key theme of Walrath’s presentation was the inherent complexity in evaluating a large federal program that funds diverse interventions across a range of sites. These differences span geographic contexts, population demographics, implementation capacity, and intervention models. To respond to this complexity, the evaluation team adopted a mixed-methods, multilevel approach, including both quantitative and qualitative data collection and analysis. This approach allowed them to understand not only what outcomes were achieved, but how and under what conditions they were achieved.

She emphasized the importance of using data to tell a compelling story—one that speaks to both statistical outcomes and human experiences. For example, she presented findings showing statistically significant declines in suicide attempts among youth served by GLS grantees compared to control groups and increases in knowledge and awareness among gatekeepers trained by the program. She also referenced qualitative insights that provided richer understanding of how interventions were experienced by participants.

One key lesson from the GLS evaluation, Walrath noted, was the value of designing evaluation methods that are flexible enough to adapt to evolving program needs while maintaining scientific rigor. This required building strong relationships with grantees and technical assistance providers, so that data collection efforts were feasible and aligned with real-world implementation. It also meant carefully balancing the need for standardization (to allow for cross-site comparisons) with responsiveness to the unique features of each grantee site.

Walrath shared examples of how findings from the evaluation have informed both policy and practice. For instance, the results helped make the case for sustained and increased funding for suicide prevention programs and shaped the development of new tools and resources to support grantees. She also highlighted the use of dashboards and other visual tools to communicate evaluation findings to diverse audiences, including policymakers, program staff, and community members.

Page 73 Cite

Suggested Citation: "4 Considerations for Program Evaluation." National Academies of Sciences, Engineering, and Medicine. 2025. Development, Implementation, and Evaluation of Community-Based Suicide Prevention Grants Programs: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29215.

In closing, Walrath reflected on the broader implications of the GLS evaluation for suicide prevention and public health more generally. She stressed the need for continued investment in evaluation infrastructure, capacity-building among grantees, and the use of data not just for accountability, but for learning and continuous improvement. “Evaluation,” she noted, “should not be an afterthought. It should be built into the program from the start.”

PANEL DISCUSSION AND AUDIENCE Q&A

Following the two presentations on program evaluation, the presenters were joined by individuals who had given the presentations summarized in Chapter 2: Mary Cwik (Johns Hopkins University), Novalene Alsenay Goklish (Johns Hopkins University), Brandi Jancaitis (Virginia Department of Veterans Services), Richard McKeon (Substance Abuse and Mental Health Services Administration), and David Rozek (University of Texas Health Science Center at San Antonio), along with Tanha Patel (Centers for Disease Control and Prevention Foundation), for a moderated panel discussion to reflect on practical strategies, challenges, and lessons learned related to evaluating suicide prevention programs and communicating results to diverse audiences. Diana Clarke (American Psychiatric Association; member, workshop planning committee) served as the moderator for this discussion. Panelists responded to guiding questions on the following topics: evolution of program design and administration based on evaluation findings, best practices for balancing evaluation across multiple levels, using intermediate indicators to track progress toward suicide prevention goals, and grantee involvement in evaluation and building and sustaining capacity over time.

Evolution of Program Design and Administration Based on Evaluation Findings

Clarke began the discussion by asking panel members to reflect on how program design and administration has evolved over time passed on evaluation findings and lessons learned. Quinlan emphasized the growing role of participatory action approaches and the need to invest in community-based infrastructure for data collection. She shared an example of a county-level youth program that initially relied on school-based surveys but faced objections from parents. In response, the team pivoted to key informant interviews and focus groups with youth. When those results were shared, community members asked how to obtain a more representative sample, prompting further adaptation. Quinlan noted that this cycle of feedback and adjustment exemplifies the value of participatory approaches in ensuring

Page 74 Cite

Suggested Citation: "4 Considerations for Program Evaluation." National Academies of Sciences, Engineering, and Medicine. 2025. Development, Implementation, and Evaluation of Community-Based Suicide Prevention Grants Programs: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29215.

that communities have the information that they need to grow and sustain their prevention efforts.

Patel described an initiative supporting veteran-serving organizations that evolved significantly over seven years of implementation. Early in the program, technical assistance providers discovered that many of the participating organizations had limited staff and capacity for evaluation. Tools were developed to support them, but each year those tools had to be adapted based on where organizations were in their readiness. Eventually, a toolkit was created to support organizations beyond the initial grantee group, incorporating input from participants in the program’s final two years. Patel underscored the importance of engaging participants to ensure that the tools were relevant, understandable, and aligned with their actual needs, emphasizing that building this capacity takes time and sustained investment.

Cwik shared that her team’s evaluation findings also prompted a significant shift in their programming approach. Initially, they had focused on adapting existing programs for use in tribal communities, but evaluations showed that while these programs were often strong in some areas, they were not always a good cultural fit. In response, the team shifted toward creating innovative programs from the ground up with guidance from tribal Elders to ensure cultural alignment and community ownership.

McKeon described several ways in which data collected through program evaluation influenced efforts to refine and adjust funding priorities under the GLS grant program. One area of focus, he noted, was determining where grantees were directing their suicide prevention efforts. Evaluation data revealed that the vast majority of grantees were implementing school-based suicide prevention activities. While this focus was valuable—given that schools provide a universal access point for reaching youth—McKeon noted that other high-risk populations were receiving less attention. “We were getting occasional really good grantee work on juvenile justice or foster care,” he said, “but not as much as I would have liked.” Although some projects addressed youth in mental health settings, there were very few focused on youth involved in the juvenile justice or child welfare systems—despite their elevated suicide risk. McKeon referenced a study from Utah that found 60% of youth who died by suicide had previous involvement in the juvenile justice system. The question became whether similar patterns existed in other states, and what more could be done to encourage targeted interventions for these populations.

To support a potential shift, McKeon’s team asked the Suicide Prevention Research Center to conduct an analysis of suicide prevention efforts within juvenile justice and foster care systems. The hope was to generate guidance and recommendations that could spur increased engagement by states and tribes with these high-risk populations. While some grantees

Page 75 Cite

Suggested Citation: "4 Considerations for Program Evaluation." National Academies of Sciences, Engineering, and Medicine. 2025. Development, Implementation, and Evaluation of Community-Based Suicide Prevention Grants Programs: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29215.

responded and pursued work in these areas, McKeon acknowledged that uptake was limited, in part due to constraints in statutory language governing the grant program. Legal and policy restrictions also created barriers to directly funding suicide prevention in some justice settings, such as jails. He noted there was a little more flexibility with detention centers, but it was mostly limited to training staff, not providing services. Despite the limitations, McKeon emphasized that this was a data-driven effort to evolve the program in response to identified gaps. The evaluation revealed where needs existed, and while the changes were only partially successful, they reflected an intentional effort to align funding priorities with the goal of reaching higher-risk youth.

Clarke summarized a key theme across the examples: the need for both programs and funders to evolve together. Meaningful change is most likely when community capacity-building is coupled with program flexibility and responsiveness to evaluation.

Best Practices for Balancing Evaluation Across Multiple Levels

Clarke next asked panelists to identify best practices for balancing evaluation across multiple levels—individual grantees, clusters of similar grantees, and overarching program efforts. Walrath emphasized the importance of planning to create data collection strategies that serve multiple purposes. Rather than thinking of it as a matter of balance, she suggested that evaluation designers should identify essential indicators upfront that can be useful at local, cluster, and national levels. With this kind of strategic forethought, programs can avoid duplicative data collection and ensure that data gathered serve a range of evaluation needs.

Jancaitis offered lessons from her team’s work in Virginia, noting that they are still actively learning how to categorize grantees in ways that allow for meaningful aggregation and storytelling. She described grantees ranging from organizations providing direct peer counseling to others focused on gatekeeper training, emphasizing the need to develop tailored subsets of measures for different types of interventions. Jancaitis acknowledged that when programs move quickly to get funding into communities, as was the case for her team, data strategies may be less developed initially. However, she stressed the value of learning alongside grantees and adapting over time—particularly when grantees highlight that certain data points are not useful or suggest better ways to frame findings. For example, when talking to legislators, reframing impact in terms of household effects rather than individual veterans allowed for broader, more meaningful storytelling.

Jancaitis also shared a challenge they encountered in using clinical language, such as the term “safety plan.” While common in provider settings, she noted that this language sometimes triggered concern among veterans,

Page 76 Cite

Suggested Citation: "4 Considerations for Program Evaluation." National Academies of Sciences, Engineering, and Medicine. 2025. Development, Implementation, and Evaluation of Community-Based Suicide Prevention Grants Programs: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29215.

who associated it with hospitalization and loss of rights. This made engagement difficult. She stressed the importance of tailoring terminology to the cultural and lived experience of the community being served.

Rozek underscored the importance of transparency and communication around evaluation metrics. He noted that when grantees are asked to report data without understanding its purpose, it can feel arbitrary or burdensome. To address this, his team works collaboratively with grantees to explain why specific data elements are requested and whether they are feasible to collect. If a data point would require an excessive time investment, Rozek said, they reconsider whether it is truly essential. He emphasized that data should be actionable for both the funder and the grantee, and that ongoing conversations are key to ensuring mutual understanding and relevance.

Patel added a perspective from her previous role working on National Institutes of Health–funded multi-site programs. She described how funder guidance on performance measures, combined with opportunities for evaluators to connect in peer groups, helped foster shared learning and collaboration. These informal workgroups allowed evaluators to explore common interests—such as implementation or outcome evaluations—and build a community of practice that extended beyond the life of the grant. Patel emphasized that these evaluator-to-evaluator connections strengthened the evaluation process and helped align efforts across sites.

Clarke closed the discussion by affirming the importance of participatory evaluation approaches. She emphasized that evaluation is most effective when it is not imposed on organizations but co-created with them—an insight echoed throughout the conversation.

Using Intermediate Indicators to Track Progress Toward Suicide Prevention Goals

Clarke invited panelists to discuss best practices for identifying and using intermediate indicators in program evaluation. Noting that many presentations had touched on this theme, she asked how programs should approach the development and use of such indicators. Cwik responded by emphasizing the importance of grounding evaluation efforts in a logic model or theory of change. These models, she explained, help programs articulate what they expect their interventions to impact and how those effects might ultimately reduce suicide risk. “Sometimes you have to take a guess,” she acknowledged, adding that the path to suicide prevention is not always linear or obvious. Cwik offered the New Hope program as an example. While the ultimate goal is to reduce suicide and suicide deaths, hypothesized intermediate outcomes include reduced depressive symptoms, decreased suicidal ideation, and increased reasons for hope. Identifying the most appropriate indicators may require returning to relevant scientific

Page 77 Cite

Suggested Citation: "4 Considerations for Program Evaluation." National Academies of Sciences, Engineering, and Medicine. 2025. Development, Implementation, and Evaluation of Community-Based Suicide Prevention Grants Programs: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29215.

literature, consulting with clinicians and community members, and conducting small pilot tests that track multiple potential outcomes to see which ones are most responsive to the intervention.

Quinlan added that early engagement with community stakeholders and other invested parties is essential. These groups can help clarify what they care about, what they want to learn from the evaluation, and where they expect to see change. “There’s nothing more disappointing than to build an evaluation not built on what the community wants to know,” she said. Asking the right questions from the outset and grounding evaluation in what the community values can help identify meaningful intermediate outcomes.

Walrath emphasized that logic models and theories of change should not be viewed as static documents. Rather, they should be dynamic—revisited and refined over time as programs learn more about their effects. Programs may initially identify a “best guess” about short-, intermediate-, and long-term outcomes, but these assumptions should be adjusted as new evidence becomes available.

Quinlan added that traditional logic model formats do not always resonate with all communities. Linear models, she noted, may not reflect how certain cultural groups conceptualize change. She encouraged funders and evaluators to consider alternative approaches to logic modeling, such as using visual metaphors like canoes or circles. These adaptations can better align with community values and worldviews while still providing a roadmap for evaluation. Returning to the broader theme of community engagement and inclusive evaluation, Quinlan described a past SAMHSA program called Service to Science. The initiative paired evaluators with grassroots organizations to help translate their work into the language of funders and evaluation. She stated that there is a lot of great practice-to-evidence work occurring, but these efforts often do not speak the language of the funder. Service to Science helped bridge this gap by providing training and support in evaluation while also encouraging funders to expand their definitions of acceptable evidence. Rather than prioritizing only randomized controlled trials, Quinlan said, the program encouraged exploration of other methodologically sound approaches that better captured different ways of knowing.

Clarke commented that evaluators should also consider how they define positive versus negative outcomes. Drawing from her experience as a clinical researcher, she cautioned that metrics traditionally viewed as negative—such as increased hospitalizations—may, in some contexts, reflect positive developments. For example, if a suicide prevention program led to more individuals seeking care at hospitals or emergency departments, that may indicate greater willingness to reach out for help rather than an increase in crisis events. The perception of outcomes can shape interpretation of program impact, she noted. She encouraged evaluators to reflect

Page 78 Cite

Suggested Citation: "4 Considerations for Program Evaluation." National Academies of Sciences, Engineering, and Medicine. 2025. Development, Implementation, and Evaluation of Community-Based Suicide Prevention Grants Programs: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29215.

critically on how success is defined and measured, particularly in areas as complex as suicide prevention.

Grantee Involvement in Evaluation and Building and Sustaining Capacity Over Time

The next question posed by Clarke was “What role should grantees play in the evaluation and what support do they need to do it well and to sustain it over time?” Rozek stressed the importance of understanding the data that grantees are already collecting. In one instance, he noted, the funder team was prepared to introduce a standard set of evaluation metrics when a simple question—“What else are you already collecting?”—revealed that the grantee had already amassed a rich set of tailored data. “Some of these organizations have amazing, very rich data,” Rozek said, “and sometimes even better measures than what we ask from common data elements because they are tailored to the program.” In some cases, programs had already conducted psychometric assessments or developed custom outcome measures to meet the needs of their communities. Rozek emphasized that if evaluation is approached as a true collaboration—where funders are willing to listen, share reasoning behind requests, and negotiate when appropriate—it can be mutually beneficial. Importantly, he added, grantees often possess valuable insights about what works and why, but those insights may be overlooked unless explicitly invited into the conversation.

Jancaitis echoed the need for funders to approach evaluation as a true collaboration and to be responsive to grantee feedback. She offered examples from her own work in which grantees explained why their sample sizes were small or why additional time was needed to build trust and reach vulnerable populations. In one case, a peer support grantee shared that it could take six months before someone was even willing to talk. Funders, she emphasized, should consider the human realities behind the data and buffer grantees from excessive burden whenever possible. She noted that flexibility in implementation requirements—such as when and how screening is conducted—can help align evaluation with real-world program delivery.

Within the GLS program, Walrath shared, grantees were supported by a team that included a grant project officer, a technical assistance provider, and an evaluation support person. This wraparound support helped grantees integrate evaluation into their existing workflows, regardless of the specific setting or strategy being implemented. She referred to these support roles as “program partners” who could adapt evaluation approaches to fit the grantee’s context and make implementation as seamless as possible.

Patel highlighted two critical needs for supporting grantee participation in evaluation: funding and time. Without dedicated evaluation resources

Page 79 Cite

Suggested Citation: "4 Considerations for Program Evaluation." National Academies of Sciences, Engineering, and Medicine. 2025. Development, Implementation, and Evaluation of Community-Based Suicide Prevention Grants Programs: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29215.

and time to plan and execute evaluation activities across the program life cycle, she cautioned, efforts are likely to be limited to basic reporting. “To really get to the stories of what’s meaningful,” she said, “it will take time and money.” Clarke then asked for specifics on the type of training provided to build grantee capacity to lead evaluation, in response to which Patel added that her program is fundamentally designed to provide training, technical assistance, and capacity-building support for evaluation. Organizations participate in a structured cohort for approximately 8 to 12 months, during which they receive both group-based instruction and individualized technical assistance. “We don’t expect our grantees to come in with evaluation knowledge,” Patel said. “If they know the bare bones—‘Hey, I need to do evaluation; my funder is asking me to do this’—that’s where we get them.” Participants are introduced to core concepts of evaluation and guided through the process of building an evaluation plan for one suicide prevention program. The plan is structured so that it can be replicated across other programs within the organization, with the goal of establishing lasting internal capacity for evaluation.

Quinlan added that in some cases, the first step is simply making the case for why evaluation is needed at all. She recalled a conversation with a community pastor who declined to engage in evaluation training, saying, “I’m busy putting out fires.” In such cases, she said, it is essential to show how evaluation can help sustain the program and communicate its impact to funders, stakeholders, and the broader community. “To be able to lay the foundation that it’s even needed is sometimes where you need to start,” she noted.

These reflections underscored that building evaluation capacity is not only a technical challenge, but also a relational and motivational one. For grantees to lead evaluation effectively, they must be given not only tools and training, but also a clear understanding of the value and purpose of evaluation within their specific context.

Audience Q&A

The session concluded with a robust audience discussion, both in-person and online, that explored issues of data infrastructure, culturally responsive evaluation, and the importance of clear and inclusive assessment practices, particularly in work with tribal and veteran communities.

Exploring Federated Data Systems and Common Data Elements

A member of the virtual audience asked about the potential to develop federated learning models or data sharing platforms that preserve individual privacy while enabling linkage across multiple programs and organizations.

Page 80 Cite

Suggested Citation: "4 Considerations for Program Evaluation." National Academies of Sciences, Engineering, and Medicine. 2025. Development, Implementation, and Evaluation of Community-Based Suicide Prevention Grants Programs: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29215.

Walrath shared that an ongoing evaluation of the 988 and behavioral health crisis continuum initiative includes a conceptual exploration of cross-linked data environments at the state level. Though still aspirational, she noted that the evaluation is looking at ways to connect data across crisis response systems and ultimately link to the National Death Index. Crosby added that some preliminary work has occurred in states using the National Violent Death Reporting System to link death data with correctional or other state-level administrative records. These efforts remain limited to the state level, he noted, as personally identifiable information typically falls under state authority. Rozek shared an example from Face the Fight, which is partnering with the Institute for Veterans and Military Families at Syracuse University. This grantee is working with coalition partners to gather social determinants of health data across veteran-serving organizations using common data elements. Rozek noted that the long-term goal is to establish data sharing agreements—while protecting privacy—that would enable pooled, de-identified data to be analyzed at scale. However, he emphasized that funder support is essential for standardizing data collection and building a shared repository of usable, accessible information.

Clarke pointed out that research studies funded by the National Institute of Mental Health and other federal agencies already require data deposition into repositories, making those data available for cross-study analysis. She reiterated the importance of common data elements for meaningful comparison across studies and highlighted the challenges of cross-walking between different tools without a shared standard—particularly when large samples are needed. McKeon underscored that data linkage, in addition to common data elements, is critical. He referenced the Utah youth suicide study as a powerful example, where public records were used to trace contact across youth-serving systems. The study revealed that 60 percent of youth who died by suicide in Utah had prior involvement with the juvenile justice system—higher than contact with the mental health system. That kind of trajectory mapping, he said, offers insight into pathways for intervention. While Utah has since replicated and expanded the study, McKeon noted that few others have followed suit. He added that SAMHSA’s evaluation of behavioral crisis services, including the 988 platform, is one of the most ambitious of its kind, but navigating the balance between standardization and customization remains a persistent challenge.

Incorporating Cultural Values into Evaluation and Data Sharing with Tribal Communities

Another member of the virtual audience asked how evaluation tools used in suicide prevention incorporate culturally specific values, beliefs, and protective factors unique to tribal communities, and what culturally

Page 81 Cite

Suggested Citation: "4 Considerations for Program Evaluation." National Academies of Sciences, Engineering, and Medicine. 2025. Development, Implementation, and Evaluation of Community-Based Suicide Prevention Grants Programs: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29215.

appropriate methods of data sharing have proven most effective in fostering trust, reciprocity, and inter-tribal learning. Cwik shared that her team routinely adapts standardized measures based on tribal community input. Community members are invited to review survey tools for face validity, language clarity, and cultural fit. Edits may be made to existing questions or new questions added to reflect local priorities. In addition to modifying standardized tools, the team incorporates qualitative approaches to surface risk and protective factors not captured by traditional instruments. These might include measures of cultural connectedness, experiences of historical trauma, or other culturally salient constructs.

Goklish expanded on Cwik’s response by explaining how evaluation tools are reviewed through a community advisory board to ensure cultural appropriateness. Community members, including those not typically involved in research, offer feedback to ensure language is respectful, relevant, and unlikely to offend. Goklish emphasized the importance of staff comfort as well—if frontline staff are uneasy administering an evaluation, it compromises both the data quality and the trust between programs and participants. In some cases, tools must be translated into Apache, requiring close attention to nuance, meaning, and cultural context to avoid miscommunication.

Clarke followed up with a question directed to Jancaitis about tailoring assessment tools for the military and veteran populations. Jancaitis stressed the need for alignment in how military and veteran status is defined and captured. Different programs use different definitions—for example, VA-eligible veterans versus anyone who has served in the military with any discharge status—and this can create challenges for both service delivery and cross-program referrals. She urged that data systems include not just information on service members and veterans, but also their families. “They are a key path to that service member or veteran that may not have otherwise engaged,” she said. Capturing data on partners, loved ones, and children is essential to serving the full ecosystem of support that surrounds veterans.

Clarke closed the session by thanking the presenters and panelists for the wide-ranging discussion and emphasized the need to continue conversations around data infrastructure, cultural responsiveness, and community-informed evaluation approaches.

Page 82 Cite

Suggested Citation: "4 Considerations for Program Evaluation." National Academies of Sciences, Engineering, and Medicine. 2025. Development, Implementation, and Evaluation of Community-Based Suicide Prevention Grants Programs: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29215.

This page intentionally left blank.

Development, Implementation, and Evaluation of Community-Based Suicide Prevention Grants Programs: Proceedings of a Workshop (2025)

Chapter: 4 Considerations for Program Evaluation

4

Considerations for Program Evaluation

LESSONS LEARNED AND EXAMPLES FROM THE FIELD

From Program Evaluation to Comprehensive, Community-Based Suicide Prevention Evaluation: Lessons Learned from the Field

Multi-Site Community-Based Suicide Prevention Program Evaluation: An Example from the Field

PANEL DISCUSSION AND AUDIENCE Q&A

Evolution of Program Design and Administration Based on Evaluation Findings

Best Practices for Balancing Evaluation Across Multiple Levels

Using Intermediate Indicators to Track Progress Toward Suicide Prevention Goals

Grantee Involvement in Evaluation and Building and Sustaining Capacity Over Time

Audience Q&A

Exploring Federated Data Systems and Common Data Elements

Incorporating Cultural Values into Evaluation and Data Sharing with Tribal Communities

Publication information

Suggested citation

My Academies

Development, Implementation, and Evaluation of Community-Based Suicide Prevention Grants Programs: Proceedings of a Workshop (2025)

Chapter: 4 Considerations for Program Evaluation

Exploring Federated Data Systems and Common Data Elements

Incorporating Cultural Values into Evaluation and Data Sharing with Tribal Communities