After years of critique that deliberative theory was divorced from practice, scholars and practitioners have begun to apply theories of public deliberation to the development and evaluation of real public engagement processes (Gastil 2018; Nabatchi et al. 2012; Neblo 2015). This has necessitated creating a definition of deliberation that can be operationalized when collecting data about events (Black, Burkhalter, et al. 2011). Though contextual needs and constraints shape what deliberation looks like in practice, most scholars have coalesced around a shared description of deliberative processes. Namely, for a public engagement process to count as deliberative it should prompt participants to critically analyze relevant information, arguments, and values (Burkhalter et al. 2002; Steenbergen et al. 2003) and engage in an egalitarian discussion that demonstrates mutual respect and consideration of different perspectives (Benhabib 1996; Gutmann & Thompson 1996; Mansbridge 1983).

Deliberation, however, is about more than a normative model of open-ended discussion. Those organizing deliberative events or institutions aim to achieve specified outcomes, such as more nuanced opinions, better decisions, and increased public engagement (Chambers 2003; Goodin & Dryzek 2006; Kuyper 2018; Niemeyer & Dryzek 2007). For an event to be deliberative, therefore, participants must move beyond the sharing of evidence and opinions and towards the formation of an informed judgment that takes into account multiple options and perspectives. Sometimes this means the post-forum recording of a considered private judgment (Fishkin 2018), but other times it means rendering verdicts, making decisions, or at least arriving at concrete recommendations or shared judgments (Crosby & Nethercutt 2005; Grönlund et al. 2014; Hendriks 2005).

With this working conception of deliberation broadly shared (e.g., Black, Welser, et al. 2011; Karpowitz & Raphael 2014; Nabatchi 2012; Neblo 2015), scholars have begun to assess the strength of the theorized link from deliberative inputs to outputs (Farrar et al. 2010; Gastil et al. 2017). Too often, however, such research lacks repeated iterations of a deliberative design, a sufficiently large sample of participants, or subjective measures of participants’ individual experiences. Without comparative data, scholars have difficulty testing the impact of contextual variables, such as duration and level of ideological diversity among participants. Rarer still are datasets that permit inspection of consequential deliberation, in which there exist real political or policy stakes for the process (C. Johnson & Gastil, 2015).

Our study provides an opportunity to meet each of these criteria by examining data aggregated from fifteen instances of a deliberative institution, the Citizens’ Initiative Review (CIR). Since its inception in 2010, a team of researchers have been asking CIR participants to assess their deliberative experiences and their subjective process satisfaction, information gains, and opinion change. Using multiple iterations of the same deliberative process, we will examine how different CIR contexts shape participants’ experience of deliberation, as well as the relationship between participants’ perceptions of deliberative quality and process outcomes.

Research Setting

We answer these questions in the context of the CIR. Each CIR gathers between 20 and 24 participants to deliberate for between four and five days on a state or local ballot measure. In the United States, these ballot measures allow the electorate to vote for or against proposed legislation. Initiatives are ballot measures that have been proposed by citizens or civic groups, though special interests often utilize them to promote the prerogatives of businesses rather than citizens (Gerber 1999). Referenda are placed on the ballot by members of the legislature. Many states and thousands of municipalities in the US place such measures on their ballots to allow citizens to vote for everything from tax increases to constitutional amendments (Matsusaka 2004). Unfortunately, direct democratic elections commonly feature issues about which voters have limited knowledge and substantial misunderstanding (Broder 2001; Burnett 2019; Ellis 2002; Gastil et al. 2001).

Though the details of the CIR process have evolved since its inception in 2010, the basic features have remained the same. During CIR deliberations, citizen panelists are provided with information about the measure, including the text of the measure, estimates of its financial impact, and arguments for and against the measure. They also hear from advocates both in favor of and in opposition to the measure as well as experts who can speak to the measure’s details or potential effects. Panelists engage in facilitated small and large group discussions to distill those presentations in search of the most important information and arguments pertinent to the measure. At the end of the review, the panelists write a statement that includes ‘Key Findings’ (the most important facts about the measure) and ‘Arguments in Favor’ and ‘Arguments in Opposition.’ This statement is disseminated through an official state voter guides, media, and/or publicity campaigns so that voters can discover and use this information when filling out their ballots.

The CIR was first developed in Oregon as a one-time demonstration in 2010 and was then institutionalized in 2011 by the state legislature as a permanent part of Oregon’s elections. Unofficial statewide CIRs have also been held in Arizona, California, Colorado, and Massachusetts, and local CIR pilot tests have been held in Phoenix (Arizona), Portland (Oregon), and Jackson County (Oregon). Participants have studied measures related to taxes, primary elections, marijuana legalization, genetically modified foods, mandatory minimum sentencing, hospital staffing, rent control, and other policy topics.

Evaluating Face-to-Face Deliberative Events

Deliberative public events like the CIR have proliferated in this century (Abdullah & Rahman 2015; Fishkin 2009; Gastil & Levine 2005; Grönlund et al. 2014; Nabatchi et al. 2012; Neblo 2015; Setälä & Smith 2018). What was once theorized as an ideal way of reaching decisions (Barber 1984; Bohman 1996; Chambers 2003; Habermas 1998) has been realized through the development of varied process designs in settings ranging from the local contexts—such as individual workplaces, neighborhoods, and schools—to global ones, such as the World Wide Views forums that link participants across continents to discuss transnational policy problems (Herriman et al. 2011).

As more public officials and democratic reformers came to champion this type of engagement, deliberative processes have gained legitimacy and power (Fagotto & Fung 2009; Fung & Wright 2003; Hartz-Karp & Briand 2009). Deliberation has been used by citizens and governments across the world to draft policy, propose laws, set budgets, and initiate constitutional reform (Farrell & Suiter 2019; Gilman 2016; Warren & Pearse 2008). In the context of direct democracy, one especially useful deliberative intervention is to convene a ‘minipublic’—a body of randomly selected citizens gathered to study and assess a public issue (Grönlund et al. 2014; Setälä & Smith 2018). Deliberative Polling and Citizens’ Assemblies have been used to develop ballot measures (Fishkin et al. 2015; Warren & Pearse 2008). The process studied herein takes a different path, asking panelists to study an already developed ballot initiative and write an assessment for lay voters who may otherwise have trouble finding well-reasoned arguments and reliable information (Burnett 2019). In the present age of disinformation, the CIR may be equally critical as a means of countering deliberate deceptions disseminated via social media during elections (Schia & Gjesvik 2020).

Such institutionalization could improve opportunities for informed and effective citizen engagement (Warren & Gastil 2015), but scholars must maintain a critical eye when evaluating whether the deliberative experience leads to its assumed outcomes (Spada & Ryan 2017). Similarly, because deliberation is resource intensive, those promulgating its expansion should identify under what contexts it can be most effective.

The Deliberative Experience

To advance our understanding of the deliberative process, we begin by identifying its two key components. At a minimum, deliberation entails rigorous analysis of information and policy alternatives, along with an inclusive and respectful discussion process (Burkhalter et al. 2002).

Analytic rigor and democratic relations

The analytic portion of deliberation requires participants to examine pertinent evidence (Gouran & Hirokawa 1996). Such evidence may be established facts but can also include narratives and personal experiences related to the policy or decision in question (Black 2008; Polletta & Lee 2006). Another requirement asks that participants consider tradeoffs or weigh the pros and cons of implementing any decision (Barber 1984; Fishkin 1991; Gouran & Hirokawa 1996; Mathews 1994). This means that they should consider the benefits of potential solutions and the consequences of implementation and seek to uncover any unintended consequences that might arise as a result of a specific decision. Finally, participants should consider the relevant values underlying arguments. This requires considering what goals might be reached through particular decisions and acknowledging that multiple, competing values are often at play (Anderson 1993; Benhabib 1996).

Equally important to the deliberative process is inclusive and respectful discussion (Gutmann & Thompson 1996; Mansbridge 1983). Inclusivity has two primary components: external and internal. The external component requires events to seek a diversity of participants so that stakeholders or traditionally marginalized individuals are not excluded from the discussion and decision making (Young 2002). Internal inclusivity occurs once participants have been assembled. This requires not simply equal speaking opportunities among participants but also fair and full consideration regardless of demographic characteristics or different ways of speaking (Benhabib 1996; Gutmann & Thompson 1996; Mansbridge 1983). Finally, a deliberative discussion seeks to provide information in a way that is accessible to all participants so that each discussant has an opportunity to both consider the information and weigh in on the discussion (Gastil 1993; Gutmann & Thompson 2004).

Objective and subjective experience

These criteria can be considered from both an objective and subjective perspective. Objectively, scholars can attempt to measure the degree to which participants have investigated an issue or shown respect to one another by counting the times participants ask questions or interpreting transcripts of an event. Subjectively, we can look to participants’ experiences as measures of whether these goals have been achieved.

For questions of democratic quality, subjective experiences may be a more useful measure. Participants may be better at measuring the presence of respect and mutual consideration, for example, than experts because they are the ones who experience and engage in those activities (Black 2012; Gastil et al. 2012). Though measures like turn-taking can assess the level of equality, they may not be the best judge of equity. Some participants may need to speak more to lend voice to minority experiences. Conversely, some participants may feel less comfortable talking in group settings. Though experts can lend a more objective evaluation framework that provides consistency across participants, if deliberation is an inherently subjective process, then participant experiences provide a critical measure of whether deliberation occurred (Neblo 2007). Indeed, in one study attempting to connect deliberative quality and outcomes, participant ratings of deliberative quality were linked to convergence on policy attitudes whereas expert coding of deliberative quality was not (Gastil et al. 2008).

Still, perceptions of inclusion likely reflect one’s individual perspective. For instance, a study found that people of color were more positive in their ratings of deliberative quality than were white participants, because participants of color more readily juxtaposed the opportunity to have their voices heard with contexts outside of the deliberative setting in which those opportunities are scarcer (Abdel-Monem et al. 2010). This discrepancy indicates that experience matters, and subjective experiences may rely on different standards than objective measures of process quality.

Findings in small group literature also encourage caution in conflating experience with theoretically derived measures of quality, particularly in relation to the analytic components of the process. This body of literature suggests that participants may not be able to judge the analytic quality of a conversation because they may not have the information available to make such a judgment. Unfortunately, participants in small group discussions sometimes withhold pertinent information and rely too heavily on information already known to the whole group. As a result, groups can make flawed judgments when their shared information supports a bad choice (Lu et al. 2012; Stasser & Titus 2003).

Group decision-making scholarship, however, typically takes place in laboratory settings with tight interactional constraints, severe time limits, and minimally motivated participants. By contrast, deliberative events are designed specifically to encourage information sharing among participants, such as ensuring a diversity of participants, training them in deliberative practices, and providing them with evidence to reference during discussion. Even researchers working in the ‘hidden profile’ research paradigm, which showed information processing biases in small groups, have recognized that real-world groups may lie outside the scope of such theories (Sohrab et al. 2015), precisely because deliberative discussion can help participants overcome self-defeating tendencies (Myers 2018).

For the same reason, participants in deliberative events may be better equipped to judge the quality of their discussion than those who are in non-deliberative contexts. Without more research on comparisons of objective and subjective assessment, however, subjective measures of analytic rigor might be understood as a useful but incomplete measure of discussion quality (Gastil 2013).

Three Outputs: Knowledge Gains, Opinion Change, and Satisfaction

Advocates of deliberative democracy have highlighted a number of potential benefits of deliberation. Deliberation has the potential to shift policy opinions, encourage consensus decision making, and increase democratic legitimacy (Fishkin 2018; Gastil 2018; Goodin & Dryzek 2006; Grönlund et al. 2014; Neblo et al. 2018). Additionally, well-structured deliberation can change the cognitions and actions of participants. Engagement in a deliberative process has been shown to increase participants’ sense of political efficacy, policy knowledge, and civic engagement (Fishkin 2009; Gastil 2004; Jacobs et al. 2009; Knobloch & Gastil 2012; Morrell 2005; Nabatchi 2010).

Though all such changes are laudable, the success of a deliberative event often depends on achieving two interrelated goals—knowledge gains and opinion change (Farrar et al. 2010; Gastil et al. 2017; Niemeyer & Dryzek 2007). One of the first metrics used to empirically test deliberative events was whether participants learned relevant policy information through their experience (Fishkin & Luskin 1999; Gastil & Dillard 1999). Such research has continued and now shows ample evidence that deliberation can lead to policy-specific knowledge gains (Barabas 2004; Fishkin 2018; Gastil 2006; Richards 2018). This reflects the core premise of deliberative democracy that, all other things being equal, a pluralistic and deliberative process should yield better decisions (Landemore 2013).

Even so, the ability of deliberation to generate higher quality judgments is contested (Pincock 2012). Some argue that deliberation can lead to opinions that are more consistent with available knowledge or underlying values (Barabas 2004; Fishkin 2018; Gastil et al. 2018). Others are wary that social pressure, rather than knowledge gains or perspective taking, may determine shifts in opinion (Karpowitz & Mendelberg 2014; Sanders 1997). Those who take the latter stance, however, are often studying events that may not actually be thoroughly deliberative (Pincock 2012). Connecting broad variations in the quality of deliberation to the likelihood of opinion change among participants may shed light on this debate, even if it cannot discern the more fine-grained mechanisms whereby deliberation shapes opinion.

Finally, in addition to knowledge gains and opinion change, participants’ subjective experience also provides a measure of process quality, with participants’ self-reported satisfaction providing one important indicator of success (Abdullah & Rahman 2015; Foels et al. 2000; Gastil et al. 2012; Hickerson & Gastil 2008). Participants who are satisfied with their experience of deliberation are more likely to see the host of attitudinal and behavioral changes mentioned above (Gastil, Deess, et al. 2010). Thus, participants’ satisfaction can be an important metric for assessing the success of a deliberative event.


Having described key features of deliberation and three intended outputs, we examine the relationships between these process and outcome variables. Simply put, when participants subjectively experience deliberation as a rigorous and respectful process, do they become more likely to report process satisfaction, knowledge gains, and shifts in their opinions?

Main Hypothesis

Consistent with the preceding literature review, our principal hypothesis predicts that CIR panelists’ assessments of both the analytic rigor and democratic quality of their deliberative event will be positively associated with our three focal outcomes. These include participants’ process satisfaction, their sense of having learned enough to reach a good decision on the ballot measure, and their degree of reported individual opinion change.

H1: CIR panelists’ assessments of the review’s analytic rigor and democratic quality will be positively associated with participants’ (a) satisfaction with the process, (b) their sense of having learned enough to reach a good decision, and (c) their level of opinion change.

Contextual Predictors

This first hypothesis generalized across all instances of the CIR, but with over a dozen different iterations of this process in hand, our data permit us to advance hypotheses about how different CIR deliberations and outcomes link back to variations in the Review’s design and setting. Though deliberative theorists often talk about the importance of institutional context (G. F. Johnson 2009), the nature of a discussion issue and its framing (Gastil, Bacci, et al. 2010; Lee 2014), or the particular deliberative design being employed (Carman et al. 2015; Himmelroos 2017), the effects of these differences have not been empirically tested in a comparative manner. The present research context makes it possible to look at a handful of such variations across deliberative forums, including differences in the political context, event duration, and political division.

Duration of deliberation

In the case of the CIR, one straightforward variation concerns the Review’s duration. Although the question of whether adequate time is given for deliberation should be a fundamental piece of any process evaluation (J. Abelson et al. 2003; Coote & Lenaghan 1997), studies rarely address this directly. Advocates of deliberation often claim that extensive time is needed to engage in substantive deliberation. Greater time can allow participants to delve deeply into an issue and provide them the space to gather information, hear from witnesses, and collectively scrutinize evidence (Coote & Lenaghan 1997). Time also affords participants the opportunity to develop mutual respect and understanding (Renn et al. 1993).

The length of time required for participation in such events, however, can place a considerable burden on everyday citizens. This could prevent otherwise willing community members from engaging in deliberative processes and ultimately result in less inclusivity (J. Abelson et al. 2003; Barnes 1999; Coote & Lenaghan 1997; Dienel & Renn 1995; French & Laver 2009). Though one study of trial jurors found that quality of the deliberations, rather than length of time, was the deciding factor in whether jury experience led to increased voting (Gastil, Deess, et al. 2010), some citizens’ jury participants have lamented that four days was not enough time to adequately grapple with the issue in question (Barnes 1999). Also, shorter deliberative processes do not appear to provide the same civic motivation that more rigorous processes engender (Myers et al. 2018). If duration does matter, such tradeoffs may be worth the costs, but if it has no influence on participant experience and process outcomes, shorter processes may be able to generate more inclusivity at a lower cost.

The first four iterations of the CIR (2010–2012) all lasted five days, but from 2014–2018, the Reviews have all lasted three and a half days (hereafter labeled as ‘four days’). Concerns about controlling the cost of the CIR’s implementation prompted this foreshortening, but it went against the traditional model of Citizens’ Juries that the CIR aimed to reproduce (Crosby & Nethercutt 2005). We hypothesized that the participants’ subjective ratings of deliberative quality would decline as a result of this abbreviation of the CIR process. Moreover, because time gives participants a greater opportunity to weigh arguments and evidence and engage in democratic discussion, we predict that participants will be more likely to change their opinion if they took part in the longer process.

H2a: Relative to shorter ones, longer CIRs will lead to higher participant assessments of the process’ analytic rigor and democratic quality and will be more likely to result in knowledge gains and opinion change.

Degree of authority

Part of what sets the Oregon CIR apart from so many other minipublics is its authority—its ability to put its findings in the official state pamphlet mailed to every Oregon voter (Warren & Gastil 2015). From 2014 through 2018, however, unofficial CIRs have been held in a county (Jackson County, Oregon), two municipalities (Phoenix, Arizona and Portland, Oregon), and four states (Arizona, California, Colorado, and Massachusetts). Lacking formal authorization from the government, these CIRs have no reliable method for sharing their findings with voters. By contrast, the exercise of real political power in the Oregon CIR raises the stakes for deliberation (Levine et al. 2005), which means that deliberation is more consequential. When participants are aware of this state authority, they may be more likely to take the task of deliberation seriously, enacting the ground rules meant to assure the process’ analytic rigor and democratic quality. Similarly, they may be more likely to keep an open mind, and thereby learn more information and reconsider their initial judgments.

H2b: Empowered CIRs, compared to unofficial ones, will lead to higher participant assessments of the process’ analytic rigor and democratic quality and will be more likely to result in knowledge gains and opinion change.

Ideological divergence

The degree of ideological divergence within a CIR panel may also influence the likelihood of achieving desired outcomes. Deliberative processes are designed to bring competing perspectives into conversation with one another. Though deliberation may at times be used to allow homogenous groups to engage in preference identification, more often designers seek to engage individuals from across the political spectrum so that participants may learn about competing perspectives and evaluate a range of potential solutions (Gutmann & Thompson 1996; Mansbridge 1983; Young 2002). In this sense, the presence of ideological divergence among panelists is a necessary requirement for knowledge gains and opinion change.

The presence of high degrees of difference, however, may have adverse effects on the quality of deliberation if participants polarize in opposition to one another (Mendelberg & Karpowitz 2007; Mendelberg & Oleske 2000; Sanders 1997). The CIR design offers an opportunity to explore the effects of political division on process outcomes. One criterion used to select participants is partisan affiliation, resulting in a participant pool reflective of local political divisions. Even so, because CIRs have been conducted in different locations and at different points in time, and because simple partisan affiliation does not provide an indication of ideological strength, the level of ideological division among participants varies across reviews.

H2c: Compared to relatively homogenous CIRs, those that have more ideological divergence among their citizen panelists will produce lower participant assessments of the process’ analytic rigor and democratic quality, but they will be more likely to result in knowledge gains and opinion change.

Context as Moderating the Influence of Participant Experiences on Outcomes

Our hypotheses for the contextual variables concern not only their association with deliberative process perceptions but with the association between those perceptions and subjective outcomes. These amount to contextual qualifications of H1, which predicts that positive experiences of analytic rigor and democratic quality will lead to the intended deliberative outcomes.

Because time is theorized as essential for the development of both the analytic and democratic aspects of deliberation (Coote & Lenaghan 1997; Dienel & Renn 1995), we predict that a shorter CIR duration will place greater stress on deliberative processes. In the foreshortened instances of the CIR, which last four days instead of five, favorable outcomes will depend to a greater extent on the levels of analytic rigor and democratic discussion achieved.

H3a: The duration of a CIR will moderate the relationship between participant assessments of process quality and deliberative outcomes, such that shorter processes will show stronger relationships between participants’ assessment of the process quality and their likelihood of experiencing satisfaction, knowledge gains, and opinion change.

Similar effects may occur in relation to empowerment. If participants take their duty to voters seriously, state-authorized instances of this process should create stronger ties between participants’ ratings of the deliberative process and its outcomes. In the context of amplified authority, CIR panelists may think more critically about whether the process met the deliberative criteria when rating their satisfaction with its performance. Likewise, we predict that panelists’ readiness to learn new information or alter their opinions will hinge on deliberative process quality more in the higher-stakes official CIRs versus the pilot tests thereof.

H3b: The presence or absence of legislative authorization for a CIR will moderate the relationship between participant assessments of process quality and deliberative outcomes, such that empowered processes will show stronger relationships between participant assessments of deliberative quality and satisfaction, knowledge gains, and opinion change.

Finally, high levels of disagreement, especially if those become personal, make maintaining respectful democratic relations among members all the more important (Hwang et al. 2018; Zhang 2012). Substantive disagreement also could heighten the impact of analytic rigor, which would become more crucial as a means of analyzing conflicting information and perspectives and generating attitude change (Burgess et al. 2008; Caluwaerts & Deschouwer 2014; also see Esterling et al. 2015). Thus, we anticipate that CIR panelists participating in events populated with more ideologically diverse participants will show stronger positive relationships between deliberation perceptions and outcomes.

H3c: The level of ideological divergence among CIR panelists will moderate the relationship between participant assessments of process quality and deliberative outcomes, such that more diverse panels will yield stronger relationships between participant ratings of deliberative quality and satisfaction, knowledge gains, and opinion change.


To test the relationship between deliberative experiences and outputs, researchers surveyed participants in fifteen CIRs between 2010 and 2018. At the end of each day’s discussions, participants at each review took a brief survey that asked them to rate the democratic quality of the discussion. The participants took an additional, and slightly longer, survey at the end of the review that asked them to reflect on the entirety of the CIR process to assess its analytic rigor and their overarching satisfaction with the process. With very few exceptions, every participant completed every survey on every day, for an N of 318 (with a survey response rate above 98 percent).

Deliberative Experience Measures

In determining how well the CIR embodied a deliberative process, we relied on an operationalization adapted from Gastil (2008) and used previously in other studies (Black, Welser, et al. 2011; Gastil 2013). This approach defines analytic rigor as establishing a solid information base, analyzing all available options, identifying pertinent values, and weighing the pros and cons of an issue. Democratic discussion is defined as providing equal opportunity to participate and creating an atmosphere of mutual comprehension, consideration, and respect.

Because participants were asked to deliberate about measures that often necessitated the discussion of complex scientific or economic evidence, the research team chose to assess analytic rigor at the end of the review, rather than on a daily basis. The assumption here was that although participants might at times feel they did not have all of the necessary information at the end of each day, by the end of the review they would be better able to assess whether they had been provided the information needed to thoroughly understand the evidence, arguments, and relevant values. Participants were, however, asked to rate the democratic quality of the discussion at the end of each day. Here, the research team believed it was important to assess whether participants felt they were being respected and included in the conversation throughout the entirety of the process and that they could follow the discussion even if they didn’t have all of the information needed to reach their final decision until the end.

To assess its analytic rigor, participants were asked at the end of the review how well the process performed in ‘weighing the most important arguments and evidence’ in favor of/opposing the measure and in ‘consideration of the values and deeper concerns motivating’ those in favor of/opposing the measure on a scale from ‘very poor’ (1) to ‘excellent’ (5). These four items were then combined into a single scale assessing the CIR’s Analytic Rigor (α = 0.89, M = 4.21, SD = 0.72).

Participants rated the democratic quality of the discussion at the end of each day of the CIR by responding to how often they engaged in the following activities on a scale from ‘never’ (1) to ‘almost always’ (5): carefully considered ‘views different from your own’ when expressed by ‘experts or other CIR participants,’ felt that ‘other participants treated you with respect today,’ or had ‘trouble understanding or following the discussion’ (reverse coded). Participants were also asked whether they had ‘sufficient opportunity to express [their] views today’ on a scale from ‘definitely no’ (1) to ‘definitely yes’ (5). Participants’ individual scores on each question were averaged across days, and then those four average scores were combined into a scale assessing the CIR’s Democratic Quality (α = 0.66, M = 4.41, SD = 0.36).

Deliberative Output Measures

The three focal output variables in this study were satisfaction with the deliberative process, the perception of gaining sufficient knowledge to make a sound judgment, and the perception of having changed one’s opinion on the policy issue under discussion.

Process satisfaction

On their end-of-review survey, participants rated their ‘overall satisfaction with the CIR process’ on a scale from ‘very dissatisfied’ (1) to ‘very satisfied’ (5). This was used as the Satisfaction measure, M = 4.49, SD = 0.80.

Subjective knowledge gain

To measure whether participants believed they had gained an adequate amount of knowledge during the review, participants were asked whether they believed ‘that [they] learned enough this week to make an informed decision?’ Responding on a scale from ‘definitely no’ (1) to ‘definitely yes’ (5), the average responses on this Learned Enough measure were very high (M = 4.65, SD = 0.70).

Subjective opinion change

Opinion change was assessed by asking participants about their opinions prior to the review and at the end of the review on a scale from ‘strongly support’ (1) to ‘strongly oppose’ (5). Because the CIR event organizers did not want to encourage participants to reach an opinion before deliberation occurred, for all but the 2014 reviews, researchers were required to measure both pre- and post-review opinions during the end-of-review survey. The first question asked, ‘Before you participated in the CIR, what was your position on this measure?’ The second then asked, ‘At the end of the CIR process, what is your position now on this measure?’ The absolute value of the difference between their pre- and post-CIR positions on that five-point scale was calculated to determine their degree of Opinion Change (M = 1.21, SD = 0.88).1

Contextual Variations

For the fifteen CIR panels, Table 1 describes the key features of each panel, including its official authorization (or pilot test status), its duration, and the level of dispersion of participants’ left-right ideological identities. These variations result in relatively balanced splits that divide the CIR panelists into four-day CIRs (n = 222) versus five-day CIRs (n = 96) and unofficial CIR pilot tests (n = 162) versus official Oregon CIRs (n = 156). As for Ideological Divergence, this continuous variable (M = 1.55, SD = 0.18) was created by measuring the SD of ideology within each CIR panel, using participants’ self-identification on a scale from ‘extremely liberal’ (1) to ‘extremely conservative’ (7). The CIRs conducted in 2012 did not include the ideology variable, so their Ideological Divergence scores were estimated based on panelists’ party membership.2

Table 1

Full List of Fifteen Citizens’ Initiative Review Panels and Their Contextual Features, 2010–2018.

Year Ballot measure CIR type Duration Ideological Divergence
2010 Mandatory Sentencing Oregon CIR 5 days 1.70
2010 Marijuana Dispensaries Oregon CIR 5 days 1.75
2012 Corporate Tax Reform Oregon CIR 5 days 1.48*
2012 Private Casino Authorization Oregon CIR 5 days 1.41*
2014 GMO Seed Ban Jackson County Pilot 4 days 1.78
2014 Top-Two Primary Oregon CIR 4 days 1.41
2014 GMO Labeling Requirement Oregon CIR 4 days 1.66
2014 GMO Labeling Requirement Colorado Pilot 4 days 1.78
2014 Pension Reform Phoenix Municipal Pilot 4 days 1.17
2016 Marijuana Legalization Arizona General Pilot 4 days 1.74
2016 Gross Receipts Tax Increase Oregon CIR 4 days 1.60
2016 Marijuana Legalization Mass. General Pilot 4 days 1.36
2018 Purchase Bonds for Affordable Housing Portland metro area 4 days 1.46
2018 Hospital Regulation of Nursing Work Shifts Mass. General Pilot 4 days 1.53
2018 State Law Enabling Local Rent Controls California General Pilot 4 days 1.38
  • * Ideological SD estimated based on distribution of partisanship.

Control Variables

We also included a range of political and demographic variables as statistical controls. These included: political party identity (38% Democrat, 25% Republican, 31% nonpartisan, and 7% other party); annual income (median = $40,000–60,000); age (M = 50.59, SD = 17.40); education (median/mode [43%] = ‘Some college/technical school’); self-reported gender (53% female, 47% male); and ethnicity (71% white, 10% Hispanic/Latino, 2% Native American, 5% African-American, 3% Asian American, 8% more than one ethnicity, and 3% other). All of these proportions approximate the actual demographics of the adult populations collected via US Census and state election data for the states or locations in which the CIRs were conducted.

Because of inconsistent use of some of these demographics across the CIRs, along with reluctance to answer particular questions, three of these variables had excessive levels of missing data (age, income, education, ethnicity). Their inclusion would have reduced the effective sample size from 314 to 222, and multiple imputation would have limited utility given the distinctness of these variables. Their inclusion did not change the findings, so we dropped them from further analysis.3

Statistical Analysis

The nature of our sample made multi-level modeling impractical owing to the small sample size at the group level of analysis (i.e., fifteen CIR processes). In terms of statistical power (Cohen 1988), the repetition of a small-scale deliberative process over the years yielded a sufficiently large sample of individual participants but an insufficient number of separate deliberative events. To acknowledge the non-independence of individual panelists nested within each of those fifteen CIRs, however, we utilized cluster-robust standard errors (Cameron & Miller 2015; Esarey & Menger 2019; Liang & Zeger 1986). After making this adjustment, we used general linear regression models to test each hypothesis,4 including the use of interaction terms to test the contextual moderating effects in the third hypothesis.5


H1: General Hypothesis

H1 predicted that the two measures of deliberative experience—analytic rigor and democratic discussion—would each predict three outcomes commonly associated with deliberative events—participant satisfaction, knowledge gains, and opinion change. Table 2 shows the results of the three corresponding regression equations, which include both process variables and control variables as predictors for each outcome measure.6

Table 2

Predictors of Process Satisfaction, Learning Enough, and Opinion Change (Absolute Value).

Predictor Satisfaction B(SE) Learned Enough B(SE) Opinion Change B(SE)
Analytic Rigor   0.48 (0.08)**   0.31 (0.06)**   –0.04 (0.11)
Democratic Discussion   0.53 (0.10)**   0.45 (0.17)**   0.21 (0.13)
Republican (1 = GOP) –0.16 (0.18)   0.04 (0.11)   0.03 (0.11)
Democrat (1 = Dem)   0.10 (0.08)   0.06 (0.09)   0.06 (0.10)
Gender (1 = Female)   0.10 (0.06) –0.04 (0.06)   0.01 (0.08)
Adj. R2   0.38**   0.23**   0.01
N panelists   313   315   310
N Citizen Initiative Reviews   15   15   15
  • Note: Figures are unstandardized regression parameter estimates (with cluster-robust standard errors in parentheses), one-tailed (directional) * p < 0.05, ** p < 0.01.

For Satisfaction, analysis showed significant independent associations for both Analytic Rigor (B = 0.50) and Democratic Discussion (B = 0.53). Analytic Rigor (B = 0.32) and Democratic Discussion (B = 0.44) also predicted panelists’ sense that they had learned enough to make an informed decision. Neither of these process variables, however, predicted Opinion Change. Democratic Discussion had a non-significant association in the predicted direction but also considerable variance in this statistical relationship (B = 0.20, p = 0.07).7

H2: Effect of Context on Experience and Outcomes

To test whether context influenced both perceptions of deliberation and process outcomes in the predicted directions, we conducted separate regression equations for each of these predictors, paired with the same control variables as in the preceding analyses. Table 3 shows the key results from each of these five equations, with context serving as independent variables and the process quality and outcomes acting as dependent variables.8

Table 3

Contextual Variables as Predictors of Analytic Rigor, Democratic Discussion, Process Satisfaction, Learning Enough, and Opinion Change (Absolute Value).

Predictor Analytic Rigor B (SE) Democratic Discussion B (SE) Satisfaction B (SE) Learned Enough B (SE) Opinion Change B (SE)
Duration (5 days = 1)   0.10 (0.10)   0.11 (0.07)   0.18 (0.13)   0.26 (0.07)**   0.31 (0.11)**
Empowered (Oregon CIR = 1) –0.14 (0.13) –0.02 (0.07) –0.06 (0.12)   0.15 (0.10)   0.28 (0.14)*
Ideological Divergence –0.29 (0.30) –0.05 (0.22)   0.01 (0.32)   0.27 (0.20)   0.31 (0.42)
  • Note: Figures are unstandardized regression parameter estimates (with cluster-robust standard errors in parentheses), one-tailed (directional) * p < 0.05, ** p < 0.01. Control variables were partisanship and gender, which had coefficients comparable to those shown in Table 2.

Looking first at the duration of the CIR (H2a), the length of the review did not correspond to Analytic Rigor, though it had a nearly significant association with Democratic Discussion ratings (B = 0.11, p = 0.08), which were higher for the five-day CIRs (M = 4.49) than for four-day reviews (M = 4.38), t = 2.58, p = 0.005. Duration was associated with Learned Enough ratings (B = 0.26), with the longer processes yielding higher average scores (M = 4.83) than did the shorter ones (M = 4.57). There was also more change in opinion reported during the five-day CIRs (M = 1.42) compared to the four-day processes (M = 1.12), B = 0.31.

Turning to official authorization of the CIR, the results showed associations with two outcome measures. Participants in state-authorized Oregon CIRs reported a greater sense of having learned enough about the issue (M = 4.73) than did those in pilot processes (M = 4.58), though this did not reach significance, B = 0.15, p = 0.07. Empowered processes were also more likely to lead to opinion change (M = 1.35) than were pilot processes (M = 1.09), B = 0.31.

On average, the CIR panels’ Ideological Divergence was associated with differences in panelists’ process ratings and various outcomes, but the standard errors of these coefficients were unusually large. Parameter estimates for Ideological Divergence were high and in the predicted directions for Analytic Rigor (B = –0.29), Learned Enough (B = 0.27), and Opinion Change (B = 0.31), but the error terms of these regression coefficients were substantial as well. As a result, the only near-significant effect for this contextual variable was Learned Enough, p = 0.08.9

H3: Contextual Moderators of Main Associations

Our third set of hypotheses predicted that each of these contextual variables would moderate the associations between the process and outcome measures. To test for moderation, we began with the same regression models used to test the first hypothesis but added in a contextual variable and its interactions with the two process measures, Analytic Rigor and Democratic Discussion.10 Significant interaction terms indicated that the relationship between a process and outcome measure was moderated by a contextual variable. Moreover, we hypothesized negative interaction terms, meaning that the process-outcome relationship was stronger for shorter CIRs, pilot tests versus official CIRs, and panels with less ideological diversity.

After running nine separate regressions (one for each pairing of contextual and outcome variables), Table 4 shows the four equations that resulted in statistically significant interactions. All four showed interactions for Analytic Rigor, but not Democratic Discussion.

Table 4

Significant Interactions between Contextual Variables and Democratic Process Measures Predicting Outcome Variables.

Contextual variable Duration (4 days = 0, 5 days = 1) Ideological Divergence
Predictors Satisfaction B (SE) Learned Enough B (SE) Opinion Change B (SE) Learned Enough B (SE)
Intercept   1.49 (0.53)*   2.20 (0.61)**   0.18 (1.11) –1.56 (7.32)
Contextual variable   1.59 (0.69)*   0.75 (1.18)   0.56 (1.37)   1.88 (4.41)
Analytic Rigor   1.57 (0.60)*   1.32 (0.56)* –2.17 (0.77)*   1.63 (0.47)**
Dem. Discussion   0.90 (0.84) –0.12 (1.27)   1.41 (1.14) –0.32 (1.70)
Republican (1 = GOP) –0.15 (0.18)   0.04 (0.11)   0.03 (0.10)   0.03 (0.12)
Democrat (1 = Dem)   0.12 (0.07)   0.07 (0.09)   0.06 (0.10)   0.07 (0.10)
Gender (1 = Female)   0.11 (0.06)   0.03 (0.07) < 0.01 (0.08) –0.03 (0.07)
Context × Analytic Rigor –0.25 (0.13)* –0.24 (0.13)*   0.49 (0.17)** –0.82 (0.30)**
Context × Dem. Discuss. –0.10 (0.19)   0.10 (0.26) –0.28 (0.25)   0.48 (1.01)
Adj. R2   0.39**   0.26**   0.06   0.26**
N panelists   313   315   310   315
N CIRs   15   15   15   15
  • Note: Figures are unstandardized regression parameter estimates (with cluster-robust standard errors in parentheses), one-tailed (directional) * p < 0.05, ** p < 0.01.

Consistent with Hypothesis 3a, the four-day CIR processes had stronger process-outcome associations for Satisfaction (B = –0.25) and Learned Enough (B = –0.28). To illustrate these interactions, linear regressions showed higher coefficients for Analytic Rigor predicting Satisfaction in four-day CIRs (B = 0.54) versus five-day CIRs (B = 0.29), with a similar difference for Learned Enough (B = 0.38 vs. B = 0.14). The equation for Opinion Change, however, had an unanticipated result, with the process-outcome link being stronger for the five-day CIR (B = 0.49).11 Expressed as linear regression coefficients, this was the difference between a negative association in four-day CIRs (B = –0.19) versus five-day (B = 0.32).

The only other statistically significant association was consistent with Hypothesis 3c: Learned Enough had a significant interaction with Ideological Divergence (B = –0.82). A median split on this contextual variable showed that the linear regression coefficient for Analytic Rigor was relatively high (B = 0.45) for low-diversity CIRs versus those with more ideological divergence (B = 0.23).


This paper asked three primary questions. First, do participants’ experiences of analytic rigor and democratic discussion predict process satisfaction, a sense of learning, and opinion change? Second, does context influence how participants assess either the deliberative quality of the event or its outcomes. Third, are the relationships between participants’ experiences of deliberative quality and process outcomes moderated by contextual variables?

Summary of Findings

Findings indicate that it matters whether participants believe deliberation occurred during their sessions. Higher participant assessments of analytic rigor and democratic discussion were associated with both participant satisfaction and participants’ belief that they had learned enough to reach a good decision. Neither measure of deliberative experience, however, was associated with variations in opinion change. In sum, we found a clear relationship between the ideal definition of deliberation and two of its expected outputs—satisfaction and knowledge gain. This aligns with previous theory and research. Deliberation is designed to foster more informed decisions (Niemeyer & Dryzek 2007), and a plethora of research shows that participation can lead to knowledge gains (Barabas 2004; Fishkin 2018; Gastil 2006; Richards 2018). Both analytic rigor and democratic discussion may be essential to produce participant satisfaction and participants’ confidence that they have learned enough to reach a good decision.

Our second set of hypotheses sought to understand the influence of contextual variables. Hypothesis 2a predicted that more time spent deliberating would lead to higher participant assessments of the CIR process and a greater sense of learning and opinion change. Duration proved unrelated to process assessments, but longer CIR processes yielded a stronger sense of having learned enough and a greater frequency of changing one’s opinion. Hypothesis 2b predicted that the Oregon CIRs authorized by government would lead to more favorable process assessments and outcomes relative to CIR pilot projects. Once again, learning and opinion change were the only results consistent with hypotheses, though the former result fell just short of the conventional threshold for significance. Ideological diversity had process and outcome associations in the predicted directions, but high standard errors rendered all of these findings non-significant.

Our third set of hypotheses predicted that these same three contextual variables would moderate the strength of the process-outcome relationship. A moderation effect was clearest in regard to the four- versus five-day CIR duration. In the shorter processes, analytic rigor was more strongly associated with process satisfaction and the sense of having learned enough to make an informed decision. In the shorter CIR processes, however, scores on analytic rigor had a significant negative relationship with opinion change. One interpretation of the latter finding is that citizen bodies like the CIR may have a turning point for opinion change between the fourth and fifth day. Without that extra day, greater rigor can produce rapid learning but a modest resistance to opinion change if the process feels rushed. Unfortunately, this poses a dilemma for practitioners, who recognize that high-quality deliberation can be expensive to arrange and burdensome for citizen participants (Barnes 1999; French & Laver 2009). Researchers should continue to explore exactly how much rigorous deliberation is required to produce desired outcomes, thereby ensuring that any added cost is worth the marginal benefit.

The other significant contextual moderator found that CIR panels with low ideological diversity had a stronger relationship between analytic rigor and opinion change. Consistent with predictions, this result suggested that a more rigorous deliberative process can help make up for low ideological diversity when it comes to generating shifts in panelist opinions about the ballot measure under discussion.

An indirect implication of the moderation findings concerns the distinction between analytic rigor and democratic process quality. The fact that the former variable was the only one moderated by contextual factors is one more validation of the difference between these two process measures. This conceptual and methodological note has special significance for deliberation scholars, who continue to seek a robust approach to measuring process quality (Black, Burkhalter, et al. 2011). Such assessments should at least make the distinction between the depth of problem and solution analysis versus the relational dynamic among the participants (Gastil & Black 2007).

Finally, we note the relative weakness of the demographic control variables in our analysis, including those we dropped from analysis to avoid significant data loss (see Footnote 3). The pattern we see here is consistent with other studies that have found relatively small or non-existent demographic variations in deliberative experiences (e.g., Hickerson & Gastil 2008; Siu 2009; Sumaktoyo et al. 2016). Given the importance of potential inequalities in deliberative events (e.g., Mendelberg & Karpowitz 2007; Young 2002), it is noteworthy when these variations fail to emerge in such analyses.

Study Limitations

Though pooling survey results across fifteen deliberative events provided a rare glimpse at cross-event patterns, limitations in this study warrant caution when generalizing from our results. Fifteen is a small number when considering the CIR as a unit of analysis, and this made multilevel modeling impossible. As detailed in the Methods section (and in Sumaktoyo et al. 2016), the use of cluster-robust standard errors might offer some reassurance to those who wish to account for the interdependence of the participants within a single event, but it is a poor substitute for modeling effects at different levels of analysis.

With a sufficient number of event-level cases, one could begin to tease apart the effects of contextual variables, such as duration, official authorization, and ideological diversity. Not only did this study analyze those variables separately, but the use of just three contextual variables obscured other group-level differences and interrelationships among them. For example, the four- versus five-day duration variable overlapped with time, since the only five-day CIRs were the first four events, held in 2010–2012. Likewise, authorization covaried with geography: The only officially authorized CIRs all happened in Oregon, with all but two of the pilot tests occurring elsewhere.

Others may wish to improve on this study’s measure of deliberative quality, which relied on participant assessments of the process. Though we believe subjective experiences are valid measures of process quality (Knobloch et al. 2013), utilizing more objective measures of process quality could help validate, or raise doubts about, those assumptions. A related problem was that CIR participants rated all aspects of their process highly, resulting in low variance for some variables, particularly democratic discussion.

Similarly, our measure of opinion change and knowledge gains were reliant on participants’ subjective sense of those changes. Pre- and post-deliberation measures would provide more validity to these tests. This is particularly important in the case of opinion change. When participants rated their prior opinion after they deliberated, they were more likely to say that they had previously been undecided, resulting in lower levels of opinion change. These discrepancies may obscure actual opinion change that did take place and subsequently may have reduced our ability to find opinion change or sort out the relationships between opinion change and other relevant variables.


If deliberation hopes to achieve its basic goals, then participant experience matters. Processes that participants rated as more analytically rigorous and democratic produced higher levels of satisfaction and feelings of having the knowledge necessary to make a good decision. Such ideal deliberation, however, did not necessarily result in opinion change.

Context also matters. Longer processes increased participants’ confidence in their issue-specific knowledge and led to greater levels of opinion change. Shorter processes were more reliant on analytic rigor to produce panelist satisfaction and learning, though the reverse was true for opinion change. Empowered processes were more likely to yield opinion change. Finally, processes with relatively low ideological division among the panelists were most reliant on analytic rigor to produce opinion change.

A broader view of these findings can provide lessons for both practitioners and scholars. If those who promote deliberation do so with the goal of achieving better democracy, then meeting the ideals of deliberation is necessary. Simply calling something deliberative engagement does not make it so. For such interactions to make a difference for participants, the process needs to foster the careful weighing of information, arguments, and values under conditions that engender equality, respect, and consideration of diverse perspectives. This is important to remember as the CIR model gets adapted in other countries, with recent pilot tests occurring in Korsholm (Finland) and Sion (Switzerland), but it applies equally well to the wider array of deliberative designs being developed in Ireland (Farrell & Suiter 2019), Belgium (Reuchamps 2020), and elsewhere.

Looking across all the findings from this study, the basis of deliberative theory appears sound, though the details remain hazy. Across contexts, positive participant experiences of analytic rigor and democratic quality led to satisfaction and knowledge gains. Though scholars have long made this claim, seldom have outcomes been empirically connected to the presence or absence of deliberative components. This paper attempted to fill that gap. Its findings bolster claims that deliberation itself is what leads to the broader goals of a more informed, engaged, and legitimate democracy. Equally, however, it complicates questions for deliberative proponents and highlights the need to continue to test its basic presumptions rather than assuming that theoretical arguments will be realized in practice. These results indicate that opinion change may be more elusive and context-dependent than previously theorized. Researchers must continue to explore what makes participants change their minds during deliberative events and debate whether opinion change should be considered a measure of deliberative success.


  1. In 2014, participants provided their opinion in a pre-survey conducted before the review and reported only their post-event opinion on the end-of-review survey. To test whether this discrepancy influenced reported levels of opinion change, a t-test was performed comparing the absolute value of opinion change for participants in 2014 versus the other years. That test did find a significant difference between the two groups, with those who reported their opinions in a pre-survey showing higher levels of change than those who reported their pre-review opinions retrospectively in the post-CIR survey (t = –2.60, p = 0.01). We discuss this issue in the conclusion as a limitation of our study.
  2. The 2010 and 2014 Oregon CIRs were combined to create a distribution of ideology by partisanship among Oregon panelists. CIR panelists were assigned ideology scores within party to match this overall distribution, then ideology SDs were calculated for both of the 2012 CIRs. This simplified form of missing data imputation for Ideological Divergence was used because this dataset had no other significant data loss.
  3. We checked the robustness of the regression analyses reported below to make certain that there was no difference in main results depending on the inclusion or exclusion of these three demographic control variables. There were no such differences, as only one of these dropped demographic variables had a significant association with a dependent variable in regression: Education was associated with Learned Enough (B = .08, SE = 0.03, p = 0.011) and opinion change (B = 0.15, SE = 0.07, p = 0.038). The minor effects of these controls is consistent with another study using clustered standard errors (Sumaktoyo et al. 2016).
  4. We conducted these analyses in SPSS using the Complex Samples analysis commands. This procedure began with the Prepare for Analysis (CSPLAN syntax keyword) that specified the CIRs as the grouping variable and the size of each CIR panel, then we proceeded to regression (CSGLM syntax keyword). A previous analysis had used conventional regression without clustering and produced approximately the same results as reported herein.
  5. A previous analysis produced approximately similar results with a different approach, conducting separate regressions for each contextual variable, with partisanship dispersion split at the median, then comparing regression coefficients using a q-test (Cohen 1988).
  6. Because the measures of satisfaction and knowledge gain had potential ceiling effects (i.e., the modal response in both cases being the highest value of a 1–5 scale), we reexamined the effects of deliberative process variables using Tobit regression (McBee, 2010) in SPSS via the R statistical package extensions. The coefficients were similar in every case, without any changes in their level of statistical significance.
  7. We used one-tailed p values throughout our analyses to reflect the directional nature of each hypothesis. On this choice of a significance threshold, see (R. P. Abelson 1995: 64–67).
  8. Because partisanship was used to assign ideological dispersion scores for two of the CIRs, alternative analyses were run for this contextual variable dropping partisanship as a control. The results were nearly identical.
  9. Dropping the two CIR cases with imputed values (see Note 2) caused no change in results.
  10. Each interaction term was calculated by multiplying a contextual variable by a process variable.
  11. Because this result ran contrary to predictions, the stricter two-tailed threshold was applied, p = 0.012.

Funding Information and Acknowledgements

This project has been supported by the Kettering Foundation, The Democracy Fund, Colorado State University, the Pennsylvania State University, the University of Washington, and the National Science Foundation Directorate for Social, Behavioral and Economic Sciences: Decision, Risk and Management Sciences, NSF (Awards 0961774 and 1357276/1357444). Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of these foundations or universities. For assistance with the design and execution of this study, we wish to thank Robert Richards, David Brinker, Michael Broghammer, and Jessica Kropczynski at the Pennsylvania State University, Justin Reedy, Chris Anderson, and Cheryl Maiorca at the University of Oklahoma, Kacey Bull, Kalie McMonagle, Krystina O’Neal, and Cramer McGinty at Colorado State University, Laura Black at Ohio University, Ekaterina Lukianova at Saint-Petersburg State University, Genevieve Fuji Johnson at Simon Fraser University, A. Lee Hannah at Wright State University, Soo-Hye Han at Kansas State University, Michael E. Morrell at the University of Connecticut, Leah Sprain at the University of Colorado – Boulder, and Stephanie Bor at the University of Colorado – Denver. We’d also like to thank the Morrison Institute in Phoenix, Arizona, Engaged Public in Denver, Colorado, Tufts University’s Tisch College of Civic Life, and State Representative Jonathan Hecht and his staff for giving us access to the Citizen Initiative Reviews and its participants. Finally, we extend a special thank you to the entire staff, past and present, at Healthy Democracy, particularly Tyrone Reitman, Elliot Shuford, Robin Teater, Jessie Conover, and Linn Davis who helped make this work possible over a span of ten years.

Competing Interests

The authors have no competing interests to declare.


1 Abdel-Monem, T., Bingham, S., Marincic, J., & Tomkins, A. (2010). Deliberation and diversity: Perceptions of small group discussions by race and ethnicity. Small Group Research, 41(6), 746–776. DOI:

2 Abdullah, N. N., & Rahman, M. F. A. (2015). The use of deliberative democracy in public policy making process. Public Policy and Administration Research, 5(3), 221–229. DOI:

3 Abelson, J., Forest, P.-G., Eyles, J., Smith, P., Martin, E., & Gauvin, F.-P. (2003). Deliberations about deliberative methods: Issues in the design and evaluation of public participation processes. Social Science & Medicine, 57, 239–251. DOI:

4 Abelson, R. P. (1995). Statistics as principled argument. LEA.

5 Anderson, C. W. (1993). Recommending a scheme of reason: Political theory, policy science, and democracy. Policy Sciences, 26(3), 215–227. DOI:

6 Barabas, J. (2004). How deliberation affects policy opinions. American Political Science Review, 98, 687–701. DOI:

7 Barber, B. R. (1984). Strong democracy: Participatory politics for a new age. University of California Press.

8 Barnes, M. (1999). Building a deliberative democracy: An evaluation of two citizens’ juries. London: Institute for Public Policy Research.

9 Benhabib, S. (1996). Toward a deliberative model of democratic legitimacy. In S. Benhabib (Ed.), Democracy and difference: Contesting the boundaries of the political (pp. 67–94). Princeton University Press. DOI:

10 Black, L. W. (2008). Deliberation, storytelling, and dialogic moments. Communication Theory, 18(1), 93–116. DOI:

11 Black, L. W. (2012). How people communicate during deliberative events. In T. Nabatchi, J. Gastil, M. Weiksner, & M. Leighninger (Eds.), Democracy in motion: Evaluating the practice and impact of deliberative civic engagement (pp. 59–81). Oxford University Press. DOI:

12 Black, L. W., Burkhalter, S., Gastil, J., & Stromer-Galley, J. (2011). Methods for analyzing and measuring group deliberation. In E. P. Bucy & R. L. Holbert (Eds.), Sourcebook for Political Communication Research: Methods, Measures, and Analytical Techniques (pp. 323–345). Routledge. DOI:

13 Black, L. W., Welser, H. T., Cosley, D., & DeGroot, J. M. (2011). Self-governance through group discussion in Wikipedia: Measuring deliberation in online groups. Small Group Research, 42(5), 595–634. DOI:

14 Bohman, J. F. (1996). Public deliberation: Pluralism, complexity, and democracy. Cambridge, MA: MIT Press.

15 Broder, D. S. (2001). Democracy derailed: Initiative campaigns and the power of money. Boston: Mariner Books.

16 Burgess, M., O’Doherty, K., & Secko, D. (2008). Biobanking in British Columbia: Discussions of the future of personalized medicine through deliberative public engagement. Personalized Medicine, 5, 285–296. DOI:

17 Burkhalter, S., Gastil, J., & Kelshaw, T. (2002). A conceptual definition and theoretical model of public deliberation in small face-to-face groups. Communication Theory, 12(4), 398–422. DOI:

18 Burnett, C. M. (2019). Information and direct democracy: What voters learn about ballot measures and how it affects their votes. Electoral Studies, 57, 223–244. DOI:

19 Caluwaerts, D., & Deschouwer, K. (2014). Building bridges across political divides: Experiments on deliberative democracy in deeply divided Belgium. European Political Science Review, 6(3), 427–450. DOI:

20 Cameron, A. C., & Miller, D. L. (2015). A practitioner’s guide to cluster-robust inference. Journal of Human Resources, 50(2), 317–372. DOI:

21 Carman, K. L., Mallery, C., Maurer, M., Wang, G., Garfinkel, S., Yang, M., et al. (2015). Effectiveness of public deliberation methods for gathering input on issues in healthcare: Results from a randomized trial. Social Science & Medicine, 33, 11–20. DOI:

22 Chambers, S. (2003). Deliberative democratic theory. Annual Review of Political Science, 6, 307–326. DOI:

23 Cohen, J. (1988). Statistical power analysis for the behavioral sciences, 2nd Edition. Hillsdale, NJ: Lawrence Erlbaum.

24 Coote, A., & Lenaghan, J. (1997). Citizens’ juries: Theory into practice. Institute for Public Policy Research.

25 Crosby, N., & Nethercutt, D. (2005). Citizens Juries: Creating a trustworthy voice of the people. In J. Gastil & P. Levine (Eds.), The deliberative democracy handbook: Strategies for effective civic engagement in the Twenty-First Century (pp. 111–119). Jossey-Bass.

26 Dienel, P. C., & Renn, O. (1995). Planning cells: A gate to “fractal” mediation. In O. Renn, T. Webler, & P. Wiedemann (Eds.), Fairness and Competence in Citizen Participation: Evaluating Models for Environmental Discourse (pp. 117–140). Dordrecht: Springer Netherlands. DOI:

27 Ellis, R. J. (2002). Democratic delusions: The initiative process in America. Lawrence, KS: University Press of Kansas.

28 Esarey, J., & Menger, A. (2019). Practical and effective approaches to dealing with clustered data. Political Science Research and Methods, 7(3), 541–559. DOI:

29 Esterling, K. M., Fung, A., & Lee, T. (2015). How much disagreement is good for democratic deliberation? Political Communication, 32(4), 529–551. DOI:

30 Fagotto, E., & Fung, A. (2009). Sustaining public engagement: Embedded deliberation in local communities. East Hartford, CT: Everyday Democracy.

31 Farrar, C., Fishkin, J. S., Green, D. P., List, C., Luskin, R. C., & Levy Paluck, E. (2010). Disaggregating deliberation’s effects: An experiment within a deliberative poll. British Journal of Political Science, 40(2), 333–347. DOI:

32 Farrell, D. M., & Suiter, J. (2019). Reimagining democracy: Lessons in deliberative democracy from the Irish front line. Cornell University Press. DOI:

33 Fishkin, J. S. (1991). Democracy and deliberation: New directions for democratic reform. Yale University Press.

34 Fishkin, J. S. (2009). When the people speak: Deliberative democracy and public consultation. Oxford University Press.

35 Fishkin, J. S. (2018). Democracy when the people are thinking: Revitalizing our politics through public deliberation. New York: Oxford University Press. DOI:

36 Fishkin, J. S., Kousser, T., Luskin, R. C., & Siu, A. (2015). Deliberative agenda setting: Piloting reform of direct democracy in California. Perspectives on Politics, 13(4), 1030–1042. DOI:

37 Fishkin, J. S., & Luskin, R. C. (1999). Bringing deliberation to the democratic dialogue: The NIC and beyond. In M. McCombs, A. Reynolds, et al. (Eds.), The poll with a human face: The National Issues Convention experiment in political communication (pp. 3–38). Mahwah, NJ: Lawrence Erlbaum.

38 Foels, R., Driskell, J. E., Mullen, B., & Salas, E. (2000). The effects of democratic leadership on group member satisfaction: An integration. Small Group Research, 31(6), 676–701. DOI:

39 French, D., & Laver, M. (2009). Participation bias, durable opinion shifts and sabotage through withdrawal in citizens’ juries. Political Studies, 57(2), 422–450. DOI:

40 Fung, A., & Wright, E. O. (2003). Deepening democracy: Institutional innovations in empowered participatory governance. Verso.

41 Gastil, J. (1993). Democracy in small groups: Participation, decision-making, and communication. Philadelphia, PA: New Society Publishers.

42 Gastil, J. (2004). Adult civic education through the National Issues Forums: Developing democratic habits and dispositions through public deliberation. Adult Education Quarterly, 54, 308–328. DOI:

43 Gastil, J. (2006). How balanced discussion shapes knowledge, public perceptions, and attitudes: A case study of deliberation on the Los Alamos National Laboratory. Journal of Public Deliberation, 2, 1–39. DOI:

44 Gastil, J. (2008). Political communication and deliberation. Sage. DOI:

45 Gastil, J. (2013). What counts as deliberation? Comparing participant and observer ratings. In L. Carson, J. Gastil, J. Hartz-Karp, & R. Lubensky (Eds.), The Australian Citizens’ Parliament and the future of deliberative democracy (pp. 95–107). Pennsylvania State University Press. DOI:

46 Gastil, J. (2018). The lessons and limitations of experiments in democratic deliberation. Annual Review of Law and Social Science, 14. DOI:

47 Gastil, J., Bacci, C., & Dollinger, M. (2010). Is deliberation neutral? Patterns of attitude change during “The Deliberative Polls.” Journal of Public Deliberation, 6(2). DOI:

48 Gastil, J., & Black, L. W. (2007). Public deliberation as the organizing principle of political communication research. Journal of Public Deliberation, 4(1). DOI:

49 Gastil, J., Black, L. W., & Moscovitz, K. (2008). Ideology, attitude change, and deliberation in small face-to-face groups. Political Communication, 25(1), 23–46. DOI:

50 Gastil, J., Deess, E. P., Weiser, P. J., & Simmons, C. (2010). The jury and democracy: How jury deliberation promotes civic engagement and political participation. Oxford University Press.

51 Gastil, J., & Dillard, J. P. (1999). Increasing political sophistication through public deliberation. Political Communication, 16, 3–23. DOI:

52 Gastil, J., Knobloch, K., & Kelly, M. (2012). Evaluating deliberative public events and projects. In T. Nabatchi, J. Gastil, M. Weiksner, & M. Leighninger (Eds.), Democracy in motion: Evaluating the practice and impact of deliberative civic engagement (pp. 205–229). Oxford University Press. DOI:

53 Gastil, J., Knobloch, K. R., Reedy, J., Henkels, M., & Cramer, K. (2017). Assessing the electoral impact of the 2010 Oregon Citizens’ Initiative Review. American Politics Research. DOI:

54 Gastil, J., & Levine, P. (Eds.). (2005). The deliberative democracy handbook: Strategies for effective civic engagement in the twenty-first century. San Francisco, CA: Jossey-Bass.

55 Gastil, J., Richards, R., Ryan, M., & Smith, G. (2017). Testing assumptions in deliberative democratic design: A preliminary assessment of the efficacy of the Participedia Data Archive as an analytic tool. Journal of Public Deliberation, 13(2). DOI:

56 Gastil, J., Smith, M. A., & Simmons, C. (2001). There’s more than one way to legislate: An integration of representative, direct, and deliberative approaches to democratic governance. University of Colorado Law Review, 72, 1005–1028.

57 Gerber, E. R. (1999). The populist paradox: Interest group influence and the promise of direct legislation. Princeton University Press.

58 Gilman, H. R. (2016). Democracy reinvented: Participatory budgeting and civic innovation in America. Washington, DC: Brookings Institution Press.

59 Goodin, R. E., & Dryzek, J. S. (2006). Deliberative impacts: The macro-political uptake of mini-publics. Politics & Society, 34(2), 219–244. DOI:

60 Gouran, D. S., & Hirokawa, R. Y. (1996). Functional theory and communication in decision-making and problem-solving groups: An expanded view. In R. Y. Hirokawa & M. S. Poole (Eds.), Communication and group decision making (2nd ed., pp. 55–80). Beverly Hills: Sage. DOI:

61 Grönlund, K., Bachtiger, A., & Setälä, M. (Eds.). (2014). Deliberative mini-publics: Involving citizens in the democratic process. Colchester, UK: ECPR Press.

62 Gutmann, A., & Thompson, D. (2004). Why deliberative democracy? Princeton University Press. DOI:

63 Gutmann, A., & Thompson, D. F. (1996). Democracy and disagreement. Cambridge, MA: Harvard University Press.

64 Habermas, J. (1998). Between facts and norms: Contributions to a discourse theory of law and democracy (W. Rehg, Trans.). MIT Press.

65 Hartz-Karp, J., & Briand, M. K. (2009). Institutionalizing deliberative democracy. Journal of Public Affairs, 9, 125–141. DOI:

66 Hendriks, C. M. (2005). Consensus conferences and planning cells: Lay citizen deliberations. In J. Gastil & P. Levine (Eds.), The deliberative democracy handbook: Strategies for effective civic engagement in the twenty-first century (pp. 80–110). Jossey-Bass.

67 Herriman, J., Atherton, A., & Vecellio, L. (2011). The Australian experience of world wide views on global warming: The first global deliberation process. Journal of Public Deliberation, 7(1). DOI:

68 Hickerson, A., & Gastil, J. (2008). Assessing the difference critique of deliberation: Gender, emotion, and the jury experience. Communication Theory, 18, 281–303. DOI:

69 Himmelroos, S. (2017). Discourse quality in deliberative citizen forums: A comparison of four deliberative mini-publics. Journal of Public Deliberation, 13(1). DOI:

70 Hwang, H., Kim, Y., & Kim, Y. (2018). Influence of discussion incivility on deliberation: An examination of the mediating role of moral indignation. Communication Research, 45, 213–240. DOI:

71 Jacobs, L. R., Cook, F. L., & Delli Carpini, M. X. (2009). Talking together: Public deliberation and political participation in America. Chicago: University of Chicago Press. DOI:

72 Johnson, C., & Gastil, J. (2015). Variations of institutional design for empowered deliberation. Journal of Public Deliberation, 11(1). DOI:

73 Johnson, G. F. (2009). Deliberative democratic practices in Canada: An analysis of institutional empowerment in three cases. Canadian Journal of Political Science, 42, 679–703. DOI:

74 Karpowitz, C. F., & Mendelberg, T. (2014). The silent sex: Gender, deliberation, and institutions. Princeton University Press. DOI:

75 Karpowitz, C. F., & Raphael, C. (2014). Deliberation, democracy, and civic forums: Improving equality and publicity. Cambridge University Press. DOI:

76 Knobloch, K. R., & Gastil, J. (November 2012). Civic (re)socialization: The educative effects of deliberative participation. Ninety-Eighth Annual Convention of the National Communication Association, Orlando, FL.

77 Knobloch, K. R., Gastil, J., Reedy, J., & Cramer Walsh, K. (2013). Did they deliberate? Applying an evaluative model of democratic deliberation to the Oregon Citizens’ Initiative Review. Journal of Applied Communication Research, 41(2), 105–125. DOI:

78 Kuyper, J. W. (2018). The instrumental value of deliberative democracy – or, do we have good reasons to be deliberative democrats? Journal of Public Deliberation, 18(1).

79 Landemore, H. (2013). Democratic reason: Politics, collective intelligence, and the rule of the many. Princeton University Press. DOI:

80 Lee, C. W. (2014). Do-it-yourself democracy: The rise of the public engagement industry. Oxford University Press. DOI:

81 Levine, P., Fung, A., & Gastil, J. (2005). Future directions for public deliberation. In J. Gastil & P. Levine (Eds.), The deliberative democracy handbook: Strategies for effective civic engagement in the twenty-first century (pp. 271–288). San Francisco, CA: Jossey-Bass.

82 Liang, K.-Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73(1), 13–22. DOI:

83 Lu, L., Yuan, Y. C., & McLeod, P. L. (2012). Twenty-five years of hidden profiles in group decision making: A meta-analysis. Personality and Social Psychology Review, 16(1), 54–75. DOI:

84 Mansbridge, J. J. (1983). Beyond adversary democracy. Chicago: University of Chicago Press.

85 Mathews, D. (1994). Politics for people: Finding a responsible public voice. Chicago: University of Illinois Press.

86 Matsusaka, J. G. (2004). For the many or the few: The initiative, public policy, and American democracy. University of Chicago Press. DOI:

87 McBee, M. (2010). Modeling outcomes with floor or ceiling effects: An introduction to the Tobit model. Gifted Child Quarterly, 54(4), 314–320. DOI:

88 Mendelberg, T., & Karpowitz, C. (2007). How people deliberation about justice: Groups, gender, and decision rules. In Deliberation, participation and democracy: Can the people govern? (pp. 101–129). Houndmills, Basingstoke, Hampshire. DOI:

89 Mendelberg, T., & Oleske, J. (2000). Race and public deliberation. Political Communication, 17, 169–191. DOI:

90 Morrell, M. E. (2005). Deliberation, democratic decision making and internal political efficacy. Political Behavior, 27, 49–69. DOI:

91 Myers, C. D. (2018). Political deliberation, interest conflict, and the common knowledge effect. Journal of Public Deliberation, 18(1). DOI:

92 Myers, C. D., Gordon, H. G., Kim, H. M., Rowe, Z., & Goold, S. D. (2018). Does group deliberation mobilize? The effect of public deliberation on willingness to participate in politics. Political Behavior. DOI:

93 Nabatchi, T. (2010). Deliberative democracy and citizenship: In search of the efficacy effect. Journal of Public Deliberation, 6(2). DOI:

94 Nabatchi, T. (2012). An introduction to deliberative civic engagement. In T. Nabatchi, J. Gastil, M. Weiksner, & M. Leighninger (Eds.), Democracy in motion: Evaluating the practice and impact of deliberative civic engagement (pp. 3–17). Oxford University Press. DOI:

95 Nabatchi, T., Gastil, J., Weiksner, M., & Leighninger, M. (Eds.). (2012). Democracy in motion: Evaluating the practice and impact of deliberative civic engagement. Oxford University Press. DOI:

96 Neblo, M. A. (2007). Family disputes: Diversity in defining and measuring deliberation. Swiss Political Science Review, 13(4), 527–557. DOI:

97 Neblo, M. A. (2015). Deliberative democracy between theory and practice. Cambridge University Press. DOI:

98 Neblo, M. A., Esterling, K. M., & Lazer, D. M. (2018). Politics with the people: Building a directly representative democracy. Cambridge University Press. DOI:

99 Niemeyer, S., & Dryzek, J. S. (2007). The ends of deliberation: Meta-consensus and inter-subjective rationality as ideal outcomes. Swiss Political Science Review, 13(4), 497–526. DOI:

100 Pincock, H. (2012). Does deliberation make better citizens? In T. Nabatchi, J. Gastil, M. Weiksner, & M. Leighninger (Eds.), Democracy in motion: Evaluating the practice and impact of deliberative civic engagement (pp. 135–162). Oxford University Press. DOI:

101 Polletta, F., & Lee, J. (2006). Is telling stories good for democracy? Rhetoric in public deliberation afer 9/11. American Sociological Review, 71, 699–723. DOI:

102 Renn, O., Webler, T., Rakel, H., Dienel, P., & Johnson, B. (1993). Public participation in decision making: A three-step procedure. Policy Sciences, 26, 189–214. DOI:

103 Reuchamps, M. (January 17, 2020). Belgium’s experiment in permanent forms of deliberative democracy | ConstitutionNet. ConstitutionNet.

104 Richards, R. C., Jr. (2018). Making policy information relevant to citizens: A model of deliberative mini-publics, applied to the Citizens’ Initiative Review. Policy & Politics, 46(3), 445–465. DOI:

105 Sanders, L. M. (1997). Against deliberation. Political Theory, 25, 347–376. DOI:

106 Schia, N. N., & Gjesvik, L. (2020). Hacking democracy: Managing influence campaigns and disinformation in the digital age. Journal of Cyber Policy, 5(3), 413–428. DOI:

107 Setälä, M., & Smith, G. (2018). Mini-publics and deliberative democracy. In A. Bächtiger, J. S. Dryzek, J. Mansbridge, & M. E. Warren (Eds.), The Oxford handbook of deliberative democracy (pp. 300–314). Oxford University Press.

108 Siu, A. (2009). Look who’s talking: Deliberation and social influence (SSRN Scholarly Paper ID 1468078). Social Science Research Network. DOI:

109 Sohrab, S. G., Waller, M. J., & Kaplan, S. (2015). Exploring the hidden-profile paradigm: A literature review and analysis. Small Group Research, 46(5), 489–535. DOI:

110 Spada, P., & Ryan, M. (2017). The failure to examine failures in democratic innovations. PS: Political Science & Politics, 50(3), 772–778. DOI:

111 Stasser, G., & Titus, W. (2003). Hidden profiles: A brief history. Psychological Inquiry, 14, 304–313. DOI:

112 Steenbergen, M. R., Bächtiger, A., Spörndli, M., & Steiner, J. (2003). Measuring political deliberation: A discourse quality index. Comparative European Politics, 1(1), 21–48. DOI:

113 Sumaktoyo, N. G., Nickerson, D. W., & Keane, M. J. (2016). Discussion Group Composition and Deliberation Experience. Journal of Experimental Political Science, 3(2), 164–173. DOI:

114 Warren, M. E., & Gastil, J. (2015). Can deliberative minipublics address the cognitive challenges of democratic citizenship? Journal of Politics, 77(2), 562–574. DOI:

115 Warren, M. E., & Pearse, H. (Eds.). (2008). Designing deliberative democracy: The British Columbia Citizens’ Assembly. Cambridge University Press. DOI:

116 Young, I. M. (2002). Inclusion and democracy. Oxford University Press. DOI:

117 Zhang, W. (2012). Perceived procedural fairness in deliberation: Predictors and effects. Communication Research, 42(3), 345–364. DOI: