Mini-Public Replication: Emotions and Deliberation in the Citizens’ Initiative Review Redux

Scholars have increasingly urged researchers to evaluate prior findings through replication studies that can help test, refine, and extend claims made in previous research. We agree that this is an important aspect of social science that deliberative scholarship has underutilized. To help fill this lacuna, we test our previous findings from an analysis of data from Citizen Initiative Reviews (CIRs) in 2016 by replicating our methodology on data from CIRs in 2018. We set out to determine if the results we discovered earlier and developed into the Deliberative Procedures Frame theory appeared again in the 2018 CIRs. We find several repeating patterns, including consistent levels of enthusiasm, slow-rising happiness, and the relationship between happiness on the final day and participants’ evaluations of deliberative quality, and these indicate that our theory remains a viable explanation for emotions in mini-public deliberation. While we discover some differences between the two sets of data—the most common mid-level reported emotion was anger in 2016 and sympathy in 2018—we remain confident that many of the claims identified in our previous analysis remain correct. Based on this replication, we clarify that what we call the Deliberative Procedures Frame enables the identification of the times during deliberation when participants are most likely to experience emotions such as anger, happiness, and sympathy, and thus, those moments that are probably the most crucial to ensuring quality deliberation in mini-publics. Anger and sympathy are most likely to occur during the middle periods of deliberation in which participants interact intensely with outside experts, advocates, and their fellow participants, while happiness is most likely to arise at the end of deliberation when participants successfully complete the process.


Introduction
We seek in this paper to advance the empirical study of deliberative democracy. Our particular focus is on an important but still under-researched theme in the burgeoning literature on public deliberation: the role of emotions in deliberation. There is a rich and growing field of study looking at the role of emotions in politics (Brader & Marcus 2013;Demertzis 2014;Marcus 2003). Yet the research on how emotions play out in deliberation and specifically how they influence the course of deliberation remains a relatively new area of study. While much early deliberative theory focused on a rationalized process of argumentation, following many criticisms, most now recognize that emotions are very important to deliberation. Emotions can not only affect the process of deliberation, but are also likely an integral component of what it means to deliberate. Based on our empirically oriented studies, we have written previously on the relationship between emotion and deliberation (Johnson, Morrell, & Black 2019;Johnson, Black, & Knobloch 2017). This paper represents a further development of our work, and by extension, of the field of deliberative democracy. Specifically, in this paper, we test the findings from our analysis of data from 2016 by replicating our methodology on data from 2018. We derived our data for both years from very similarly structured mini-publics: Citizens' Initiative Reviews (CIRs). Given the relative paucity of research on emotions in deliberation in 2016, one of our main goals for the initial study was to describe what emotions participants reported feeling and whether there were any patterns in such reports across the four days of deliberation. As described below, we choose the emotions to study-anger, anxiety, enthusiasm, happiness, sadness, and sympathy-based upon previous theoretical and empirical work that scholars had not yet applied to deliberative mini-publics. Relying upon our descriptive data, we were able to engage in theory building by constructing what we called the Deliberative Procedures Frame (Johnson, Morrell, & Black 2019), which posits that mini-public deliberative procedures are the best explanation for the emotions participants report feeling as well as their variation across time. We developed this frame based on our initial analysis of the 2016 CIRs, and it enables us to examine how particular moments in the deliberative process mediate emotions. In addition, it allows us to see how procedures deepen the collective bond and shared deliberative purpose of participants and how emotions, when properly mediated, might deepen and advance collective deliberation. Based on this previous analysis, we contend that the activities and tasks of the deliberative group, as well as the behaviors of participants and relationships among them, are all important in shaping the experience and role of emotions, but that the procedures are particularly influential in mediating emotions toward productive deliberation. We also were able to engage in a preliminary analysis that examined whether these emotions affected three self-reported measures of deliberative quality, giving us purchase on some of the theoretical and empirical arguments put forward by Affective Intelligence Theory (MacKuen et al. 2010;Marcus 2002;Marcus, Neuman, & MacKuen 2000;Wolak & Marcus 2007) and those who put empathy at the heart of deliberation (Fleckenstein 2007;Grönlund, Herne, & Setälä 2017;Krause 2008;Morrell 2010;Muradova 2020;Muradova 2021). For this replication study, we set out to determine if the patterns we discovered earlier arose again in 2018. Although it appeared after we collected both sets of data, Michael Neblo's (2020) elaboration of twelve roles that emotion can play in deliberation provides a map by which we can clarify the relational aspect of our work, which most closely relates to what Neblo identifies as Role #7: Enabling Conditions, that is, the role that emotions play as part of 'the basic means by which we can engage in reciprocal role-taking during deliberation ' (2020: 925). Yet what we also highlight is that emotions are more complex than even Neblo emphasizes because of the interplay between emotions and deliberation itself; emotions can contribute to deliberation, but deliberative procedures can also trigger emotions.
In the following, we provide a brief discussion of the value of replication studies and describe our methodology for this replication study. We then present our findings and discuss the implications for our Deliberative Procedures Frame, in particular, and for the empirical study of deliberation and emotions more generally. The results of our replication cause us to reflect on our initial conclusion that the Deliberative Procedures Frame is the most helpful way to understand expressions of anger, anxiety, enthusiasm, happiness, and sympathy, and the role these emotions play in deepening and advancing collective deliberation. We find many of the same results across the two sets of data, including consistent levels of enthusiasm, slow-rising happiness, and the relationships between certain emotions on the final day and participants' evaluations of deliberative quality, and these indicate that our Deliberative Procedures Frame remains a viable explanation for emotions in mini-public deliberation. We also remain confident that the sources of anger and frustration identified in our previous analysis remains correct. What is new is that the Deliberative Procedures Frame provides us a theory about the likely source of participants' emotions in deliberation but does not establish a causal relationship between procedures and participants' emotional patterns. Instead, it identifies the points in time during deliberation when participants are most likely to experience emotions such as anger, sympathy, and happiness, and thus, those moments that are probably the most crucial to ensuring quality deliberation in mini-publics.

Deliberation Replication
One way to advance deliberative theory and research is to test prior claims through replication studies. If done well, replication research can validate or challenge findings from previous research (Amir & Sharon 1990;Rosenthal 1990;Smith 1970). Additionally, replication has the potential to improve the transparency of research processes, clarify the scope conditions of claims made by prior research, and highlight opportunities for additional research.
Scholars in many social science disciplines have advocated for replication as a way of refining and advancing scholarly knowledge in their fields. Psychology, a discipline that largely relies on experimental studies, faced a 'replication crisis' when they realized that scholars were unable to adequately replicate many key research findings (Shrout & Rodgers 2018). To address this crisis, Shrout and Rodgers argue, researchers should adopt ' open science conventions of preregistration and full disclosure' about research methods (2018: 487) and findings of replication studies 'should be reported regardless of whether the new result was statistically significant' (2018: 500). 1 Replication is also useful for fields that embrace a more multi-method approach to knowledge building, such as political science (Dunning 2016), criminology (Pridemore, Makel, & Plucker 2018), and business (Hubbard & Vetter 1996). Political scientists, for example, have argued that replication is part of what is needed for cumulative learning and the advancement of scholarly knowledge. They argue that replication is made possible through transparency about research design-including sample, measures, methods, and findings-and through open science processes such as making data available to other researchers (Dunning 2016;Elman, Kapiszewski, & Lupia 2018;King 1995;Liberman 2010;Lupia & Elman 2014).
Over the years, scholars of deliberative democracy have published a strong body of work that has many key findings (Curato et al. 2017). These findings have developed from a large body of interdisciplinary and multi-method research, and we argue that scholars could potentially test, clarify, and extend some of these claims through replication research. Deliberative scholars are well poised to do this research in partnership with practitioners who lead a series of events that follow the same deliberative design, such as Citizens' Initiative Reviews (see Knobloch et al. 2013). Good replication requires that we have a clear idea of the forms of previous studies, and since our particular focus is on emotions in deliberation, this is the area to which we now turn.

Emotions and Deliberation
We can classify previous empirical research on the interactions between deliberation and emotions across three dimensions: 1) whether the emotions are general or discrete, 2) whether the deliberation is diffuse or concentrated, and 3) whether deliberation is structured or unstructured. We conceptualize emotion as a self-expression of an affective, occurrent mental state with a specific intentional object (Ben-Ze'ev 2010), which we believe is appropriate given how several deliberative theorists refer to the concept (e.g., Bandes 2013;Dowding 2018;Hall 2005;Hoggett & Thompson 2002;Thompson & Hoggett 2001). General emotions refer to broad categories of affect, including positive and negative feelings; discrete emotions name specific affective states. Diffuse deliberation includes practices of thinking, information seeking, and discussion by citizens in the broader polity over an extended period, including internal reflective deliberation (Goodin 2003); concentrated deliberation encompasses limited periods of citizen deliberation that usually focus on one topic or a limited number of issues. Structured deliberation occurs under specific procedures and often involves facilitation; unstructured deliberation occurs in a more free-wheeling setting with only limited specified procedures or facilitation. Scholarship exists that covers several combinations of these three factors.
The most developed research in the field has looked at the effects of discrete emotions on diffuse, unstructured deliberation: the Affective Intelligence Theory developed by George E. Marcus and his colleagues W. Russell Neuman and Michael MacKuen (2000). Guided by this theory, Marcus, MacKuen, Jennifer Wolak, and Luke Keele provide evidence across several studies that anxiety increases the likelihood of citizens engaging in deliberation, enthusiasm may increase participation but not deliberation, and loathing, anger, and aversion lead citizens to resist new information and deliberation (MacKuen et al. 2010;Marcus 2002;Wolak & Marcus 2007). Colleen McClain (2009) experimentally tested the implications of Affective Intelligence Theory on concentrated, unstructured deliberation by having subjects engage in a computer-mediated exchange of ideas and arguments with a reactive, scripted computer program over a short period of time. While she found that anxiety increased information seeking, the same was also true of enthusiasm, contrary to previous research, although she admits her analysis may have suffered from a multicollinearity problem (2009: 61-63), and that anger had no effects. Still, the discrepancies between her findings and those of Marcus and his colleagues may also arise from the differences between the contexts of diffuse and concentrated deliberation. In contrast with McClain, Nuri Kim (2016), who also studied concentrated, unstructured deliberation, provides evidence that anger increases information seeking. Kim reports the effects of experimentally manipulated anger and information on participants' recall of arguments and information after a one-hour deliberation about a contentious issue in small groups of three to four participants without formal moderation. She finds that those in the anger condition ' collected more arguments for other side opinions and gained more knowledge after the deliberations,' although for subjects 'with a lot of information about the issue, being angry decreased the acquisition of same-side arguments' (2016: 18). One possible explanation for these differences is that Kim utilized a fictitious issue about which participants had no prior knowledge-land use on the university's campus-while McClain had subjects deliberate about human embryonic stem cell research, an issue about which many likely had both information and already formed opinions and affective references. Suiter et al. (2020) examine discrete emotionscompassion and sympathy-but their study looked at the effects of reading about a deliberative mini-public on non-participants. Thus, their finding that 'being exposed to balanced information from both sides of the issue (Pro-Con statements)' (Suiter et al. 2020: 265) can increase compassion and sympathy towards others best represents a study of discrete emotions in diffuse, unstructured deliberation, the same category as the original Affective Intelligence research. Beyond Affective Intelligence, Pawel and Antoni Sobkowicz investigated general emotions, positive and negative, in the diffuse, unstructured arena of online forums (2012), while several other researchers have examined general emotions in concentrated, structured deliberation-the presence of emotions in juries (Hickerson & Gastil 2008) and emotionally laden discourse (Martin 2012) and biographical affect (Komporozos-Athanasiou & Thompson 2015) in a deliberative patient forum.
Unlike the research we have reviewed thus far, we investigate the effects of discrete emotions in mini-publics, which involve concentrated, structured deliberation. When we first set out to do so during the 2016 CIRs, there was no research of which we were aware in this area. In the meantime, however, Nicole Saam (2018) published her study that qualitatively analyzed data from 150 interviewees who were involved in a variety of face-to-face, extended deliberative forums. Beyond our methodological differences, which will become apparent shortly, her research differs from ours in the discrete emotions measured: she focuses on hope, disappointment, and shame, while we examine anger, anxiety, enthusiasm, happiness, sadness, and sympathy. Based upon her data, she inductively hypothesizes that disappointment and shame promote exit rather than deliberation and reinforce inequalities in deliberation since higher status individuals can overcome this effect due to their higher emotional capital, but hope induces participation rather than exit and strengthens everyone's voice irrespective of emotional capital (2018: 767-770). Despite our differences, in the end, we believe our findings are consistent with Saam's. Our Deliberative Procedures Frame also highlights how the deliberative process itself can evoke emotions in participants. What Saam's research and ours also makes clear is that studying emotions in deliberation is even more complicated than it appears at first glance. Not only must investigators be cognizant of the nature of the deliberation they are studying and decide which discrete emotions to measure, they must also consider the target or cause of those emotions. While our quantitative data does not allow us to probe this important question, we can determine if the patterns we discovered earlier in the 2016 CIRs, which supported the Deliberative Procedures Frame, appeared again in the 2018 CIRs, and unlike Saam's snowball, selective sample, across the two years, we have data for every participant in six mini-publics across four days of deliberation.

Deliberative Procedures Republication Study: Data and Methods
The Citizens' Initiative Reviews (CIRs) provide a unique opportunity for replication because it allows us to use the exact same methods and analytical strategies on two sets of data collected in identical ways; in our previous work we reported results from three CIRs conducted in 2016 (Johnson, Morrell, & Black 2019); here we engage in the same exact analysis for three CIRs from 2018. CIRs bring together stratified random samples of 18 to 24 citizens from a polity 'to fairly and thoroughly evaluate ballot measures' over a four-day deliberative process in order to 'give voters information they can trust' by producing ' a statement that contains key facts, the best reasons to vote for the measure, and the best reasons to vote against the measure' (Healthy Democracy, n.d.). The organizers of the CIRs then distribute this statement ' as widely as possible' to allow voters to 'read and consider the statement when they cast their ballot' (Healthy Democracy, n.d.). Healthy Democracy facilitated each of the six CIRs utilizing the same basic four-day process. Prior to the CIR they provided panelists with information on the proposals. Day 1 involved an orientation to the CIR process and the measure citizens were to discuss. Day 2 comprised the presentations by panels of advocates and experts, including question-and-answer sessions. On Day 3, the participants worked on producing claims for their final statement. Day 4 encompassed final revisions on the claims, final votes, and the closing of the session, including participants reflecting on their experiences. In all CIRs, following the close of each day, participants completed surveys that are the source of our analysis.
The survey data we presented previously came from CIRs held in 2016 involving 22 participants in Arizona (Gastil, Reedy, et al. 2017) and 20 participants in Massachusetts (Gastil, Knobloch, et al. 2016) discussing proposals concerning the legalization of marijuana, and 20 participants in Oregon evaluating a proposal for the state's corporate income tax . The data which we are using in this replication come from three CIRs held in 2018, each involving 20 participants, on propositions regarding affordable housing in Portland, Oregon; patient safety and hospital transportation in Massachusetts; and rent-controlled housing in California (Gastil et al. 2019).
In both years, the survey asked participants at the end of each day whether they felt one of six discrete emotions: anger, anxiety, enthusiasm, happiness, sadness, and sympathy. 2 We only had limited space on the survey, so we chose the emotions based upon previous theoretical and empirical work as indicating that they were the most important for deliberation. We included anger, anxiety, and enthusiasm due to the findings from Affective Intelligence Theory we discussed earlier. Several scholars have argued that empathy plays an important role in deliberation (Fleckenstein 2007;Grönlund, Herne, & Setälä 2017;Krause 2008;Muradova 2020;Muradova 2021), but, as one of us has argued (Morrell 2010), empathy is not itself an emotion, and thus, we chose to ask about sympathy, which we argue equates to the concern that results from the empathic process. Finally, previous research has highlighted the importance of positive and negative valence emotions in deliberation (e.g., Sobkowicz & Sobkowicz 2012), so we included happiness and sadness to capture these valences. The resulting survey question was the following: During today's CIR sessions, which of these emotional reactions did you experience, if any? Circle ALL THAT APPLY Anger Anxiety Enthusiasm Happiness Sadness Sympathy

Results from Six Citizens' Initiative Reviews
For each of the results sections below, we first highlight our findings from our previous analysis on the data from 2016 and then give the results for the exact same analysis on the 2018 data. In the first section, we present the frequencies of participants' self-reports for the six emotions for each of the four days of deliberation, and then use McNemar's test to compare the differences across emotions within each day and across the four days for each emotion. In the next section, we report on tests of the relationships among the six emotions and three measures of deliberative quality for each of the four days using Somers' D.

Expressions of emotion
With reference to the 2016 CIRs, we found that across the four days, on average, the emotions participants reported feeling from most to least common were enthusiasm (71.4%), happiness (36.3%), anxiety (31.0%), sympathy (22.6%), anger (16.9%), and sadness (9.3%) (see Figure 1). 3 The data from the 2018 CIRs mirrors the relative positions of enthusiasm (61.7%) and happiness (30.8%), but there are some differences in the remaining emotions. Although anxiety (25.4%) and sympathy (29.2%) are again the third and fourth most common emotions, their ranking switches in the new data; the same is true for anger (6.3%) and sadness (8.8%) (see Figure 2).
To determine which differences between reports of emotions were significant, we compared results across two axes: 1) for each of the four days we examined the differences among the six emotions (15 comparisons per day across four days for a total of 60 comparisons), and 2) for each emotion, we examined the differences across the four days (six comparisons across six emotions for a total of 36 comparisons). This resulted in analyzing 96 different comparisons for both 2016 and 2018, a total of 192 comparisons. The most appropriate statistic to compare these differences, given that we have paired dichotomous nominal data, is McNemar's test.
The emotions participants reported feeling most frequently in both years was enthusiasm. The pattern each day was nearly identical in both studies, with participants statistically significantly more likely to report feeling enthusiasm than all other emotions on all four days (p < 0.001 for all comparisons except for the 2016 and 2018 Day 2 and 2018 Day 3 comparisons with anxiety, where p < 0.01) apart from the last day, after which the differences between enthusiasm and happiness were not statistically significant in either year. Not only was it the most common emotion, the substantive gap between enthusiasm and most of the other emotions was high across both years. There were two exceptions to this consistent pattern. In 2016, there were no significant differences across the four days in reported enthusiasm, while in 2018, enthusiasm was statistically significantly higher on Day 1 (71.7%) than on Days 2 (55.0%, p < 0.01) and 3 (55.0%, p < 0.05); despite this difference, enthusiasm was still the most reported emotion on those days in 2018. The other difference in the data was that on Day 2 in 2018, although reported enthusiasm (55%) was still higher than reported sympathy (48.3%), this difference was not statistically significantly. While these three differences between the two studies do exist, given our small sample sizes and the very possible idiosyncratic nature of mini-publics, we believe that the patterns across the data provide good evidence to conclude that our replication confirmed that enthusiasm was the dominant emotion participants reported feeling throughout the CIRs. We admit, however, that, while we do not have the data to identify the cause, something in the 2018 CIRs limited the enthusiasm participants expressed on Days 2 and 3, and as we will discuss below, also affected reported feelings of sympathy on Day 2.
We find similar confirmation of our previous findings regarding happiness. In both years, this was the second most reported emotion on average across the four days,  primarily due to a significant increase on Day 4. From Day 2 to Day 3, reports of happiness jumped 14.5% (p < 0.05) in 2016 and 18.4% (p < 0.01) in 2018, yet it was really on Day 4 in both years in which reports of happiness spiked, with the 66.1% in 2016 and the 60.0% in 2018 statistically significantly different from the three previous days at the p < 0.001 level for all comparisons. Thus, our replication confirms our previous finding that happiness among participants began to rise on Day 3 and peaked on Day 4 of the CIRs. It is not surprising, then, that in both years, happiness was higher than all the other emotions other than enthusiasm on that final day (p < 0.001 for all comparisons). In contrast, the only other times happiness was significantly different from any other emotion was for sadness in 2016 on Days 1 (p < 0.01), 2 (p < 0.05), and 3 (p < 0.01), and in 2018 on Day 3 (p < 0.001); for anger, it was higher in 2016 on Day 1 (p < 0.05) and in 2018 on Days 1 (p < 0.05) and 3 (p < 0.001). These patterns are only slightly consistent, but they do not significantly undermine our general finding from our previous study: happiness, while somewhat present throughout the CIR, was most prominent at the end of the deliberative process.
In the 2016 data, anxiety was the third most common reported emotion, averaging 31.0% across the four days, while in 2018, it was the fourth most common, averaging 25.4%. The only statistically significant difference across the four days in 2016 was that Day 2 was 14.5% higher than Day 4 (p < 0.05); there were no significant cross-time differences in 2018. Compared with other emotions, in both years anxiety was significantly higher than anger on Days 1 (2016 p < 0.05, 2018 p < 0.01) and 2 (p < 0.01 both years), and significantly higher than sadness on Days 1 (p < 0.01 both years), 2 (2016 p < 0.001, 2018 p < 0.05), and 3 (2016 p < 0.01 and 2018 p < 0.001). The significant findings about anxiety unique to each study are that in 2016 it was higher than sympathy on Day 2 (p < 0.05), and in 2018 it was higher than anger on Days 3 and 4 (p < 0.01 for both comparisons). We concluded previously that the patterns we detected regarding anxiety were not as strong as those for enthusiasm or happiness, although we thought the data suggested that anxiety peaked on Day 2 and reached its nadir on the final day.
Our new data provides more evidence that across the four days of deliberation anxiety was generally higher than sadness and anger, but it does not support the pattern we tentatively saw regarding the movement of anxiety across those days. Thus, we now conclude that it is unlikely that average levels of anxiety change significantly across time during mini-public deliberations such as the CIR.
Another difference we can see between the two studies is regarding sympathy. In the 2016 CIRs, between 17.7% (Day 2) and 27.4% (Day 1) of participants reported feeling sympathy, with an average of 22.6%; there were no statistically significant differences between the days, and sympathy was only significantly higher than anger (p < 0.01) and sadness (p < 0.001) on Day 1. In contrast, in 2018, between 13.3% (Day 4) and 48.3% (Day 2) of participants reported feeling sympathy, with an average of 29.2%; this meant that sympathy was higher on average than anxiety in 2018. Sympathy was also statistically significantly higher on Day 2 than on Days 3 and 4 (p < 0.001 for both comparisons), and higher on Day 1 than Day 4 (p < 0.01) as well. In relation to the other emotions, in 2018, sympathy was significantly higher than anger and sadness on Days 1 (p < 0.001 for both comparisons), 2 (p < 0.001 for both comparisons), and 3 (p < 0.05 for both comparisons), and higher than anxiety (p < 0.01) and happiness (p < 0.001) on Day 2. It appears that participants felt sympathy more often in the 2018 CIRs, especially on Day 2, and while we made no strong conclusions regarding sympathy in our previous work, comparing the two sets of data reveals that sympathy is likely an emotion that can vary significantly across mini-publics.
The least reported emotions in both years, on average, were sadness and anger. In 2016, anger was more common than sadness, while in 2018 the reverse was true, but when we look at the comparisons, we see a striking consistency. With one exception, neither anger nor sadness showed significant temporal variations, and across all four days in both sets of data, neither were ever significantly higher than any other emotion. The exception to this consistency occurred on Day 3 in 2016. We argued that Day 3 included a 'spike in anger' to 29.0%, which was statistically significantly higher than the 11.3% on Day 1 (p < 0.01) and 12.9 % on Day 2 (p < 0.05), and borderline statistically significantly higher than the 14.5% on Day 4 (p = 0.06). Anger was also statistically significantly higher than sadness on Day 3 (p < 0.01), the only time either emotion was higher than another one. In 2018, however, we see no such 'spike' on Day 3. Reports of anger were consistently low across all four days, and in fact, setting aside sadness, anger was significantly lower than every emotion on every day except for sympathy on Day 4 (see above for exact p-values). While the 2018 data supports our previous claims that sadness was uncommon and anger never dominant during mini-public deliberations in the CIRs, it undermines our previous contention that anger rose to a peak during Day 3.

Emotions and deliberative quality
Having established the patterns in emotional expression across the four days of the CIR, we also wanted to examine the possible effects of the emotions on the deliberation itself. To do so, we investigated the relationships among participants' emotions and their evaluations of the deliberation. To measure these evaluations, we used three questions from the daily surveys:

When experts or other CIR participants expressed views different from your own today, how often did you CONSIDER CAREFULLY what they had to say? [CONSIDER OTHER VIEWS] Never Rarely Occasionally Often Almost Always
The first and second items aim at measuring participants' views on whether their fellow participants enacted the deliberative values of openness and mutual respect, while the third evaluates participants' own self-reported enactment of those same values. In this case, because the analysis involved relationships between dichotomous variables (emotions felt or not felt) and ordinal variables (Likert scale evaluations), we utilized Somers' D in our analysis (Newson 2002: 51-52).
Somers' D provides results that can treat associations as either symmetrical or going in one direction or the other, although the approximate statistical significance is the same for all three values. In what follows we justify presenting Somers' D values that are directional based upon our theoretical arguments and previous empirical research, but we acknowledge that it is possible that these relationships are symmetrical or go in directions opposite of the ones we propose. Since our data comes from end-ofthe-day surveys that included all the measures we analyze, we cannot establish with certainty the directionality of the relationships, although we believe there are good reasons for reporting them as we do. We want to stress, however, that the associations are all statistically significant regardless of directionality.
Given our theory that deliberative procedures are the best explanation for the patterns of reported emotions among participants, we present Somers' D values assuming that OPPORTUNITY and RESPECT-the two items that tapped into participants' perceptions of how others treated them during the deliberative process-would likely affect their emotions. More specifically, we posited that participating with others who they perceived as following deliberative norms would be associated with the positive emotions of enthusiasm, happiness, and sympathy, while a negative experience would be associated with the negative emotions of anger, anxiety, and sadness.
In contrast, when assessing participants self-identified willingness to follow deliberative norms-CONSIDER OTHER VIEWS-we present the Somers' D values assuming that people's emotions affected this measure because the findings of Affective Intelligence and related research shows that emotions affect people's willingness to deliberate. The evidence for the specific relationships here is more complicated. Affective Intelligence (MacKuen et al. 2010;Marcus 2002;Wolak & Marcus 2007) research provides evidence that anxiety, which researchers often classify as a negative emotion, likely leads to a more deliberative stance, while enthusiasm and anger do not. McClain (2009) agrees with these findings regarding anxiety and anger but argues that enthusiasm increases information seeking in a concentrated deliberative setting. Kim (2016), in contrast to both, provides evidence that anger leads participants in deliberation to gain more knowledge and collect more arguments from the other side, but also that highly informed angry participations acquired fewer arguments for their own side. Morrell (2010) and Muradova (2020Muradova ( , 2021 find positive relationships between empathy-which we indirectly measure through sympathy-and deliberative norms such as reciprocity and reflective judgment. Despite the mixed evidence, on balance, we believe that the best evidence indicates the associations between participants' willingness to carefully consider other views and anxiety, enthusiasm, and sympathy are likely positive, while the association with anger is likely negative. In the absence of previous evidence, we presumed that this measure would likely have a positive association with the positive emotion of happiness and a negative association with the negative emotion of sadness. Our analysis involved comparing six emotions to three evaluations across three days for both CIRs, resulting in 144 comparisons. In our previously reported results from the 2016 data, only three of the 72 comparisons were statistically significant at even the p < 0.05 level, and of these, we were not very confident in one of the results. Contrary to our expectations, the negative emotion of sadness had a positive relationship with participants indicating that they carefully considered views that differed from their own on Day 2 (d = 0.48, p = 0.03). Our lack of confidence in the result arose from the fact that only four participants out of 62 (6.5%) reported feeling sadness on that day. We believe our 2018 data justifies our initial hesitancy to draw strong conclusions on this score because they indicate that sadness demonstrated no significant relationships with any of the three deliberative quality items across any of the four days.
We were also cautious, although more confident, in in one of our two other findings, which were related. On Day 4 in 2016, happiness had a significant, positive relationship (d = 0.45, p < 0.001), and anger a significant, negative relationship (d = -0.22, p < 0.05), with participants' beliefs that they had sufficient opportunities to express their views. When we looked more closely at the crosstabulations, we saw that 34 of the 41 (82.9%) participants who reported that they definitely had opportunities to speak on Day 4 also reported feeling happiness, while only three (7.3%) reported feeling anger. In contrast, only six respondents (9.7%) reported probably or definitely not having sufficient opportunities to express their views, and of these, three reported feeling anger (50.0%), but two reported feeling happiness (33.3%). Thus, we concluded, the significant differences we found on Day 4 were primarily due to the differences among those who reported they definitely had an opportunity to express themselves, with significantly more of these participants indicating they had felt happiness than those who did the same for anger.
In the 2018 data, we found nearly exactly the same relationships between participants' perceptions of their having an opportunity to speak on Day 4 and emotions. Happiness had a significant, positive relationship (d = 0.39 p < 0.01), and anger a nearly significant, moderately negative relationship (d = -0.21, p = 0.050), 4 with OPPORTUNITY. The cross-tabulations revealed that 31 of the 44 participants (70.5%) who indicated that they definitely had a sufficient opportunity to express their views reported feeling happiness, while only one (2.3%) reported feeling anger. Only two (3.3%) participants indicated that they probably or definitely did not have a sufficient opportunity to express their views, and of these, neither reported feeling happiness and one (50%) reported feeling anger. Even if the conclusion is rather intuitive, given our small sample sizes and the idiosyncratic possibilities that abound in deliberative mini-publics, the consistency in these findings across the two sets of data is remarkable. While we were cautious in our interpretations before, we are now more confident that participants who believe they have had a good chance to express their views are much more likely to report feeling happiness at the end of deliberative mini-publics.
While these findings were pleasantly supportive of our previous work, there were other differences that arose in the new data. In 2016, beyond the three findings at the p < 0.05 significance level, we discussed six others that fell between 0.05 and 0.10, and yet, given their statistical weakness, we were only willing to claim that the data was suggestive. Since we were the first to engage in this kind of empirical analysis of emotions in deliberative minipublics, we decided that reporting such marginal results was appropriate. In the 2018 data, however, we discovered many more statistically significant relationships (p < 0.05) among the comparisons, only one of which conforms to our previous tentative findings. Many of the new findings in the 2018 data support our general hypotheses. Enthusiasm had positive relationships with RESPECT on Day 3 (d = 0.30, p < 0.05), and with CONSIDER OTHER VIEWS (d = 0.37, p < 0.01) and OPPORTUNITY (d = 0.45, p < 0.01) on Day 4. Happiness had positive relationships with CONSIDER OTHER VIEWS (d = 0.45, p < 0.05) and RESPECT (d = 0.17, p < 0.01) on Day 2. Finally, sympathy had positive relationships with OPPORTUNITY (d = 0.17, p < 0.01) and CONSIDER OTHER VIEWS (d = 0.44, p < 0.05) on Day 4. While all these relationships are in the right direction, none of our even marginally significant findings from our previous study match these new findings.
Several of the statistically significant results in the new data run counter to our hypotheses. Anger had a positive relationship with RESPECT (d = 0.09, p < 0.05) on Day 1, but as with our finding in 2016 about sadness, only four participants indicated they felt anger on Day 1, and all of these indicated that their fellow participants ' almost always' treated them with respect. Also counter to the hypothesis we drew from the work of Marcus et al., on Day 2 anxiety had a significant relationship with CONSIDER OTHER VIEWS, but it was negative (d = -0.34, p < 0.05) rather than positive. Again, however, even the marginally significant results from 2016 do not parallel these.
The one marginal finding from our 2016 data that our 2018 data bolsters is that happiness had a positive relationship with CONSIDER OTHER VIEWS (2016: d = 0.22, p = 0.08; 2018: d = 0.49, p < 0.001) on Day 4. Although this finding was only marginally statistically and moderately substantively significant in 2016, the relationship exhibited greater strength and significance in 2018. While still cautious, we are more confident than before that people who express feeling happiness on the final day of the CIR are more likely to report carefully considering others' views that are different than their own. If we consider, as reported above, that in both years participants indicated feeling more happiness if they believed they had a good chance to express their views during the final day, as well as our new findings that on Day 4 enthusiasm and sympathy had positive relationships with participants' perceptions of the opportunities to express their views and their willingness to consider others' views carefully, we believe that there is a likely pattern across the data. While we must remain careful about the strength of this claim, we believe it is highly likely that the final day may be a unique moment where the culmination of the process in the deliberative minipublic leads to a connection between participants feeling certain positive emotions-enthusiasm, happiness, and sympathy-and their evaluations of their own and others' behavior during the final day's deliberations.

Deliberation, Replication, Emotions: A Discussion
Replication can be an important tool in buttressing or weakening our confidence in previous results. The CIRs provide a unique opportunity for engaging in replication because, despite inevitable variations, the differences across CIRs are much less than one would find comparing different types of mini-publics. Our examination of data from a new round of CIRs strengthens several of our previous findings. We are much more confident that the most common emotion mini-public participants experience is enthusiasm, that happiness is most pronounced after the final day of deliberation, and that sadness is a relatively uncommon emotion throughout. It is possible that some of these findings, especially the consistently high levels of enthusiasm, are the result of socially desirable responses, although there are several considerations that mitigate this potential issue. Members of the research team distributed and collected all surveys, not the CIR staff, we assured participants that their responses were anonymous, and we believe the variations we see in the other emotions, especially happiness, as well as measures we do not report here, make it more likely that these responses reflect most participants' actual experiences. We also found further support for our earlier conclusion that participants who feel that they have had sufficient opportunities to express their views on Day 4 are much more likely to report feeling happiness, and better evidence to support our tentative conclusion that those who feel happiness on that day are more likely to indicate that they carefully considered views different from their own. Given the smaller sample sizes in our data and the inevitably idiosyncratic nature of mini-publics and their participants, we believe that replicating these findings is no small feat.
There were, however, differences in our new data that have caused us to reflect more on our previous claims.
Although anxiety and sympathy were again the third and fourth most common emotions in both studies, there were some important differences. Anxiety was the third most common emotion in 2016, while sympathy ranked third in the new data. Previously, we detected what we called a 'rise' in anxiety on Day 2, while sympathy did not vary significantly across the four days. In 2018, however, we did not see the same variation in anxiety, but instead, saw a rise in sympathy on Day 2 to a level that was significantly higher than on Days 3 and 4. Thus, not only did these two emotions exchange places in the hierarchy of commonality, they give the impression of exchanging temporal patterns as well. Of course, there is no reasonable explanation for this seeming relationship; it is merely, we would speculate, a random artifact. This leads us to conclude that anxiety and sympathy are emotions that are fairly common during mini-public deliberation, and we would encourage future researchers to continue to measure them, but the exact roles they play may vary across mini-publics.
Another difference in our new data relates to anger. In our previous work (Johnson, Morrell, & Black 2019), we noted that on average participants reported feeling anger (16.9%) only more than sadness (9.3%), but in 2018, we see another reversal, with sadness (8.8%) slightly more prevalent on average than anger (6.3%). Even more important, however, is that we perceived a 'spike' in anger on Day 3 (29.0%) in 2016, and while we acknowledged that this was still a smaller percentage of participants than several other emotions on several other days, given the centrality of anger to the Affective Intelligence Theory, we decided to explore the role of anger using a qualitative analysis of observer notes. In 2018, however, there was no 'spike' in anger on Day 3 or at any other time; it thus made little sense to repeat our qualitative analysis, at least in the same form, on this new data. Yet despite not finding a spike in anger again, we would argue that given the findings across the two studies related to anger, including that it might have effects on evaluations of deliberation, as well its central role in Affective Intelligence Theory, researchers should continue to measure anger in their studies of mini-publics. We would not argue the same regarding sadness given its very low prevalence and lack of significant effects. In its stead, researchers could draw from Saam's work for other possible negative valence emotions that may be important, such as disappointment and shame.
These new sets of revelations require that we reflect upon our conclusion regarding the Deliberative Procedures Frame as the best way to help us understand panelists' expressions of emotions such as anger, anxiety, enthusiasm, happiness, and sympathy, and the role these emotions might play in deepening and advancing collective deliberation. One of the advantages of our data, however, is that by collecting surveys at the end of each day, we can detect patterns. The fact that we find several of these repeating across the two sets of dataconsistent levels of enthusiasm, the slow rise in happiness to a peak on the final day, and the relationships between several emotions on the final day and participants' evaluations of deliberative quality-indicate that the Deliberative Procedures Frame is still probably a viable theory for helping us explain emotions in mini-public deliberation. We also remain confident that the sources of anger and frustration we identified in our previous analysis-including that which focused on the deliberative procedures themselves-remains correct. What our replication clarifies, however, is that the Deliberative Procedures Frame provides us a theory about the likely source of participants' emotions in deliberation; it does not establish an inevitable causal relationship between procedures and participants' emotional patterns. 5 Instead, it identifies the most likely times during deliberation where emotions that are important to deliberative quality-such as happiness, anger, and sympathy-will most likely arise; thus, it points us to the periods in the deliberative process that are likely crucial to maintaining deliberative quality in mini-publics. Our data point to the middle periods of deliberation-during which participants are hearing from and questioning experts and advocates and engaging in the process of drafting, editing, and ranking the claims they want to include in their statement-as the most probable moments for the activation of emotions such as sympathy and anger in the most participants. In contrast, the end of the process is the most likely time for the activation of happiness. Our data, unfortunately, cannot reveal why this is the case, but based upon our observations of the process, as well as the qualitative analysis we engaged in for our previous work, we would conjecture that it is the interactive nature of the middle moments of deliberation that hold the key to understanding their potential for the activation of positive or negative emotions, while the sense of accomplishment in the group after successfully addressing the issue under discussion and crafting the CIR statement likely explains why happiness arises at the end of the process.
Our findings also speak to several concerns scholars have regarding emotions and deliberative democracy. First, they provide evidence indicating that the process of deliberation most likely plays an important role in the emotions participants are likely to feel during deliberative mini-publics (see, e.g., Fischer 2010) and that these effects can vary across time. Thus, researchers should aim to observe such effects throughout deliberation, if possible, rather than just relying upon pre-and postdeliberation measures; they may miss moments of emotion among participants that subside by the end, such as the increases among participants we observed in anger in 2016 and sympathy in 2018. Our robust findings in both years indicating that there is likely a positive relationship between participants perceiving that they had opportunities to express their views on the final day and reporting feeling happiness is not surprising, but it still points to how important it is that those running deliberative mini-publics ensure that everyone has a chance to express themselves.
In our less robust findings-those that only appeared in 2018-our results go against Affective Intelligence Theory's claims. First, we found very little indication that the key emotions the theory identifies-enthusiasm, anxiety, and anger-had any effects on our measures of deliberative quality. When there were relationships, such as enthusiasm having a positive relationship on Day 4, and anxiety a negative relationship on Day 2, with participants' self-reported tendency to consider other's views, they run counter to the theory. We would also note that we found no support for Kim's (2016) counter-argument that anger can improve deliberation. However, since the relationships for which we did find evidence were not consistent across all four days and did not appear in 2016, and given our small sample sizes, we are not in a position to claim that the evidence is persuasive enough to question the theory, although they do raise the possibility that there could be key differences between emotions' effects in the diffuse, unstructured deliberation of the broader political system-the target of Affective Intelligence Theory-and in concentrated, structured mini-publics.
Our data also point to one of the key difficulties in this area of research: those designing and running mini-publics not surprisingly aim to have successful deliberations, and in doing so, often engage in strategies to mitigate the effects of emotions such as anxiety and anger. Experiments such as Kim's (2016) and McClain's (2009) can attempt to manipulate these emotions, but there are likely limits on how these emotions might appear for those who wish to study mini-public deliberation outside the laboratory. As our evidence indicates, enthusiasm may begin and remain common, while anger and anxiety may or may not arise. Similarly, one result that conforms with previous theories from scholars such as Krause (2008), Morrell (2010), and Muradova (2020Muradova ( , 2021, who argue for the importance of empathy in deliberation, is that sympathy had a positive relationship with self-reports of considering the views of others on Day 4, but again, this relationship was only significant on one day and did not appear in 2016. Such evidence is simply not enough to persuade us fully to confirm these theories. Variations in design, as well as the variability in participants and the issues under consideration, make studying mini-public deliberation complex and difficult (see, e.g., OECD 2020). That we were able to replicate many of our previous findings is an important achievement, we believe, and the confounding data only pushes us, and others we hope, to continue to engage in research on the vital topic of the role of emotions and process in deliberative democracy. In doing so, our experience suggests several areas that will be important in future research. One key area that needs addressing, and only increases the complexity of research in this area, is to determine the target of the emotions participants report feeling. One of us is part of a research group that has begun engaging with this problem (Morrell, Spada, & Smith 2020), but if we want to understand more fully the role of emotions in deliberation, it is vital that we get a handle on how the sources of the emotions vary. As we suggested in our previous research, the targets of the various emotions participants feel could include the issue under discussion, the outside experts or advocates, their fellow participants, the facilitators, or the deliberative procedures themselves. We would also suggest measuring additional emotions that might further our understanding in this area. Given our own previous findings, and following Saam's (2018) work, measuring frustration with the deliberative process, and isolating this from anger, is likely highly vital; it could also be helpful to try to measure disappointment, hope, and shame. It is also important for researchers to consider how we measure emotions in deliberation. There are many ways of doing so other than relying upon surveys from participants, which relies upon participants accurately remembering and reporting their experiences, and which we can only administer post-deliberation. By addressing these various concerns, we may be able to analyze the relationships among emotions and their effects on deliberative quality more clearly. It is only this kind of data that will allow us to be both confident in our Deliberative Procedures Frame and our understanding of the role of these important emotions in mini-public deliberation.
Notes 1. We agree with Shrout and Rodgers that preregistration is important but must also admit that we did not do so for either the 2016 or 2018 studies. They were part of a large collaborative effort including scholars in disciplines where preregistration is still not the norm. 2. We acknowledge that there are many ways to measure emotions in deliberation, and our choice of selfreported surveys has its limitations. We are only able to do so at the end of the day and rely upon participants' both remembering what they felt and being willing to report their experiences. We also used a dichotomous variable (felt/not felt), so we were unable to capture variations in the strength of the felt emotion. These are real limitations, but there were both practical and theoretical reasons for our choice. Practically, as part of a larger research project, we had only limited space on the survey and thought this approach would work well enough to allow us to do the tests in which we were interested. Theoretically, we were really interested in participants' own perceptions of what occurred during deliberation. Although we think our reasons justified our choices, scholars could certainly gain even more insight using other measures, something we discuss below. 3. While we recognize that mini-publics can often have frustratingly difficult variations across cases, we chose to aggregate the data from all three mini-publics in each year to overcome issues of small sample size that are endemic to much of this type of research. This allows us to be more confident that any effects that we find are not due to the idiosyncrasies of a particular mini-public, and although it is also a conservative approach that might lead to more Type II errors, and thus miss effects that are present, we think it is the best given our small sample sizes. To further ensure this approach was valid, however, we computed Goodman and Kruskal's lambda for contingency tables of the three CIRs each year by each emotion on each day. The only comparison that was significant at the p < 0.05 level was for anxiety on Day 1 in 2016, with 50% of participants reporting feeling it in MA, 30% in in OR, and 14% in AZ, and its substantive significance was small (0.12). Given this is the only significant results across all these comparisons across both years, we feel confident that our approach is appropriate. 4. To allow a full assessment and comparison with previous results, we report here the exact approximate significance rather than the traditional p < 0.05 formulation. 5. We appreciate the suggestion of one anonymous reviewer that one possible explanation for the differences we find could be the different topics discussed in 2016 (legalization of marijuana and corporate income tax) and 2018 (affordable housing, patient safety and hospital transportation, and rent-controlled housing). We were able to compare differences within each set of three CIRs and found no differences, but comparing differences across years would move us beyond the replication study we report here.