Evaluating Public Deliberation: Including the Audience Perspective

I argue that in evaluating public deliberation, the basic criterion should be how deliberating citizens’ need for usable input is met, rather than how the debaters embody Habermasian consensus-oriented ideals, and I question assessment of “deliberative quality” on that basis, such as the “Discourse Quality Index.” Studies of public deliberation should instead build on an Aristotelian notion of deliberation, on Rawls’s idea of “reasonable disagreement” and on the deliberating audience’s needs. To explore these, we need real-time studies of audience reception of public deliberation. I place the studies I call for in a typology of studies, present a study with novel methodological features and discuss its implications for criteria for public deliberation.

people may deliberate (bouleuein) about is actions they may decide to undertake: 'Clearly counsel can only be given on matters about which people can deliberate; matters, namely, that ultimately depend on ourselves, and which we have it in our power to set going' (1995( , Rhetoric 1359a. Accordingly, I consider it essential that deliberation centrally concerns decisions about future action (i.e., about what to do). It is worth noting that ' deliberative' in Greek is symbouleutikon (i.e., discourse in which we talk with each other (sym-) about what it is our will (boulē) to do); hence, it is by definition focused on the future, as Aristotle points out (1995, Rhetoric 1358b. Deliberative debate is a subcategory of the category of all deliberative discourse and also of all debates. Another main genre of debate is what we may call epistemic debate. Here the overarching issue on which debaters diverge is not what to do, but what is the case in regard to some issue. The distinction between deliberative and epistemic debate is not dichotomous but allows for intermediary and mixed types. While deliberative debates have future action as their overarching type of issue, they regularly involve epistemic issues, for example concerning current laws or past events. However, discourse on such issues will typically function as part of the arguments (premises) debaters offer to support their policies regarding the overarching issue.
There are other main genres of debate. Forensic debate in court cases tend to combine elements of deliberative and epistemic debate. They resemble epistemic debates insofar as debaters will argue about past facts and current laws, but they resemble deliberative debates because there will also be discussion of future action (e.g., what prison sentence to give).
It is meaningful to distinguish between three different forms or venues of deliberation: public, interpersonal, and intrapersonal ('inner') deliberation (Kock 2018).
In public deliberative debates, debaters address not just each other but also a (mainly silent) audience, either a live audience, as in town hall debates, or a mass mediated one, as in televised debates; both kinds may be involved simultaneously. Typical instances are televised debates between presidential candidates, or between party leaders pending a general election (as in the study presented below). Public debates may also unfold, wholly or partly, in writing, for example in the opinion pages of a newspaper or across several media. New digital technologies, such as videoconferencing and many-to-many social media, have created numerous intermediary forms of mediated debates. In all these forms, traditional or digital, the binary distinction between debaters and audience members may be porous, as when a town hall debate includes comments and questions from the audience. However, it remains possible to distinguish clearly between open debates, which non-participants may witness, and debates without such a possibility (closed debates). The institutions of a democracy usually stipulate the use of both public and closed debates; advantages and drawbacks of these two forms are discussed in Chambers (2004).
Interpersonal debates are closed in that they have no audience, only participants. Of the three forms, interpersonal debates have been most emphasized and studied by scholars interested in deliberation. Consensuscentered standards of deliberation, as advocated by Habermas (1984Habermas ( , 1991Habermas ( , 1996 and some deliberative theorists (e.g., Bessette 1994), seem to build on a nearexclusive consideration of interpersonal debate. Its first paradigm was (some of) Plato's Socratic dialogues; in these, consensus often emerges as Socrates' interlocutors adopt his views. Important initiatives like the deliberative polls organized and described by James Fishkin (e.g., 2011) or the 'minipublics' discussed by Goodin and Dryzek (2006) are interpersonal dialogues. Hauser's work on 'vernacular rhetoric ' (1999) illuminated citizens' quotidian interpersonal exchanges. The early thinking of Habermas, which has informed much work on deliberative democracy, held that consensus was the ideal (even if counterfactual) outcome of proper deliberative debate. In argumentation theory, the school of Pragma-dialectics (van Eemeren & Grootendorst 2004 and many other writings) has similarly theorized that ' critical discussion' abiding by certain rules will result in the issue separating the discussants being 'resolved.' There has since been objections to consensusbased theories among deliberative democrats as well as rhetoricians (e.g., Dryzek 2000;Ivie 2002;Jezierska 2019). The same is true in argumentation theory (e.g., Kock 2007Kock , 2009Kock , 2018; Habermas (2001: 43) himself has recognized that 'in the case of controversial existential questions arising from different world views' it is 'reasonable to expect continuing disagreement.' Regardless of whether interpersonal debate could or should posit eventual consensus as an ideal, I hold, as argued below, that in public deliberative debate, consensus between debaters is not a likely nor a desirable goal.
A third form of deliberation is intrapersonal (i.e., an individual's own deliberative reflections), without interlocutors or audiences; Goodin (2000) speaks of ' deliberation within'. In The New Rhetoric, Perelman and Olbrechts-Tyteca (1969: 30) emphasize situations where an individual ' deliberates or gives himself reasons for his actions.' As citizens in democracies are decisionmakers in their capacity as voters, it seems self-evidently desirable that individual citizens engage in intrapersonal deliberation before deciding. I will argue that the most meaningful function of public deliberative debates is to aid citizens in this; hence, consensus in a public debate may actually be undesirable, because it can be said to preempt citizens' own decisions. Chambers (2009), finding that interpersonal deliberation has overly dominated deliberation scholars' work, has asked-rightly, I suggest-whether deliberative democracy has ' abandoned mass democracy.' She proposes that those wanting to enhance deliberation in society should do more to study and improve public debates-for the sake of mass democracy, mediated or otherwise.
Finally, the concept of reception is central in this paper. It is understood in a broader sense than concepts often used as dependent variables in audience-including debate studies that look at 'persuasive effect,' 'learning,' 'perceived issue salience,' or 'vote preference.' Audience members' reception of a debate includes all these factors and others as well. All these parameters tend to cast audience members as passive objects of influence exerted, and effects achieved, by the debates they have witnessed, but reception may also include processes that cast audience members as independent agents who-more or less consciously-seek to use and assess the input they receive for purposes of their own. Reception analysis, following Schrøder's (2015: 3) authoritative account, originated in media research as an attempt to supplement a 'positivistic, behavioral approach that concentrated on the quantitative measurement of immediate and direct media effects.' Reception analysts, following a tradition begun by Stuart Hall (1973), have tended to emphasize audiences' (sometimes oppositional) meaning-making; in this paper, the umbrella notion for the data types that reception analysis gleans is even broader than meaning-making and notably includes audience members' evaluations of, and affective responses to, what they witness. As will be seen, such data may suggest what audience members expect and want, respectively do not want, from public deliberative debates.

Public Debate for audiences
As indicated, I argue that public debates (deliberative or otherwise) should not be considered as interpersonal dialogues between the participants but rather as events staged in front of audiences and for their sakes (or primarily for their sakes). Hence, any discussion of quality standards for such debates must necessarily include the debate-audience nexus in all its aspects. It will not suffice to only consider what the participants may see as the main goal of having the debate or as their individual aims in participating. The argumentation scholar Douglas Walton (1989) has usefully proposed that we distinguish between ' dialogue types'; however, it is inadequate to postulate, as he does regarding so-called 'Persuasion dialogue,' that the main goal of a public deliberative debate is to '[r]esolve or clarify an issue' and that participants aim to '[p]ersuade the others,' nor to say, as he does for 'Deliberation', that the main goal is to '[d]ecide the best course of action,' and that participants' aims are to '[c]oordinate goals and actions.' Do we, for example, as citizens watching a presidential debate in the US, want the two candidates to resolve between them the issue of who is the best candidate with the best policies? Hardly, and we certainly do not expect it to happen. The aims with which audience members witness a debate do not necessarily coincide with those of the debaters. Instead, we must attend to the aims a public debate might serve for audience members (i.e., what uses and benefits there might for them in witnessing such a debate).
Based on these reflections, I argue that public deliberative debate, rather than aiming to have issues resolved and agreement found between the debaters, should primarily serve the function of providing citizens usable input for their intrapersonal deliberations on which future actions to support in their capacity as judges, as Aristotle has it (1358b). Public deliberative debate should not be theorized or evaluated as if it were purely interpersonal but according to standards reflecting the fact that the audience, not the debaters, are to judge. The key criterion of quality in public deliberation should not be what it does for the debaters but what it does for the audience.
For deliberative democrats, a defining feature of proper deliberative dialogue is that 'preference change' may result from argumentation offered by the two sides (e.g., Dryzek 2000). However, in successful public deliberative debates, it is not primarily in the debaters that preference change should be expected or encouraged. Rather, it is in the audience. Or, for audience members who do not already have preferences, preference formation might, just as importantly, be seen as something to desire and expect.
Also, it seems empirically plausible that public debate is not as such conducive to preference change in debaters; when disagreeing arguers debate in public, they seem less likely to concede ground to their adversaries than they would be in closed debate. Chambers (2004: 394) has argued that while public debate will tend to enhance the appeal to public reason-because public debaters cannot expect to persuade by appealing to purely private reasons that only their own group will share-an undesirable tradeoff is that public debate tends to lower the quality of public reason: debaters will tend towards shallow ('plebiscitory') reasoning, 'wanting to please the largest number of people or wanting to appear firm and decisive in the public's eye.' This discourages consensus or compromise, because it requires debaters to concede ground and risk appearing less firm. Chambers (2004: 394) quotes Gutmann and Thompson (1996: 115) as saying that because the sessions of the 1787 Constitutional Convention in Philadelphia were secret, 'members could speak candidly, change their positions, and accept compromises without constantly worrying about what the public and the press might say.' Hence, even if it could be argued theoretically that the proper function of public debates is to induce or enhance consensus or compromise-as early-Habermasian deliberation theory would suggest-the fact seems to remain that public debates are ill suited for it.
We may also start, as it were, from the other end, asking, 'What, if anything, could public debates be well suited for?' This question is not asked (let alone answered) as often as one might wish. Plenary debates in parliamentary democracies are typically public and carefully recorded and disseminated on all available platforms, but even in a firmly established parliamentary democracy like Denmark, for example, it is a remarkable fact that nowhere in the constitution, nor in constitutional law or in-house rules, does one find formulations stating the intended purposes or functions of these debates (Andersen 2012). However, if our point of departure is the Aristotelian assumption that the citizens attending a public debate are to be judges, then it follows that an obvious function of such debates might be to equip those citizens in the best possible way to act in that capacity. They, not the debaters, nor any moderator or presiding official, are to judge; in presidential debates, for example, they must judge in the polling booth who is the best candidate with the best policies.
A particular affordance of debates compared to monological forms (such as speeches, rallies, ads) is that debaters may be required not just to present their positions, but also to offer relevant arguments for them and to answer critical questions and counterarguments. In monological forms none of these may be required of debaters. It is often assumed that citizens, to choose which candidate to vote for, must know all candidates' positions, and to provide that knowledge is the main function and quality criterion for public debates; but such a criterion, while appropriate, is inadequate-precisely because it neglects to mention citizens' need to also hear arguments, counterarguments, and answers. The inadequate criterion fits the assumption that citizens have pre-formed and fixed views (preferences) beforehand (i.e., this criterion belongs to what Dryzek (2000) calls an ' aggregative' nondeliberative conception of democracy), as in Rational Choice theory. It has no place for preference formation or preference change in citizens, and deliberative democrats will deem it inadequate. Thus, public debates may afford something that monological forms lack.
To conclude, if we want to determine criteria for public debates based on what function they might best serve, one natural criterion is to equip citizens as well as possible to act as judges in a democracy, and this would involve not just providing knowledge of candidates' positions, but also of their arguments for them and their answers, if any, to critical questions and counterarguments. In contrast, assigning to public debates the function of having the debaters undergo preference change and thereby resolve disagreements by compromise or consensus would seem, first, pointless in the sense that it would co-opt the role of the audience, which again would make the very publicness of such debates well-nigh meaningless; secondly, assigning such a function to public debates would require these debates to do something they are particularly ill suited for; thirdly, public debates would then not be given a democratic function that they are, in principle, particularly well-suited for.
Additional functions of public debates might also be asserted. Aristotle names epideictic as the third major genre of rhetoric besides deliberative and forensic rhetoric; epideictic means something to do with show and performance, and giving audiences a stimulating and exciting show is per se a legitimate function of, for example, presidential debates. This function is not to be scoffed at by deliberative democrats, because excitement might boost interest, arousal, and eventually voter turnout. But if a debate does that, then by the same token more citizens are to be judges, and the function of equipping them well in that regard accordingly acquires more urgency.

Debate studies
If, as argued above, quality in public deliberative debates should primarily be theorized and evaluated with regard to their functions for audiences, then I will argue that there is a need for empirical studies where real audiences generate data on their reception of the debates they watch. While holding the primary, theoretically based function of public debates constant, we might, if we conduct such studies, learn more about how characteristics of actual debates may help or hinder that function, as well as other, additional functions.
I will argue for a specific type of empirical study as having particular relevance in this regard. It is a type not strongly represented in the rich debate literatureone where audiences can freely tally their reception of a debate.
I will place the type of study I advocate in a typology of existing debate studies, citing representative examples.
First, we may distinguish between exclusively contentoriented (textual) studies (i.e., studies that consider debates without involving audience-related data) and studies that do consider the debate-audience nexus.
Content-oriented studies may focus on debate rhetoric (what the debaters say and do) or on the organization and format of the debates, or both. Contentoriented research focusing on debate rhetoric may be represented by the 'functional' analyses by William Benoit and associates that categorize debaters' rhetoric as either acclaims (self-praise), attacks (criticisms of the opponent), or defenses (responses to attacks) (e.g., Benoit and Harthcock 1999). Most such analyses only mention the audience cursorily. They mainly put their theoretical apparatus to descriptive use, bypassing normative issues (e.g., such as what kinds of debate content have deliberative quality).
A content-oriented study focusing primarily on format and with a clear normative approach is Auer (1962), which dismissed the 1960 presidential debates as 'counterfeit.' A comprehensive, mainly content-oriented debate study, addressing matters of format and content with a normative orientation, is Jamieson and Birdsell (1988).
A strand in recent textual debate scholarship attempts systematic normative assessment of debaters' rhetorical practices in terms of discourse quality. Thus, a discourse quality index (DQI) was proposed by Steenbergen and colleagues (2003: 22) as a 'measurement instrument of deliberative quality.' While applauding the normative intent underlying DQI assessment, I find its normative yardstick problematic, partly because of what I see as an over-reliance on early-Habermasian ideals, which tend to consider debates as purely interpersonal exchanges, disregarding the debate-audience nexus.
Some textual debate studies do make normative comments on how debates may hypothetically impinge on audiences. For example, Davidson and colleagues (2017: 188), who adopted DQI assessment in analyzing party leaders' televised debates and compared these with parliamentary debates, state, ' a well-reasoned justification for a policy position may enable the electorate to better reflect on the value of that position, which should enhance the quality of democratic decision-making.' This statement precisely supports the view that public deliberative debates should be evaluated by their capacity to help citizens' deliberation, thus supporting calls, as made in the present paper, for studies integrating audience reception data.
Among data gleaned in audience-involving studies we may distinguish between data relating mainly to effects and data relating mainly to reception.
Effect data tend to see the audience as objects of the impact of a message (e.g., a debate). Reception data, as noted, view audiences as reflective individuals who observe and evaluate all aspects of a debate, including debaters' rhetorical practices, debate format, and the moderator's role.
Most audience-involving studies expressly consider effect; fewer tally reception. Frequent dependent variables in effect-oriented audience-involving studies are 1) learning occurring in debate audiences (about issues, candidates, etc.) and 2) persuasion, such as how debates affect the audience's positions on issues, their vote preferences, and so forth.
An example of effect studies, Benoit, Hansen, and Verser (2003) present a broad-spectrum meta-analysis of effects on citizens of watching US presidential debates, including effects on learning, perceived issue salience, and vote preference; no reflections on deliberative quality or other normative considerations are offered.
Mutz (2015) has studied effects of incivility in televised debates. Holding all other factors than civility versus incivility constant by using scripted debates, she finds, among other things, that 'incivility breeds distrust' in citizens (Mutz 2015: 194); in a nuanced, normatively oriented discussion of her findings, she argues that a documented negative effect of incivility on trust in politics and politicians is somewhat offset by a tendency of incivility to increase audience arousal and interest. Benoit and Smythe (2003: 111) call for a synthesis of classical (effect-oriented) persuasion research and rhetorical (content-oriented) analysis, arguing that 'we must understand how auditors process and react to rhetorical discourse to have a complete understanding of the rhetorical process. ' An effect study that does this, connecting rhetorical features to persuasive effects and involving normative considerations, is Jørgensen and colleagues (1998). They analyzed 37 televised issue-oriented, town hallformat debates, statistically correlating properties of the debaters' rhetoric with their apparent persuasive effects (as reflected in polls taken in the live audiences immediately before and after each debate); from this data set the authors derived hypotheses concerning rhetorical practices and other features that typically characterize winning debaters (i.e., those gaining more votes than their opponents). Normative considerations, drawn from rhetorical theory and argumentation studies, were adduced, suggesting that winners' rhetorical performance accorded rather well with prevalent scholarly views of fair and reasonable debate practices.
Another little-known effect-oriented study from Denmark (Lantz 2013) presents empirically tested normative hypotheses about the best role for moderators in public debates. The criterion was that if an audience watches a debate, then votes on the issue, and is later offered more input on that issue, then the less preference change is caused by this later input, the better is the deliberative quality of the initial debate deemed to be (because it arguably gave the audience more of the input they felt they needed). Results showed that the best role for the moderator is to ask debaters questions raised by the audience and then press energetically for answers.
As for the difference between effect-oriented studies and reception-oriented studies, to which we now turn, it should be recognized that the difference between effect data and reception data is not a clear-cut dichotomy. Some studies involving the debate-audience nexus produce data of both kinds as well as data belonging somewhere on a continuum between them.
Studies rich in reception-related data tend to be more qualitative (i.e., nuanced and varied) but to yield less easily quantifiable responses than mainly effect-oriented studies.
Qualitative data relating to audience reception of a debate may either be gleaned after the debate or during the debate (real-time or ' concurrent' data); both kinds may be gleaned in the same study.
Qualitative reception data gleaned after a debate may, for example, come from focus groups who talk freely about debates they have watched. Livingstone and Lunt (2002) is a comprehensive study involving focus groups that have watched televised debates with participating studio audiences.
Reception-oriented data collected from informants in real time (i.e., while they watch a debate) are presented by Reinemann and Maurer (2005: 781), who used continuous response measurement (CRM): informants continuously record their personal impression of the debates with a seven-point dial device; this data was correlated with postdebate audience verdicts on the debaters.
Coleman and colleagues (2018: 5) also present a realtime reception-oriented study. They express reservations about quantitative audience studies that only record audience response on a positive-negative scale, including studies that use a turn dial to allow for degrees. They argue that the reasons why people express the preferences they do remain unknown and unexamined. More broadly, the problem with current technologies is that they fail to reflect the complicated relationship that always exists in acts of reading, viewing, and decoding between the text, social reality, and viewers' thoughts and experiences.
Instead, Coleman and colleagues (2018: 2) argue for ' a conceptual shift in real-time studies from measuring the preferences of audience members to capturing their sense of whether their capabilities are advanced through media use.' Their study involves real-time continuous feedback by allowing informants to continually choose between several re-formulated response statements.
Like Coleman and colleagues (2018: 6), I argue for studies producing nuanced, qualitative data. I agree that a simply negative response, for example, has little informativity; as they say, it might reflect the fact a viewer does not support a particular leader and his or her ideas. But it might also be because a viewer is frustrated with the debate in general, feels excluded from the discussion, believes she is misunderstood and misrecognized, or lacks the information she needs to understand specific claims.
Other reasons too might underlie a simply negative response. Likewise, simply positive responses may mean anything from 'I agree' over 'Well said' to 'This is entertaining.' On the background of this cursory, exemplified typology of debate studies, I argue that especially if we want to develop or apply deliberative quality standards for public debate, we should do studies that consider the debate-audience nexus, not just debates themselves. Further, they should not just produce effect data but data reflecting reception. The rhetorician Jens Kjeldsen (2016) has made a similar, well-argued call. Such studies should glean data near the qualitative end of a quantitative/ qualitative continuum-where the latter category reflects in free, nuanced form what happens in the minds of individuals. Such studies may increase researchers' opportunities to explore how audiences understand, evaluate, and use debates and how they believe debates meet their wants and needs. Studies should-explicitly or implicitly-involve or facilitate normative considerations centered on deliberative quality. All this might be the case if they either allow informants to choose between several nuanced responses or to produce free responses. Coleman and colleagues (2018) did the former; the study I will report did the latter. Like classic 'protocol analysis' (Ericsson & Simon 1984), it used real-time, free verbal responses-but, as an unusual feature, responses in written form.

a Qualitative Reception study
Below, I present a qualitative study of a group of young individuals who watched a televised debate and produced free, written, real-time (concurrent) responses to it. Their responses, collated with the debate content that triggered them, constitute data that may offer advice on how to turn into practice the theoretically based criteria advanced above for quality in public deliberative debate.
The debate in question was televised by the public service broadcaster TV2 before the June 2015 general election in Denmark. A radio station had asked a class of 23 students at a folk high school to watch the debate; their average age was 20.6; thus, most were potential firsttime voters (the voting age is 18). I was asked to comment on the debate and on the students' responses.
The debate was titled 'The Summit Meeting' (TV2 2015). The debaters were Ms. Helle Thorning-Schmidt (HTS), then prime minister, leader of the Social Democrats, and Mr. Lars Løkke Rasmussen (LLR), former prime minister, leader of The Left: The Liberal Party of Denmark. 1 The moderator was TV2 host Cecilie Beck (CB).
I organized for audience members to tally their responses using a self-devised variety of protocol analysis, a methodology developed by Ericsson and Simon (1984). Classic protocol analysis uses audio-recorded thinkaloud protocols where individual informants talk freely while performing some cognitive activity. My informants watched the debate together; instead of thinking aloud, they jotted down their free responses in real time, using prepared response sheets, marking time codes for all entries. They were also allowed to draw smileys like  or  for faster responses. In a receptive activity like this, think-aloud would cause informants to disturb, distract, and influence each other, compromising the tallying of their responses. Concurrent written protocols, instead of after-the-fact methods like questionnaires, focus groups, or interviews, offer some of the advantages of thinkingaloud without the drawbacks. They may open a window on processes in informants' minds while they watch, not as they try to recall them afterwards, and they facilitate fine-grained analysis of informants' responses and collation of them with what triggered them, including the rhetorical practices of the debaters. This method has not, to my knowledge, been used by other political communication scholars, except in one Norwegian study (Vatnøy, Kjeldsen & Andersen 2020), which states that it adopted the method from my conference presentation of the study reported here.
Such a design admittedly produces no generalizable quantitative data but may offer observations of the kinds that qualitative studies are well suited to make, identify phenomena that may have received little notice, and generate hypotheses that quantitative studies may pick up and test. Thus, Karpf andcolleagues (2015: 1890) see ' qualitative methods as a necessary part of the empirical and theory-building enterprise of political communication research.' Figure 1 below shows a filled-in response sheet (from a female student of 21). At the top she has (voluntarily) stated her first name, age, and gender. Before the debate, informants were invited to state their voting intention in the upcoming election (all 10 parties running in the election were named), indicating how certain they were about that intention. Response sheets have three columns: Time, Comment, and Smiley. At the bottom, informants are invited to state their voting intention after the debate and freely offer their overall impressions.
I transcribed the entire debate, entering all the informants' responses and smileys, including time codes, aligned with those chunks of the debate that they were simultaneous with.
In the subsequent analysis, the themes under which the informants' comments are categorized were inductively generated as clusters of mutually similar comments gradually formed; pre-existent theoretical constructs were only used as themes if they effortlessly fit observations (as in themes #1 and #5-see below). This reflects the exploratory nature of the study. As will be clear, the themes are not entirely mutually exclusive, and only the largest clusters of responses are presented as themes. The aim has not been to establish these themes as separate constructs but to convey main aspects of the audience's reception of the debate.

Findings
In 1 hour, the 23 informants made 327 written, timecoded response entries (not counting the smileys that accompanied many of them)-that is a little over 14 entries for each informant, or on average 1 every 4 minutes; a few entries contained distinct responses on separate aspects of the debate. Many entries were 30-40 words long; the longest was 79 words. This shows that the informants attended closely to the debate, continually reflecting on it. It also testifies to the viability of real-time reception analysis with written protocols.
Significantly, most responses (275) referred to the debaters' rhetorical practices, including both content and verbal form; only 33 expressed agreement or disagreement with the debaters' policies or political positions in other ways. Twenty-two responses concerned aspects of the debate format and the moderator's performance. The informants had simply been asked to jot down any comments; therefore, because their responses focused so strongly on the debaters' rhetorical practices, these practices must have been salient in the informants' minds, while agreeing or disagreeing with the debaters must have been less important.
Also significantly, most responses and smileys by far expressed disapproval of some of the debaters' rhetorical practices. Exact numbers cannot be determined, as many responses may either be read as neutral or as evaluative, for example: 'Sometimes one wonders whether Helle and Lars live in the same country.' This response occurs at timecode 20:35 in the sheet printed above (Figure 1); it may be a purely neutral observation, but in the context, it is perhaps meant as criticism, because nearly all this informant's other entries decry the debaters' constant needling of each other. Criticism is often explicit, for example: 'Stop talking about what happened 4 years ago Helle T. K.' The smileys too mostly suggest disapproval, with 80  smileys against 28  (74% against 26%); many happy smileys considering the verbal responses they accompany concern the entertainment value of the debate as such rather than what debaters specifically said. Thirty-seven smileys (such as K) express indifference or must otherwise be placed in a residual category; some are self-devised by informants and hard to interpret, but many may be read as 'not-too-pleased.' In the sheet above we find nuances such as smileys with a slanting mouth or an S-shaped mouth; another has a question mark for nose and mouth. Other variations include 'SUK!!' [Danish for 'SIGH!!'] in the smiley column. In verbal responses, as in smileys, the informants recorded a wide range of utterance types; had they only been able to express agree/ disagree, their intentions (as Coleman and colleagues (2018) emphasize) would have been unclear. For example, many disapproval responses might then have been read as disapprovals of a debater's politics, but the free, nuanced responses show that by far the most frequent objects of disapproval were not the debaters' politics but some of their rhetorical practices. Oddly, only few debate studies illuminate how audiences like debaters' rhetoric as such. Mutz (2015: 80), for example, using a battery of quantitative data, detects a 'negative response to incivility rather than disagreement,' but even Mutz's work mainly looks at the effects of incivility, rather than informants' reception of it.
Before the debate, 20 out of the 23 informants stated their voting intention in the upcoming election on a 5-point scale (indicating degree of certainty). Of these 20, 14 stated it again after the debate. No party changes occurred; one informant became more certain she would vote for the party she had named before the debate (not one represented by the debaters). Another initially named three parties, but afterwards only one (also not represented by the debaters). This suggests that whatever minimal preference change the debaters' rhetoric may have effected was counter-persuasive (i.e., apt to repel, not attract voters). Remarkably, among these informants, two top politicians' rhetorical efforts to draw potential voters towards themselves and their parties were at best useless. 2 In what follows, I identify and discuss the themes most often instantiated in the responses. They are all themes of disapproval. (Numbers at responses indicate which informants made them.) Disapproval Theme #1: Debaters are Uncivil. Mutz (2015: 6) describes 'incivility' as 'violations of norms for interpersonal interaction, the type of behavior that would be impolite in face-to-face contexts.' This theme is expressed 73 times by 20 of the 23 informants. The term mudslinging [mudderkastning] occurs 12 times and bickering [mundhuggeri] 9 times.

Disapproval Theme #2: Debaters Talk about Their Opponent's Policies When They Should Talk about Their
Own. Nineteen informants mention this theme 47 times, mostly with disapproval. The following typical episode begins with the moderator reading a question from a viewer: CB 'What will you do to make it attractive both for individuals and for companies to be environmentally conscious?' HTS Well, for one thing I am really glad that the Liberals, after not believing in climate changes for many years, are now in a place where they do believe in them. But it's also a fact that they had to be forced to go there. We're glad that we have a deal, but the Liberals are constantly signaling that they want to go back on that deal.
No less than 13 informants disapprove of this tirade and berate Thorning-Schmidt for immediately ' dumping on' Løkke, while dodging the question. Theme #2 also occurs repeatedly and emphatically in positive form: 'we want to hear your policies, not how good you are at slamming each other,' 'Helle …talks more about the Liberals' "mistakes" than about her own objectives and key issues,' 'Come on, say what you wanna do instead of smearing each other.' Disapproval Theme #3: Debaters Talk about the Past, Not the Future. This theme occurs in 41 comments by 17 informants. While remarks representing the two previous themes come in clusters, remarks expressing this theme come scattered across the debate. Here follows a chunk of the debate along with some of the responses it drew: CB (reading audience question): ' … what will you do to avoid the skewed development where all the jobs go to the cities?' HTS I think, first, we should be glad that four years ago, we were in a situation where we had lost 100,000 jobs, the EU had reprimanded us, we were falling back in regard to competitive power and productivity. Eight informants responded to the prompt 'After the debate -what do you think? Write freely.' Nearly all these responses were also strongly negative, reiterating the same themes as the time-coded entries. The remark 'I am no wiser' recurred.
What They Want. Nearly all the disapproving responses instantiate the above themes. There are also approving responses, albeit much fewer; they suggest what kinds of rhetoric these young citizens might have welcomed.
Tellingly, most approving comments concern the moderator. Approval for the debaters is scarce. However, some examples are 2 concrete proposal-that we understand 5 (on Rasmussen): gives clear answer, argues for his own case  6 Helle talks about working together with the Liberals  8 Løkke acknowledges that his own policies need straightening up 15 At last we get some of the Liberals' visionsgreat . 17 Helle T's argument on growth was good, well executed and intelligent, I disagree with it, but good anyway 17 Again Lars Løkke argues better than Helle Arguably, the most important takeaway from the responses, whether of approval or disapproval, is this: they want deliberative discourse in the original Aristotelian sense but fail to get it. They want the main subject matter to be debaters' divergent proposals for future action, with argumentation for and against these proposals. They want concrete policy statements from the debaters, not debaters' mutual attacks on each other for past failures, but besides policy statements they also want good arguments for these policies, and they want debaters to answer critical questions and counterarguments. Thus, they largely seem to endorse the theoretically based criterion presented above. This allows us to use their responses to better understand how a public debate might best meet our criterion (which this debate largely failed to do).
These informants do not want a debate that tilts towards the epideictic: whenever debaters slide into negative epideictic (mutual blame, sarcasm, needling remarks), many disapprove. Disapproval themes #1 and #2 both attest to this and can be said to represent negations of what the informants mainly want: they prefer civility to incivility, and they indicate repeatedly that debaters should not ' dump on' their opponents' policies but present their own. However, there are also responses decrying self-praise (Benoit's ' acclaims'). Informants likewise react against quasi-forensic debate: they dislike accusatory attitudes in both debaters and are impatient when they talk about the past; disapproval theme #3 shows that they want proposals about future action (i.e., about things that 'ultimately depend on ourselves, and which we have it in our power to set going' (Aristotle 1995, Rhetoric, 1359a). They expect the debaters to advance their concrete, divergent proposals and to ' argue for their own case,' with no expectation that consensus ensues. In Mutz's (2015) words, what puts informants off is 'incivility rather than disagreement.' This does not keep informants from approving when debaters acknowledge that their proposals have weaknesses or when they suggest that they may collaborate across the political divide, but they react against debaters' straw man portrayals of each other and expect debaters to answer critical questions, not dodge them.
The young citizens in the present study focus on two top politicians' rhetorical practices because they largely disapprove of them. Their assessment only rarely concerns their policies, and the persuasive effect of these two political leaders' rhetoric on our informants seems to be nil or negative. Hardly any preference change occurs and only of the negative kind-which tallies with their predominantly negative assessment of the debaters' rhetoric. This debate does not cause the informants to pay much attention to the debaters' agendas or their arguments for them, mainly because the debate gives them too little in that regard, being dominated by the other types of content that the debaters provide: incivility against each other, talk about the past, an overdose of talk about the opponent's agenda. Information of the kind that the informants want yields to mutual interruptions and dodgy answers. Given the way this debate unfolds, it is far from fulfilling our theoretical criterion, at least in the eyes of these informants. Because they, as we saw, seem to be broadly in accord with that criterion, we may use the qualitative data reflecting their reception as input on how a debate may be conducted in order to live up to our criterion in practice-or rather, on how it should not be conducted.

takeaways: hypotheses and implications
As for deliberative debate practice, an implication that a study like this suggests is that in order to best serve the primary desirable function of public deliberative debates-that is, to be useful input for citizens' deliberations-such debates should discourage the rhetorical practices causing audience disapproval in this study and promote the practices our informants call for but find lacking.
This would imply that in formatting and moderating debates, organizers and moderators might have these rules of engagement in mind: Debaters should be disciplined to behave civilly, allow their opponents to talk and not slide into bickering, mudslinging, sarcasm, or interruption. Debaters should be pushed to focus primarily on their own policies and proposals for the future, not talk about the past-more specifically, not to spend disproportionate amounts of time blaming opponents for their past performance. Moderators should inculcate these rules not just before debates, but also when needed during debates. They should ask debaters to speak concretely and not only state their policies but also argue for them. They should press debaters to answer critical questions and counterarguments. They should interfere with debaters' use of straw men. They should encourage debaters to concede opponents' possible merits and acknowledge weaknesses of their own. They should ask debaters to recognize points of agreement and openings for collaboration or compromise, without expecting consensus.
Besides yielding input that may be of use regarding how public debates should be conducted in order to best fulfil their democratic function in practice, this qualitative study might also be useful in the way Karpf and colleagues (2015) suggest: as part of the empirical, theory-building enterprise of political communication research -because it makes visible a type of debate reception among citizens that has not yet received much notice in debate studies.
These young people are impatient with politicians' finger-pointing and bickering about the past and expect politicians to propose concrete future action. That is not surprising. They have longer futures than the politicians who want their votes. We cannot guess how prevalent their alienated and disdainful attitude to the top politicians' rhetorical practices is in the population, but the present study can at least tell us that such an attitude exists. Conceivably, many other young citizens might feel the same way as the informants in the present study about the kind of political communication typically coming to them. In fact, it is thinkable that many citizens in all age groups feel that much political communication and debate is more likely to repel than to engage them. More studies, deploying quantitative and qualitative methods, might illuminate these questions.
In the debate reported above I was dismayed to see a country's two top politicians displaying rhetorical practices that alienated a group of enlightenment-seeking young citizens. The good news is that these millennialsmembers of a European generation of young people often said to be unusually passive politically-mustered an attitude that was both critical and constructive. Their reception of the debate did not suggest apathy but alienation from the way these politicians spoke. These informants may be seen as ' assertive citizens' (a notion analyzed from many angles in Dalton and Welzel (2014)): their critical attitude to what they saw expressed a wish for a better, more deliberative democracy.
Another consideration to take away is this: as noted, the studies by Mutz (2015) raised a caveat worth applying to the study presented here. In Mutz's (2015: 199) research, people found her uncivil debates 'far more lively and entertaining than the civil ones.' She therefore asks 'how such programs can eliminate incivility, yet maintain their audiences' (Mutz 2015: 210) and suggests several ideas to that end. As noted above, parallel functions of such debates might be to attract larger audiences by being lively and entertaining and thereby perhaps increase voter engagement and turnout.
A response to this might be that the present study has concerned itself with deliberative value, not entertainment value. It is important to note, as Mutz (2015) does, that these two kinds of value are not identical and do not necessarily coincide, but neither are they necessarily contradictory. It is certainly relevant to ask whether public debates might be deliberatively valuable and entertaining at the same time, for example by offering something other than incivility to entertain and arouse. Also, it is not a given that entertainment is the only kind of value that may attract audiences. These are issues for further research and experimentation.
A final takeaway from this paper is methodological: a protocol analysis design as employed here, involving real-time written protocols, is clearly viable and might show a path for future reception-oriented research, supplementing studies that are purely textual, mainly effect-oriented, or mainly quantitative. Generally, the study suggests that it might be profitable in future to attend more closely to citizens' reception of what they are offered by way of political communication.

Notes
1 The Left is in fact a conservative-liberal party somewhat to the right of the median in Danish Politics; its name has a historical explanation which, for brevity, is omitted here. In the translated transcriptions below I will use the term 'the Liberals.' 2 One might speculate that the debaters, being equally effective, equalized each other's vote-drawing effect. But then we should have seen both draw votes from informants backing other parties. Nothing like that happened. 3 In 2011, Thorning-Schmidt's opponent, Løkke Rasmussen, was prime minister; his party, the Liberals, had then been in government since 2001. This explains why the informant sees Thorning-Schmidt's remark as finger-pointing.