文档库 最新最全的文档下载
当前位置:文档库 › Referential

Referential

Referential
Referential

Bos, Foster & Matheson (eds): Proceedings of the sixth workshop on the semantics and pragmatics of dialogue (EDILOG 2002),13

4-6 September 2002, Edinburgh, UK, Pages 13-20.

Referential form and word duration in video-mediated and face-to-face

dialogues

Anne H. Anderson

Barbara Howarth

Multimedia Communications Group Department of Psychology 58 Hillhead Street University of Glasgow Glasgow, G12 8QB anne@https://www.wendangku.net/doc/341984213.html,

Multimedia Communications Group Department of Psychology 58 Hillhead Street University of Glasgow Glasgow, G12 8QB barbara@https://www.wendangku.net/doc/341984213.html,

Abstract

It is widely believed that speakers adapt their speech to meet the comprehension needs of their

listener. Yet recent work in face-to-face communication has shown that word articulation and the design of referring expressions are insensitive to certain aspects of listener knowledge (Bard et al., 2000; Bard &Aylett, 2001). Word articulation and referential form were investigated in two studies. Study 1 examined the duration of words forming the names of landmarks on a map in video-mediated dialogues. Study 2explored the impact of cognitive load (as illustrated by time pressure) on word duration and referential form in both face-to-face and video-mediated communicative contexts. It was found that second mentions of words were articulated more quickly than first mentions regardless of which interlocutor introduced the names of the landmarks (study one). Study two showed that speakers responded to time pressure by shortening the names of landmarks on a map but only when the task was video-mediated. Speakers did not, however, resist the pressure to shorten second mentions of words relative to first mentions under time pressure. The implications of these findings are discussed in terms of the Dual Process Model of speech processing in spoken dialogue proposed by Bard et al.,(2000).

1 Introduction

Spoken dialogue presents a particularly challenging area of research since speakers must produce utterances in real-time, interact with another person, and cope with the

demands of the specific task. Nowadays the onslaught of computer-based technologies,such as videoconferencing systems, provides novel communicative contexts in which a dialogue may take place. It is possible, that communicating in such a setting places

additional demands on the speaker. This raises the question of whether spoken output will be affected by communicative context in which a dialogue takes place.

2.1 Variation in spoken output in face-to-face communication

In face-to-face communication, referring expressions have been shown to vary in terms of the way expressions referring to entities within a discourse are chosen and in terms of the way words forming referring expressions are articulated. Studies have demonstrated that the articulatory quality of words varies with

factors such as sentence context (Hunnicutt,1985; Lieberman, 1963) and repeated mention (Fowler & Housum, 1987; Robertson & Kirsner,2000). Fowler and Housum (1987), for example,demonstrated that repeated, or second mentions,of word tokens were articulated more quickly and less clearly than introductory mentions of words. Mere repetition was not enough to induce the effect. Rather, both words had to refer to the same entity (Fowler, 1988).

With respect to referential form, it has been shown that speakers vary the descriptive content of referring expressions depending on factors such as; assumed listener knowledge (Fussell &Krauss, 1992; Isaacs & Clark, 1987); the typicality of an object (Dell & Brown, 1991);and the cognitive load associated with a given task (Horton & Keysar, 1996; Rosnagel, 2000).Generally speaking referring expressions tend to be shorter and less explicit if the speaker can assume that; the listener is familiar with the object of conversation, the object being referred to is typical given the context, or the cognitive load associated with the task is increased.

Cognitive load can be thought of as the “mental

Proceedings of EDILOG 2002 14

energy” required to process a given amount of information (Sweller, 1988) and can be increased by imposing a time pressure or increasing the difficulty of a task. Rosnagel (2000), for instance, showed that speakers used shorter descriptions of component parts of a model such as a technical term alone (rather than a term plus a supplementary description) when the difficulty of the task was increased.

Traditionally, differences observed in articulatory quality and referential form have been interpreted in terms of the belief that speakers adapt their speech in order to produce utterances that will be comprehensible to the listener (Clark & Clark, 1977). The finding that speakers varied referring expressions depending on what the listener could be assumed to know (Fussell & Krauss, 1992; Isaacs & Clark, 1987) has been taken as evidence in support of this claim. The use of shorter, less explicit expressions under conditions of high cognitive load has also been interpreted in terms of adaptation to the listener. Rosnagel (2000) suggests that choosing a referential expression must involve making a complex assessment of how readily a listener will be able to interpret a given referring expression. The use of less explicit referring expressions under conditions of high cognitive load is taken as an indication that the speaker has sacrificed adjustment to the listener’s perspective.

The view that speaker’s adapt their speech to the comprehension needs of the listener has also been applied to the articulatory shortening of second mentions of words relative to first. Fowler and Housum (1987) showed that listeners could make use of differences in articulatory quality to distinguish old, or given, information from “new” information. Given information has been described as that “which the speaker assumes is already in the addressee’s consciousness” (Chafe, 1974). It has been argued that if a word is partially specified by the Given status of the entity to which it refers, the speaker may assume that the listener would not require as clear a signal. Thus words referring to given entities (by virtue of previous mention, for instance) are articulated more quickly and less clearly than words referring to new entities.

Several researchers have questioned the position that variation in spoken output is tailored to the comprehension needs of the listener. Dell and Brown (1991), for example, reasoned that certain within-clause structures could be adequately accounted for in terms of generic language processing mechanisms. Horton and Keysar (1996) suggest that certain stages of speech production, such as initial planning, may be dependent on the speaker’s own knowledge and that models of what the listener can be assumed to know are implicated in the later stages as part of a correction mechanism. Bard et al., (2000) showed that speakers failed to mitigate the attenuation of repeated mentions of words in cases where the entity being referred to could not be taken as given by the listener. If speakers adapted their speech to the listener, one would expect the speaker to resist the pressure to attenuate repeated mentions of words in such cases.

2.2 The Dual Process Model of speech production in spoken dialogue

In order to account for their results Bard et al., (2000) proposed a Dual Process Model of speech production. The model is based on that proposed by Dell and Brown (1991); (Brown & Dell, 1987) and holds that two basic types of processes underpin speech production in spoken dialogue. On the one hand articulatory effects, such as the attenuated articulation of repeated mentions of words relative to introductory mentions of words, are attributed to priming processes which are fast, automatic and dependent on the sole experience of the speaker. Priming can be triggered by given status and result in faster articulation and reduced articulatory detail. The claim that givenness per se triggers priming, would account for the observation that introductory mentions of words that can be taken as given by virtue of physical presence tend to be articulated more quickly and less clearly than mentions referring to new entities (Bard & Anderson, 1994). This would also explain why second mentions of words were attenuated relative to first mentions, regardless of whether both mentions were uttered by the same speaker or by different

Anderson & Howarth / Referential form and word duration in video-mediated and face-to-face dialogues15

speaker. Since priming is triggered by given status it is unimportant which interlocutor introduces an item into the dialogue.

In contrast to priming processes, complex processes, such as decision making and maintaining models of the listener, are slower and are subject to the demands on the speaker’s time and attention. If the demands on the speaker’s attention are high, or time is short, there may not be sufficient time to assimilate feedback from the listener or interpret what the listener can be assumed to know. These type of processes offer an account of why a speaker’s control of the word articulation appears to be insensitive to these aspects of listener knowledge (Bard et al, 2000; Bard and Aylett, 2001). The processes involved in interpreting feedback or drawing inferences about what the listener knows are deemed too slow to precede articulatory production on a word-to-word basis. Consequently, priming is not mitigated in these circumstances.

The Dual Process Model predicts articulatory control and the design of referring expressions may function differently in spoken dialogue. In support of this claim, Bard and Aylett (2001) showed that both word articulation and referential form were insensitive to subtle aspects of listener knowledge such as indications about what the listener could or could not see. However, referential form was more sensitive than word duration to gross aspects of listener knowledge, such as listener identity, and to aspects of speaker knowledge, such as what the speaker could or could not see. Thus the design of referring expressions did not appear to be based on the same criteria as word articulation. More specifically, articulation must be designed without regard to costly information such as listener-knowledge. The design of referential form, on the other hand, may be sensitive to such knowledge depending on the demands on the speaker and the time available.

3 Video-mediated and face-to-face communication

Much of the research in video-mediated communication has tended to focus on aspects of communication such as dialogue length, turn-taking and interrupting. Thus it is not clear how speakers will articulate words or choose referring expressions in a video-mediated context compared with a face-to-face context. There are, however, certain features associated with video-mediated communication which suggest that speakers may behave differently in this context. For example, it has been suggested that participants may view the technology as novel (Doherty-Sneddon et al., 1997). While (Clark, 1996) suggests that certain conversational settings may restrict the language used if the participants are not co-present. Social presence theorists suggest that the remoteness of the communicative situation may impose a sense of social distance, or lack of salience of the other person and their common surroundings (Short, Williams, & Christie, 1976).

The aim of the paper is to explore the influence of communicative context (video-mediated and face-to-face) and cognitive load on speech production in spoken dialogue.

4. Studies of word duration and referential form in video-mediated and face-to-face dialogues

An implication of the Dual Process Model (Bard et al., 2000) is that the formulation of referring expressions does not appear to be based on the same criteria as the control of word articulation. The model suggests that if time is short, there may be insufficient time to access and respond to information relating to the communicative context. Thus one might expect the referring expressions to be more sensitive to communicative context under time pressure than word articulation. In order to answer these questions we conducted two studies which examined word duration and referential form in video-mediated and face-to-face dialogues.

4.1 Study One: word duration in video-mediated dialogues

4.1.1 Method

Design

In order to address the question of whether video-mediated communication would function in the same way as face-to-face communication, we conducted an initial study to examine the

Proceedings of EDILOG 2002 16

duration of words forming referring expressions in video-mediated dialogues. If the articulation of repeated mentions of words is unaffected communicative context, we would expect the same pattern of results observed in face-to-face communication. Namely, that repeated mentions of word tokens would be shorter in duration than introductory mentions regardless of which speaker introduced the entity into the discourse (Bard et al., 2000). To do this, we compared first and second mentions of words forming the names of landmarks on a map in same- and different-speaker-repetition conditions. In the same-speaker condition, both first and second mentions were uttered by the same speaker. In the different-speaker condition, first and second mentions were uttered by different speakers.

Materials

The materials for this study were drawn from 48 dialogues recorded as part of a previous study (Anderson et al., 1999) in which two participants performed a collaborative problem-solving task (The Map Task; (Brown, Anderson, Yule, & Shillcock, 1984) in a video-mediated communicative context. The 48 participants were students of the University of Glasgow and the University of Nottingham. They performed two versions of the Map Task, changing roles as Instruction Giver and Instruction Follower. The participants were based at Nottingham and Glasgow and the connection between the two cities was made via the SuperJANET ATM network.

Procedure

Speaker-per-channel recordings of the dialogues were made onto a computer at a sampling rate of 16 kHz. The duration of 649 word pairs words forming part of the same name of landmarks on the map was measured in milliseconds using speech analysis software. The beginning and end of words was determined by examining waveform and spectral representations of the words using the Syntrillium waveform editor Cool Edit 4.1.2 Results

A two-way ANOVA was carried out on the data with Mention treated as a within subjects variable and Speaker Repetition treated as a between subjects variable. Table 1 shows mean word duration for repeated mentions of word tokens uttered by the same and different speakers.

Table 1: Mean word duration (in milliseconds) for introductory and repeated mentions of lexical items uttered by the same and different speakers Speaker

repetition

Mention

1st mention2nd mention Same378342 Different364330

There was a main effect of mention

[F1(1,78) = 22.98, p < 0.01; F2(1,647) =

88.942, < 0.01] with no effect of speaker-repetition and no interaction. Second mentions of word tokens were articulated more quickly than first mentions regardless of whether repeated mentions of words tokens were uttered by the same speaker or by a different speaker.

4.1.3 Discussion

The pattern of results observed in study one was similar to that observed by Bard et al., (2000). This suggests that, in terms of articulatory quality at least, video-mediated communication functions in the same way as face-to-face communication despite any sense of novelty or social distance that might be associated with this medium.

4.2 Study 2: Word duration and referential form in video-mediated and face-to-face dialogues

4.2.1 Method

Design

The purpose of Study Two was to explore the effect of cognitive load and communicative context on word duration and referential form through a comparison of face-to-face and video-mediated dialogues. Cognitive load was manipulated by imposing a time pressure. Thus a timed condition, where a three-minute time limit was imposed on the task, was compared with an un-timed condition, in which

Anderson & Howarth / Referential form and word duration in video-mediated and face-to-face dialogues17

participants performed the task in their own time. The design was counterbalanced for task order and map, but task role was randomly assigned.

If a three-minute time limit was sufficient to put participants under pressure, timed dialogues should be shorter in duration, contain fewer words and lead to less accurate task performance. Hence, dialogue length (in terms of words and duration) and task accuracy were also examined.

The Dual Process Model predicts that the formulation of referring expressions should be more sensitive to demands on the speaker’s time and attention than word articulation. It was expected, therefore, that increased cognitive load would have a greater influence on referential form than word duration. Factors associated with video-mediated communication, such as social distance and novelty, may restrict communication and this may place extra demands on the speaker. Consequently, any effect of cognitive load would be expected to be more pronounced in video-mediated dialogues than in face-to-face dialogues.

Participants

The participants in this study were all undergraduate students of the University of Glasgow and were native speakers of English. Task

Two electronic versions of the Map Task (Brown et al., 1984) were created using Adobe Photoshop. Printed versions of the maps were used in the face-to-face conditions. Procedure

Participants were assigned to either a face-to-face or a video-mediated setting and performed the Map Task in timed and un-timed conditions. In the timed condition, participants were interrupted at minute intervals and told that; there were two minutes to go, one minute to go, and the time was up. In the un-timed condition participants completed the task in their own time. Tape recordings of the dialogues were made and, of the 82 dialogues recorded, 64 were retained for analysis. These were transcribed and recorded onto a computer at a sampling rate of 16kHz.Technical set-up

In the video-mediated condition, participants were located in different rooms and communicated via a desktop videoconferencing system on which the Map Task and a video-window of the other participant were displayed. The audio and visual signals were delivered in synchronisation. Full duplex audio was used and the video image was refreshed at a rate of 25 frames per second. In the face-to-face condition, audio recordings of the dialogues were made using the same equipment to record the video-mediated dialogues. Participants sat opposite each other and were unable to see each other’s maps.

4.2.2 Dependent variables

Dialogue length and task accuracy

A measure of task accuracy was obtained by comparing the original route on the map with that reproduced by the Instruction Follower. The area between the two routes provides a measure of the degree to which the Instruction Followers deviate from the route. The duration of the dialogues was measured in seconds and the number of words (including interjections) were counted.

Referential Form

First and second references to landmarks were coded according to the scheme in Table 2 where “0” indicates no shortening from the name printed on the map and “2” indicates the greatest degree of shortening.

Table 2: Shortening scale for referring expressions Code Referential

form

Example

0full landmark

name

Do you have a popular

tourist spot?

1truncated name I have a tourist spot,

yes

2pronoun Ok go right the way

round it

There were 774 first-references to landmarks and 631 second-references to landmarks but these were analysed separately. Although functionally distinct, the coding scheme employed here is conceptually similar to that used by Bard and Aylett (2001).

Proceedings of EDILOG 2002 18

Durational measurements

Duration measurements (in milliseconds) of first and second mentions of words forming the names of landmarks on the map were made from waveform and spectral displays using the Syntrillium waveform editor, Cool Edit.

4.2.3 Results

Cognitive load response

Two-way mixed ANOVAs were carried out on the dialogue data (N = 32). As expected, timed dialogues were significantly shorter in duration [F(1,30) = 29.25, p <

0.001] and contained fewer words [F(1,30) = 26.62, p < 0.001] than un-timed dialogues. Furthermore, route deviation was significantly greater in the timed dialogues than in the un-timed dialogues [F(1,30) = 10.53, p < 0.01]. These results indicated that subjects did, in fact, respond to increased cognitive load. Under time pressure, dialogues were found to be significantly shorter (in words and duration) and produce less accurate performance.

Referential Form

A two-way, independent measures ANOVA was carried out on second-reference referential forms (N = 631). There was no effect of cognitive load or communicative context, but there was an interaction approaching significance [F(1,627) = 3.18, p = 0.07]. Analysis of simple effects showed an effect of cognitive load in the video-mediated group [F(1,627) = 5.47, p = 0.02] but no effect of cognitive load in the face-to-face group (F < 1). Referential coding scores were significantly higher in the timed condition than the un-timed condition but only for the video-mediated group. These results suggest that increased cognitive load influenced the formulation of referring expressions in video-mediated communication but not face-to-face communication. Time pressure lead to an increased tendency to use more shortened referring expressions (such as pronouns), but only when the task was video-mediated.

A separate, two-way ANOVA of first-reference referential forms (N = 774) showed an overall effect of cognitive load [F(1,770) = 12.94, p < 0.001], no main effect of communicative context and no significant interaction. Analysis of simple effects revealed that referential shortening scores were significantly higher in the video-mediated, timed condition than in the video-mediated, un-timed condition [F(1,770) = 11.24, p < 0.01]. There was no significant difference, however, between timed and un-timed referential coding scores in the face-to-face dialogues. Thus, as with second references to landmark names, time pressure lead to an increased tendency to use more shortened referring expressions (such as pronouns) but only when the task was video-mediated.

Word duration

A three-way ANOVA was carried out on the data with Communicative Context as a between-subjects variable, and Cognitive Load and Mention as within-subjects variables. There were 24 speakers in total with 12 speakers contributing to each cell of the design. As expected, there was a main effect of mention [F(1,22) = 20.47, p < 0.001]. Second mentions of word tokens were shorter in duration than first mentions. Interestingly, there was a main effect of communicative context [F(1,22) = 6.11, p = 0.02]. Word tokens in the video-mediated group were articulated more slowly than word tokens in the face-to-face group. There was no effect of cognitive load. Nor were there any significant interactions.

4.2.3 Discussion

Although the speakers articulated words more slowly overall when communicating in a video-mediated context, the communicative context did not lead speakers to resist the pressure to articulate second mentions of words more quickly than first mentions.

5 Conclusion

The purpose of this paper was to explore the effect of communicative context and cognitive load on two aspects of speech production in spoken dialogue, namely, word duration and referential form. Taken together, the results reported here suggest that communicative context does influence speech production in spoken dialogue, but that word duration and referential form are affected in different ways. First, video-mediated communication seems to function in the same way as face-to-face

Anderson & Howarth / Referential form and word duration in video-mediated and face-to-face dialogues19

communication, insofar as speakers reliably reduce the duration of second mentions of word tokens relative to first, in spite of any sense of social distance or novelty that might be associated with this medium. Second, cognitive load, as illustrated by time pressure, did not offset the pressure to reduce repeated mentions of words in either a face-to-face or video-mediated context.

Spoken language did differ between contexts, however, in terms of the influence of cognitive load on referential form and in terms of word articulation in general. Overall, speakers articulated words more slowly in a video-mediated context. Second, when under time pressure, there was a tendency to use a greater number of short referring expressions (such as pronouns), rather than use the name that was given on the map. The tendency only occurred when the task was video-mediated. No evidence was found to indicate an effect of cognitive load in face-to-face communication.

The results of this study are broadly consistent with the Dual Process Model (Bard et al., 2000). The shortening of second mentions of word tokens relative to first mentions can be accounted for in terms of priming processes. Since priming processes are fast and automatic they are deemed too fast for higher level factors, such as a response to the communicative context to make their influence felt or to precede every attempt at articulation. The design of referential form, on the other hand, is thought to be governed by more complex processes which are planned in longer somewhat slower cycles. This allows time for the effects of communicative context to take effect.

Why would cognitive load appear to influence referential form in a video-mediated communicative context while no influence of cognitive load was found in a face-to-face communicative setting? Although it is not clear how the slower articulation of words in a video-mediated context would be accounted for within the framework of the Dual Process Model, (Blokland & Anderson, 1998) suggest that if communication is not smooth, then speakers may compensate by hyperarticulation. Following the same line of reasoning, it could be the case that speakers make a conscious decision to articulate words more clearly if they experience some kind of difficulty communicating in a video-mediated context. Possibly, the technology itself imposes some kind of communicative stress. Following the rationale of Bard et al., (2000), this could place extra demands on the speech production system to extent that certain processes involved in naming a landmark may not be run. The overall conclusion of these studies however, provide evidence that word articulation and the formulation of referring expressions are influenced by different aspects of dialogue. The Dual Process Model seems to give a plausible explanation of the data reported here.

6 Acknowledgement

This work was supported by a University of Glasgow Studentship.

7 References

Anderson, A. H., Mullin, J., Katsavras, E.,

Brundell, P., McEwan, R., Grattan, E.,

& O’ Malley, C. (1999). Multimediating

multiparty interactions. Paper presented

at the Proceedings of INTERACT 99.

IFIP.

Bard, E. G., & Anderson, A. H. (1994). The

unintelligibility of speech to children:

Effects of referent availability. Journal

of Child Language, 21, 623-648. Bard, E. G., Anderson, A. H., Sotillo, C., Aylett, M., Doherty-Sneddon, G., & Newlands,

A. (2000). Controlling the intelligibility

of referring expressions in dialogue.

Journal of Memory and Language,

42(1), 1-22.

Bard, E. G., & Aylett, M. (2001). Referential form, word duration, and modeling the

listener in spoken dialogue. Paper

presented at the Proceedings of the

Twenty Third Annual Conference of the

Cognitive Science Society. Blokland, A., & Anderson, A. (1998). Effect of low frame-rate video on intelligibility of

speech. Speech Communication, 26, 97-

103.

Brown, G., Anderson, A. H., Yule, G., &

Shillcock, R. (1984). Teaching Talk.

Cambridge: Cambridge University

Press.

Proceedings of EDILOG 2002 20

Brown, P., & Dell, G. (1987). Adapting

production to comprehension - the

explicit mention of instruments.

Cognitive Psychology, 19, 441-472. Chafe, W. (1974). Language and

consciousness. Language, 50, 111-

133.

Clark, H. H. (1996). Using Language:

Cambridge University Press. Clark, H. H., & Clark, E. V. (1977).

Psychology and Language: An

Introduction to Psycholinguistics.

New York: Harcourt Brace

Jovanovich.

Dell, G. S., & Brown, P. M. (1991).

Mechanisms for listener-adaptation in

language production: Limiting the role

of the ’model of the listener’. In D. J.

Napoli & J. A. Kegel (Eds.), Bridges

between psychology and linguistics.

Hillsdale NJ: Erlbaum.

Doherty-Sneddon, G., Anderson, A. H.,

O’Malley, C., Langton, S., Garrod, S.,

& Bruce, V. (1997). Face-to-face and

video-mediated communication: A

comparison of dialogue structure and

task performance. Journal of

Experimental Psychology: Applied,

3(2), 105 - 125.

Fowler, C. (1988). Differential shortening of repeated content words produced in

various communicative contexts.

Language and Speech, 28, 47-56. Fowler, C., & Housum, J. (1987). talkers

signalling ’new’ and ’old’ words in

speech, and listeners’ perception and

use of the distinction. Journal of

Memory and Language, 26, 489-504. Fussell, S. R., & Krauss, R. M. (1992).

Coordination of knowledge in

communication: effects of speakers’

assumptions about what others know.

Journal of Personality and Social

Psychology, 62, 378-391.

Horton, W. S., & Keysar, B. (1996). When do speakers take into account common

ground? Cognition, 59, 91-117. Hunnicutt, S. (1985). Intelligibility versus

redundancy - conditions of

dependency. Language and Speech,

28(1), 47-56.Isaacs, E. A., & Clark, H. H. (1987). References in conversation between experts and

novices. Journal of Experimental

Psychology: General, 116, 26-37. Lieberman, P. (1963). Some effects of the

semantic and grammatical context on

the production and perception of speech.

Language and Speech, 6, 172-187. Robertson, C., & Kirsner, K. (2000). Indirect memory measures in spontaneous

discourse in normal and amnesic

subjects. Language and Cognitive

Processes, 15(2), 203-222. Rossnagel, C. (2000). Cognitive load and

perspective-taking: applying the

automatic-controlled distinction to

verbal communication. European

Journal of Social Psychology, 30, 429 -

445.

Short, J., Williams, E., & Christie, B. (1976).

The Social Psychology of

Telecommunications: Wiley. Sweller, J. (1988). Cognitive load during

problem-solving:Effects on learning.

Cognitive Science, 12, 257-285.

相关文档