SUNSHINE

PART 1

INTRODUCTION

1.1 BACKGROUND OF STUDY

One of the compulsory subjects to be completed by the Bachelor of English Language and Literature (BENL) students in International Islamic University Malaysia is the Computer Applications in Language Studies (COMPAPP). This subject comprises of a number of topics and one of them is CMC or Computer-Mediated Communication. This topic aims to promote the technology especially the computers which can be used as a medium to communicate as well as spread the knowledge instead of using the conventional way of communication.

The online forum is one of the ways in communicating with everyone around the globe. This type of asynchronous communication is still a good way of communicating today. Even though there are many other social networking sites available on the Internet, it is still a choice among the netizens. They share almost every single thing related to life, as per instance, the food, clothes, places to travel and much more.

Hence, when they share and chat almost everything, they will be surely using different types of speech acts in conveying the statements and the meanings of their sayings. These speech acts must be having the contextual cues that influence them to be placed in the conversations. Thus, this study aims to identify the elements of speech acts which are frequently used in online forum as well as to understand the contextual cues that influence the choice of speech acts employed by the participants in online forum.

1.2 STATEMENT OF THE PROBLEM

Making a statement may be the paradigmatic use of language, but there are all sorts of other things we can do with words. We can make requests, ask questions, give orders, make promises, give thanks, offer apologies, and so on. Moreover, almost any speech act is really the performance of several acts at once, distinguished by different aspects of the speaker's intention: there is the act of saying something, what one does in saying it, such as requesting or promising, and how one is trying to affect one's audience.

Hence, there are several elements of speech acts that are being stressed, they are representatives, directives, commissives, expressions, declarations. Thus, this study then will help in identifying those elements of speech acts which are frequently used in online forum. All of these speech acts are being identified on an online forum, Quora. Quora comprises of a variety of topics and the chosen topic is the language exchange online. A 1000-word corpus has been extracted from the whole conversation in the forum and it is then be analyzed based on the elements of speech acts mentioned before.

The difficulties encountered by the students who use online forum in language learning process triggers this study to be conducted where when the analysis has been done, it will finally help the students to understand the contextual cues that influence the choice of speech acts employed by the participants in the online forum. By observing their statements and the way they express them will definitely show us some distinguished features between all of them.

1.3 RESEARCH OBJECTIVES

The study aims to achieve the following objectives:

1. To identify the elements of speech acts which are frequently used in online forum.

2. To understand the contextual cues that influence the choice of speech acts employed by the participants in online forum.

1.4 RESEARCH QUESTIONS

1. What are the elements of speech acts which are frequently used in online forum?

2. What are the contextual cues that influence the choice of speech acts employed by the participants in online forum?

1.5 DESCRIPTION OF FRAMEWORK

The framework which we have chosen to use is speech acts. Speech acts are utterances which can be used to perform actions. For instance, apologizing, suggesting and threatening. The classification of speech acts which is used in this research is according to Searle’s classification of speech acts. There are five elements of speech acts. The first one is assertive which is the speech acts that state what the speaker believes to be true. For example, “shared talk offers chat room which is good for entry learners to start chatting”. This shows that the speaker believes that the application, shared talk is good for learners to start chatting. However, this may not be true as it is only the opinion of the speaker. The second one is directives, speech act which the speaker uses when he wants something from another person. For instance, “If you want to check out more language learning and my journey, peek over at my little blog called Lingualism”. The speaker suggests to the other person to take a look at his blog if he wants to know in details. The third one is commissives, a speech act that is used when the speaker intends to do something in the future. For example, “I am going to Germany…” This shows that the speaker intended to go to Germany in the future. The fourth one is expressive which are speech act that is used to show the emotions of the speaker. For example, “I also remember my brother telling me of various gifts he got from his Chinese tandem. I was a bit envious :)”. This portrays the speaker feelings in which he feels envy towards his brother. The last one is declarations in which the speech act used would cause a change in something. This type of speech act cannot be found in our corpus.

1.6 VALUE OF CORPUS

Since the corpus used is an online forum, Quora, it really helps in gaining and sharing knowledge regarding language exchange online. This is because it involves everyone from around the globe. Hence, this connection at the same time will enhance the language learning among the users. For instance, an American, who is a native speaker of English perhaps then communicates with a non-native speaker from Korea instead of learning English, the Korean may then share his or her own mother tongue with the American.

Besides that, the exploitation of language sites may be occurred by communicating through an online forum. This online forum can be a language site towards the netizens who speak different languages and dialects. The corpus can really show us the usage of language among different people of the world and how they are exchanged through the conversation. From the conversations, we can eventually learn and value some of the words from different languages. Hence, this creates an awareness in learning and exchanging the language with others through the Internet.

1.7 SIGNIFICANCE OF STUDY

This study will contribute to students to make any intervention based on the findings derived from this study to know the contextual cues that influence the choice of speech acts employed by the participants in the online forum. This will eventually help them in their learning process by understanding the contextual cues in various speech acts. It stresses the usage of speech acts on online forum Quora, which comprise of several elements which are representatives, directives, commissives, expressions, declarations while also being beneficial to linguists who are also in the field of digital technologies and curriculum planners and designers. Furthermore, this study is beneficial to specialists in teacher training and those in charge of their training in Computer-mediated communication (CMC) by emphasizing that trained teachers use digital technologies more effectively through the language exchange online.

1.8 METHODOLOGY

1. Qualitative

This method involves the description and the explanation on the topic which is the elements of speech acts on online forum. Since every element carries every meaning, an analysis is done to know and understand the contextual cues that influence the choice of speech acts employed by the participants in online forum.

2. Content-based Analysis

Data for the study was the actual instances of written messages collected from a public online discussion board forum, Quora. The analysis is a content-based analysis on the corpus of 1137 words that is chosen on Quora. This particular forum website discusses issues pertaining to every aspect of life including food, travel, learning and much more. The general topic of the corpus is about the language learning. Data was purposively selected texts from the forum which was used to answer the research questions being investigated. Therefore during the process of coding and tagging, utterances that made up of a single word, a phrase, a sentence, or a paragraph was tagged according to the language function they were performing such as to express an opinion, to the question, to make a suggestion and so on. With that, the starting point for analyzing the data is to categorize the text-based utterances according to Searle’s (1976) Speech Acts taxonomy to explore the interactive language function of the messages.

PART 2

LITERATURE REVIEW 1

Title of research

· Language Function and Knowledge Construction in Online Discussion Board Forums.

Authors

· Alice Shanthi, Lee Kean Wah, Denis Lajium, and Xavier Thayalan.

URL

· https://www.academia.edu/12785713/Language_Function_and_Knowledge_Construction_in_Online_Discussion_Board_Forums

Purpose of study

· To determine how participants in asynchronous discussion board forum use language to share and elicit information, knowledge and experience unique to a Malaysian setting.

Significance of the study

· This study will aid educators and academicians in the pedagogical aspect in using discussion boards in the teaching and learning process.

Research questions

· What types of language functions are mostly used while communicating online through discussion board forums?

· Which phases of knowledge construction is evident in discussion board forum postings/messages?

Methodology

· Qualitative research

· Corpus – data for the study was the actual instances of written messages collected from a public online discussion board forum set in Malaysia.

· Framework: Searle’s (1976) Speech Acts (Assertive, Directive, Commissive, Expressive, and Declaration)

· Method

1. The text-based utterances are categorised according to Searle’s (1976) Speech Acts taxonomy to explore the interactive language function of the messages.

2. Based on these categories, the data was recoded.

3. Then, the data was tagged again to study the gradual process of co-construction of knowledge according to descriptors indicated by Gunawardena, Lowe and Anderson’s (1997) Interaction Analysis Model (IAM).

Findings

· Assertive speech acts were most frequently present in the online interaction followed by directives, expressive, and commissive. No declarative speech acts were found in the corpus.

· The data used for this study have evidence of the different phases of knowledge construction; sharing and comparing opinion (44%), the discovery and exploration of inconsistency ideas (33.9%), negotiation of meaning co-construction of knowledge (14.5%), testing and modification of proposed synthesis (4.4%), and agreement statement (3.2%), therefore proving that new knowledge is indeed constructed and shared in the online forums.

Discussion

· Assertive speech acts are the speaker’s utterances that are merely stating his/her mind and it often described as an act to express the speaker’s belief and attention.

· Directive speech acts were used especially by those who have better knowledge of the subject matter to provide members who needed information with helpful instructions either to overcome their problem or new knowledge for better understanding of the subject-matter at hand.

· Expressive speech acts were used not only to inform other members of their personal opinions, but they also give a glimpse of their emotional state.

· Members who used commissive speech acts revealed their future plans.

· As for the phases of knowledge, for phase I, it is natural process that they shared and exchanged their experiences which helped and guided the forum members to have a better understanding of the subject matter as they shared a common interest.

· For phase II, when members experienced conflict and inconsistency in ideas, they had to negotiate meaning, making it possible for higher levels of knowledge construction to happen.

· For phase III, forum activity has enabled some members to try to achieve greater understanding of the knowledge constructed.

· For phase IV and V, the level of knowledge construction shows evidence of accommodation of new knowledge (or its synthesis) on the part of the participants.

LITERATURE REVIEW 2

Title of research

· An Analysis of Expressive Speech Acts in Online Task-Oriented Interaction by University Students

Author

· Marta Carretero, Carmen Maiz-Arevalo and M. Angeles Martinez

URL

http://ac.els-cdn.com/S1877042815013609/1-s2.0-S1877042815013609-main.pdf?_tid=d4575b06-b584-11e6-879e-00000aab0f01&acdnat=1480349458_741084863e3abbb71ad94f3e98bb36f0

Purpose of Study

· To study whether Expressives equally frequent across the three sub-corpora, and they similarly distributed in terms of sub-types such as Apologies, Thankings, Compliments, and so forth.

· To study whether the contextual variables with a bearing on the choices made by participants.

Research Questions

· Are Expressives equally frequent across the three sub-corpora, and are they similarly distributed in terms of sub-types such as Apologies, Thankings, Compliments, and so forth?

· If this is not the case, which are the contextual variables with a bearing on the choices made by participants?

Significance of Study

· The analysis on the relative frequency of occurrence of different subtypes of Expressives across the three subcorpora.

· The influence of certain contextual variables have a strong bearing on the Expressives employed by each group.

Methodology

· Quantitative

· Corpus:

83 university students belonging to one of the following groups: 64 undergraduate students taking an optional course on English Discourse, 9 undergraduate students from an evening group taking an obligatory course on Pragmatics and 10 post-graduate students following the Master’s Seminar on English Linguistics.

Each group of participants was subdivided into smaller groups of three or four students, randomly created by Virtual Campus itself.

· Procedures

Framework

Speech Acts (focusing on Expressive)

Method

1. These smaller groups had to carry out one or two collective assignments.

2. They were asked to do these collaborative exercises online, by means of an e-forum.

3. Once the activity was over, participants gave their written consent.

Findings

· The analysis uncovers two main similarities:

1. A predominance of other-oriented over self-oriented speech acts

2. The high degree of conventionalization found in the most recurrent subtypes: Compliments, Greetings, Thankings, and Apologies.

· The analysis also showed remarkable differences in terms of frequency of use, concrete linguistic realizations of individual subtypes, and the use of typographic marks.

LITERATURE REVIEW 3

Title of research

· Classifying Sentences as Speech Acts in Message Board Posts.

Author

· Ashequl Qadir and Ellen Riloff

Journal of publication

· Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

URL

· https://www.cs.utah.edu/~riloff/pdfs/emnlp11-speechacts.pdf

Purpose of study

· to distinguish between expository sentences and speech act sentences in message board posts

· to classify speech act sentences into four types: Commissives, Directives, Expressives, and Representatives

Statement of Problem

· The text genres (weblogs and social media sites) offer new challenges for natural language processing (NLP)

Research Questions

· How to distinguish between expository sentences and speech act sentences in message board posts?

· How to classify speech act sentences into four types: Commissives, Directives, Expressives, and Representatives?

Significance of Study

· Information extraction systems could benefit from filtering speech act sentences so that facts are only extracted from the expository text.

· Identifying Directive sentences could be used to summarize the questions being asked in a forum over a period of time.

· Representative sentences could be extracted to highlight the conclusions and beliefs of domain experts in response to a question.

Methodology

· Quantitative

· Corpus:

Randomly selected 150 Veterinary Information Network (VIN) message board threads from this collection on the three topics: cardiology, endocrinology, and feline internal medicine.

· Framework:

Searle’s Speech Acts (Commissives, Directives, Expressives, and Representatives)

· Procedures:

1. We did basic cleaning, removing html tags and tokenizing numbers.

2. Two human annotators were told to assign one or more speech act classes to each sentence.

3. For our first experiment, we created a speech act filtering classifier to distinguish sentences that contain one or more speech acts from sentences that do not contain any speech acts.

4. Our next set of experiments focused on labelling sentences with the four specific speech act classes: Commissive, Directive. Expressive, and Representative.

Findings

· We achieved good results for speech act filtering and the identification of Directive and Expressive speech act sentences.

· We found that Representative and Commissive speech acts are much more difficult to identify, although the performance of our Commissive classifier substantially improved with the addition of lexical, syntactic, and semantic features.

PART 3

No	Elements of Speech Acts	Frequency
1	Representatives	10
2	Directives	7
3	Commissives	3
4	Expressives	5
5	Declarations	0
Total		25

Summary of Speech Acts

FINDINGS

Based on the table of summary of the data from the corpus, it can be seen that assertives speech acts occurs repeatedly and most often used by the users in the discussion where it takes place ten times from the beginning until the end of the discussion. For instance, in corpus that we have chosen, a user said “Still, it’s a great tool to use no matter what level you are at” and another user makes a statement “The last product that I have used has been the app HelloTalk, which has probably been the most productive and most used tool I have used so far”. From these two examples, it can be seen that the users try to share their opinions about certain websites that they have been using.

The second frequent speech acts used in online forum is directives speech acts which happens about seven times during the discussion. For this part, it can be said that the users use directives speech acts when their aims are to give instruction or command to other users. Based on the analysis of the corpus, a user said “If you want to check out more language learning and my journey, peek over at my little blog called Lingualism”. What the user tries to convey from his statement is that he wants the other users to feel free visiting his blog.

As for expressives speech acts, it appears quite frequently in an online forum with the frequency of 5 occurrences throughout the corpus. By using expressive speech acts, the participants intend to express their feelings and emotions in the online forum. For example, one of the participant expresses their feelings by saying, “thanks so much” to express gratitude towards the suggestion made by other participants. Other participant also express his emotion by saying, “I was a bit envious” to show his envy towards his brother.

As for commissives speech acts, it also appears in an online forum with the frequency of 3 occurrences throughout the corpus. By using commissives speech acts, the participants show their intention to do something in the future. For instance, one of the participant said that, “I am going to Germany” in order to tell his future planning of going to Germany.

As for the last speech acts, declarations, it does not occur in the online forum. This shows that there are no utterances uttered which would causes changes happen to the world.

In conclusion, the most frequent speech acts is assertives in which the participants share their thoughts and opinions. Then, it is followed by directives in which the participants express what they want. They also express their emotions and feelings through the online forum. Finally, they also show what they intend to do later in the future.

DISCUSSION

Discussion on Research Question 1: What are the elements of speech acts which are frequently used in online forum?

The elements of speech acts which are frequently used in an online forum are assertive. The reason why assertive is the most speech acts used is because the online forum is a place where people leaves their opinions and suggestions. Assertive is kind of speech acts that states what the speaker believes to be the case or not. Thus, most of the people in the conversation leave their opinions on their experiences doing language exchanges online while some of them leave some suggestions on the best language sites that can be used in learning English language. Directive speech act is also frequently used in this conversation as it is a type of speech act that speakers use to get someone to do something. As can be seen from the corpus, Directive is used when a speaker wants the other people to give opinions or suggestions on which language sites are better in learning the language. This is because there are many language sites that can be found in today's time but some of them might not be useful to the learners.

Besides that, Expressive is also frequently used in this conversation. Expressive is the speech act that states what the speaker feel. While reading the comments and feedbacks from the readers, they tend to express what they feel on the suggestions and opinions given by the people in the online forum. Other than that, Commissive which helps the speakers to commit some future action is also used in this conversation. This is due to the reason that after reading the suggestions and opinions from other people, they are interested in trying the language sites suggested by others. Thus, some of the people in this conversation express what they intend to which they replied something that they will do in future. Declaration does not exist in this conversation as there is nothing to be changed due to the fact that they are just sharing their experiences on online language exchanges throughout the conversation.

Discussion on Research Question 2: What are the contextual cues that influence the choice of speech acts employed by the participants in online forum?

First and foremost, for the assertives, the example in the corpus is when a participant in the online forum responded to the question regarding the experience of doing language exchange online. He explained his experience as well as the sites that he used in exchanging language. The assertives element can be seen when he said “It’s a great introduction to online language exchange”. This shows that he believed that the suggested site which is sharedtalk.com is really a great site and he viewed his opinion by saying so. The contextual cue that triggers this speech act is the experience of the language exchange and the site that he used itself. This is because the participant is eager to share his experience using the online forum sites in order to exchange the language.

Besides that, for the directives, the example is when a participant introduced herself and wanted to join others as their friend. This can be seen when she said: “I want to be your friend”. The contextual cue that involves in this speech act is the previous conversations of other participants which then trigger the affected participant to say so. This may be because of the favor of joining the active conversations among the participants. The heat may be felt by her that finally made her start the conversation and requested to be their friend.

Apart from that, the commisives can be found in the corpus when a participant said “probably will do a review!” This shows that the participant may be unsure about doing the review, even though he does have the intention. The contextual cue for this is when he came up with an update about the new website that he found interesting. It means that he might not want to elaborate on the new website instead of just telling the other participants about the existence of the website.

Last but not least, the expressives can be vividly seen when another participant said “Yay!” that shows the happiness and joyfulness in him. He talked and shared about his experience in a language exchange site. The contextual cue here is his experience of buying a group ticket where the group was introduced by his German partner with a huge discount. This influence him to say “yay” to show that he was happy and delightful.

In relating both speech acts and contextual cues with the online forum, it is known that users in online forum have different experiences and encounter different situations. Thus, their experiences trigger and influence them to use particular speech acts in their utterances in the online forum. In other words, the discussions that take place in the online forum have something to do with what the users have experienced before. Different people would experience different things. Hence, online forums is a medium for them to exchange opinions, ideas and thoughts among them.

PART 5

Raw Data

What is your experience of doing language exchange online?

Any positive experience and challenges, would be great to share.

JD Davidson, I like language(s).

In short, online language exchange has left me with tons of experience, albeit much of it repetitive, but quite a bit deep as well. It's also left me with plenty acquaintances and a very small number of people I talk to regularly, which has been pretty great actually.

I first started using sharedtalk.com. Run by Rosetta Stone and slightly outdated, it's a great introduction to online language exchange. Using it made me a great conversationalist and gave me the ability to not only filter what forms of a language I wanted, but connect with learners my same age. I've been exchanging with one guy since September 2013, and it all started with a small conversation in which he made me laugh on sharedtalk, but the site is also kind of a hit or miss. The good thing is, you have a lot to choose from.

After sharedtalk I proceeded to Italki.com . The site really offered great tools such as peer editing and asking open questions, and great assets such as the community itself to help me out. Though, I did not get into it very much and did not end up making any good exchange partners. Still, it's a great tool to use no matter what level you're at, just be prepared for an excess of notifications depending on who you are, but none that are "off topic."

After Italki I then moved to Interpals. I discovered the youth community to have a much greater presence here, but of course that also brought a few problems. It's perfectly possible to find a great partner. There are great tools to use to narrow down to exactly what it is you want. A problem is that some users are not very serious about their endeavors or are quite "off topic." Never the less, I had many good mid-length running conversations there before I phased out a little. I might come back at some point, though.

After searching for a short while, I finally found a good way to practice in high quantity verbally. Verbling.com was the answer and it's a great tool even if you use it free. It may take a little bit to get over some beginner anxiety, but it's overall great. The only possible complaint about this is that it's possible in big groups for maybe 2 or 3 people to do all the talking, but that's practically bound to happen no matter what. It's just how groups work and you can specifically regulate the group size or even do a one on one anyway. No matter what, even just listening, you'll get a lot better with not only speaking anxiety, but verbal fluency as well.

The last product that I've used has been the app HelloTalk, which has probably been the most productive and most used tool I've used so far. I've called it a "gift from the language gods." It really does offer a great amount of useful tools. On HelloTalk, once can narrow users down even by city. You can also block users, which can prove useful. Content is regulated, so spam is nonexistent. You have verbal and textual tools ranging from voice-mail like chatting and modifiable speech synthesizers, to special windows to show textual corrections and the ability to translate any message sent immediately in the window just by tapping it. I've developed some great connections here and had some great conversations that would have been slightly more difficult otherwise.

Without all these tools, there is no way I'd be capable of speaking probably to even half the abilities I have now. Depending on how you use them, you can really, really excel or feel like you're not moving much at all. If you want to check out more language learning and my journey, peek over at my little blog called Lingualism.

Update:
New website I've found GoSpeaky.com, really great probably will do a review!

Mikhail Kotykhov

Excellent personal reflection and tips, JD, thanks so much for sharing.

I am also curious to learn more about your experience of using purely commercial sites like Verbling for free, even decided to ask a question on Quora about that.

Richard Yang

very informative

Lucy Mouna

How are you,

My name is miss Lucy and i want to be your friend.and so that i we tell you about me please contact me (lucy_mouna@yahoo.in) ok

Mikhail Kotykhov, Had enough of studying. It never worked. Learn by doing.

For me, the biggest challenge has been to find users that actually do start language exchange (meaning start talking on Skype).

Many users I have connected to are very keen on exchanging Skype IDs and e-mails, but after that they disappear and never actually start talking.

I was wondering if there any solution to this problem.

Is there an effective way for learners of English to overcome the false "fear of speaking English"?

Peter Liu, many year experience of language exchange

each platform has advantage and disadvantage. sharedtalk offers chat room which is good for entry learners to start chatting, and live voice chat for advanced learners for 1 to 1 conversations. Italki offers tools for writing essays and getting corrected, and kind of forum for users for mutual help. but live chat is not available there.
TALKEER is what i want to recommend here. TALKEER.com

offers live voice and video chat, essay writing and correction, and messaging, and personal profile and album. Tutoring and learning functions go paralel with socializings.

Dasha Marmalukova, Content marketing manager @ Createl.la

Now I’m using Bilingua: Your Language Exchange & Learning Companion, it helps me to find French native speakers with whom i can conversate on the topics i’m interested in.

Josef, Founder of a language exchange site GetTandem.com.

I only have a good experience with tandems. Lucky me!

I give you one that just crossed my mind. I mentioned my German tandem partner that I am going to Germany to visit a friend and that I will need to buy a train ticket. Not only did he suggested to use shared rides (mitfahrgelegenheit) to save money, he immediately found me a group, called the guy and arranged everything so when I found myself on the platform someone just approached me to take me on a group ticket and I went for 5 instead of 20 euros! Yay!

And things like this just happen. I also remember my brother telling me of various gifts he got from his Chinese tandem. I was a bit envious :).

Sun Shine, Majoring in English Language and Literature

Hello. I have never experienced any language exchange online before but one of my friends has suggested to use WeSpeke. It is a language site that enables the user to communicate with the English speakers across the globe. I have seen her communicating with English speaker and yes, it is quite interesting. It is just like Skype, but if you want to try something new, I would recommend WeSpeke to you. :)

Analyzed Data

REFERENCES

References

Ashequl Qadir, & Riloff, E. (2011). Classifying Sentences as Speech Acts in Message Board Posts. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Retrieved from https://www.cs.utah.edu/~riloff/pdfs/emnlp11-speechacts.pdf.

Carretero, M., Maíz-Arévalo, C., & Martínez, M. Á. (2015). An Analysis of Expressive Speech Acts in Online Task-oriented Interaction by University Students. Procedia - Social and Behavioral Sciences, 173, 186-190. Retrieved from http://ac.els-cdn.com/S1877042815013609/1-s2.0-S1877042815013609-main.pdf?_tid=6f9aa796-b59c-11e6-97ee-00000aacb362&acdnat=1480359597_c5a56d1a1237fa7ef6752cac09b4ea93

Shanti, A., Wah, L. K., Lajum, D., & Thayalan, X. (2015). Language Function and Knowledge Construction in Online Discussion Board Forums. Frontiers of Language and Teaching, 6. Retrieved from https://www.academia.edu/12785713/Language_Function_and_Knowledge_Construction_in_Online_Discussion_Board_Forums

Yule, G. (1998). Pragmatics. Oxford: Oxford Univ. Press.

Article 1

Frontiers of Language and Teaching …...………………………. Volume 6 (2015)

Language Function and Knowledge Construction in Online Discussion Board Forums

Alice Shanthi Universiti Teknologi MARA

Lee Kean Wah Universiti Malaysia Sabah

Denis Lajium Universiti Malaysia Sabah

Xavier Thayalan Universiti Teknologi MARA

Corresponding Author’s Email: alice_shanthi@yahoo.com.my

Abstract One form of online communication categorised as asynchronous is online discussion board forums. This paper presents the findings of a study on the interactive language functions and the phases of knowledge construction in asynchronous computer-mediated discourse (CMD) in an online discussion board forum in Malaysia. Data for the study was collected using purposive non-random data samples motivated by theme. It was found that the members of the online discussion forum used more assertive speech acts such as explaining, giving suggestions, agreeing, supporting, and answering to queries while interacting online. The second part of the study revealed that the most common phase of knowledge constructing was the act of sharing and comparing of opinion, as well as the discovery and exploration of dissonance or inconsistency among ideas, concepts or statements.

Keywords: Computer Mediated Communication, Computer Mediated Discourse, Asynchronous Computer-Mediated Discourse, Online interaction

Introduction The advent of the Internet has revolutionized the manner by which people communicate virtually. The growth of Internet service providers and the increasing number of virtual communication tools are testimonies of the popularity of virtual communication. It is said that people are attracted to communicate virtually because it reduces the constraints of time and distance, when people share information, experience and feelings with one another (van Varik & van Oostendorp, 2013). Additionally, Internet communication especially asynchronous does not require people to reply to messages instantly but allows them to make a more thought out reply or answer. This allows people who are engaged in asynchronous communication to pull in resources, knowledge and expertise in an online discussion forum. As such asynchronous communication helps people to engage in fruitful discussions such as overcoming problems through the knowledge acquired from the online discussion forums. Discussion board forums, newsgroups and email are some of the most common asynchronous mode of virtual communication. According to Annand ( 2011), discussion board forum is an interactive channel which allows users to be active and engage in a two-way communication. Furthermore, it is an inexpensive

Frontiers of Language and Teaching …...………………………. Volume 6 (2015)

way of information seeking for increasing efficiency and productivity (Miller, 2009). Thus it is a good tool for generating dialogue between and among users, and to solicit feedback from others. Hence, the focus of this study is discussion board forums that allow people to read and exchange comments while expressing views on a particular subject. Some of the popular discussion board forums in Malaysia are Lowyat.NET, mudah.my, cari.com.my and Webportal Malaysia. By studying the different discourse functions the researchers aim to determine how participants in asynchronous discussion board forum use language to share and elicit information, knowledge and experience unique to a Malaysian setting. Two general research questions addressed in this study are: 1. what are the types of language function mostly used in discussion board forums?, and 2. which phases of knowledge construction is mostly evident in discussion board forums?

Literature Review There are basically two modes of CMC; asynchronous and synchronous. Online communication that allows for a delay between message and response, meaning the people interaction need not be online at the same time is regarded as asynchronous, whereas communication that occur in real time, meaning the people interacting must be online at the same time is termed as synchronous communication. Discussion board forum is one of the most common types of asynchronous CMC which enables multiple users to engage in discussion with each other; read and exchange comments beyond real time. It has empowered people from diverse background to meet and engage in online discussion (Herring, 2004; Paolillo, 2011). In discussion board forums people share information and experiences thus creating a space where knowledge can be constructed, and shared (Thanasingam, Kit, & Soong, 2007). The information shared in online discussion board forums, unlike other forms of online discussion such as chat rooms or online conferencing, is stored in the form of messages in the archives of the forums, and they are arranged according to topic. Since past messages in the form of written text remain on the site along current messages, and are arranged according to topics, thus it makes it easier for the researcher to choose data according to the criteria set by the researcher (Byrne et al., 2013). Hence, making discourse analysis of participants’ text- based transcripts an effective technique for researchers to get a better understanding of the participants’ cognitive processes and of the phases that depict the quality of knowledge constructed and shared online (Gunawardena, Lowe, & Anderson, 1997; Wang, 2005; Akayoglu & Altun, 2009). Next looking at prior studies conducted locally in the academic setting using asynchronous mode of CMC found it to be of benefit to students. Kaur and Mohamed Amin (2010) investigated the effectiveness of the use of asynchronous computer-mediated communication in emails using quantitative method and semi-structured interview and found that asynchronous computer-mediated communication has the potential in aiding learners to take charge of their own learning. In another study, Thayalan and Shanthi (2011), conducted a study to investigate the social presence experienced by undergraduates in online forums for distance learning, the study found that interactivity in the discussion boards served the purpose to maintain contact among the students. The study also found that students actively read messages posted by others, but posted limited number of messages, thus limiting the amount of information shared online. Other studies conducted overseas based on the experience of CMC in online communities by analysis of their transcripts of online interaction suggest that in addition to the content of the discussion, the interactive strategies (denoted by the used of speech acts) used play important

Frontiers of Language and Teaching …...………………………. Volume 6 (2015)

role in determining active participations in online interaction (Pena-Shaff & Nicholls, 2004; Schallert et al., 2009; Means, Toyama, Murphy, Bakia, & Jones, 2009). Pertaining to studies conducted to investigate CMC and of its use for knowledge construction, Fitzpatrick (2010), cited a few studies that show favourable results such as studies conducted by Aviv, Erlich, Ravid, and Geva, 2003; Hawkey, 2003; Hiltz, Coppola, Rotter, and Turoff, 2000; Curtis and Lawson, 2001; McConnell, 2000; Thomas, 2002. These studies found evidences for higher order thinking and knowledge building through collaborative learning that happened through online interaction in web forums or online discussion boards. However, Paulus and Phipps, (2008), in their study found that students engaged in asynchronous discussion board as part of their course fulfilment did not go beyond surface- level discussion, and so questioned whether deep, meaningful discussions are even possible in asynchronous learning environments. Another study that came out unfavourable to asynchronous CMC was conducted by Lester and Paulus, (2011), their study found that the online interaction lacked “quality”. They stated that the lack of “quality talk” will be a notable problem because only when members of a virtual community actively give and comment on each other’s ideas can knowledge be constructed and shared in online learning. Problems with getting good participation for online discussion board forums from members was also encountered by Griffith (2009), who examined computer-mediated communication discussions in educational environments for evidence of learning, and found that attempts to use a voluntary asynchronous discussion forum among student members resulted in little to no participation. In short, besides the physical aspects of discussion board forums such as the number of participants, speed of internet and excess to computers and so on, the role of language in how it is used to perform actions (speech acts) plays an important role in getting participants to continue posting messages, and thus continue to share, elicit and exchanging information. This study hopes to look into the discourse function of the messages themselves to have a better understanding of the participants’ use of language to communicate and construct knowledge online.

Method Data for the study was the actual instances of written messages collected from a public online discussion board forum set in Malaysia. This particular forum website discusses issues pertaining to everyday Malaysian life. Data was purposively selected texts from the forum which was used to answer the research questions being investigated. According to Herring (2004), qualitative analysis of text based Computer Mediated Discourse (CMD) is usually based on individual themes as the unit for analysis, rather than the physical linguistic units (e.g., word, sentence, or paragraph). Therefore during the process of coding and tagging, utterances that made up of a single word, a phrase, a sentence, or a paragraph was tagged according to the language function they were performing such as to express an opinion, to question, to make a suggestion and so on. With that, the starting point for analysing the data is to categorise the text-based utterances according to Searle’s (1976) Speech Acts taxonomy to explore the interactive language function of the messages. Based on these categories, the data was recoded and tagged again to study the gradual process of co-construction of knowledge according to descriptors indicated by Gunawardena, Lowe and Anderson’s (1997) Interaction Analysis Model (IAM). Messages from the three online discussion board forums which comprise of different interest groups (IG) were analysed as stated in Table 1.

Frontiers of Language and Teaching …...………………………. Volume 6 (2015)

Table 1: Data set selected for the study Interest group Topic Messages No. of words Fast and Furious Proton Saga FLX Very High fuel Consumption 92 4145 Finance, Business and Investment House Geneva Malaysia V2 130 4786 Computer Technical Support Folding@Malaysia needs your help! 62 3099

Total

284

11530

Findings and Discussion Research Question 1: What types of language functions are mostly used while communicating online through discussion board forums? Messages from the three different interest groups were coded and analysed to study the function of language used to communicate online, following Searle’s (1976) category of speech act analysis: assertive, directive, commissive, expressive and declaratives. This yielded a total of 492 speech acts (refer to Table 2). In total, almost half of the language function used to communicate by members from the different interest groups was assertive (47.6%) in nature, roughly 32% was directive, and expressive stood at 17 %, and finally, almost five per cent of participants’ speech acts consisted of commissive acts. No declarative acts were found in this sample.

Table 2: Functions of Utterance According Speech Acts Types of Speech Acts IG1 – Fast and Furious IG2 - Finance, Business and Investment House IG3- Computer Technical Support

TOTAL

Assertive 83 103 48 234 47.6 Directive 61 69 26 156 31.7 Commissive 7 9 5 21 4.3 Expressive 28 41 12 81 16.5 Declaration 0 0 0 0 0 IG – interest group

The study found that while communicating online in discussion board forums the function or purpose of language used were more assertive in nature. Assertives are primarily used to share information with other members of the group by explaining, describing, stating opinion, reflecting, disputing, making predictions and so on. They are mainly statements that are neither true or false, accurate or inaccurate (Searle, 1976), but rather these are the speaker’s utterances that are merely stating his/her mind. An assertive act is often described as an act to express the speaker’s belief and intention.

Frontiers of Language and Teaching …...………………………. Volume 6 (2015)

The study also found that directive speech acts also play an important role in virtual community members’ discourse. Directive speech act that was commonly used in the forum was questioning. This action was used in order to elicit direct responses from those seeking information or help. As the directive speech acts focus on getting the receiver to do something (Searle, 1976), besides the action of questioning, this study found that directives such as suggesting, requesting or asking, inviting, insisting and so on, were used by members. These actions were used especially by those who have better knowledge of the subject matter to provide members who needed information with helpful instructions either to overcome their problem or new knowledge for better understanding of the subject-matter at hand. Expressive speech acts were also relatively frequent in the discussion board messages, comprising 16.5% of the speech act. Through the display of emotions and feelings (e.g., " haha i can't feed my car 97 fuel, i even have problem feeding myself every month", "ai yo yo.... this poor guy!", “STOP MILKING SYMPATHY AND ACCEPT YOU LOSS QUIETLY!!!!!!!!), participants not only inform other members of their personal opinions, but they also give a glimpse of their emotional state (e.g., inspired, happy, sad, angry, stressed). Next, by posting commissive based messages, members performed acts such as promising, refusing, offering and/or volunteering to help other members in the discussion board forum. Members of the forum revealed their future plans, mostly based on the new information/knowledge gathered from the discussion (e.g., "ok i will change to lighter oil for my next service"). In conclusion, by using Searle’s speech acts, the taxonomy has provided this study important insight into how messages from discussion boards were built linguistically. This study found that in the process of discussion the members used mainly used assertive speech acts to share information. They also asked questions in order to get information and at the same time get other members to respond to them with their personal experiences and knowledge so that the other members in the virtual community can share their knowledge and experience.

Research Question 2: Which phases of knowledge construction is evident in discussion board forum postings/messages? The data selected to answer research question two is the same as that which was selected to answer research question one (refer to Table 1). As shown in Table 3, 109 (44 %) comments were categorized as sharing and comparing of opinions (Phase I level). 84 (33.9 %) stating disagreements, asking and answering questions (Phase II level), 36 (14.5 %) displaying negotiation of meaning and co-constructing knowledge (Phase III level), 11 (4.4%) messages showed evidence that participants’ perception have changed as a result of the interaction in the discussion board (Phase IV level), and finally 8 (3.2%) refers to messages that show evidence of accommodation of new knowledge (or its synthesis) on the part of the participants of the discussion board forums.

Frontiers of Language and Teaching …...………………………. Volume 6 (2015)

Table 3: Phases of knowledge construction in online discussion board forums Phases of knowledge construction IG1 IG2 IG3 TOTAL % Ph I- Sharing and comparing of opinion 29 59 21 109 44.0 Ph II - The discovery and exploration of dissonance or inconsistency among ideas, concepts or statements 34 36 14 84 33.9 Ph III - Negotiation of meaning co- construction of knowledge 13 17 6 36 14.5 Ph IV - Testing and modification of proposed synthesis or co-construction 6 2 0 11 4.4 Ph V - Agreement statement (application of newly constructed meaning. 6 4 1 8 3.2 Total 88 118 37 248 IG1 – Fast and Furious IG2 - Finance, Business and Investment House IG3- Computer Technical Support

The findings signify that the most common activity for constructing and sharing knowledge was exchanging ideas, opinions and experiences (44%). As most members shared a common background/interest it seems natural that they shared and exchanged their experiences, resources and/or information which helped and guided the forum members to have a better understanding of the subject-matter they were discussing, and in the process they constructed and shared new knowledge Next, 33.9% of the comments posted in the discussion forums were clarification comments (level II). When members experienced conflict and inconsistency in ideas, they had to negotiate meaning, making it possible for higher levels of knowledge construction to happen. In fact in IG1 there were more phases II level of knowledge construction compared to phases I, suggesting this group of people were constructing new knowledge by asking and answering questions to clarify the source and extent of disagreement. As such, suggesting that the online forum has been effective in engaging members of the interest group to critically review their peers’ feedback on the subject-matter being discussed. Members also at times counter-argued and sometimes criticised or provoked reactions, these actions raised the opportunities for further discussions and exchange of ideas. Phase 3 level comments, though small in number (14.5%) suggests that the forum activity has enabled some members to try to achieve greater understanding of the knowledge constructed. Through exercising higher mental functions such as negotiating or clarifying (level II), they have tried to process and construct more accurate feedback on the subject-matter (level III). The findings on levels of knowledge constructed also suggest that discussion forums promote the construction of critical feedback. These findings support Thanasingam, Kit, and Soong's (2007), claim that tools such as discussion forums facilitate knowledge construction through collaboration. An almost similar lower percentage of phase IV and V level of knowledge construction were also detected in the messages taken from the discussion board forums. There were 11(4.4%) comments of phase 4, and 8 (3.2%) comments that were observing knowledge construction at phase 5. These messages show evidence of accommodation of new knowledge (or its synthesis) on the part of the participants.

Frontiers of Language and Teaching …...………………………. Volume 6 (2015)

100

Conclusion In regards to the interactive language functions of the language used in the discussion board forums, it was observed that assertive speech acts were most frequently present in the online interaction followed by directives. From this it can be concluded, with respect to the first research question, that speech acts in which the members of the virtual community constructed and shared knowledge, used more assertive acts such as explaining, giving suggestions or opinion, agreeing, reporting or stating, supporting, conclusions, complaining (indirectly-expression of dissatisfaction) and answering to queries. Second, they also used directive speech acts such as to question, to ask, to advice, and/or to instruct other members of the virtual community in order to construct and share new knowledge. This study also showed that forums used as data for this study have evidence of the different phases of knowledge construction, therefore proving that knowledge is indeed constructed and shared in the online forums. The findings of this study will aid educators and academicians in the pedagogical aspect in using discussion boards in the teaching and learning process. It is hoped the findings of this study can be extended to the learning environment because over the years the use of internet technology in classroom has gained popularity, and this can be seen in the rapid growth in research into computer mediated discourse (Fitzpatrick & Donnelly, 2010).

References

Akayoglu, S., & Altun, A. (2009). The Functions of Negotiation of Meaning in Text-Based CMC. In Handbook of Research on ELearning Methodologies for Language Acquisition (pp. 291–306). IGI Global. Retrieved from http://www.mendeley.com/research/functions-negotiation-meaning-text-based-cmc/

Annand, D. (2011). Social Presence within the Community of Inquiry Framework. The International Review of Research in Open and Distance Learning, 12(5), 40–56. Byrne, C. L., Nei, D. S., Barrett, J. D., Hughes, M. G., Davis, J. L., Griffith, J. a., … Mumford, M. D. (2013). Online Ideology: A Comparison of Website Communication and Media Use. Journal of Computer-Mediated Communication, 18(2), 25–39. doi:10.1111/jcc4.12003

Fitzpatrick, N., & Donnelly, R. (2010). Do You See What I Mean ? Computer-Mediated Discourse Analysis Do You See What I Mean ? Computer- mediated Discourse Analysis (pp. 0–17). IGI Global. doi:10.4018/978-1-61520-879-1.ch004

Herring, S. C. (2004). Content Analysis for New Media : Rethinking the Paradigm (pp. 47– 66). Bloomington. Lester, J. N., & Paulus, T. M. (2011). Accountability and public displays of knowing in an undergraduate computer-mediated communication context. Discourse Studies, 13(6), 671–686. doi:10.1177/1461445611421361

Means, B., Toyama, Y., Murphy, R., Bakia, M., & Jones, K. (2009). Evaluation of Evidence- Based Practices in Online Learning. Structure, 15(20), 94. Retrieved from http://newrepo.alt.ac.uk/629/ Paolillo, J. C. (2011). “ Conversational ” Codeswitching on Usenet and Internet Relay Chat. Language@Internet, 8, article 3.

Paulus, T. M., & Phipps, G. (2008). Approaches to case analyses in synchronous and asynchronous environments. Journal of Computer-Mediated Communication, 13(2), 459–484. doi:10.1111/j.1083-6101.2008.00405.x Pena-Shaff, J. B., & Nicholls, C. (2004). Analyzing student interactions and meaning construction in computer bulletin board discussions. Computers & Education, 42(3), 243–265. doi:10.1016/j.compedu.2003.08.003

Frontiers of Language and Teaching …...………………………. Volume 6 (2015)

Schallert, D. L., Chiang, Y. V., Park, Y., Jordan, M. E., Lee, H., Janne Cheng, A.-C., … Song, K. (2009). Being polite while fulfilling different discourse functions in online classroom discussions. Computers & Education, 53(3), 713–725. doi:10.1016/j.compedu.2009.04.009 Searle, J. R. (1976). A classification of illocutionary acts ’. In Language in society (Vol. 5, pp. 1–23). Cambridge University Press.

Thanasingam, S., Kit, S., & Soong, A. (2007). Interaction patterns and knowledge construction using synchronous discussion forums and video to develop oral skills, 1002–1008.

Thayalan, X., & Shanthi, A. (2011). Qualitative Assessment of Social Presence in Online Forums. In IEEE Colloquium on Humanities, Science and Engineering research (pp. 438–440). Van Varik, F. J. M., & van Oostendorp, H. (2013). Enhancing Online Community Activity: Development and validation of the CA framework. Journal of Computer-Mediated Communication, n/a–n/a. doi:10.1111/jcc4.12020

Wang, H. (2005). A Qualitative Exploration of the Social Interaction in an Online Learning Community Haidong Wang, 1, 79–88.

Article 2

An analysis of expressive speech acts in online task-oriented

interaction by university students

Marta Carreteroa

*, Carmen Maíz-Arévaloa

, M. Ángeles Martíneza

Universidad Complutense de Madrid, Facultad de Filología, Madrid, 28040, Spain

Abstract

This study explores the use of Expressive speech acts in a corpus of online interaction involving three groups of university students in the area of English Linguistics. The analysis focuses on the relative frequency of occurrence of different subtypes of Expressives across the three subcorpora. The influence of certain contextual variables such as multiculturality, age, linguistic proficiency and group size seems to have a strong bearing on the Expressives employed by each group.

Peer-review under responsibility of Universidad Pablo de Olavide.

Keywords: expressive speech acts; online collaborative writing; multiculturality.

1. Online collaborative writing

Online collaborative writing is the term used to refer to the computer-mediated joint production of a text by two or more authors with shared ownership of the product (Storch, 2011). The use of online collaboration for pedagogical purposes is connected to collaborative learning theories (Dillenbourg, 1999), in turn deeply linked to socio-cultural and interactionist views of the learning process (Piaget, 1928; Vygotsky, 1978). Among the many benefits of collaborative learning we could mention stronger learner motivation and improved social dynamics (Neumann & Hood, 2009, p. 383), as well as higher involvement (Cole, 2009) and enhanced learner autonomy and control over the learning process (Blake, 2011, p. 25; Leeming & Danino, 2012, p. 54). Blended learning environments, now frequent in higher education settings using a virtual campus, are those that combine face-to-face and computer-mediated interaction. From the point of view of discourse organization, online

* Corresponding author. Tel.:+3491-394-53-83. Fax: +3491-394-57-62.

E-mail address: mcarrete@ucm.es

(http://creativecommons.org/licenses/by-nc-nd/4.0/).

Peer-review under responsibility of Universidad Pablo de Olavide.

Marta Carretero et al. / Procedia - Social and Behavioral Sciences 173 ( 2015 ) 186 – 190 187 written interaction differs from face-to-face communication in several main respects. One is related to the asynchronous nature of computer-mediated communication (Herring et al., 2013). Secondly, online interaction cannot rely on many of the multimodal resources used in face-to-face settings, such as eye-to-eye contact, prosodic features, gestures, or body language (Herring et al., 2013), and this endows the ongoing written production with a strong dependence on linguistic organization, particularly when the conveyance of emotion is concerned. Finally, many computer modes –wikis, e-forums, or blogs –imply the permanent recording of the interaction in the form of a history log which allows privileged access by analysts to the complete transcription of the linguistic production of the participants.

These specific features of online communication are particularly relevant to the present study, which focuses on the online written collaboration of three groups of undergraduate and post-graduate university students interacting in pedagogical e-forums for the subjects Discourse and Text (D&T), Pragmatics (Pr), and Seminar on English Linguistics (SL), at the Complutense University of Madrid, Spain. Although the language used is strongly taskoriented, the analysis of the e-forum logs and their resulting three written sub-corpora reveals a high presence of Expressives that seem to perform the communicative function of making up for the absence of face-to-face resources, in terms of smoothing transactional and task-oriented communication, and building rapport among participants. The research questions are the following:

a) Are Expressives equally frequent across the three sub-corpora, and are they similarly distributed in terms of

sub-types such as Apologies, Thankings, Compliments, and so forth?

b) If this is not the case, which are the contextual variables with a bearing on the choices made by participants?

2. Expressive speech acts

Expressives are one of the basic speech act types proposed in Searle’s (1976) seminal classification, together with Representatives, Directives, Commissives and Declaratives. Searle gives Apologizing, Congratulating and Thanking as examples of Expressives. A preliminary study of the data uncovered the need for the scope of Expressives to be enlarged, since many speech acts were considered intuitively as expressive but did not fit into any of Searle’s types.

Hence, other references were consulted: Bach & Harnish (1979), Thomas (1995), Verschueren (1999), and especially Weigand (2010), who proposes a speech act classification based on the notions of belief and desire. We adopted as a criterial feature the concern with desire, or the predominance of desire over belief. The resulting corpus-driven taxonomy included Expressives of two general types: self-centred, pertaining to the speaker / writer’s feelings; and other-centred, focusing on the addressee’s feelings. Self-centred Expressives include:

Likings, which express positive emotional reactions (1); Concerns, which express worries (2); and Wishes, which claim that the truth of the proposition should (or should not) be the case (3):

1. I really like the classification. (SL)

2. I cannot recognize PCIs nor GCIs... It is difficult to see them... the easiest are the presuppositions xD (Pr)

3. I wanted to answer to the last part of question two and question three but I really cannot think any longer. (Pr)

Other-oriented Expressives include Apologies, Compliments and Thankings, which correspond to Searle’s expressives mentioned above, as well as other subtypes: Reassurings, which aim at comforting the addressee by diminishing his/her feeling of guilt (4); and Reproaches, which may be seen as the negative counterpart of Compliments (5):

4. Don't worry because everything is finished and sent (D&T)

5. I feel like I'm having pretty much of a monologue here… (D&T)

Finally, our scope of Expressives also includes speech acts of other kinds that focus on the speaker/writer’s emotional involvement by linguistic or typographical means, concretely interjections such as oh, exclamation marks, emphatic do, accumulation of evaluative expressions, repetition of a letter or of a question mark, capitalization, and the use of emoticons (Yus, 2011). Utterances containing any of these marks have also been considered as Expressives, in addition to the subtypes described above. In the corpus, occurrences were found of reinforced Greetings (6), Assertions (7), Directives (8) and Commissives (9). Due to its importance in the three subcorpora, the category of Agreement was split from Assertions at large and conferred the status of an individual subtype (10).

6. Hello everybody! (D&T)

7. I have finished my part! (D&T)

8. Suggestions would be very welcome!! (Pr)

9. I'm going to try to post my ideas tomorrow! (D&T)

10. I agree with everything you've said :D (SL)

3. Methodology

The data used in the study consists in a 79,699-word long corpus, made up of three subcorpora containing the eforum written interaction of 83 university students belonging to one of the following groups: 64 undergraduate students taking an optional course on English Discourse and Text (Subcorpus D&T: 40,226 words) (Martínez,2014); 9 undergraduate students from an evening group taking an obligatory course on Pragmatics (Subcorpus Pr: 14,119 words) (Carretero, 2014); and 10 post-graduate students following the Master’s Seminar on English Linguistics (Subcorpus SL: 25,354 words) (Maíz-Arévalo, 2014). Each group of participants was subdivided into smaller groups of three or four students, randomly created by Virtual Campus itself. Each of these smaller groups had to carry out one or two collective assignments. However, they were specifically asked not to do these collaborative exercises in the traditional face-to-face way but online, by means of an e-forum where they could negotiate and discuss for one week before producing a final written report. None of the participants was informed a priori of their participation in this research project, in order to avoid unnaturally biased interactions. However, once the activity was over, participants gave their written consent. In any case, pseudonyms were used to preserve their identity.

The unbalanced number and age of participants could not be controlled for the present research but implied two interesting variables to take into account when analysing the results. A third variable was the participants’ level of English. Although quite advanced in general terms, the undergraduate students’ level ranged from B2 to C1 according to the Common European Framework of Reference (2001), whilst the postgraduates’ linguistic proficiency ranged between C1 and C2. A fourth major difference was the high degree of interculturality present in the Master group, which included Russian, Korean, Arabic, Polish and Spanish students, as opposed to the undergraduate groups, consisting mostly of Spaniards.

4. Data analysis

Table 1 presents the Expressive subtypes found in the corpus, accompanied by the number of tokens (N), together with the corresponding percentages across the three subcorpora. The analysis uncovers two main similarities: the first is a predominance of other-oriented over self-oriented speech acts. This tendency may well be due to the students’ concern with assuring a good rapport, rather than focusing on their own feelings. Another reason might be the blended nature of the learning context. These otherfocused Expressives are enhanced in the data by the use of typographic signs like exclamation marks or emoticons, as in the Thanking in (11) or the Apology in (12):

(11) Thanks, Anat for offering to put the analysis in the final document! - (SL)

(12) Hi, sorry for being this late, I've been having problems with my internet connection at home - (Pr)

Marta Carretero et al. / Procedia - Social and Behavioral Sciences 173 ( 2015 ) 186 – 190 189

Table 1. Cross-comparative view of results.

Speech acts Corpus Pr % Corpus SL % Corpus D&T %

Apology 25.56 (N=34) 10.72 (N=25) 10.90 (N=52)

Compliment 16.54 (N = 22) 21.00 (N=49) 14.89 (N=71)

Greeting 9.02 (N = 12) 13.73 (N=32) 16.14 (N=77)

Wish 6.77 (N = 9) 3.43 (N=8) 17.20 (N=82)

Thanking 6.77(N = 9) 18.88 (N=44) 19.91 (N=95)

Liking 0.00 (N= 0) 4.29 (N=10) 0.42 (N=2)

Concern 10.53 (N = 14) 1.71 (N = 4) 2.93 (N=14)

Reproach 4.51 (N = 6) 0.85 (N = 2) 5.66 (N=27)

Directive 7.52 (N = 10) 13.30 (N=31) 4.40 (N=21)

Agreement 4.51 (N = 6) 3.00 (N=7) 3.14 (N=15)

Assertion 3.01 (N = 4) 6.86 (N=16) 1.44 (N=7)

Commissive 2.26 (N = 3) 1.71 (N=4) 2.10 (N=10)

Reassuring 3.01 (N = 4) 0.42 (N =1) 0.63 (N=3)

TOTAL 100 (N=133) 100 (N=233) 100 (N=476)

The second similarity lies in the high degree of conventionalization found in the most recurrent subtypes:Compliments often contain adjectives such as excellent, fine, good, interesting or perfect; Greetings, hello, hi or hey; Thankings, thank you or thanks; and Apologies, sorry. This conventionalization may be accounted for by the formulaic nature of these expressions as well as the priority given to the performance of the task, which needed quick and effective rapport building other-centeredness, as in example (13):

11. Hey guys! (D&T)

In spite of these similarities, the three subcorpora differ in some respects. For instance, expressions of Concern are overwhelmingly higher in Subcorpus Pr than in the other two, as can be observed in table 1. Subcorpus SL, on the other hand, displays a remarkably higher presence of Compliments, Directives, and Assertions, while ranking below average in Wish, Concern, and Reproach. Finally, Subcorpus D&T presents above average percentages in Greetings and Wishes, and is also high in Thanks and Reproaches, but has a relatively lower presence of Directives, Assertions, and Agreements typographically marked as Expressives. The reasons for these differences seem to be related to the four contextual variables mentioned in Section 3: general group size, age, linguistic proficiency and cultural homogeneity.

The larger size of the morning group (64 students) in contrast to the rather small evening groups (9 and 10 students, respectively) could have accounted for the higher number of Reproaches in Subcorpus D&T. In this group, face-threat might have been perceived with lower intensity, given the lack of real face-to-face contact, as in (14):

12. I hope that the other two participants of the group say something, if not… I think we must talk to T[eacher]

(D&T)

Age may have a bearing on the differences between the two undergraduate subgroups: the evening students in Subcorpus Pr often produce Concerns and Apologies. By contrast, their morning D&T younger counterparts seem to favour “wishful thinking”, hence the high frequency of conventionalized Wishes, as in (15):

13. I hope you can give me an idea and do it together (D&T)

As for linguistic proficiency, it appears to be connected with the different percentage of Compliments across the three subcorpora: they rank slightly higher in the master Subcorpus SL (21%), and gradually decrease along the proficiency scale, with 16.54% in Subcorpus Pr and 14.89% in Subcorpus D&T.

Finally, interculturality seems to influence the preference for typographic signs like emoticons in Subcorpus SL, produced by students belonging to very different nationalities, who resorted to typographic signs to build rapport, as in example (16):

14. For question 2, I tried to summarize before the table. It seems logical to put words before the table. ;-) (SL)

In addition, this multicultural group seems particularly fond of Thanking in its formulaic realization. It could be argued that these Master students issue thanks on a British English basis, since they were perfectly aware that English was being used as a lingua franca.

5. Conclusions

The analysis carried out in this paper covered expressive speech acts in a corpus consisting of e-forum history logs produced by three groups of students in English linguistics. The study revealed two common features: predominance of other-oriented over self-oriented Expressives and a high degree of conventionalization in the linguistic realization of the four most frequent subtypes (Thankings, Apologies, Greetings and Compliments). The analysis also showed remarkable differences in terms of frequency of use, concrete linguistic realizations of individual subtypes, and the use of typographic marks. These differences may be accounted for by the influence of contextual variables, namely group size, age, linguistic proficiency and cultural homogeneity.

References

Bach, K., & Harnish, R. M. (1979). Linguistic communication and speech acts. Cambridge, USA and London, UK: MIT Press.

Blake, R. J. (2011). Current trends in online language learning. Annual Review of Applied Linguistics, 31, 19-35.

Carretero, M. (2014). The Virtual Campus in the teaching of English pragmatics in BA’s last year: a description of two e-forum-based activities.

In M. Ángeles Martínez (coord.) The use of institutional e-forums for online collaborative writing activities in the field of discourse analysis

and English linguistics. Universidad Complutense de Madrid: Proyectos de Innovación y Mejora de la Calidad Docente (PIMCD2012-

2013) (pp. 18-29). Madrid: Editorial del Economista.

Cole, M. (2009). Using wiki technology to support student engagement: Lessons from the trenches. Computers and Education, 52, 141-146.

Dillenbourg, P. (1999). Collaborative-learning: Cognitive and computational approaches. Oxford: Elsevier.

Herring, S. C., Stein D. & Virtanen T. (2013). Handbook of pragmatics of computer-mediated communication. Berlin: Mouton.

Leeming, D. E. & Danino N. (2012). Breaking barriers: A case study of culture and Facebook usage. Journal of Modern Languages and

International Studies, 1(1), 52-64.

Maíz-Arévalo, C. (2014). Using e-forums to develop intercultural pragmatic awareness. In M. Ángeles Martínez (coord.) The use of institutional

e-forums for online collaborative writing activities in the field of discourse analysis and English linguistics. Universidad Complutense de

Madrid: Proyectos de Innovación y Mejora de la Calidad Docente (PIMCD2012-2013) (pp. 43-53). Madrid: Editorial del Economista.

Martínez, M. Á. (2014). Teaching discourse analysis through online collaborative writing in a tertiary educational setting: a focus on assessment.

In M. Ángeles Martínez (coord.) The use of institutional e-forums for online collaborative writing activities in the field of discourse analysis

and English linguistics. Universidad Complutense de Madrid: Proyectos de Innovación y Mejora de la Calidad Docente (PIMCD2012-

2013) (pp. 62-78). Madrid: Editorial del Economista.

Neumann, D. L. & Hood M. (2009). The effects of using a wiki on student engagement and learning of report writing skills in a university

statistics course. Australasian Journal of Educational Technology, 25(5), 382-398.

Piaget, J. (1928). The language and thought of the child. New York: Harcourt.

Searle, J. R. (1976). A classification of illocutionary acts. Language in Society, 5(1), 1-23.

Storch, N. (2011). Collaborative writing in L2 contexts: Processes, outcomes, and future directions. Annual Review of Applied Linguistics, 31(1),

275-288.

Thomas, J. (1995). Meaning in interaction: An introduction to pragmatics. London and New York: Longman.

Verschueren, J. (1999). Understanding pragmatics (vol. 31). London: Arnold.

Vygotsky, L. S. (1978). Mind and society: The development of higher mental processes. Cambridge, Ma: Harvard University Press.

Weigand, E. (2010). Dialogue: The mixed game (vol. 10). Philadelphia: John Benjamins Publishing.

Yus, F. (2011). Cyberpragmatics: Internet-mediated communication in context . Philadelphia: John Benjamins Publishing.

Article 3

In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP-2011). Classifying Sentences as Speech Acts in Message Board Posts

Ashequl Qadir and Ellen Riloff School of Computing University of Utah Salt Lake City, UT 84112 {asheq,riloff}@cs.utah.edu

Abstract

This research studies the text genre of mes- sage board forums, which contain a mix- ture of expository sentences that present fac- tual information and conversational sentences that include communicative acts between the writer and readers. Our goal is to create sentence classiﬁers that can identify whether a sentence contains a speech act, and can recognize sentences containing four different speech act classes: Commissives, Directives, Expressives, and Representatives. We con- duct experiments using a wide variety of fea- tures, including lexical and syntactic features, speech act word lists from external resources, and domain-speciﬁc semantic class features. We evaluate our results on a collection of mes- sage board posts in the domain of veterinary medicine.

1 Introduction

In the 1990’s, the natural language processing com- munity shifted much of its attention to corpus-based learning techniques. Since then, most of the text cor- pora that have been annotated and studied are collec- tions of expository text (e.g., news articles, scientiﬁc literature, etc.). The intent of expository text is to present or explain information to the reader. In re- cent years, there has been a growing interest in text genres that originate from Web sources, such as we- blogs and social media sites (e.g., tweets). These text genres offer new challenges for NLP, such as the need to handle informal and loosely grammatical text, but they also pose new opportunities to study

discourse and pragmatic phenomena that are funda- mentally different in these genres. Message boards are common on the WWW as a forum where people ask questions and post com- ments to members of a community. They are typ- ically devoted to a speciﬁc topic or domain, such as ﬁnance, genealogy, or Alzheimer’s disease. Some message boards offer the opportunity to pose ques- tions to domain experts, while other communities are open to anyone who has an interest in the topic. From a natural language processing perspective, message board posts are an interesting hybrid text genre because they consist of both expository text and conversational text. Most obviously, the conver- sations appear as a thread, where different people respond to each other’s questions in a sequence of posts. Studying the conversational threads, however, is not the focus of this paper. Our research addresses the issue of conversational pragmatics within indi- vidual message board posts. Most message board posts contain both exposi- tory sentences as well as speech acts. The person posting a message (the “writer”) often engages in speech acts with the readers. The writer may explic- itly greet the readers (“Hi everyone!”), request help from the readers (“Anyone have a suggestion?”), or commit to a future action (“I promise I will report back soon.”). But most posts contain factual infor- mation as well, such as general knowledge or per- sonal history describing a situation, experience, or predicament. Our research goals are twofold: (1) to distin- guish between expository sentences and speech act sentences in message board posts, and (2) to clas-

sify speech act sentences into four types: Com- missives, Directives, Expressives, and Representa- tives, following Searle’s original taxonomy (Searle, 1976). Speech act classiﬁcation could be useful for many applications. Information extraction sys- tems could beneﬁt from ﬁltering speech act sen- tences (e.g., promises and questions) so that facts are only extracted from the expository text. Identifying Directive sentences could be used to summarize the questions being asked in a forum over a period of time. Representative sentences could be extracted to highlight the conclusions and beliefs of domain experts in response to a question. In this paper, we present sentence classiﬁers that can identify speech act sentences and classify them as Commissive, Directive, Expressive, and Repre- sentative. First, we explain how each speech act class is manifested in message board posts, which can be different from how they occur in spoken dia- logue. Second, we train classiﬁers to identify speech act sentences using a variety of lexical, syntactic, and semantic features. Finally, we evaluate our sys- tem on a collection of message board posts in the domain of veterinary medicine.

2 Related Work

There has been relatively little work on applying speech act theory to written text genres, and most of the previous work has focused on email classi- ﬁcation. Cohen et al. (2004) introduced the notion of “email speech acts” deﬁned as speciﬁc verb-noun pairs following a pre-designed ontology. They ap- proached the problem as a document classiﬁcation task. Goldstein and Sabin (2006) adopted this no- tion of email acts (Cohen et al., 2004) but focused on verb lexicons to classify them. Carvalho and Cohen (2005) presented a classiﬁcation scheme us- ing a dependency network, capturing the sequential correlations with the context emails using transition probabilities from or to a target email. Carvalho and Cohen (2006) later employed N-gram sequence fea- tures to determine which N-grams are meaningfully related to different email speech acts with a goal towards improving their earlier email classiﬁcation based on the writer’s intention. Lampert et al. (2006) performed speech act clas- siﬁcation in email messages following a verbal re-

sponse modes (VRM) speech act taxonomy. They also provided a comparison of VRM taxonomy with Searle’s taxonomy (Searle, 1976) of speech act classes. They evaluated several machine learning al- gorithms using syntactic, morphological, and lexi- cal features. Mildinhall and Noyes (2008) presented a stochastic speech act model based on verbal re- sponse modes (VRM) to classify email intentions. Some research has considered speech act classes in other means of online conversations. Twitchell and Jr. (2004) and Twitchell et al. (2004) employed speech act proﬁling by plotting potential dialogue categories in a radar graph to classify conversa- tions in instant messages and chat rooms. Nas- tri et al. (2006) performed an empirical analysis of speech acts in the away messages of instant mes- senger services to achieve a better understanding of the communication goals of such services. Ravi and Kim (2007) employed speech act proﬁling in online threaded discussions to determine message roles and to identify threads with questions, answers, and unanswered questions. They designed their own speech act categories based on their analysis of stu- dent interactions in discussion threads. The work most closely related to ours is the re- search of Jeong et al. (2009) on semi-supervised speech act recognition in both emails and forums. Like our work, their research also classiﬁes indi- vidual sentences, as opposed to entire documents. However, they trained their classiﬁer on spoken telephone (SWBD-DAMSL corpus) and meeting (MRDA corpus) conversations and mapped the la- belled dialog act classes of these corpora to 12 di- alog act classes that they found suitable for email and forum text genres. These dialog act classes (ad- dressed as speech acts by them) are somewhat differ- ent from Searle’s original speech act classes. They also used substantially different types of features than we do, focusing primarily on syntactic subtree structures.

3 Classifying Speech Acts in Message Board Posts

3.1 Speech Act Class Deﬁnitions

Searle’s (Searle, 1976) early research on speech acts was seminal work in natural language processing that opened up a new way of thinking about con-

versational dialogue and communication. Our goal was to try and use Searle’s original speech act def- initions and categories as the basis for our work to the greatest extent possible, allowing for some inter- pretation as warranted by the WWW message board text genre. For the purposes of deﬁning and evaluating our work, we created detailed annotation guidelines for four of Searle’s speech act classes that commonly occur in message board posts: Commissives, Direc- tives, Expressives, and Representatives. We omitted the ﬁfth of Searle’s original speech act classes, Dec- larations, because we virtually never saw declara- tive speech acts in our data set.1 The data set used in our study is a collection of message board posts in the domain of veterinary medicine. We designed our deﬁnitions and guidelines to reﬂect language use in the text genre of message board posts, trying to be as domain-independent as possible so that these deﬁni- tions should also apply to message board texts rep- resenting other topics. However, we give examples from the veterinary domain to illustrate how these speech act classes are manifested in our data set. Commissives: A Commissive speech act oc- curs when the speaker commits to a future course of action. In conversation, common Commissive speech acts are promises and threats. In message boards, these types of Commissives are relatively rare. However, we found many statements where the main purpose was to conﬁrm to the readers that the writer would perform some action in the future. For example, a doctor may write “I plan to do surgery on this patient tomorrow” or “I will post the test results when I get them later today”. We viewed such state- ments as implicit commitments to the reader about intended actions. We also considered decisions not to take an action as Commissive speech acts (e.g., “I will not do surgery on this cat because it would be too risky.”). However, statements indicating that an action will not occur because of circumstances be- yond the writer’s control were considered to be fac- tual statements and not speech acts (e.g., “I cannot do an ultrasound because my machine is broken.”). Directives: A Directive speech act occurs when

1Searle deﬁnes Declarative speech acts as statements that bring about a change in status or condition to an object by virtue of the statement itself. For example, a statement declaring war or a statement that someone is ﬁred.

the speaker expects the listener to do something as a response. For example, the speaker may ask a question, make a request, or issue an invitation. Di- rective speech acts are common in message board posts, especially in the initial post of each thread when the writer explicitly requests help or advice re- garding a speciﬁc topic. Many Directive sentences are posed as questions, so they are easy to identify by the presence of a question mark. However, the language in message board forums is informal and often ungrammatical, so many Directives are posed as a question but do not end in a question mark (e.g., “What do you think.”). Furthermore, many Direc- tive speech acts are not stated as a question but as a request for assistance. For example, a doctor may write “I need your opinion on what drug to give this patient.” Finally, some sentences that end in ques- tion marks are rhetorical in nature and do not repre- sent a Directive speech act, such as “Can you believe that?”. Expressives: An Expressive speech act occurs in conversation when a speaker expresses his or her psychological state to the listener. Typical cases are when the speaker thanks, apologizes, or welcomes the listener. Expressive speech acts are common in message boards because writers often greet readers at the beginning of a post (“Hi everyone!”) or ex- press gratitude for help from the readers (“I really appreciate the suggestions.”). We also found Ex- pressive speech acts in a variety of other contexts, such as apologies. Representatives: According to Searle, a Rep- resentative speech act commits the speaker to the truth of an expressed proposition. It represents the speaker’s belief of something that can be evaluated to be true or false. These types of speech acts were less common in our data set, but some cases did ex- ist. In the veterinary domain, we considered sen- tences to be a Representative speech act when a doctor explicitly conﬁrmed a diagnosis or expressed their suspicion or hypothesis about the presence (or absence) of a disease or symptom. For example, if a doctor writes that “I suspect the patient has pancre- atitis.” then this represents the doctor’s own propo- sition/belief about what the disease might be. Many sentences in our data set are stated as fact but could be reasonably inferred to be speech acts. For example, suppose a doctor writes “The cat has

pancreatitis.”. It would be reasonable to infer that the doctor writing the post diagnosed the cat with pancreatitis. And in many cases, that is true. How- ever, we saw many posts where that inference would have been wrong. For example, the following sen- tence might say “The cat was diagnosed by a pre- vious vet but brought to me due to new complica- tions” or “The cat was diagnosed with it 8 years ago as a kitten in the animal shelter”. Consequently, we were very conservative in labelling sentences as Representative speech acts. Any sentence presented as fact was not considered to be a speech act. A sen- tence was only labelled as a Representative speech act if the writer explicitly expressed his belief.

3.2 Features for Speech Act Classiﬁcation

To create speech act classiﬁers, we designed a vari- ety of lexical, syntactic, and semantic features. We tried to capture linguistic properties associated with speech act expressions as well as discourse prop- erties associated with individual sentences and the message board post as a whole. We also incorpo- rated speech act word lists that were acquired from external resources, and used two types of seman- tic features to represent semantic entities associated with the veterinary domain. Except for the semantic features, all of our features are domain-independent so should be able to recognize speech act sentences across different domains. We experimented with domain-speciﬁc semantic features to test our hy- pothesis that Commissive speech acts can be asso- ciated with domain-speciﬁc semantic entities. For the purposes of analysis, we partition the fea- ture set into three groups: Lexical and Syntactic (LexSyn) Features, Speech Act Clue Features, and Semantic Features. Unless otherwise noted, all of the features had binary values indicating the pres- ence or absence of that feature.

3.2.1 Lexical and Syntactic Features

We designed a variety of features to capture lexical and syntactic properties of words and sentences. We described the feature set below, with the features cat- egorized based on the type of information that they capture. Unigrams: We created bag-of-word features rep- resenting each unigram in the training set. Numbers were replaced with a special # token.

Personal Pronouns: We deﬁned three features to look for the presence of a 1st person pronoun, 2nd person pronoun, and 3rd person pronoun. We in- cluded the subjective, objective, and possessive form of each pronoun (e.g., he, him, and his). Tense: Speech acts such as Commissives can be related to tense. We created three features to iden- tify verb phrases that occur in the past, present, or future tense. To recognize tense, we followed the rules deﬁned by Allen (1995). Tense + Person: We created four features that re- quire the presence of a ﬁrst person subjective pro- noun (I, we) within a two word window on the left of a verb phrase matching one of four tense representa- tions: past, present, future, and present progressive (a subset of the more general present tense represen- tation). Modals: One feature indicates whether the sen- tence contains a modal (may, must, shall, will, might, should, would, could). Inﬁnitive VP: One feature looks for an inﬁnitive verb phrase (‘to’ followed by a verb) that is preceded by a ﬁrst person pronoun (I, we) within a three word window on the left. This feature tries to capture common Commissive expressions (e.g., “I deﬁnitely plan to do the test tomorrow.”). Plan Phrases: Commissives are often expressed as a plan, so we created a feature that recognizes four types of plan expressions: “I am going to”, “I am planning to”, “I plan to”, and “My plan is to”. Sentence contains Early Punctuation: One fea- ture checks for the following punctuation marks within the ﬁrst three tokens of the sentence: , : ! This feature was designed to recognize greetings, such as: “Hi,” , or “Hiya everyone !”. Sentence begins with Modal/Verb: One feature checks if a sentence begins with a modal or verb. The intuition is to capture interrogative and impera- tive sentences, since they are likely to be Directives. Sentence begins with WH Question: One fea- ture checks if a sentence begins with a WH question word (Who, When, Where, What, Which, What, How). Neighboring Question: One feature checks whether the following sentence contains a question mark ‘?’. We observed that in message boards, Di- rectives often occur in clusters.

Sentence Position: Four binary features repre- sent the relative position of the sentence in the post. One feature indicates whether it is the ﬁrst sentence, one feature indicates whether it is the last sentence, one feature indicates whether it is the second to last sentence, and one feature indicates whether the sen- tence occurs in the bottom 25% of the message. The motivation for these features is that Expressives of- ten occur at the beginning and end of the post, and Directives tend to occur toward the end.

Number of Verbs: One feature represents the number of verbs in the sentence using four possible values: 0, 1, 2, >2. Some speech acts classes (e.g., Expressives) may occur with no verbs, and rarely occur in long, complex sentences.

3.2.2 Speech Act Word Clues

We collected speech act word lists (mostly verbs) from two external sources. In Searle’s original pa- per (Searle, 1976), he listed words that he consid- ered to be indicative of speech acts. We discarded a few that we considered to be overly general, and we added a few additional words. We also collected a list of speech act verbs published in (Wierzbicka, 1987). The details for these speech act clue lists are given below. Our system recognized all derivations of these words.

Searle Keywords: We created one feature for each speech act class. The Representative keywords were: (hypothesize, insist, boast, complain, con- clude, deduce, diagnose, and claim). We discarded 3 words from Searle’s list (suggest, call, believe) and added 2 new words, assume and suspect. The Direc- tive keywords were: (ask, order, command, request, beg, plead, pray, entreat, invite, permit, advise, dare, defy, challenge). We added the word please. The Expressives keywords were: (thank, apolo- gize, congratulate, condole, deplore, welcome). We added the words appreciate and sorry. Searle did not provide any hint on possible indicator words for Commissives, so we manually deﬁned ﬁve likely Commissive keywords: (plan, commit, promise, to- morrow, later).

Wierzbicka Verbs: We created one feature that included 228 speech act verbs listed in the book “English speech act verbs: a semantic dictionary”

(Wierzbicka, 1987)2.

3.2.3 Semantic Features All of the previous features are domain- independent and should be useful for identifying speech acts sentences across many domains. How- ever, we hypothesized that semantic entities may correlate with speech acts within a particular do- main. For example, consider medical domains. Rep- resentative speech acts may involve diagnoses and hypotheses regarding diseases and symptoms. Sim- ilarly, Commissive speech acts may reveal a doc- tor’s plan or intention regarding the administration of drugs or tests. Thus, it may be beneﬁcial for a classiﬁer to know whether a sentence contains cer- tain semantic entities. We experimented with two different sources of semantic information. Semantic Lexicon: Basilisk (Thelen and Riloff, 2002) is a bootstrapping algorithm that has been used to induce semantic lexicons for terrorist events (Thelen and Riloff, 2002), biomedical concepts (McIntosh, 2010), and subjective/objective nouns for opinion analysis (Riloff et al., 2003). We ran Basilisk over our collection of 15,383 veteri- nary message board posts to create a semantic lex- icon for veterinary medicine. As input, Basilisk requires seed words for each semantic category. To obtain seeds, we parsed the corpus using a noun phrase chunker, sorted the head nouns by fre- quency, and manually identiﬁed the 20 most fre- quent nouns belonging to four semantic categories: DISEASE/SYMPTOM, DRUG, TEST, and TREAT- MENT. However, the induced TREATMENT lexicon was of relatively poor quality so we did not use it. The DISEASE/SYMPTOM lexicon appeared to be of good quality, but it did not improve the performance of our speech act classiﬁers. We suspect that this is due to the fact that diseases were not distinguised from symptoms in our lexicon.3 Representative speech acts are typically associated with disease diagnoses

2openlibrary.org/b/OL2413134M/English_ speech_act_verbs 3We induced a single lexicon for diseases and symptoms be- cause it is difﬁcult to draw a clear line between them seman- tically. A veterinary consultant explained to us that the same term (e.g., diabetes) may be considered a symptom in one con- text if it is secondary to another condition (e.g., pancreatitis) but a disease in a different context if it is the primary diagnosis.

and hypotheses, rather than individual symptoms. In the end, we only used the DRUG and TEST se- mantic lexicon in our classiﬁers. We used all 1000 terms in the DRUG lexicon, but only used the top 200 TEST words because the quality of the lexicon seemed questionable after that point. Semantic Tags: We also used bootstrapped con- textual semantic taggers (Huang and Riloff, 2010) that had been previously trained for the domain of veterinary medicine. These taggers assign seman- tic class labels to noun phrase instances based on the surrounding context in a sentence. The tag- gers were trained on 4,629 veterinary message board posts using 10 seed words for each semantic cate- gory (see (Huang and Riloff, 2010) for details). To ensure good precision, only tags that have a conﬁ- dence value ≥ 1.0 were used. Our speech act classi- ﬁers used the tags associated with two semantic cat- egories: DRUG and TEST.

3.3 Classiﬁcation

To create our classiﬁers, we used the Weka (Hall et al., 2009) machine learning toolkit. We used Sup- port Vector Machines (SVMs) with a polynomial kernel and the default settings supplied by Weka. Because a sentence can include multiple speech acts, we created a set of binary classiﬁers, one for each of the four speech act classes. All four classiﬁers were applied to each sentence, so a sentence could be as- signed multiple speech act classes.

4 Evaluation

4.1 Data Set

Our data set consists of message board posts from the Veterinary Information Network (VIN), which is a web site (www.vin.com) for professionals in vet- erinary medicine. Among other things, VIN hosts message board forums where veterinarians and other veterinary professionals can discuss issues and pose questions to each other. Over half of the small an- imal veterinarians in the U.S. and Canada use the VIN web site. We obtained 15,383 VIN message board threads representing three topics: cardiology, endocrinol- ogy, and feline internal medicine. We did basic cleaning, removing html tags and tokenizing num- bers. We then applied the Stanford part-of-speech

tagger (Toutanova et al., 2003) to each sentence to obtain part-of-speech tags for the words. For our ex- periments, we randomly selected 150 message board threads from this collection. Since the goal of our work was to study speech acts in sentences, and not the conversational dialogue between different writ- ers, we used only the initial post of each thread. These 150 message board posts contained a total of 1,956 sentences, with an average of 13.04 sentences per post. In the next section, we explain how we manually annotated each sentence in our data set to create gold standard speech act labels.

4.2 Gold Standard Annotations

To create training and evaluation data for our re- search, we asked two human annotators to manually label sentences in our message board posts. Iden- tifying speech acts is not always obvious, even to people, so we gave them detailed annotation guide- lines describing the four speech act classes discussed in Section 3.1. Then we gave them the same set of 50 message board posts from our collection to an- notate independently. Each annotator was told to assign one or more speech act classes to each sen- tence (COM, DIR, EXP, REP), or to label the sen- tence as having no speech acts (NONE). The vast majority of sentences had either no speech acts or at most one speech act, but a small number of sen- tences contained multiple types of speech acts. We measured the inter-annotator agreement of the two human judges using the kappa (κ) score (Car- letta, 1996). However, kappa agreement scores are only applicable to labelling schemes where each in- stance receives a single label. Therefore we com- puted kappa agreement in two different ways to look at the results from two different perspectives. In the ﬁrst scheme, we discarded the small number of sen- tences that had multiple speech act labels and com- puted kappa on the rest.4 This produced a kappa score of .95, suggesting extremely high agreement. However, over 70% of the sentences in our data set have no speech act at all, so NONE was by far the most common label. Consequently, this agreement score does not necessarily reﬂect how consistently the judges agreed on the four speech act classes.

4Of the 594 sentences in these 50 posts, only 22 sentences contained multiple speech act classes.

In the second scheme, we computed kappa for each speech act category independently. For each category C, the judges were considered to be in agreement if both of them assigned category C to the sentence or if neither of the judges assigned cat- egory C to the sentence. Table 1 shows the κ agree- ment scores using this approach.

Speech Act Kappa (κ) score Expressive .97 Directive .94 Commissive .81 Representative .77

Table 1: Inter-annotator (κ) agreement

Inter-annotator agreement was very high for both the Expressive and Directive classes. Agreement was lower for the Commissive and Representative classes, but still relatively good so we felt comfort- able that we had high-quality annotations. To create our ﬁnal data set, the two judges adjudi- cated their disagreements on this set of 50 posts. We then asked each annotator to label an additional (dif- ferent) set of 50 posts each. All together, this gave us a gold standard data set consisting of 150 anno- tated message board posts. Table 2 shows the distri- bution of speech act labels in our data set. 71% of the sentences did not include any speech acts. These were usually expository sentences containing factual information. 29% of the sentences included one or more speech acts, so nearly 1 3 of the sentences were conversational in nature. Directive and Expressive speech acts are by far the most common, with nearly 26% of all sentences containing one of these speech acts. Commissive and Representative speech acts are less common, each occurring in less than 3% of the sentences.5

4.3 Experimental Results

4.3.1 Speech Act Filtering

For our ﬁrst experiment, we created a speech act ﬁltering classiﬁer to distinguish sentences that con- tain one or more speech acts from sentences that do not contain any speech acts. Sentences labelled as

5These numbers do not add up to 100% because some sen- tences contain multiple speech acts.

Speech Act # sentences distribution None 1397 71.42% Directive 311 15.90% Expressive 194 9.92% Representative 57 2.91% Commissive 51 2.61%

Table 2: Speech act class distribution in our data set.

having one or more speech acts were positive in- stances, and sentences labelled as NONE were neg- ative instances. Speech act ﬁltering could be useful for many applications, such as information extrac- tion systems that only seek to extract facts. For ex- ample, information may be posed as a question (in a Directive) rather than a fact, information may be mentioned as part of a future plan (in a Commis- sive) that has not actually happened yet, or informa- tion may be stated as a hypothesis or suspicion (in a Representative) rather than as a fact. We performed 10-fold cross validation on our set of 150 annotated message board posts. Initially, we used all of the features deﬁned in Section 3.2. How- ever, during the course of our research we discov- ered that only a small subset of the lexical and syn- tactic features seemed to be useful, and that remov- ing the unnecessary features improved performance. So we created a subset of minimal lexsyn features, which will be described in Section 4.3.2. For speech act ﬁltering, we used the minimal lexsyn features plus the speech act clues and semantic features.6

Class P R F Speech Act .86 .83 .84 No Speech Act .93 .95 .94

Table 3: Precision, Recall, F-measure for speech act ﬁl- tering.

Table 3 shows the performance for speech act ﬁltering with respect to Precision (P), Recall (R), and F-measure score (F).7 The classiﬁer performed well, recognizing 83% of the speech act sentences with 86% precision, and 95% of the expository (no

6This is the same feature set used to produce the results for row E of Table 4. 7We computed an F1 score with equal weighting of preci- sion and recall.

Commissives Directives Expressives Representatives Features P R F P R F P R F P R F Baselines Com baseline .45 .08 .14 - - - - - - - - - Dir baseline - - - .97 .73 .83 - - - - - - Exp baseline 1 - - - - - - .58 .18 .28 - - - Exp baseline 2 - - - - - - .97 .86 .91 - - - Rep baseline - - - - - - - - - 1.0 .05 .10 Classiﬁers U Unigram .45 .20 .27 .87 .84 .85 .97 .88 .92 .32 .12 .18 A U+all lexsyn .52 .33 .40 .87 .84 .86 .98 .88 .92 .30 .14 .19 B U+minimal lexsyn .59 .33 .42 .87 .85 .86 .98 .88 .92 .32 .14 .20 C B+speechActClues .57 .31 .41 .86 .84 .85 .97 .91 .94 .33 .16 .21 D C+semTest .64 .35 .46 .87 .84 .85 .97 .91 .94 .33 .16 .21 E D+semDrug .63 .39 .48 .86 .84 .85 .97 .91 .94 .32 .16 .21

Table 4: Precision, Recall, F-measure for four speech act classes. The highest F score for each category appears in boldface.

speech act) sentences with 93% precision.

4.3.2 Speech Act Categorization

BASELINES

Our next set of experiments focused on labelling sentences with the four speciﬁc speech act classes: Commissive, Directive. Expressive, and Represen- tative. To assess the difﬁculty of identifying each speech act category, we created several simple base- lines using our intuitions about each category. For Commissives, we created a heuristic to cap- ture the most obvious cases of future tense (because Commissive speech acts represent a writer’s com- mitment toward a future course of action). For ex- ample, the presence of the phrases ‘I will’ and ‘I shall’ were hypothesized by Cohen et al. (2004) to be useful bigram clues for Commissives. This base- line looks for future tense verb phrases with a 1st person pronoun within one or two words preceding the verb phrase. The Com baseline row of Table 4 shows the results for this heuristic, which obtained 8% recall with 45% precision. The heuristic applied to only 9 sentences in our test set, 4 of which con- tained a Commissive speech act. Directive speech acts are often questions, so we created a baseline system that labels all sentences containing a question mark as a Directive. The Dir baseline row of Table 4 shows that 97% of sentences

with a question mark were indeed Directives.8 But only 73% of the Directive sentences contained a question mark. The remaining 27% of Directives did not contain a question mark and generally fell into two categories. Some sentences asked a ques- tion but the writer ended the sentence with a period (e.g., “Has anyone seen this before.”). And many di- rectives were expressed as requests rather than ques- tions (e.g., “Let me know if anyone has a sugges- tion.”). For Expressives, we implemented two baselines. Exp baseline 1 simply looks for an exclamation mark, but this heuristic did not work well (18% re- call with 58% precision) because exclamation marks were often used for general emphasis (e.g., “The owner is frustrated with cleaning up urine!”). Exp baseline 2 looks for the presence of four common expressive words (appreciate, hi, hello, thank), in- cluding morphological variations of appreciate and thank. This baseline produced very good results, 86% recall with 97% precision. Obviously a small set of common expressions account for most of the Expressive speech acts in our corpus. However, the word “hi” did produce some false hits because it was used as a shorthand for “high”, usually when report- ing test results (e.g., “hi calcium”).

8235 sentences contained a question mark, and 227 of them were Directives.

Finally, as a baseline for the Representative class we simply looked for the words diagnose(d) and sus- pect(ed). The Rep baseline row of Table 4 shows that this heuristic was 100% accurate, but only pro- duced 5% recall (matching 3 of the 57 Representa- tive sentences in our test set).

CLASSIFIER RESULTS

The bottom portion of Table 4 shows the results for our classiﬁers. As we explained in Section 3.3, we created one classiﬁer for each speech act cate- gory, and all four classiﬁers were applied to each sentence. So a sentence could receive anywhere from 0-4 speech act labels indicating how many dif- ferent types of speech acts appeared in the sentence. We trained and evaluated each classiﬁer using 10- fold cross-validation on our gold standard data set. The Unigram (U) row shows the performance of classiﬁers that use only unigram features. For Di- rectives, we see a 2% F-score improvement over the baseline, which reﬂects a recall gain of 11% but a corresponding precision loss of 10%. The uni- grams are clearly helpful in identifying many Direc- tive sentences that do not end in a question mark, but at some cost to accuracy. For Expressives, the unigram classiﬁer achieves an F score of 92%, iden- tifying slightly more Expressive sentences than the baseline with the same level of precision. For Com- missives and Representatives, the unigram classi- ﬁers performed susbtantially better than their corre- sponding baseline systems, but performance is still relatively weak. Row A (U+ all lexsyn) in Table 4 shows the re- sults using unigram features plus all of the lexical and syntactic features described in Section 3.2.1. The lexical and syntactic features dramatically im- prove performance on Commissives, increasing F score from 27% to 40%, and they produce a 2% re- call gain for Representatives but with a correspond- ing loss of precision. However, we observed that only a few of the lex- ical and syntactic features had much impact on per- formance. We experimented with different subsets of the features and obtained even better performance when using just 10 of them, which we will refer to as the minimal lexsyn features. The minimal lexsyn fea- ture set consists of the 4 Tense+Person features, the Early Punctuation feature, the Sentence begins with

Modal/Verb feature, and the 4 Sentence Position fea- tures. Row B shows the results using unigram fea- tures plus only these minimal lexsyn features. Preci- sion improves for Commissives by an additional 7% and Representatives by 2% when using only these lexical and syntactic features. Consequently, we use the minimal lexsyn features for the rest of our exper- iments. Row C shows the results of adding the speech act clue words (see Section 3.2.2) to the feature set used in Row B. The speech act clue words produced an additional recall gain of 3% for Expressives and 2% for Representatives, although performance on Com- missives dropped 2% in both recall and precision. Rows D and E show the results of adding the se- mantic features. We added one semantic category at a time to measure the impact of them separately. Row D adds two semantic features for the TEST cat- egory, one from the Basilisk lexicon and one from the semantic tagger. The TEST semantic features produced an F-score gain of 5% for Commissives, improving recall by 4% and precision by 7%. Row E adds two semantic features for the DRUG category. The DRUG features produced an additional F-score gain of 2% for Commissives, improving recall by 4% with a slight drop in precision.

4.4 Analysis

Together, the TEST and DRUG semantic features dra- matically improved the classiﬁer’s ability to recog- nize Commissive speech acts, increasing its F score from 41% → 48%. This result demonstrates that in the domain of veterinary medicine, some types of semantic entities are associated with speech acts. Our intuition behind this result is that commitments are usually related to future actions. In veterinary medicine, TESTS and DRUGS are associated with ac- tions performed by doctors. Doctors help their pa- tients by prescribing or administering drugs and by conducting tests. So these semantic entities may serve as a proxy to implicitly represent actions that the doctor has done or may do. In future work, ex- plicitly recognizing actions and events many be a worthwhile avenue to further improve results. We achieved good success at identifying both Di- rectives and Expressives, although simple heuristics also perform well on these categories. We showed that training a Directive classiﬁer can help to iden-

tify Directive sentences that do not end with a ques- tion mark, although at the cost of some precision. The Commissive speech act class beneﬁtted the most from the rich feature set. Unigrams are clearly not sufﬁcient to identify Commissive sentences. Many different types of clues seem to be important for recognizing these sentences. The improvements obtained from adding semantic features also sug- gests that domain-speciﬁc semantics can be useful for recognizing some speech acts. However, there is still ample room for improvement, illustrating that speech act classiﬁcation is a challenging problem. Representative speech acts were by far the most difﬁcult to recognize. We believe that there are several reasons for their low performance. First, Representatives were sparse in the data set, occur- ring in only 2.91% of the sentences. Consequently, the classiﬁer had relatively few positive training instances. Second, Representatives had the low- est inter-annotator agreement, indicating that human judges had difﬁculty recognizing these speech acts too. The judges often disagreed about whether a hypothesis or suspicion was the writer’s own belief or whether it was stated as a fact reﬂecting general medical knowledge. The message board text genre is especially challenging in this regard because the writer is often presumed to be expressing his/her be- liefs even when the writer does not explicitly say so. Finally, our semantic features could not distinguish between diseases and symptoms. Access to a re- source that can reliably identify disease terms could potentially improve performance in this domain.

5 Conclusions

Our goal was to identify speech act sentences in message board posts and to classify the sentences with respect to four categories in Searle’s (1976) speech act taxonomy. We achieved good results for speech act ﬁltering and the identiﬁcation of Direc- tive and Expressive speech act sentences. We found that Representative and Commissive speech acts are much more difﬁcult to identify, although the per- formance of our Commissive classiﬁer substantially improved with the addition of lexical, syntactic, and semantic features. Except for the semantic class information, our feature set is domain-independent and could be used to recognize speech act sentences

in message boards for any domain. Furthermore, our features only rely on part-of-speech tags and do not require parsing, which is of practical importance for text genres such as message boards that are littered with ungrammatical text, typos, and shorthand nota- tions. In future work, we believe that segmenting sen- tences into clauses may help to train classiﬁers more precisely. Ultimately, we would like to identify the speech act expressions themselves because some sentences contain speech acts as well as factual in- formation. Extracting the speech act expressions and clauses from message boards and similar text genres could provide better tracking of questions and answers in web forums and be used for sum- marization.

6 Acknowledgments

We gratefully acknowledge that this research was supported in part by the National Science Founda- tion under grant IIS-1018314. Any opinions, ﬁnd- ings, and conclusion or recommendations expressed in this material are those of the authors and do not necessarily reﬂect the view of the U.S. government.

References

James Allen. 1995. Natural language understanding (2nd ed.). Benjamin-Cummings Publishing Co., Inc., Redwood City, CA, USA.

Jean Carletta. 1996. Assessing agreement on classiﬁ- cation tasks: the kappa statistic. Comput. Linguist., 22:249–254,

June. Vitor R. Carvalho and William W. Cohen. 2005. On the collective classiﬁcation of email ”speech acts”. In SI- GIR ’05: Proceedings of the 28th annualinternational ACM SIGIR conference on Research and development in information retrieval, pages 345–352, New York, NY, USA. ACM Press.

Vitor R. Carvalho and William W. Cohen. 2006. Improv- ing ”email speech acts” analysis via n-gram selection. In Proceedings of the HLT-NAACL 2006 Workshop on Analyzing Conversations in Text and Speech, ACTS ’09, pages 35–41, Stroudsburg, PA, USA. Association for Computational Linguistics.

William W. Cohen, Vitor R. Carvalho, and Tom M. Mitchell. 2004. Learning to classify email into “speech acts”. In EMNLP, pages 309–316. ACL. Jade Goldstein and Roberta Evans Sabin. 2006. Using speech acts to categorize email and identify email gen-res. In Proceedings of the 39th Annual Hawaii Inter- national Conference on System Sciences - Volume 03, pages 50.2–, Washington, DC, USA. IEEE Computer Society.

Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The weka data mining software: an update. SIGKDD Explor. Newsl., 11:10–18, November.

Ruihong Huang and Ellen Riloff. 2010. Inducing domain-speciﬁc semantic class taggers from (almost) nothing. In Proceedings of the 48th Annual Meeting of the Association for ComputationalLinguistics, ACL ’10, pages 275–285, Stroudsburg, PA, USA. Associa- tion for Computational Linguistics.

Minwoo Jeong, Chin-Yew Lin, and Gary Geunbae Lee. 2009. Semi-supervised speech act recognition in emails and forums. In Proceedings of the 2009 Con- ference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3, EMNLP ’09, pages 1250–1259, Stroudsburg, PA, USA. Association for Computational Linguistics.

Andrew Lampert, Robert Dale, and Cecile Paris. 2006. Classifying speech acts using verbal response modes. In Proceedings of the 2006 Australasian Language Technology Workshop (ALTW2006), pages 34–41. Sydney Australia : ALTA. Tara McIntosh. 2010. Unsupervised discovery of neg- ative categories in lexicon bootstrapping. In Pro- ceedings of the 2010 Conference on Empirical Meth- ods in Natural Language Processing, EMNLP ’10, pages 356–365, Stroudsburg, PA, USA. Association for Computational Linguistics.

John Mildinhall and Jan Noyes. 2008. Toward a stochas- tic speech act model of email behavior. In CEAS. Jacqueline Nastri, Jorge Pena, and Jeffrey T. Hancock. 2006. The construction of away messages: A speech act analysis. J. Computer-Mediated Communication, pages 1025–1045.

Sujith Ravi and Jihie Kim. 2007. Proﬁling student inter- actions in threaded discussions with speech act classi- ﬁers. In Proceeding of the 2007 conference on Arti- ﬁcial Intelligence in Education: Building Technology Rich Learning Contexts That Work, pages 357–364, Amsterdam, The Netherlands, The Netherlands. IOS Press.

Ellen Riloff, Janyce Wiebe, and Theresa Wilson. 2003. Learning subjective nouns using extraction pattern bootstrapping. In Proceedings of the seventh confer- ence on Natural language learning at HLT-NAACL 2003 - Volume 4, CONLL ’03, pages 25–32, Strouds- burg, PA, USA. Association for Computational Lin- guistics. John R. Searle. 1976. A classiﬁcation of illocutionary acts. Language in Society, 5(1):pp. 1–23.

Michael Thelen and Ellen Riloff. 2002. A bootstrapping method for learning semantic lexicons using extrac- tion pattern contexts. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10, EMNLP ’02, pages 214–221, Stroudsburg,PA, USA. Association for Computational Linguistics.

Kristina Toutanova, Dan Klein, Christopher D. Manning, and Yoram Singer. 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. In Pro- ceedings of the 2003 Conference of the North Ameri- can Chapter of the Association for ComputationalLin- guistics on Human Language Technology - Volume 1, NAACL ’03, pages 173–180, Stroudsburg, PA, USA. Association for Computational Linguistics.

Douglas P. Twitchell and Jay F. Nunamaker Jr. 2004. Speech act proﬁling: a probabilistic method for ana- lyzing persistent conversations and their participants. In System Sciences, 2004. Proceedings of the 37th An- nual Hawaii International Conference on, pages 1–10, January.

Douglas P. Twitchell, Mark Adkins, Jay F. Nunamaker Jr., and Judee K. Burgoon. 2004. Using speech act theory to model conversations for automated classi- ﬁcation and retrieval. In Proceedings of the Inter- national Working Conference Language Action Per- spective CommunicationModelling(LAP 2004),pages 121–130.

A. Wierzbicka. 1987. English speech act verbs: a se- mantic dictionary. Academic Press, Sydney, Orlando.

About Me

Assignment 2: CMC

About Me

Assignment 2: CMC

Blog Archive

Popular Posts