Abstract
This study combines a corpus-based approach and intuition-based judgements to develop a set of multiword combinations for research publications in academic journals. To obtain a representative sample, a corpus of four internal sections of 120 Applied Linguistics research articles indexed in the TCI (Thai Citation Index) database was systematically compiled and investigated. To identify n-grams which occur frequently in the corpus, a corpus-based approach was used. First, a list of 49 content-based strings, likely to be the most useful for pedagogic purposes, was derived. Based on their grammatical and semantic relationships, 3-grams were further investigated. For multiword sequences to occur frequently in the corpus, some pragmatic functionality is required which contributes to pedagogical use. Five EAP instructors were therefore invited to select the useful multiword combinations from the list of identified n-grams. A list of 289 phraseological patterns was finally created successfully. The list can provide additional evidence-based and corpus-informed instructional resources which support English teachers with the planning of lessons as well as materials design and development, particularly for advanced language courses which target scholarly writing.
1 Introduction
Composing a research article can be a formidable task for a graduate student or novice writer, since writing competency for publications needs a background knowledge of the genre and its associated textual features (Feak and Swales 2011; Hyland 2004). An article writer must have awareness of the generic and micro-structures, macro-structures, and linguistic features of articles in the academic genre (Hyland 2008a, Hyland 2008b; Samraj 2005; Swales 1990, Swales 2004). In terms of micro-structures, vocabulary is one of the challenges for L2 and EFL learners when taking part in reading and writing academic discourse (Shaw 1991). In Thailand, novice scholars and graduate students feel an increased pressure to publish their research work in prestigious journals, especially those indexed in the TCI (Thai Citation Index) database. Most universities require graduate students to publish a minimum of one research article in a peer-reviewed journal as a component of graduation. To assist people in successfully publishing their research in a peer-reviewed journal, extra attention must be paid to the vocabulary in an English for Academic Purposes (EAP) context, particularly the specialist words which frequently occur in research articles.
Having knowledge of vocabulary is seen as an important part of learning a language. It influences the reading and writing proficiency of the learner (Nation 2001) and is strongly linked with linguistic ability and academic achievement (Jacobs 2008; Gardner and Davies 2014). Since acquiring vocabulary has a crucial role in language learning, researchers have paid attention to the academic vocabulary used in the literature in order to help language learners to achieve their academic goals. Several academic word lists have been examined in different academic genres and specific disciplines, e.g. Tangpijaikul’s (2014) frequent word lists in economics and business; Watson Todd’s (2017) opaque words list in engineering; Coxhead’s (2000), Gardner and Davies’s (2014), and Simpson-Vlach and Ellis’s (2010) frequently-used words lists in general English; Yang’s (2015) Nursing Academic Word List (NAWL); Brezina and Gablasova’s (2015) general vocabulary list representing current language use; and Lei and Liu’s (2016) list of frequent words required for the medical learner. Such academic word lists have been developed from various subject-specific aspects; however, the main objective of vocabulary lists is generally to meet the needs of language learners and build decision-making tools for EAP and ESP instructors in regard to teaching, learning, and material and curriculum design.
Some of the academic word lists, e.g. the AWL by Coxhead (2000) and the General Service List (GSL) by West (1953) have received criticism since they included lists of word families often used in the English language. Nation (2001) stated that more significant words exist than appear in the lists, but since they are not used as often in data analysis, they do not show up in the lists. Academic vocabulary lists in the literature have been investigated via various methods and were created from various corpora across academic disciplines (Gardner and Davies 2014). Hyland and Tse (2005) argued how useful a general academic vocabulary list can be, e.g. the AWL by Coxhead (2000), since vocabulary can vary across disciplines in terms of frequency, range, collocation, and meaning. Furthermore, a number of the vocabulary lists provide individual vocabulary items which were distributed in a different frequency order (such as AWL and GSL) with regard to the principles on which the lists were created as well as with respect to their utility and the researchers’ purposes. In addition, because the majority of academic word lists in past research includes only single words and word families, e.g. benefit, beneficial, beneficiary, beneficiaries, benefited, befitting, and benefits, which could be used across a variety of occurrences and contexts (Brezina and Gablasova 2015), such lists have some limitations.
Aside from single word vocabulary lists, multiword combinations are said to help as they offer a “pre-packaging of information or of the structures used to present information” (Reppen 2004: 83), which assists the writer by lowering the processing load. As a result, learning and teaching language can be achieved in the writing of academic works and in related areas of academic communication (e.g. Biber and Barbieri 2007; Conrad and Biber 2004; Cortes 2004, Cortes 2006, Cortes 2013; Hyland 2008a, Hyland 2008b; Li and Schmitt 2009; Martinez and Schmitt 2012). There is a general consensus that such knowledge plays a facilitating role in the learning and use of a language as it represents fluent linguistic production, especially in spoken language (Pawley and Syder 1983) and academic texts (e.g. Biber et al. 1999; Conrad and Biber 2004; Cortes 2013; Hyland 2008a, Hyland 2008b; Nesi and Basturkmen 2006). Thus, it is interesting and important to develop new lists of academic words including multiword combinations which are derived from new methods, aside from using frequently used as a selection criterion, for example range and frequency, and from a particular or specific discipline.
Because of the need of Thai graduate and novice students to publish research articles in English, the principal objective of this research is to show the use of newer methods to develop multiword combination lists covering essential and useful words for publication in applied linguistic research. We expect that the creation of discipline-specific word lists will be important and improve the academic skills of graduate students regarding the writing and reading of English research articles.
2 Related literature
Academic word lists
To assist language learners in developing their knowledge of vocabulary, West (1953), a leading pioneer in the field, developed the General Service List of 2000 word families chosen from a 5-million word corpus. Although West in developing the GSL used various criteria, including frequency, ease of learning and covering useful ideas, as well as stylistic level and emotional neutrality (West 1953: ix–x), the list has been criticized in regard to age and the number of words included. Gilner and Morales (2008) also question the possibility of expanding the GSL given the combination of objective and subjective criteria on which the original word list was based. Some words in the GSL, however, may occur less frequently in specific other fields (Xue and Nation 1984). Therefore, Xue and Nation (1984) decided to create the University Word List (UWL) by adding high-frequency, non-overlapping words to Lynn’s (1973) and Ghadessy’s (1979) word lists. The UWL consists of 737 base words and is stated to be a useful and complete resource for language students.
Coxhead (2000) argued that the UWL did not have consistent selection principles and was derived from small corpora, failing to cover a broad variety of topics. She created an Academic Word List (AWL) which contains 570 word families from a corpus of 3.5 million over four different areas – natural science, law, commerce, and arts. Each word in the AWL list must occur over ten times in each of the disciplines, in at least 15 of the 28 subject areas, and over 100 times in the entire corpus. The AWL is divided into 10 sub-categories on the basis of occurrence frequency. Coxhead and Nation (2001) stated that because of increased academic text coverage, in both textbooks and research articles, the AWL list has been claimed to be a convenient learning tool for L2 learners in academic study. The AWL has a large impact upon academic writing, vocabulary instruction, and testing (Simpson-Vlach and Ellis 2010). Moreover, most words in the list are Latinate, which is useful to language learners in general, though it might be easier for learners with a Romance language as their L1 in particular to remember them (Coxhead and Byrd 2007).
Some issues with construction of the AWL, however, have been found. The central issue is linked to the usage and meaning of words (Vongpumivitch et al. 2009). Similarly, the AWL has focused on an EAP context, and thus its contribution has been limited in terms of the repertoire of the terms relating to learner occupation or study field. Martinez and Schmitt (2012) have argued that an academic word list needs not to be created solely from frequently used words since frequency alone leads to an overabundance of items with an undifferentiated value and “does not necessarily imply either psycholinguistically salient sequence or pedagogical relevance; common sequences of common words, such as ‘and of the,’ are expected to occur frequently” (Simpson-Vlach and Ellis, 2010: 490). Meanwhile, Nation (2001) stated that words featured in English academic writing fall into four categories, which are: frequently used words, academic words, technical words, and low-frequency words. Given that technical words occur frequently in particular subject areas, but are uncommon elsewhere, learners may not always be familiar with technical words from outside their own field (Thurstun and Candlin 1998; Nation 2001). Additionally, Nation (2001) asserted that low-frequency words are not often used in some corpora; however, they might be the largest group of words in any field and can include proper names and technical words in other subject areas.
Given such criticisms, researchers have doubted the pedagogic usefulness of academic word lists in the literature. Recent research has tried to combine frequency and new methods to identify specific words in the academic discourse. For instance, Simpson-Vlach and Ellis (2010) created an Academic Formulas List (AFL) through the combination of measures of mutual information (MI) and frequency to investigate the target corpora of academic discourse: MICASE, BNC, Hyland’s (2004) research article corpus, and selected BNC files. From a qualitative perspective, twenty experienced instructors were asked to rate the formulae to discover if the phrases discovered were an expression, a formulaic expression, or a phrase. A correlation analysis was carried out using quantitative statistical methods and qualitative judgement data to ensure the validity and reliability of the instructor insights. Liu (2012) produced a list of the 228 most used multiword constructions (MWCs), which covered various fixed or semi-fixed expressions. The list was generally meant for academic writing across the sub-corpora academic divisions in the Corpus of Contemporary American English and the British National Corpus (BNC). Every MWC identified that words ending with articles (a/the) or another incomplete NP (e.g. one of the) are represented with the ending “det+NP”. Brezina and Gablasova (2015) argued that the word lists in the literature were compiled using different approaches and differ in corpora size. More importantly, they might not reflect the current language use. Brezina and Gablasova, therefore, developed the New General Service List (new-GSL) from four corpora: LOB, BNC, BE06, and EnTenTen12, using a purely quantitative approach and a lemma principle. Based on the average reduced frequency considered from frequency, dispersion, and distribution of the top 3,000 words among the four corpora, Brezina and Gablasova initially generated a stable vocabulary core of 2,122 items. With the aim to create a list of current words, these generated lexical items were then combined with new items frequently occurring in the corpora of BE06 and EnTenTen12, which represent current language use. The finalized list consists of 2,494 items as the lexical core (2,116 base words and 378 current vocabulary). The study also provides evidence of changes in general vocabulary in the English language.
Regarding the collocation functions, Durrant (2009) produced a list of 1,000 academic collocations from the written texts of five faculties: Life Sciences, Science and Engineering, Social-Psychological, Social-Administrative, and Arts and Humanities. Via the use of WordSmith Tools (Scott, 2004), the collocations were compared with their total frequencies in the 85-million word BNC corpus. Such collocations had to reach a minimum mutual information score of four in all five subject groupings. Martinez and Schmitt (2012) combined the qualitative criteria and frequency in choosing phrasal expressions and individual words. The BNC corpus was chosen to be the corpus source, and WordSmith Tools was employed to search for any 2–4-word strings which were repeated in the corpus a minimum of five times. Additionally, a series of “Auxiliary criteria” and “Core criteria” (Martinez and Schmitt 2012: 308–310) were accounted for to assist in the justification of intuitions in terms of what might be formulaic when choosing multiword expressions for inclusion in the list. Ultimately, a random sampling technique was used to search the derived multiword lexical items line-by-line to see if they had phraseological polysemy. Finally, the PHRASE List consisted of 505 multiword items, claiming to be “useful for pedagogic materials including more multiword items, such as textbooks, graded readers, and language tests” (Martinez and Schmitt 2012: 316). Yet, the benefits of using the PHRASE List remain questionable, also in the studies of Durrant (2009) and Simpson-Vlach and Ellis (2010), primarily since the functions of the multiword items are not given, which can be seen as difficult at first, especially for learners of lower proficiency. This demonstrates that qualitative investigation and functional patterns, as well as quantitative information, need to be considered in constructing academic word lists.
Multiword combinations
Because of the variety of formulaic language, scholars have variously defined and used different terms in phraseology research. For example, Altenberg (1998) used the term “recurrent word-combinations” when investigating word patterns which verbally occur in English. The term “lexical bundles” has been used in several research papers (e.g. Biber et al. 1999; Biber and Barbieri 2007; Biber et al. 2004; Conrad and Biber 2004; Chen and Baker 2010; Hyland, 2008a, Hyland 2008b). Schmitt (2004) more often used the term “formulaic sequences”, while “phraseology” and “phraseological patterns” have been used by Charles (2006) and Granger and Meunier (2008) to refer to sets of recurring word combinations. Additionally, the terms “lexical clusters” (Hyland 2008a), “phrasicon” (De Cock et al. 1998), and “n-grams” (Stubbs 2007) refer to multiword sequences. Amongst these terms, Erman and Warren (2000: 31) stated that multiword combinations denote “combinations of at least two words favoured by native speakers in preference to an alternative combination which could have been equivalent had there been no conventionalization”. This is similar to the definition given by Biber et al. (1999) who intuited that they can be fixed expressions or idiomatic phrases, which have a fixed meaning and are understood by language speakers, but cannot be included because lexical bundles are distinct and semantically transparent. Based only upon frequency and distribution criteria gathered from computer programmes, identifying phraseological patterns is a quantitative activity (Biber 2006). Similar to Biber (2006), Cortes (2004) stated that basic techniques used in identifying lexical bundles are word frequency counts, whilst concordance lines, lexico-grammatical profiles, and keyword analysis are used with multiword combinations after they have been identified to pinpoint their functions in context and where they occur in the text. Wray (2002) also discussed that “formulaic sequences” as multiword units which are stored and retrieved from memory as lexical units have become increasingly important for language teachers, researchers, and testers to understand. Likewise, learning and utilizing formulaic language may assist language learners at different levels of proficiency to build fluency and automaticity (Ding 2007; Wood 2006).
Research on phraseology has increased in popularity with its focus on language teaching and learning (e.g. Appel and Wood 2016; Cortes 2004, Cortes 2006; Li and Schmitt 2009; Peters and Pauwels 2015). The research demonstrated that some multiword units or lexical bundles occur frequently in research article corpora. Meanwhile, the investigation of lexical items in the work of students has drawn researchers’ attention to the differences in phraseological patterns between L1 and L2 (Bychkovska and Lee 2017; Pan et al. 2016; Ruan 2016) and between professional and novice writers (Cortes 2006; Peters and Pauwels 2015). For instance, Cortes (2004) compared the function and frequency of lexical clusters in the writings of professional authors and students writing in biology and history. The study confirmed that acquiring and using lexical bundles does not appear to be a natural process. This corroborates Jones and Haywood’s (2004) work which shows that, subsequent to a 10-week instruction period focussing on producing lexical bundles, university students discovered that knowledge of multiword combinations may be technically helpful to express complex ideas, to structure the various writing stages and to attain the required level of formality. Pan et al.’s (2016) study revealed that L2 written texts contain a greater number of lexical bundle types than L1 texts. The structural patterns of the lexical bundles found in these texts are also distinctive.
Additionally, as demonstrated in Li and Schmitt’s (2009) study, the development of students’ repertoires of formulaic sequences over the course is relatively slow, even though these students have majored in language. Although the holistic storage of formulaic sequences has caused controversy (e.g. Siyanova-Chanturia 2015; Durrant and Siyanova-Chanturia 2015), some of the phraseological research (Durrant 2017; Liu 2012; Martinez and Schmitt 2012; Simpson-Vlach and Ellis 2010) is more complicated since researchers have attempted to create academic word lists that are used in various registers, such as basic conversation, reading and writing (Nation 2001), university textbooks and academic journals (Coxhead 2000), medical texts (Wang et al. 2008), academic writing across disciplines (Durrant 2014, Durrant 2017; Liu 2012), and engineering (Watson Todd 2017). These researchers state that each of the subjects has its own arguments, preferred forms, meanings and syntactical patterns (Martinez et al. 2009) and lexical items in the lists may be caused by the shaping of the disciplines, text selection, and “the particular ways of representing experience” (Yang 2015: 30). Such research, however, gives an insight into a variety of ways to create a pedagogically useful list, allowing us to see the importance and application of corpus-based analysis. Accordingly, the various sizes and types of corpora, as well as the different approaches were considered for this investigation.
Thus, to give Thai novice writers and graduate students support in enhancing their opportunities for scholarly publication, particularly in journals in the TCI database, the list and meanings of multiword combinations may give them a head start in beginning academic research writing tasks. Given the significance of discipline-specific vocabulary, the main objective of this study is to create a multiword combination list useful for writing for publication, which might help language fluency production and which especially helps novice writers and graduate students to effectively create and draft their own research articles. It is thought that learning multiword combinations contributes to the enhancement of communicative competence and that it enables writers to gain the particular rhetorical practices of the texts which they are required to produce (Hyland 2008b). To achieve this goal, this study sought the ways in which language is pragmatically expressed in academic articles by identifying multiword combinations and the associated pragmatic functions typically found in Applied Linguistics research articles.
3 Method
Corpus compilation
The study’s corpus was carefully collected from 120 research articles published in nine journals indexed in the TCI database, in which Thai graduate students and researchers are encouraged to publish their research work. Based on the annual Thai Journal Impact Factors (T-JIF) and the results of journal quality evaluation of the database, all the journals classified in tier 1, which are further included in the ASEAN Citation Index or ACI database (Svasti and Asavisanu 2007), were chosen. To control any changes in the discipline over time and to enhance the coherence and validity of results, journal samples were restricted to the years 2013–2016. With regard to corpus size, Bowker and Pearson (2002: 45) highlight that “there are no hard and fast rules that can be followed to determine the ideal size of a corpus”. Thus, 120 Applied Linguistics research articles are appropriate in terms of corpus size, since it is manageable and suitable for the study’s objectives and analysis and much useful data and in-depth information can be gained from it. Some factors, e.g. the style of writing and the peer review and copy-editing processes, were not taken into account for the present study. The study focuses on investigating the four internal sections (introduction, methods, results, discussion or IMRD) of the articles; other article sections were not analyzed in the study, including all the tables, figures, notes, abstracts, references, and appendices in each of the texts. These systematic procedures for corpus compilation yielded approximately 429,438 running words representing the language used in research articles in the discipline of Applied Linguistics. Again, although the entirety of this specialized corpus may appear relatively small in size, compared to previous studies in the literature, we argue that smaller corpora, as specialized ones, are more suitable than large multi-million corpora to identify the connections between linguistic patterning and specialized contexts of language use (Koester 2010). To this end, we were able to gather in-depth information through quantitative and qualitative methods, especially the occurrence of frequent patterns and linguistic items in context.
Data processing and measures for word selection
For the investigation of frequency statistics for word sequences in the corpus, n-grams were generated using SketchEngine (SkE) software (Kilgarriff et al. 2004). We first cleaned all texts by removing non-textual content. The edited files were then saved, corresponding to the IMRD sections. Initially, the word list option was used to investigate two-, three- and four-word n-grams, which are referred to here as high frequency formulaic expressions in the corpus. Consideration needs to be given to several issues when identifying multiword units based only on frequency occurrence. First, since n-grams are defined by their occurrence frequency, the frequency cut-offs are arbitrary (Hyland 2008b). The frequency threshold was set; each of the reported frequent n-grams occurred a minimum of eight times in the entire corpus. This cut-off point is determined by the total word number and by the aims of this research to examine the multiword combination usage in the corpus. Second, to compare the n-grams across the article sections, Biber et al. (1999) suggested a formula for normalizing frequencies. Based on the length and number of words, the choice of norming to 1,000 words was used in the present study. In regard to the range criteria, we carefully checked all of the generated n-grams to ensure they occurred in at least five files in the corpus, representing the frequency occurrence of such n-grams in at least five articles. This was necessary to guard against subjectivity and idiosyncratic expressions used by individual writers.
It is claimed in the literature that four-word bundles are more phrasal in nature (Biber and Barbieri 2007; Biber et al. 2004; Chen and Baker 2010; Cortes 2004, Cortes 2006; Grabowski 2015; Hyland 2008a, Hyland 2008b). In the analysis, two-word n-gram lists were generated, and we discovered that they mostly appear grammatically incomplete, so that they cannot be understood without the use of nouns or noun phrases (e.g. of the, in the, to the). Simpson-Vlach and Ellis (2010: 493) state that the incomplete bundles are “neither terribly functional nor pedagogically compelling”. Meanwhile, most four-word n-grams (e.g. simple past tense form, intrinsic motivation of English) were found to be content-based lexical items relating to particular subject matter, reflecting an artefact of the writing content. Regarding teachability, they may not have many implications for the entire context and the register in which they are written, in comparison with the n-grams which are grammatically and pragmatically complete units. The three-word n-grams in the analysis seem to be of greater interest than the others as they constitute complete syntactic units as independent meaningful phrases, including some grammatical items, expressing semantic relations (e.g. in order to, as well as), which are not content-based items. Even though their majority does not represent complete structural units (e.g. the use of, the results of), they remain “important building blocks in discourse” (Biber and Barbieri 2007: 270). As a part of the qualitative process, we extracted content-based strings or noun groups (e.g. language learning strategies, teaching and learning) from the list as they might be useful since they are reflective of the topic or content about which the author is writing (see Appendix). Applying this qualitative criterion, we arrived at a list of 476 potential n-grams, which is quite long for pedagogic purposes. We then applied further selection criteria by progressing through the list item-by-item using a concordancer, searching for “plausibly formulaic” multiword strings (Wray 2009: 41) which realize pragmatic functions or meanings. To ensure high reliability, utility and teachability of the list, five English instructors experienced in EAP, who have publications in peer-reviewed journals, were invited to choose the items which appear to be pedagogically useful for article reading and writing. Specifically, they were invited to rate all the phraseological patterns where, in their opinion, it was worth to learn and teach the multiword combinations with an eye to research publication writing. Each potential three-word n-gram was chosen by a minimum of three instructors and was included in the final list. The chosen multiword combinations were explored to investigate how they are used by article writers and how they are semantically used in a contextual environment.
4 Findings
Along with using frequency and intuition-based judgements from five EAP instructors, we first generated a list of content word items for anyone interested in how complex noun groups are used in Applied Linguistics articles (see Appendix). Second, a list of 289 functional lexical strings, which appear to be pedagogically useful, was created. The 289-items list is more easily to manage for pedagogic purposes than the 476-items list, but for effective teaching, it is also vital to explore dimensions in the grouping of the target items. The words were then further investigated and categorized according to functional type by looking at them in context and consulting concordance lines. To help the analysis, Biber et al.’s (2004), Hyland’s (2008b) and Durrent’s (2015) functional classifications were used as a guide. All 289 multiword combinations are grouped into four functional categories – research-oriented, text-oriented, stance-oriented and engagement and other functions. However, it should be noted that this list is not intended to be a definitive interpretation of the functional types of multiword combinations, as several of them are found to have multiple functions since they appeared in several sections and contextual environments. Yet, these functions indicate the most salient function fulfilled in an academic context, particularly in the writing of research articles.
Research-oriented functions
Location, procedure, quantification, description, intangible framing attributes
Location |
||
at the beginning |
in the target |
the present study |
At this stage |
in this group |
this study is |
from this study |
In this study |
|
in a text |
the beginning of |
|
in the study |
the current study |
Procedure |
||
an analysis of |
is used to |
to determine the |
an investigation of |
of data collection |
to find out |
analysis of the |
of each interviewee |
to identify the |
are expected to |
the data were |
to investigate the |
as a means |
the participants in |
to participate in |
by means of |
the participants were |
to retain the |
can be used |
the process of |
to use a |
data were analyzed |
the questionnaire was |
to use the |
Data were collected |
the students were |
use of the |
in order to |
The subjects were |
used as a |
interview questions were |
the use of |
used in the |
investigation of the |
to answer the |
used to analyze |
is obtained for |
to be a |
used to determine |
is used as |
to complete the |
was carried out |
Quantification |
||
a corpus of |
frequency of the |
the degree of |
a lot of |
is one of |
the frequency of |
a number of |
large number of |
the level of |
a part of |
majority of the |
the majority of |
A total of |
most of the |
the number of |
a variety of |
number of the |
the percentage of |
all of the |
one of the |
the proportion of |
as a part |
out of the |
the scores of |
each of the |
some of the |
Description |
||
criteria based on |
participants in the |
the pattern of |
meaning of the |
the meaning of |
the study of |
Intangible framing attributes |
||
a sense of |
reliability of the |
the form of |
an important role |
schematic knowledge of |
the importance of |
aware of the |
the acquisition of |
the influence of |
development of the |
the characteristics of |
the kind of |
good level of |
the concept of |
the medium of |
knowledge of the |
the development of |
the nature of |
mean value of |
the effectiveness of |
the role of |
pattern of the |
the effects of |
the strategy of |
the success of |
this statement is |
validity of the |
the type of |
understanding of the |
Text-orientedfünctions
Structuring signals, transitional signals, resultative signals, framing signals
Structuring signal |
||
above table showed |
be seen in |
seen from the |
according to the |
below illustrates the |
seen in Table |
are presented in |
focus on the |
shown in Table |
are shown in |
focused on the |
The above table |
As can be |
found in the |
the case of |
As shown in |
illustrates the results |
was based on |
based on the |
in the following |
|
be seen from |
presented in Table |
Transition signal |
||
as a result |
in agreement with |
On the other |
as well as |
in line with |
On the whole |
In addition to |
In other words |
such as the |
Resultative signal |
||
a result of |
have shown that |
revealed that the |
agreed that the |
indicate that the |
show that the |
be able to |
is consistent with |
stated that the |
be seen that |
is similar to |
study demonstrated that |
by the participants |
it was found |
study found that |
consistent with that |
not be able |
suggests that the |
data from the |
of the findings |
the data from |
differed from the |
of the participants |
the findings from |
employed by the |
of the questionnaire |
the findings of |
finding is also |
of the respondents |
the participants had |
findings of the |
of the student |
the respondents had |
findings show that |
of this study |
the result of |
followed by the |
participants were able |
The results from |
found that the |
point out that |
The results show |
found to be |
result shows that |
This finding is |
found to exist |
results from the |
This indicates that |
given by the |
results of the |
This means that |
This suggests that |
was found that |
were reported to |
to the participants |
was found to |
|
used by a |
were consistent with |
Framing signal |
||
As for the |
in the process |
the context of |
for further research |
in the use |
the other hand |
from the context |
part of the |
the part of |
in relation to |
purpose of the |
the purpose of |
in terms of |
terms of the |
with regard to |
in the future |
the basis of |
Stance-oriented and engagement functions
Stance feature |
||
are likely to |
highly related to |
need to be |
are more likely |
is important to |
related to the |
be aware of |
is possible that |
seems to be |
be concluded that |
is suggested that |
should be conducted |
be related to |
it can be |
should be noted |
be said that |
it could be |
similar to that |
because of the |
It is also |
so that they |
can also be |
it is necessary |
students should be |
can be seen |
it is possible |
there is a |
compared to the |
It must be |
there is no |
compared to those |
It should be |
This can be |
considered as a |
it would be |
This is because |
considered to be |
likely to be |
to be able |
contribute to the |
might not be |
to be aware |
due to the |
more likely to |
was able to |
Engagement feature be noted that |
Other functions |
||
developed by the |
is not only |
This is in |
exist at the |
research has been |
was divided into |
fact that the |
study aims to |
was employed to |
identified according to |
the fact that |
was identified according |
is in line |
the sense that |
was used to |
The word combination functional analysis, from a pedagogical viewpoint, is essential to understanding the value of the combinations as teaching items. However, language utterances can vary widely according to use, interpretation, socio-cultural factors, social conventions, etc. Since there is no context-free correspondence between structural patterns and pragmatic functions, we argued that each of the phraseological units included in the list can therefore express more than one pragmatic function. Consequently, we concentrated on a small selection of lexical phrases, identifying their dependency on context and topic by using concordance lines to examine how these words are used in the text in terms of salient pragmatic functions. The investigation results and their descriptions are as follows:
Research-oriented functions help the writers with the structure of their research activities and experiences. This group is the largest category including those which refer to research location or place, procedure, quantification, description and topic of the research, and intangible framing attributes.
Analysis of variance (ANOVA) was used to determine the comparability of groups at the beginning of the study. [M 26]
A corpus-driven approach was thereby applied to an analysis of Jane Austen’s six major novels in order to see how well this method works with literary texts. [M 26]
The most cited strategy used is rereading (Table 2, #17) which is a very basic and traditional strategy although some of the participants stated that they reread with a purpose, they only reread and focused on the important part. [D 25]
Moreover, this result supported the study of Kim and Petraki (2009, p. 72) stating that the recognition of L1 importance declined from the advanced group, and increased in the intermediate, and the beginning respectively. [R 26]
To investigate possible ways to encourage the development of Thai learners’ speaking skills, this study aims to research their attitudes and motivation in learning to speak English. [I 1]
As can be seen, in (1) the beginning of is an example of the location sub-category referring to location or spatial reference points in the text. The cluster in order to as in (2) is classified in the procedure sub-category, which indicates the objective of the approach used for analysis in the study. Quantification features (some of the) as in (3) refer to participant quantity. This group relates mostly to the number of samples or participants involved in research activities, data, researchers, and related research. The string (the study of) in (4) in the description sub-category describes the research’s physical properties that the writer compares the research findings with. Intangible framing attributes (the development of) as in (5) refer to learner abstract properties, such as speaking abilities and development.
Text-oriented functions deal with text meaning and its organization. Transition signals, structuring signals, resultative signals and framing signals of the text are included in this category.
An ability to hold a conversation during flights in English is just as important as listening skills as well as service functions in the role of cabin crew. [D 1]
The lists of keywords, semantic fields and grammatical categories in JA are presented in Tables 1–3 below, starting from the item with the highest degree of keyness. [R 26]
The interview revealed that the high vocabulary subjects seemed to have positive attitudes towards English while the low vocabulary subjects seemed to have negative attitudes towards the language. [D 26]
The purpose of this study was to determine how typically developing children and children with autism construe their experience of the world around and inside them in producing their narratives. [I 26]
In (6), structuring signals (are presented in) denote parts of the text, which helps to direct the reader to visuals and/or particular sections of the text. In (7), transition signals (as well as) indicate text structure, which directs readers to the information’s location (in Table 1–3). This sub-category includes phrases showing the relationships of addition, contrast, or equivalence between elements, called discourse markers in Biber et al.’s (2004) classification. Resultative signals refer to causative or inferential relationships between elements. The string the purpose of in (9) is used to state the study objective, showing what research was conducted.
Stance-oriented and engagement functions express epistemic judgements, attitudes, evaluations, and degrees of commitment regarding the claims which are being made. The findings for this category corroborate Simpson-Vlach and Ellis’ (2010) statement that the formulae are associated with knowledge claims, expression of certainty or uncertainty, beliefs, thoughts, or claims made by others. Hyland (2008a, 2008b) and Biber et al. (2004) state that stance-oriented functions also express a degree of migration, tentativeness and claim possibilities. This function category includes two functional subgroups – stance-oriented and engagement functions. Yet, here only one item is included in the engagement functions.
English language learners are likely to use the language with people from various language and cultural backgrounds. [I 26]
It should be noted that even though the respondents strongly aspired for native-like pronunciation, they were aware that native-like pronunciation is not the only requirement for successful communication. [R 1]
Stance-oriented functions, such as are likely to in (10), reflect the writing’s evaluative nature. With this expression, the writer expresses his or her interpretations and attitudes towards statements in terms of possibilities. In (11), the string be noted that as grouped in the engagement functions indicates the statement’s importance. The writer would like to incorporate the active role of potential readers. In this context, Hyland (2001: 552) points out that the exchange between writer and reader is established when readers are considered as “real players in the discourse rather than merely as implied observers of the discussion”.
Other functions refer to the meanings which vary widely depending on the particular context: interpretation, socio-cultural factors, social conventions, etc.
This finding is in line with Sarani and Kafipour (2008), who reported that L2 learners did not use dictionaries appropriately. [D 26]
These samples were divided into two groups – 20 good readers and 20 poor readers – based on their grades in 4 previous reading courses. [M 1]
The n-gram is in line (with) in (12) is commonly used when writers compare their research results with previous research. Subsequently, in (13), the string were divided into describes the study’s participants regarding the research method used in an experiment. Essentially, the multiword combinations’ defined functions and meanings included in this category are dependent upon the possible environmental contexts in which they are used.
5 Discussion and conclusion
This study examined Applied Linguistics research articles by using repeated frequent three-word sequences and psychological judgements. The aim of the research was to create a pedagogically useful list of multiword items and to provide their semantic and pragmatic functions to aid the task of research manuscript writing for publication. Based on a corpus-based, qualitative approach and the opinions of five EAP instructors, a list of 289 three-word multiword combinations for teaching research article writing in English was generated. We assessed how much phraseology contributes to article writing by investigating lexical cluster pragmatic functions included in the list. A combination of qualitative and quantitative approaches in list development has its advantages. The combination of objective and subjective criteria is seen as a complementary perspective, whereas quantitative analysis being qualitatively validated is also crucial, offering a powerful way to understand texts. Meanwhile, the inclusion of instructor insights is seen as another selection criterion which can maximize the pedagogical usefulness of the list. Taking all of these aspects together – quantitative frequency, qualitative judgements about what are meaningful phrases, and inputs from experts in the field in considering those useful phrases – demonstrates a thorough perspective on textual analysis to receive specific and in-depth information. These methodological choices ensure that the word list is developed in a transparent and reliable way, contains items which are frequently used, and can potentially be useful. As for the top-down perspective, the quantitative approach, like the corpus-based investigation, shows that the greatest range of content words is in the article corpus but is not included in the final list, since the list’s pedagogic purpose is also a principal objective. In the qualitative approach, based on context dependency, some multiword sequences seem to possess multi-functionality, appearing across several sections. For example, ‘according to the’ could appear in the Methods, Results and Discussion sections, while ‘some of them’ is found in every section across the text. This bottom-up perspective identifies the functional types in terms of context and occurrence in the text. The finding concurs with Simpson-Vlach and Ellis’s (2010) research that currency and frequency alone cannot assure functional utility, rendering teachability and pedagogic value. In this regard, the list of content-word strings is useful and has meaning for researchers who are interested in the use of the English language in research literature and in how complex noun phrases are used in the publication of articles. We also argued that semantic and pragmatic criteria are more meaningful than those based on frequency, and this combination of research methods is, therefore, substantially important in developing a list of functional strings.
Since the phraseological units selected for the present study are syntactically complete units, their characteristics are distinct from those of lexical bundles in previous research (Biber et al. 2004; Cortes 2004, Cortes 2007, Cortes 2013). One of the potential reasons for this is that Biber et al.’s (2004) and Hyland’s (2008a, 2008b) classifications are derived from the analysis of a huge corpus, including various disciplines and registers. Meanwhile, Coxhead’s (2000) and Gardner and Davies’s works (2014) focused on word families in the academic lexis. In addition, the current study focuses exclusively on three-word phraseological patterns, rather than four- or more word bundles (e.g. Biber et al. 2004; Conrad and Biber 2004; Cortes 2004; Hyland 2008a, Hyland 2008b). The pragmatic functions of the lexical items found in the context are distinctive, reflecting the topic-specific and language use in Applied Linguistics research articles. Given that the aim of this research was to help novice writers and students to draft their research manuscripts effectively, examining a specific corpus from a single discipline is likely to be beneficial since it yields more specific functional characteristics (Durrant 2017).
As far as the pedagogical purpose is concerned, novice writers and students should be aware of types of lexical items and their relation to information structure and/or discourse function. Csomay (2013) suggested that students often don’t consider that multiword items and grammatical patterns can indicate a change in text type within discourse. The list and pragmatic functions suggested in this study can serve as the basis for proficient academic writing. Thus, when designing an academic writing course for publication, instructors could make full use of the list and integrate a description of this study into their instruction. Hyland (2008b) suggests that writers are expected to stick to the linguistic rules of language and comply with the intended readers’ expectations via the implementation of potential lexical clusters of the discourse in question. Students and novice article writers should therefore have the required knowledge about the use and pragmatic functions of multiword combinations applied in a given section when preparing their research manuscript. Instructors might use the knowledge and multiword items list of this study by implementing some activities which feature different lexical cluster types, with an emphasis on fostering the expressive skills of their students and on how to use the clusters for communicative purposes. Moreover, to improve the usefulness of the list, instructors may describe patterns of use or structural “frames” rather than solely teaching the multiword combinations. For example, ‘to answer the’ is probably going to have ‘first, second, third question or research question’ as the next element and to indicate this would make the list much more useful. This may develop the students’ proficiency and experience in using contextually appropriate words while writing academic texts (Pan et al. 2016). Instructors might also draw their students’ attention to the words in the list and encourage their use in assignment writing. This supports Wood’s (2015) notion which suggests that formulaic sequence knowledge can be used with sensitivity. For example, formulaic sequences can be integrated into language pedagogy by using them with form-focused lesson, instruction, and specific types of activities, such as searching corpora for concordances of sequences, or replacing single words with sequences. To introduce different lexical clusters which act as various pragmatic functions and to raise student awareness about the importance of this language phenomenon in academic contexts will help students in drafting articles which meet the required levels current in academic and/or research communities (Coxhead and Byrd 2007; Martinez and Schmitt 2012).
At a more advanced level, the comprehensive approach to selecting pedagogically useful phraseological patterns included in the current study is a starting point for setting vocabulary goals for advanced language courses, especially in terms of guiding graduate students in independent study. Hyland and Tse (2009) suggested that a good method to prepare students for studying is not to search for universally appropriate teaching items. However, regarding a genre-based approach to teaching, we would argue that instructors can take advantage of selecting phraseological units to create academic word lists which fit a specific classroom setting, context and pedagogic purpose. They could explain to the students that the selected phraseological patterns are some of the important linguistic features they might encounter in academic settings and especially in engaging in writing research articles. In particular, the selection criteria can help instructors in selecting texts and developing learning-related activities to promote student sensitivity to the importance of lexico-grammatical features and phraseological patterns which occur frequently in the text.
However, caution is required in applying the findings and the lists to pedagogy as the corpus of this study stems from a single discipline. The results and the list should be considered as only illustrative and have restrictions because they relate only to Applied Linguistics research articles published in English which are indexed in the TCI database, rather than to English articles in various other disciplines which might not be included in the TCI. Regarding the selection criteria used in this study, the items chosen for inclusion in the pedagogically useful list of functional n-grams and content-based strings are not supposed to be representative of the entirety of all phraseology used in articles published in this field in English. There might be phraseological patterns, pragmatic functions and complex noun groups which are not present in the current study. Yet, the study’s findings are considered meaningful enough for those intending to publish their research in journals, especially the ones included in the TCI database. Other multiword combinations might not be conclusive and are not included in the pedagogically useful list. Additionally, the methodology and scope of this study should be considered. Firstly, the number of articles analyzed in this study is relatively small and specific. To generalize the findings, a bigger corpus size might achieve an improved yield and represent a better global picture of multiword combinations used in the articles. Secondly, it was found in the corpus that 3-grams are more useful than bigrams and longer grams. However, it should be taken into account that 3-grams generated from the corpus in this study might be the effect of the corpus size. We acknowledge that longer sequences might be useful as they can help reveal the semantic and pragmatic functions from the context in which the strings are used. Therefore, the bigger the corpus, the more interesting expressions can be revealed by a longer n-gram calculation. It is also acknowledged that the list is only raw material that will need further work to prove how it is useful to EAP writers. A process could be envisaged by which the article writers are given the list as well as the database, and they can then search for the specific context using a concordancer. An alternative approach would be to take other eminent statistic criteria such as MI-score and formula teaching worth (Simpson-Vlach & Ellis 2010), together with the careful selection from EAP instructors, to support an identification task for multiword units useful for pedagogic purposes and to obtain a more refined pedagogically useful list (Salazar 2011). Additionally, it is crucial to encourage and teach students to consult other reliable resources when encountering multiword items and experiencing difficulty in reading and writing. Despite the scope for future research, the descriptive results here remain crucial for EAP instructors in developing instructional materials in teaching writing for scholarly publication. This might help graduate students and novice writers with the preparation of manuscripts for publication.
Acknowledgements
We would like to thank the anonymous reviewers for their constructive comments and suggestions.
References
Altenberg, Bengt. 1998. On the phraseology of spoken English: The evidence of recurrent word-combinations. In Anthony P. Cowie (ed.), Phraseology: theory, analysis and applications, 101–122. Oxford: Oxford University Press.Search in Google Scholar
Appel, Randy & David C. Wood. 2016. Recurrent word combinations in EAP test-taker writing: Differences between high and low proficiency levels. Language Assessment Quarterly 13(1). 55–71.10.1080/15434303.2015.1126718Search in Google Scholar
Biber, Douglas. 2006. University language: A corpus-based study of spoken and written registers. Amsterdam: John Benjamins.10.1075/scl.23Search in Google Scholar
Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad & Edward Finegan. 1999. Longman grammar of spoken and written English. Harlow: Pearson Education.Search in Google Scholar
Biber, Douglas, Susan Conrad & Viviana Cortes. 2004. If you look at…: Lexical bundles in university teaching and textbooks. Applied Linguistics 25(3). 371–405.10.1093/applin/25.3.371Search in Google Scholar
Biber, Douglas & Federica Barbieri. 2007. Lexical bundles in university spoken and written registers. English for Specific Purposes 26(3). 263–286.10.1016/j.esp.2006.08.003Search in Google Scholar
Bowker, Lynne & Jennifer Pearson. 2002. Working with specialized language: A practical guide to using corpora. London: Routledge.10.4324/9780203469255Search in Google Scholar
Brezina, Vaclav & Dana Gablasova. 2015. Is there a core general vocabulary? Introducing the New General Service List. Applied Linguistics 36(1). 1–22.10.1093/applin/amt018Search in Google Scholar
Bychkovska, Tetyana & Joseph J. Lee. 2017. At the same time: Lexical bundles in L1 and L2 university student argumentative writing. Journal of English for Academic Purposes 30. 38–52.10.1016/j.jeap.2017.10.008Search in Google Scholar
Charles, Maggie. 2006. Phraseological patterns in reporting clauses used in citation: A corpus-based study of theses in two disciplines. English for Specific Purposes 25(3). 310–331.10.1016/j.esp.2005.05.003Search in Google Scholar
Chen, Yu-Hua & Paul Baker. 2010. Lexical bundles in L1 and L2 academic writing. Language Learning & Technology 14(2). 30–49.Search in Google Scholar
Conrad, Susan & Douglas Biber. 2004. The frequency and use of lexical bundles in conversation and academic prose. Lexicographica 20. 56–71.10.1515/9783484604674.56Search in Google Scholar
Cortes, Viviana. 2004. Lexical bundles in published and student disciplinary writing: Examples from history and biology. English for Specific Purposes 23(4). 397–423.10.1016/j.esp.2003.12.001Search in Google Scholar
Cortes, Viviana. 2006. Teaching lexical bundles in the disciplines: An example from a writing intensive history class. Linguistics and Education 17(4). 391–406.10.1016/j.linged.2007.02.001Search in Google Scholar
Cortes, Viviana. 2013. The purpose of this study is to: Connecting lexical bundles and moves in research article introductions. Journal of English for Academic Purposes 12(1). 33–43.10.1016/j.jeap.2012.11.002Search in Google Scholar
Coxhead, Averil. 2000. A new academic word list. TESOL Quarterly 34(2). 213–238.10.2307/3587951Search in Google Scholar
Coxhead, Averil & Paul Nation. 2001. The specialised vocabulary of English for academic purposes. In John Flowerdew & Matthew Peacock (eds.), Research perspectives on English for Academic Purposes, 252–267. Cambridge: Cambridge University Press.10.1017/CBO9781139524766.020Search in Google Scholar
Coxhead, Averil & Pat Byrd. 2007. Preparing writing teachers to teach the vocabulary and grammar of academic prose. Journal of Second Language Writing 16(3). 129–147.10.1016/j.jslw.2007.07.002Search in Google Scholar
Csomay, Eniko. 2013. Lexical bundles in discourse structure: A corpus-based study of classroom discourse. Applied Linguistics 34(3). 369–388.10.1093/applin/ams045Search in Google Scholar
De Cock, Sylvie, Sylviane Granger, Geoffrey Leech & Tony McEnery. 1998. An automated approach to the phrasicon of EFL learners. In Sylviane Granger (ed.), Learner English on computer, 67–79. London: Longman.10.4324/9781315841342-5Search in Google Scholar
Ding, Yanren. 2007. Text memorization and imitation: The practice of successful Chinese learners of English. System 35(2). 271–280.10.1016/j.system.2006.12.005Search in Google Scholar
Durrant, Philip. 2009. Investigating the viability of a collocation list for students of English for academic purposes. English for Specific Purposes 23(3). 157–169.10.1016/j.esp.2009.02.002Search in Google Scholar
Durrant, Philip. 2014. Discipline and level-specificity in university students’ written vocabulary. Applied Linguistics 35(3). 328–356.10.1093/applin/amt016Search in Google Scholar
Durrant, Philip. 2017. Lexical bundles and disciplinary variation in university students’ writing: Mapping the territories. Applied Linguistics 38(2). 165–193.10.1093/applin/amv011Search in Google Scholar
Durrant, Philip & Anna Siyanova-Chanturia. 2015. Learner corpora and psycholinguistic research. In Sylviane Granger, Gaëtanelle Gilquin & Fanny Meunier (eds.), Cambridge handbook of learner corpus research, 57–78. Cambridge: Cambridge University Press.10.1017/CBO9781139649414.004Search in Google Scholar
Erman, Britt & Beatrice Warren. 2000. The idiom principle and the open choice principle. Text 20(1). 29–62.10.1515/text.1.2000.20.1.29Search in Google Scholar
Feak, Christine, & John Swales. 2011. Academic writing for graduate students: Essential tasks and skills. Ann Arbor: University of Michigan.10.3998/mpub.5042773Search in Google Scholar
Gardner, Dee & Mark Davies. 2014. A new academic vocabulary list. Applied Linguistics 35(3). 305–327.10.1093/applin/amt015Search in Google Scholar
Ghadessy, Mohsen. 1979. Frequency counts, word lists, and materials preparation: A new approach. English Teaching Forum 17(1). 24–27.Search in Google Scholar
Gilner, Leah & Frank Morales. 2008. Corpus-based frequency profiling: Migration to a word list based on the British National Corpus. The Buckingham Journal of Language and Linguistics 1. 41–58.10.5750/bjll.v1i0.3Search in Google Scholar
Grabowski, Lukasz. 2015. Keywords and lexical bundles within English pharmaceutical discourse: A corpus-driven description. English for Specific Purposes 38. 23–33.10.1016/j.esp.2014.10.004Search in Google Scholar
Granger, Sylviane & Fanny Meunier. 2008. Phraseology: an interdisciplinary perspective. Amsterdam: John Benjamins.10.1075/z.139Search in Google Scholar
Hyland, Ken. 2001. Disciplinary discourses: Social interactions in academic writing. London: Longman.Search in Google Scholar
Hyland, Ken. 2004. Disciplinary discourse: Social interactions in academic writing. Ann Arbor: University of Michigan Press.Search in Google Scholar
Hyland, Ken. 2008a. Academic clusters: text patterning in published and postgraduate writing. International Journal of Applied Linguistics 18(1). 41–62.10.1111/j.1473-4192.2008.00178.xSearch in Google Scholar
Hyland, Ken. 2008b. As can be seen: Lexical bundles and disciplinary variation. English for Specific Purposes 27(1). 4–21.10.1016/j.esp.2007.06.001Search in Google Scholar
Hyland, Ken & Polly Tse. 2005. Hooking the reader: A corpus study of evaluative that in abstracts. English for Specific Purposes 24(2). 123–139.10.1016/j.esp.2004.02.002Search in Google Scholar
Jacobs, Vicki. 2008. Adolescent literacy: Putting the crisis in context. Harvard Educational Review 78(1). 7–39.10.17763/haer.78.1.c577751kq7803857Search in Google Scholar
Jones, Martha & Sandra Haywood. 2004. Facilitating the acquisition of formulaic sequences. In Norbert Schmitt (ed.), Formulaic sequences, 269–300. Philadelphia: John Benjamins.10.1075/lllt.9.14jonSearch in Google Scholar
Kilgarriff, Adam, Pavel Rychly, Pavel Smrz & David Tugwell. 2004. The Sketch Engine. In Geoffrey Williams & Sandra Vessier (eds.), Proceedings of the eleventh EURALEX international congress. Université de Bretagne-Sud, 105 –116. Lorient: EURALEX.Search in Google Scholar
Koester, Almut. 2010. Building small specialised corpora. In Anne O’Keeffe & Michael McCarthy (eds.), The Routledge handbook of corpus linguistics, 66–79. London: Routledge.10.4324/9780203856949-6Search in Google Scholar
Lei, Lei & Dilin Liu. 2016. A new medical academic word list: A corpus-based study with enhanced methodology. Journal of English for Academic Purposes 22. 42–53.10.1016/j.jeap.2016.01.008Search in Google Scholar
Li, Jie & Norbert Schmitt. 2009. The acquisition of lexical phrases in academic writing: A longitudinal case study. Journal of Second Language Writing 18(2). 85–102.10.1016/j.jslw.2009.02.001Search in Google Scholar
Liu, Dilin. 2012. The most frequently-used multi-word constructions in academic written English: A multi-corpus study. English for Specific Purposes 31(1). 25–35.10.1016/j.esp.2011.07.002Search in Google Scholar
Lynn, Robert W. 1973. Preparing word lists: a suggested method. RELC Journal 4(1). 25–32.10.1177/003368827300400103Search in Google Scholar
Martinez, Iliana A., Silvia C. Beck & Carolina B. Panza. 2009. Academic vocabulary in agriculture research articles: A corpus-based study. English for Specific Purposes 28(3). 183–198.10.1016/j.esp.2009.04.003Search in Google Scholar
Martinez, Ron & Norbert Schmitt. 2012. A Phrasal Expressions List. Applied Linguistics 33(3). 299–320.10.1093/applin/ams010Search in Google Scholar
Nation, Ian S. P. 2001. Learning vocabulary in another language. Cambridge: Cambridge University Press.10.1017/CBO9781139524759Search in Google Scholar
Nesi, Hilary & Helen Basturkmen. 2006. Lexical bundles and discourse signalling in academic lectures. International Journal of Corpus Linguistics 11(3). 147–168.10.1075/ijcl.11.3.04nesSearch in Google Scholar
Pan, Fan, Randi Reppen & Douglas Biber. 2016. Comparing patterns of L1 versus L2 English academic professionals: Lexical bundles in Telecommunications research journals. Journal of English for Academic Purposes 21. 60–71.10.1016/j.jeap.2015.11.003Search in Google Scholar
Pawley, Andrew & Frances Hodgetts Syder. 1983. Two puzzles for linguistic theory: Nativelike selection and nativelike fluency. In Jack C. Richards & Richard W. Schmidt (eds.), Language and communication, 191–230. London: Longman.Search in Google Scholar
Peters, Elke & Paul Pauwels. 2015. Learning academic formulaic sequences. Journal of English for Academic Purposes 20. 28–39.10.1016/j.jeap.2015.04.002Search in Google Scholar
Reppen, Randi. 2004. Academic language: An exploration of university classroom and textbook language. In Ulla Connor & Thomas A. Upton (eds.), Discourse in the professions: Perspectives from corpus linguistics, 65–86. Amsterdam: John Benjamins.10.1075/scl.16.04repSearch in Google Scholar
Ruan, Zhoulin. 2016. Lexical bundles in Chinese undergraduate academic writing at an English Medium university. RELC Journal 48(3). 327–340.10.1177/0033688216631218Search in Google Scholar
Salazar, Danica. 2011. Lexical bundles in scientific English: A corpus-based study of native and non-native writing. Barcelona: University of Barcelona dissertation.Search in Google Scholar
Samraj, Betty. 2005. An exploration of a genre set: Research article abstracts and introductions in two disciplines. English for Specific Purposes 24(2). 141–156.10.1016/j.esp.2002.10.001Search in Google Scholar
Schmitt, Norbert. 2004. Formulaic sequences: Acquisition, processing, and use. Amsterdam: John Benjamins.10.1075/lllt.9Search in Google Scholar
Scott, Mike. 2004. WordSmith Tools Version 4.0. Oxford: Oxford University Press.Search in Google Scholar
Shaw, Philip. 1991. Science research students’ composing process. English for Specific Purposes 10(3). 189–206.10.1016/0889-4906(91)90024-QSearch in Google Scholar
Simpson-Vlach, Rita & Nick C. Ellis. 2010. An academic formulas list: New methods in phraseology research. Applied Linguistics 31(4). 487–512.10.1093/applin/amp058Search in Google Scholar
Siyanova-Chanturia, Anna. 2015. On the ‘holistic’ nature of formulaic language. Corpus Linguistics and Linguistic Theory 11(2). 285–301.10.1515/cllt-2014-0016Search in Google Scholar
Stubbs, Michael. 2007. An example of frequent English phraseology: Distributions, structures and functions. In Roberta Facchinetti (ed.), Corpus Linguistics 25 years on, 89–105. Amsterdam: Radopi.10.1163/9789401204347_007Search in Google Scholar
Svasti, Jisnuson & Ruchareka Asavisanu. 2007. Aspects of quality in journals: A consideration of the journals published in Thailand. ScienceAsia 33(2). 137–143.10.2306/scienceasia1513-1874.2007.33.137Search in Google Scholar
Swales, John M. 1990. Genre analysis: English in academic and research setting. Cambridge: Cambridge University Press.Search in Google Scholar
Swales, John M. 2004. Research genre: Explorations and applications. Cambridge: Cambridge University Press.Search in Google Scholar
Tangpijaikul, Montri. 2014. Preparing business vocabulary for the ESP classroom. RELC Journal 45(1). 51–65.10.1177/0033688214522641Search in Google Scholar
Thurston, Jennifer & Christopher N. Candlin. 1998. Concordancing and the teaching of the vocabulary of academic English. English for Specific Purposes 17(3). 267–280.10.1016/S0889-4906(97)00013-6Search in Google Scholar
Vongpumivitch, Viphavee, Ju-yu Huang & Yu-chia Chung. 2009. Frequency analysis of the words in the Academic Word List (AWL) and non-AWL content words in applied linguistics papers. English for Specific Purposes 28(1). 33–41.10.1016/j.esp.2008.08.003Search in Google Scholar
Wang, Jing, Shao-lan Liang & Guang-Chun Ge. 2008. Establishment of a medical academic wordlist. English for Specific Purposes 27(4). 442–458.10.1016/j.esp.2008.05.003Search in Google Scholar
Watson Todd, Richard. 2017. An opaque engineering word list: Which words should a teacher focus on? English for Specific Purposes 45. 31–39.10.1016/j.esp.2016.08.003Search in Google Scholar
West, Michael. 1953. A general service list of English words. London: Longman.Search in Google Scholar
Wood, David. 2006. Uses and functions of formulaic sequences in second language speech: An exploration of the foundations of fluency. Canadian Modern Language Review 63(1). 13–33.10.3138/cmlr.63.1.13Search in Google Scholar
Wood, David. 2015. Fundamentals of formulaic language. London: Bloomsbury Academic.Search in Google Scholar
Wray, Alison. 2002. Formulaic language and the lexicon. Cambridge: Cambridge University Press.10.1017/CBO9780511519772Search in Google Scholar
Wray, Alison. 2009. Identifying formulaic language: Persistent challenges and new opportunities. In Roberta L. Corrigan, Edith A. Moravcsik, Hamid Ouali & Kathleen M. Wheatley (eds.), Formulaic language. Vol. 1: Distribution and historical change, 27–51. Amsterdam: John Benjamins.10.1075/tsl.82.02ideSearch in Google Scholar
Xue, Guoyi & Ian S. P. Nation. 1984. A University Word List. Language Learning and Communication 3(2). 215–229.Search in Google Scholar
Yang, Ming-Nuan. 2015. A nursing academic word list. English for Specific Purposes 37(1). 27–38.10.1016/j.esp.2014.05.003Search in Google Scholar
Appendix
List of content-based n-grams generated from the corpus of 120 research articles
vocabulary learning strategy/strategies |
multiple choice options |
foreign language anxiety |
general English proficiency |
English language learning |
high proficiency group |
child/children with autism |
low listening ability |
Jane Austen’s novels |
paper-pencil peer feedback |
typically developing children |
positive politeness strategies |
reading for pleasure |
codes corrective feedback |
foreign language classroom |
language learning strategies |
English language teaching |
negative politeness strategies |
consonant segmental phonemes |
frequently used strategies |
listening ability group |
perceived communication mobility |
written corrective feedback |
no significant difference |
low vocabulary subjects |
students’ writing ability |
low proficiency group |
teaching cultural content |
language classroom anxiety |
vocabulary size test |
high vocabulary subjects |
English major students |
newly learned words |
Information gap tasks |
academic listening comprehension |
overall mean score |
high listening ability |
proves writing approach |
significant linguistic features |
vocabulary learning problems |
©2020 Walter de Gruyter GmbH, Berlin/Boston