Introduction

Higher and vocational education experience the sheer necessity to innovate. Rapidly changing labour markets require the acquisition of skills and knowledge that are often absent in education programs (World Economic Forum, 2016). Moreover, there is an ongoing debate on safeguarding the quality of education with regard to improving instructional methods, assessments, and their impact on teachers’ competences (Organization for Economic Cooperation and Development, 2009; 2018 [OECD]). Today’s innovations in education concern, for example, integrating twenty-first century skills or soft skills such as communication, collaboration and flexibility in curricula, to support students’ entrepreneurial capacities (Schleicher, 2012; Sias et al., 2017). Also, new technologies in the classroom that foster new ways of teaching and learning are implemented (Bourgonjon et al., 2013; Eteokleous, 2008).

However, the need for innovations in education strongly depends on how teachers put these innovations into practice. Many innovations in education do not result into the desired changes. Next to institutional factors, teachers’ behaviour to cope with innovations is of utmost importance for understanding the success or failure of innovations (Hasanefendic et al., 2017). Teachers play a crucial role in innovations (George & Sabapathy, 2011; Koeslag-Kreunen, Van der Klink, Van den Bossche & Gijselaers, 2018). As Sherry (2002) states: “Individuals must be the focus if change is to be facilitated. Institutions will not change until their members change” (p.214). Innovations require that teachers develop new behaviour, but often, even after a considerable period of time, teachers abandon the newly acquired behaviour and fall back to comfortable old routines (Verloop et al., 2001). To safeguard the success of innovations it is crucial to encourage and enhance teachers’ innovative behaviour (Thurlings, Evers & Vermeulen, 2015).

Innovative work behaviour (IWB) is an emerging concept, originally defined by Janssen (2000) as individual behaviour that leads to the initiation, presentation and realization of new ideas, products or procedures within the working place, team or organization. Since then this definition has been used, and further elaborated by many researchers. Also in the education context, IWB has gained attention. For example, Messmann (2012) defines IWB as a multi-stage iterative process, consisting of four different phases: 1] Opportunity Exploration which entails paying attention to trends, opportunities for innovation and problem recognition; 2] Idea Generation, i.e. generating novel and useful ideas for products, services or processes; 3] Idea Promotion, which includes seeking sponsorship for the ideas among colleagues and supervisors and apply for funding and facilitation and finally; 4] Idea Realization, which involves the creation and implementation of a prototype or model in the workgroup or organization. The phases are partly dependent, but do not necessarily follow a fixed order, resulting in a complex, dynamic, non-linear model of IWB (Messmann & Mulder, 2012). For instance, when promoting new ideas, teachers might identify new opportunities for innovation.

Though the IWB-concept has been further developed, some gaps need our attention.

Firstly, if IWB is compared with current theoretical insights on innovations, a conceptual gap can be identified. Innovation theories emphasize the need to stabilize the innovation (West & Farr, 1989). This stabilization phase is sometimes also labeled as “transfer” or “diffusion” (Kanter, 1988), or “continuation” or “institutionalization” (Fullan, 2007). Stabilization is crucial when it comes to enhancing the sustainability of innovations. Stabilization is an extension of the implementation phase, in which the innovation is anchored into the organizational system. The integration of innovation in the organization increases its long-term success and continuity (Fullan, 2007; West & Farr, 1989). However, thus far sustainability has not been included in IWB-conceptualizations and operationalizations.

Secondly the few identified studies that have developed and empirically tested IWB-measurement instruments have been faced with challenging situations (see Table 1). The initial IWB measures consist of only a few items, representing one IWB-dimension. These measures do not reflect the complex and multidimensional nature of IWB. Other studies have explored multiple measures for distinct IWB-dimensions, but often fail to test the validity of these separate dimensions (Krause, 2004; Dorenbosch et al., 2005) or report limited evidence on the construct validity of the subdimensions due to four under-represented dimensions (e.g. De Jong & Den Hartog, 2010). As Messick (1995) points out: construct under-representation threatens validity. The multiple IWB measures, developed by Messmann and Mulder (2012) for the education context suffer from similar deficiencies. Their work supports the existence of multiple IWB-dimensions. However they cannot yet empirically substantiate an Idea Realization dimension.

Table 1 Findings from literature: Authors, number of items and dimensions, quality of measurements, number of respondents and domains

In sum, IWB conceptualizations and operationalizations need further attention. Existing measurements miss empirical evidence of the construct validity and moreover do not include a sustainability dimension. Our study aims to develop and validate a multi-dimensional IWB instrument to measure teachers IWB. Based on a thorough and comprehensive conceptualization of IWB, we first adapted and extended the previously used instruments and developed items for a sustainability dimension. Second, we tested the construct validity of this newly developed multi-dimensional IWB instrument.

Theoretical Framework

Kanter (1988) developed the foundation for IWB, as it is known today. She suggested that the innovation process in organizations could best be understood by dividing it into tasks that individuals engage in correlating with the innovation process. The following four tasks were identified: Idea Generation, Coalition Building, Idea Realization and Transfer/Diffusion. Since Kanter, many authors have addressed the role of the individual in innovation, often referred to as innovative behaviour (e.g. Scott & Bruce, 1994; Woodman et al., 1993). Janssen (2000) proposed the term Innovative Work Behaviour for the ‘intentional creation, introduction and application of new ideas within a work role, group or organization in order to benefit role performance, the group or the organization’ (p.288). Janssen’s proposal was adopted on a wide scale (e.g. De Jong & Den Hartog, 2010; Tuominen & Toivonen, 2011; for an overview of studies in the educational context, see Thurlings et al., 2015). If we compare this elaboration with the phases of innovation proposed by Kanter, we see that the transfer / diffusion phase is not included in Janssen’s concept.

Previous Dimensions of IWB

Many scholars have discussed the multi-dimensionality of individual innovation and, later on, IWB. In 1990, Farr and Ford proposed a two-stage model that included a creative and an implementation part. In the same vein, Krause (2004) and Dorenbosch et al. (2005) argued for a two-stage model comprising creativity –oriented and implementation –oriented work behaviour. In addition to two-stage models, multi-stage models have been proposed. For the three-stage model, the dimension Idea Promotion was added to the Idea Generation, and Idea Realization dimensions (e.g. Janssen, 2000, Scott & Bruce, 1994, 1998). In the four-stage model, scholars have included the Opportunity Exploration (OE) phase, which precedes Idea Generation. In the OE phase problems and needs in one’s work context are recognized, and change opportunities are created (e.g. Janssen et al., 1997; Kleysen & Street, 2001). In a first five-stage model, Kleysen and Street (2001) added the Formative Investigation stage, referring to the Krause (2004) testing phase. This stage entails formulating new ideas and solutions, testing them, and evaluating the outcomes (Kleysen & Street, 2001). Messmann and Mulder (2012) added a fifth stage to the Opportunity Exploration, Idea Generation, Promotion, and Realization stages: the Reflection stage. This stage encompasses “assessing the progress of innovation development, evaluating activities and outcomes based on criteria for success, examining one’s personal advancement during innovation development, and improving action strategies for future situations.” (p. 46)

Idea Sustainability as an Additional Dimension of IWB

The multi-stage models of IWB outlined above are, notably, comparable with innovation models described in the innovation literature. Both the innovation models of West and Farr (1989) and Fullan (2007) in the educational context propose corresponding phases. If we compare the two models, it can be observed that the first two phases of West and Farr’s model (recognition and initiation) are both included in the initiation phase of Fullan’s model. This initiation phase consists of change initiation, where performance gaps are recognized and new solutions or ideas are generated. Both models identify an implementation phase, during which the group executes the innovation the first time, and where effects of the implementation are observable in the work place. West and Farr completed the innovation cycle with a stabilization phase, or, as Fullan named it, a continuation phase. In this phase, the change has to become embedded in the organizational system. Hence, the continuation phase is an extension of the implementation phase, anchoring the innovation in the organization. Both models present a more encompassing definition of innovation, which is important, given that in educational settings there is a tendency to ignore the conditions (such as time) for the continuation or stabilization phase. In this context, both West and Farr, and Fullan stress the importance of a stabilization or continuation phase to sustain an innovative idea.

The prevailing conceptualizations of IWB include dimensions that refer to the first three phases of the innovation cycle (see Fig. 1). Remarkably, neither the stabilization nor the continuation phase is reflected in any of the IWB conceptualizations. Despite Kanter’s (1988) emphasis on the diffusion phase, the stage has not been included in more recent conceptualizations of IWB, such as the work of Messmann and Mulder (2011), who argue that the diffusion is unrelated to innovation, with different contextual characteristics, other persons and distinct resources. However, as Van de Ven (1986) puts it: “An invention or creative idea does not become an innovation until it is implemented or institutionalized” (p.604). So, Van de Ven (1986) and Johnson et al. (2004) argue that it is necessary to focus both on the implementation of innovations in the short and the long term. It follows that those responsible for the implementation of an innovation must continue to work on its sustainability, where the further continuation of the newly implemented idea must be a primary goal. This sustainability phase includes disseminating innovative ideas into the deeper structure of the organization through institutionalization (Gannaway et al., 2013; Reay et al., 2013). Particular steps have to be taken to strengthen the infrastructure that are necessary to sustain an innovative idea (Johnson, et al., 2004). The literature specifies the following features of sustainability: Improving and optimizing the innovation, such as updating and continuous regeneration to avoid implementation dips (Coffey & Horner, 2012; Fullan, 2002, Loh et al., 2013); Embedding the innovation in-depth in the system of the organization, by capacity building for securing adequate resources (Loh, et al., 2013; Fullan, 2007); Disseminating the innovation on a larger scale, like planning for project growth and broader application of an innovative idea (Loh, et al., 2013) and finally visualization of the benefits of the innovation for stakeholders by stimulating community participation and communicating a longer-term vision and outcomes (Loh, et al., 2013). In sum, we propose the following definition of IWB:

Innovative Work Behaviour is a multi-stage iterative process in which employee behaviour targets the exploration of opportunities, idea generation, idea promotion, idea realization and the sustainable implementation of these ideas, processes, products or procedures within a role, a group or an organization, whereby the ideas are (relatively) new and intended to benefit the relevant unit of adoption.

Fig. 1
figure 1

The innovation cycle (West & Farr, 1989)

Previous Measures of IWB Excluding the Idea Sustainability Dimension

Alongside the discussions that have taken place at a conceptual level, efforts have been made to measure IWB (see Table 1). In 1994, Scott and Bruce developed the first instrument to measure IWB (‘individual innovation’). Although they define innovative behaviour as encompassing three dimensions (Idea Generation, Coalition Building and Idea Realization), their six-item instrument is a one-dimensional operationalization. Starting from a similar conceptualization (Idea Generation, Idea Promotion, Idea Realization), Janssen (2000) is the first researcher who tries to develop a multi-dimensional scale (nine items) using both self and other’s ratings of IWB. Due to high correlations between the subscales he infers that the set of items represents a single scale with high reliability for the total instrument. However, both separate measures are still applied frequently, although there is insufficient information on the validity of the instruments (Thurlings et al., 2015; Yidong & Xinxin, 2013). Another attempt to develop a multi-dimensional scale is reported by Kleysen and Street (2001). They identified the dimensions: Opportunity Exploration, Generativity, Formative Investigation, Championing and Application. Their instrument has been tested in a variety of organizations, but they have not been able to provide evidence for their five-dimensional model. Studies, which have not executed validity tests, have reported a two-dimensional (Krause, 2004; Dorenbosch, et al., 2005) or four-dimensional (De Jong & Den Hartog, 2010) measure, although the latter article provides limited evidence for its construct validity. The most recent measures are provided by Messmann and Mulder (2012), who have developed measures for five dimensions namely Opportunity exploration, Idea Generation, Idea Promotion, Idea Realization and Reflection. However, the authors have not yet succeeded to identify the Idea realization dimension in the validation sample.

Table 1 illustrates the most recent attempts to develop measures for multiple IWB-dimensions. However, only a few studies also attempt to validate the measures. Merely two out of the four studies that hypothesize multiple IWB-dimensions provide empirical evidence that (partly) supports the hypothesized dimensions. The remaining two studies do not go beyond reporting initial consistency measures in terms of Cronbach’s Alpha. Internal consistency is a prerequisite for validity but does not guarantee validity (Boyle, 1991). Moreover, in contrast to the original theory on IWB, the four studies fail to conceptualize and operationalize a sustainability dimension. Therefore, this study aims to develop and validate a multi-dimensional instrument for measuring IWB consisting of five dimensions, namely Opportunity Exploration, Idea Generation, Idea Promotion, Idea Realization and Idea Sustainability.

Method

Procedure and Participants

Data on the Innovative Work Behaviour scale were collected from four different vocational education institutes in the South of the Netherlands. Their board members expressed in meetings with the authors that in their institutes educational innovation is of paramount importance. All four institutes have taken initiatives to put innovation high on their agenda. Each board member has given permission to approach the Deans of the different departments in their institutes. A number of Deans responded and allowed access to the team leaders of the various education teams. These team leaders have been approached and informed by the researcher on the subject of the study. Teacher teams were informed by team leaders and were asked to participate in the study on a voluntary basis by filling out an online questionnaire. In total, 440 teachers completed the survey. More than half of the sample was female (58.8%). On average, participants had 15.5 years of work experience in the vocational sector and the average age was 48.34 years. Respondents worked in various domains (i.e. economics, ICT, healthcare, hospitality, education etc.)

Scale Construction

Based on the literature review, we hypothesized that Innovative Work Behaviour consisted of five dimensions, namely Opportunity Exploration, Idea Generation, Idea Promotion, Idea Realization and Idea Sustainability. In turn, the scale for measuring IWB had five dimensions, namely Opportunity Exploration (OE), Idea Generation (IG), Idea Promotion (IP), Idea Realization (IR) and Idea Sustainability (IS).

The scale for measuring IWB consisted of 32 traditional IWB items (measuring OE, IG, IP and IR), based on two questionnaires developed by Messmann and Mulder (2012, 2014), completed with 20 literature-based, newly developed items for OE, IG, IP and IR and the sustainability dimension. All items were subjected to reflective discussions between the researchers, keeping in mind the original theoretical framework on IWB, and the broader conceptualization of educational innovations as discussed above. The pool of 52 items was supposed to represent the right set of items to cover the five dimensions. If needed, items were slightly adapted and reformulated to better attune the target group of vocational teachers. This was the case with two OE items (e.g. exchanging thoughts on recent developments or problems at work with one’s colleagues), three IG items (e.g. discussing personal suggestions for improvements with one’s colleagues), one IP item (e.g. providing insight on the step-by-step transformation of the new idea into practice) and two IR items (e.g. Drawing up possible operational strategies for future and comparable situations). Finally, two new items were added relating to IR, to better discriminate between the dimensions idea promotion and idea realization (e.g. Supporting colleagues with the application of a developed idea). For the development and the customization of the items, we followed the steps for scale construction as proposed by Spector (1992). The content validity was assessed by verifying whether the literature-based operationalizations genuinely represented the intended content domain for each construct (Boyle, 1991; Curda, 1997; Linacre, 2006; Trochim, 2002).

Following Spector’s (1992) steps for scale construction once again, twenty new items were developed as operationalizations of the sustainability dimension. The following search terms were used in the preceding literature review: sustainability, durability, transfer, adoption, diffusion, continuation of innovations, dissemination of innovations, institutionalization, maintaining innovations and scalability of innovations, both in English and American orthography. Second, results (definitions of sustainability, and its features) from the literature search, were presented to experts (N = 4) in the field of educational development (a professor corporate learning in business and economics; a professor in educational sciences, and two senior researchers in educational psychology and learning and development). Collectively, the most relevant definition and features of sustainability were determined, namely improving and optimizing the innovation, disseminating the innovation on a larger scale, embedding the innovation in-depth in the system of the organization, and visualizing the benefits of the innovation for stakeholders. Third, 20 items were developed in accordance with these features. The constructed items were presented to the same expert group to judge their relevance and clarity. Fourth and final, a pilot study was employed, in which the questionnaire with all five dimensions was presented to a group of teachers, who were active members of an educational research community for continious professional development in an institute for vocational education in the Netherlands (N = 10). The participants in the pilot study matched the target group of vocational teachers. Feedback on the items was processed, and in a second session, the questionnaire was scrutinized for clarity, layout and the degree to which the items subjectively appeared to measure the constructs (face validity). The pilot study resulted in minor adaptations in wording of the items and layout of the questionnaire. Furthermore the pilot study revealed that the respondents only used a restricted number of answer options. The person response patterns deviated from the intended six-point scales (1 = does not apply, 6 = fully applies). The rating scales were not adapted, due to the small size of the pilotgroup, but it led to the choice of the Rasch rating scale model (Rasch, 1960) to analyze the scale structures of the five intended dimensions. All scales are introduced by the sentence: “to what extent do the following work activities apply to you”. The Appendix lists all IWB items (52 items).

The questionnaire also included items that measured personal background variables, such as gender, age, level of education, job tenure, working hours and job position. All these characteristics have shown to be significantly related to IWB (e.g. Baer et al., 2003; Janssen, 2005; Messmann & Mulder, 2014; Pieterse et al., 2010).

Data Analysis

Preliminary Analysis

The analysis was conducted in two stages. The preliminary analysis involved scrutinizing the IWB scales with the Rasch rating scale model (Rasch, 1960) to create invariant interval measures and to evaluate the construct validity of the IWB dimensions. Also, to shed more light on the restricted person responses patterns on the items of the scale, as observed in the pilot group. The IWB survey (including its new scale Sustainability) employs six-point Likert scales. Likert scales consists of ordered (ordinal) raw scores. A 6-point Likert scale consists of six ordered categories, where the second category in a scale item represents more of the attribute than the first category, the third more than the second, and so on. However it is unknown whether the psychological distances between each of the categories are equal to each other (e.g. is the distance from strongly agree to agree equal to the distance between agree and disagree). On top of that, it is unknown whether respondents can genuinely distinguish six substantive differences in a scale item. The fact that the response format is providing six response options for each item, does not guarantee that all six categories are genuinely used by participants. A Rasch model can diagnose how many categories are distinguished by persons in items and across sets of items. Subsequently, the Rasch model can transform ordered raw scores into equal measurement units (interval level) if the data fit the Rasch model. Like a ruler or a thermometer, a Rasch measure is composed of equal, interval measurement units, which guarantees invariant measurements across samples. Also, Rasch models allow researchers to examine person and item measures simultaneously on the same interval scale.

In our study scales and items were developed based on previous research and content related literature. Rasch analyses were used as confirmatory tests of the extent to which scales have been successfully developed according to prior measurement criteria (Ludlow et al., 2008).

For the present study, the Rasch Rating Scale Model was used (Rasch, 1960). The Rasch analyses were executed in WINSTEPS 4.0, which uses joint maximum likelihood estimation (JMLE; Linacre, 2017).

The transformation of ordered qualitative observations into additive measures is a Rasch model. Rasch models are logit-linear models. The polytomous Rasch rating scale model that is applied in the current study uses the following additive transformation (Linacre, 2017, p. 34):

$$ \log \left({\mathrm{P}}_{\mathrm{n}\mathrm{i}\mathrm{j}}/{\mathrm{P}}_{\mathrm{n}\mathrm{i}}\left(\mathrm{j}-1\right)\ \right)={\mathrm{B}}_{\mathrm{n}}-{\mathrm{D}}_{\mathrm{i}}-{\mathrm{F}}_{\mathrm{j}} $$

Where Pnij is the probability that person n encountering item i is observed in category j; Bn is the ability of person n; Di is the difficulty measure of item i, the point where the highest and lowest categories of the item are equally probable. Fj is the calibration measure of category j relative to category j-1, the point where categories j-1 and j are equally probable relative to the measure of the item. No constraints are placed on the possible values of Fj.

To scrutinize the IWB scales, six requirements were investigated (e.g. Wolfe and Smith, 2007): Rating scale effectiveness, dimensionality, reliability, item measure quality, person measure quality and item hierarchy.

Confirmatory Factor Analysis

The second phase of the analyses consisted of confirmatory factor analysis (CFA), with the results of the Rasch analysis, in AMOS, version 25 (Arbuckle, 2006) to analyse the structural relationship between the items and their latent variables and the intercorrelations between the variables. To assess the model fit, a variety of fit indices were used, to be able to illuminate different aspects of goodness of fit (Lomax and Schumacker, 2004). As recommended by Schermelleh-Engel et al. (2003), we used the Chi-square test (X2), a likelihood ratio test statistic. The chi-square statistic postulates that the specified factor loadings, factor variances, factor covariances and error variances are valid. The chi-square/df ratio (X2/df), provides information on model parsimony. A rule of thumb is that X2/df values between 1 and 3 are indicative for parsimonious models (Tabachnick et al., 2007). The Tucker-Lewis Index (TLI) informs us of the adequacy of a nested model. A TLI > .90 represents a good fit (Byrne, 2016). The Comparative Fit Index (CFI; Bentler, 1990) also compares the specified model with the baseline model. A CFI > .95 represents an excellent fit (Byrne, 2016; Hu & Bentler, 1999). Finally the Root Mean Square Error of Approximation (RMSEA; Hair et al., 2010) evaluates the model fit, taking into account the complexity of the model. RMSEA values <.05 indicate good fit (Hair et al., 2010).

Results

First the process of the scrutinizing the IWB-scales in the Rasch model, based on the six criteria, is described, followed by the confirmatory factor analysis in AMOS-25.

Rasch Analyses

Rating scale effectiveness indicates how well each scale is functioning and, how well each of the IWB scales fits the Rasch model. As described previously, a Rasch model allows us to diagnose how many categories are distinguished by persons, in sets of items. Subsequently, if a (recategorized) scale fits the Rasch model, the ordered raw scores can be transformed in a scale that consists of equal measurement units (interval level). These Rasch interval measures are invariant across samples, and allow researchers to examine person measures and item measures simultaneously on the same interval scale. The rating scale effectiveness was evaluated by analysing each scale’s average item INFIT and OUTFIT mean square fit statistics. In survey development researchers mostly focus on item construction. Boone et al., 2014recommended that for items, the item outfit mean square values (MNSQ’s) should be inspected. Outfit MNSQ’s are sensitive to outliers in the data; infit MNSQ’s are sensitive to unexpected behaviour close to the item’s difficulty (or person’s endorse ability) level. Outfit MNSQ’s, as well as infit MNSQ’s of attitudinal data should be ≥ .50 and ≤ 1.5.

Mean square values ≥1.5 indicate noise in the data; ≤ .50 suggest dependency. Table 2 shows that the average item outfit mean squares are within the range of .92–1.21 logits, which demonstrates that the scale measures meet the requirements of the Rasch model.

Table 2 Rating scale effectiveness

As described previously, the Rasch model allows us to scrutinize item measures and person measures simultaneously, and on the same interval scale. The person measure quality is assessed by evaluating each scale’s average person measure (the person Mean) and by evaluating each scale’s person infit and outfit mean square value (Table 2). A scale’s Person Mean reflects the degree to which a scale is tuned to the target group. The person Means, which are recalibrated into their (recalibrated) Likert scores vary from 3.80 (SD = 0.75) to 5.07 (SD = 0.60). High scale person Means are indicative for scales that are too easy to endorse by the target group (e.g. the creativity constructs OE, IG and IP). In addition, also the person standard deviation should be taken into account, to diagnose how persons are dispersed along the latent variable of each scale. Table 2 demonstrates that the easiest scale to endorse IP, also shows the smallest spread along the underlying variable. Similar to the item fit requirements, the average person infit and outfit MNSQ’s should be ≥ .50 and ≤ 1.5 (Linacre and Wright, 1994).

Dimensionality is assessed by principal components analysis of the standardized residuals of each (restructured) scale’s residuals. This analysis identifies how much variance is explained by the person and the item measures. The identification of Eigen Values >2 indicate that two or more items might (also) measure something different than the latent variable. This can then be verified by a statistical comparison of the person measures of both possible dimensions. Disattenuated correlations (correlations, corrected for measurement errors) between the possible dimensions ≥ r = .70 that co-occur with ≤5% (max <10%) of the person measures outside the 95% confidence boundaries are strong indicators that a construct is one-dimensional (Linacre, 1998). If more than the expected 5% (max 10%) of the person responses falls outside the confidence boundaries the likelihood increases that more than one dimension is measured. Two scales showed to be two-dimensional. Idea Realization differentiated into two subscales and also Idea Sustainability was divided into two different subscales (see Table 2).

Person Reliability and Person strata indicate the extent to which scores are reproducible and/or the extent to which a scale can distinguish endorse ability differences between persons (see Table 3).

Table 3 Reliability and Strata measures (N = 440)

Person reliability is expressed in Cronbach’s alpha and refers to the internal consistency of a scale. Cronbach’s alphas vary from .84 to .94 indicating good to excellent internal consistency for each scale. Person strata represent how many different endorse ability levels between persons can be distinguished which cannot be attributed to measurement errors. Person strata should at least be ≥2.0. All person strata meet this criterion except for Idea sustainability External Dissemination (1.89).

Item Reliability and Item strata indicate the extent to which scores are reproducable and/or the extent to which a scale can distinghuish difficulty differences across items. Rasch item reliability is a Rasch characteristic specification of the item-difficulty-order reproducibility. Rasch item reliabilities range from .76 to .99, indicating good to almost perfect item reliability. Item strata represent how many distinct difficulty levels between items can be distinguished, which cannot be attributed to measurement errors. Item strata should at least be ≥2.0. Item strata vary from 2.68 to 15.37. All item strata meet this criterion (Table 3).

Item measure quality is evaluated by examining the item hierarchy, the extent to which the items vary in difficulty, the size of the standard errors, and the degree to which the items fit the expectation of the Rasch model (see Table 4). Item hierarchy implies that items should be ordered from bottom to top, from easiest to endorse toe most difficult to endorse. Items rank-order themselves in a manner that is consistent with the theory. The item hierarchical maps of all subdimensions are investigated and were hierarchically ordered (see Table 4).

Table 4 Item measure quality, as demonstrated by item measure hierarchical order, item measurement errors, and the outfit mean square statistics

Item difficulty calibrations encompassed ranges from −0.30 to 0.40 logits (OE) to −1.69 to 1.08 logits (IRLBC), indicating a weak to a good distribution of items along their latent variable (Table 4). Standard errors are small, ranging from 0.08 to 0.12 logits. OUTFIT mean square values, which are sensitive to outliers, are the most important quality indicators for items (Boone et al., 2014). Linacre and Wright (1994) suggest that outfit MNSQ’s as well as infit MNSQ’s for attitudinal data should preferably be ≥0.5 and ≤ 1.5 ideal. Only item ISV4 and ISV3 (dimension Idea Sustainability external dissemination), show some noise in their OUTFIT mean square statistics (1.89 and 1.67 respectively).

Eight items out of the pool of 52 items were deleted because they did not meet the requirements of the Rasch Rating Scale model, neither in the original scale structure format, nor the restructured one (see appendix).

Person Background Variables and the IWB-Dimensions

As described in the scale construction section, our questionnaire also encompassed personal background variables. Previous research has reported significant relationships between IWB and personal background variables. The Rasch model has not only provided valuable information on how IWB items should be restructured in order to produce invariant person and item measures for all hypothesized IWB scales (see Table 2), but also allows us to utilize these measures to get an impression of the relationships between person background characteristics and the distinct IWB-dimensions. Exclusively the significant relationships are now described below.

Significant relations between person background characteristics and IWB are differentiated across the distinct dimensions. Age correlates significantly and negatively with Idea Sustainability external dissemination, r = −.134, p < .01. Significant positive correlations between being a woman and the creative IWB-dimensions are found: OE, r = .135, p < .01, IG, r = .128, p < .01, and IP, r = .100, p < .05, supplemented by a significant positive correlation between being a woman and IR Learning Based Communication, r = .128, p < .01. A higher previous education level is significantly and positively related to all Idea Realization and Idea sustainability dimensions. IRCBI, r = .126, p < .01, IRLBC, r = .129, p < .01, ISED, r = .217, p < .01 and, ISIE, r = .180, p < .01. Work hours ratio is positively and significantly related to IP, r = .105, p < .05, IRCBI, r = .133, p < .01, and both Idea sustainability dimensions (respectively r = .138, p < .01, for ISED and r = .176, p < .01, for ISIE). To complete with, being employed in Vocational Education (college level), as compared to being employed in a University of Applied Science is significantly, and negatively correlated with IP, r = −.150, p < .01, IRCBI, r = −.178, p < .01, IRLBC, r = −.229, p < .01, ISED, r = −.286, p < .01 and ISIE, r = −.268, p < .01.

CFA

Most researchers are more familiar with confirmatory factor analysis (CFA) than the Rasch model. Hence, to demonstrate the construct validity, the extent to which a set of measured items actually reflects the theoretical latent construct, we test the relationships between items and their latent variables (i.e., the path estimates linking the underlying variable to indicator variables) in CFA. In CFA higher loadings confirm that the indicators (items) are strongly related to the latent variable. Significant loadings confirm construct validity. A rule of thumb suggests that the loadings should be at least .5 and ideally .7 or higher. Lower loadings suggest that an item is a candidate for deletion from the model (Hair et al., 2010). Table 5 shows that all loadings are between .59–.90 which indicate good construct validity. Item means scores and standard deviations are also displayed in Table 5.

Table 5 CFA results Items, means, standard deviations, and factor loadings of the Innovative work behaviour Instrument

Further we tested the seven-factor model, containing the following dimensions: Opportunity Exploration, Idea Generation, Idea Promotion, Idea Realization (both Criterion Based implementation and Learning based communication) and the dimension Idea Sustainability (both Internal Embedding and External Dissemination). This model showed a good fit (X2 = 1658, df = 864, X2/df = 1.92, CFI = .95, TLI = .94 and RMSEA = .046).

General Discussion

The urgent needs for innovation as well as the problems associated with the rapid pace of innovations have put the concept of IWB on the educational research agenda (Thurlings, et al., 2015). However, the lack of cross-validated measurement instruments, which conceptually fit the literature on innovation (cycles), jeopardizes our understanding about why innovations in educational institutions fail or succeed. Our review of the IWB measurement research has shown that over the last three decades several IWB measures have been developed, mainly based on the work of Scott and Bruce (1994). However, the studies, conducted in a variety of settings show mixed psychometric results as well as differences in the number of dimensions, due to a lack of verifying the content validity, and testing the construct validity (Table 1) (Boyle, 1991; Messick, 1995).

Therefore, the purpose of this study was to develop and validate an instrument to measure innovative work behaviour of teachers in vocational education, covering all aspects of IWB, including sustainability, a dimension that was neglected in previous studies. We have hypothesized that IWB consists of five dimensions, namely OE, IG, IP, IR and IS. Preliminary Rasch analyses and CFA, have confirmed our hypothesis that IWB consists of these five main dimensions Additional Rasch dimensionality analyses have differentiated both the realization and sustainability dimension into two sub-dimensions (e.g. Idea Realization Criterion-Based implementation and Learning Based communication and Idea Sustainability External Dissemination and Internal Embedding). The differentiation of the IR dimension into an IR- Criterion Based implementation and IR-Learning Based Communication dimension is in line with the study of Messmann (2012), who distinguished physical and cognitive activities, considering reflection as a crucial phase in the innovative process. In our research both physical and cognitive activities are represented in the two sub-dimensions. Also, scale structures have been adjusted to improve the scale effectiveness and to guarantee that the measures are invariant across samples. In the Rasch model we have meticulously scrutinized the six requirements, as proposed by Wolfe and Smith (2007): Rating scale effectiveness, dimensionality, reliability, item measure quality, person measure quality and item hierarchy.

The model fit of the available ordinal scores on the Likert scales had to be examined in the Rasch model. Items that did not meet the requirements of the Rasch model were removed, and, if necessary, scale structures were adjusted to improve the scale’s measurement quality and to assure invariance of measurement. The scale structures developed and tested in the Rasch model, were tested with CFA. This seven factor model (containing OE, IG, IP, IR with two subdimensions and IS with two subdimensions) showed a good fit.

With respect to the newly developed sustainability dimension, the identified, literature-based main features reflecting sustainability are represented in both sub-dimensions. The items reflecting Improving and optimizing the innovation, such as updating and continuous regeneration (Coffey & Horner, 2012; Fullan, 2002, Loh et al., 2013) and embedding the innovation in depth in the system of the organization (Loh, et al., 2013; Fullan, 2007) are clustered in the sub-dimension Internal embedding. In practice we observe that the internal embedding (the implementation) is more successful when there are meetings with teachers before and after the implementation. Such meetings make it possible to discuss the training needs of the teachers involved, and to involve them in any adjustments they deem necessary for the implementation to succeed. This approach reduces the likelihood that teachers, fall back on their former behaviour (Wolbers et al., 2017) and it emphasize the importance to invest in time after the realization phase.

Items reflecting Disseminating the innovation on a larger scale, like planning for project growth (Loh, et al., 2013) and visualization of the benefits of the innovation for stakeholders (Loh, et al., 2013) are clustered in the sub-dimension External Dissemination. By adding the sustainability dimension to the current conceptualization of IWB, we are strongly in line with the innovation cycles of West and Farr (1989) and Fullan (2007), who emphasized the importance of a stabilization or continuation phase as a vital stage to complete an innovation process. The emphasis on this sustainability phase can help schools to firmly anchor innovative ideas in their organizations, preventing that time and energy are wasted through unfinished innovations.

The newly developed and validated IWB scale may help researchers to empirically examine this phenomenon more accurately, making it possible to gain more knowledge about its antecedents and consequences.

Limitations and Future Research

This study has several limitations that should be addressed in future research and questionnaire validity testing. Firstly, even though we used data from a heterogeneous sample including teachers from different professional institutes, the results confirm a five-structure model of IWB in the vocational educational sector only. Hence, future studies might benefit from the validation of our instrument by utilizing the measures and operationalizations in samples, consisting of professional and general college level and/or university level education. Secondly, our study only included Dutch participants. The obtained Rasch measures are invariant, but we have not yet explored internationally composed samples. The English version, which is provided in this paper (see appendix, and Table 5), will also allow other researchers to use these validated measures in other countries. Third, we have only reported self-ratings of the IWB measures. Future research could also investigate the perception of the direct manager, for example. However that would require a revision of the questionnaire and thus another validation. Finally, the newly developed, theory-based instrument, with its new sustainability dimension, needs to be further developed. Although it proved to be a reliable and valid instrument, items can be added to increasingly better represent the experiences with the innovation cycle in practice. Such additional items will enhance the robustness of the operationalization of the theoretical concepts of IWB.

By providing a reliable and valid scale at the interval level to measure innovative work behaviour, we hope to stimulate and encourage other researchers in the field to engage in research aimed at understanding why innovations fail or succeed. Collecting qualitative data (e.g. focus group or interviews with teachers) could help to deepen our understanding of the found quantitative data. Focus on the sustainability phase may help to prevent superficial or incomplete implementations. Although a great deal of literature has addressed the importance of the role of individuals in making innovations happen, many questions remain unanswered on how to support individuals in taking up this role and how to manage sustainable innovations. The availability of the IWB scales, supplemented with a sustainability dimension, is the first step towards further development of a model that identifies the antecedents of IWB.