Development of a short scale to measure sustainable product involvement

Received: April 23, 2021 Accepted: June 16, 2021 Wilson Rojas, Advertising Professor/Researcher2 DOI:10.22458/rna.v12i1.3503 Ana L. Zamora, Director of the School of Advertising2 Clifford E. Young, Marketing Professor Emeritus1 1University of Colorado – Denver 1475 Lawrence St. Room 4101 Denver, CO 80202, US 303-315-8297, francisco.conejo@ucdenver.edu, https://orcid.org/0000-0001-8489-7107; Cliff.Young@ucdenver.edu, https://orcid.org/0000-0002-8893-3694 2Latin American University of Science and Technology, San José, Costa Rica, wrojas@ulacit.ac.cr, https://orcid.org/0000-0002-3986-7061; azamora@ulacit.ac.cr, https://orcid.org/0000-0002-3920-9659

This study develops a short, general scale to measure sustainable product involvement. This is done in a Costa Rican context, via a relatively large sample, demographically similar to the national population. The study also evaluates the viability of the C-OAR-SE scaling technique for this purpose. A five-item instrument is developed, its reliability and validity psychometrically confirmed. The scale addresses the levels and types of involvement that consumers might have. It suits not only academic researchers, but also practitioners in different areas. We conclude that C-OAR-SE is a viable technique. It complements traditional psychometric methods well, to be considered by researchers in the different fields of business.

INTRODUCTION
As consumers worldwide become more ecologically minded, it becomes vital to understand their behavior with respect to sustainable products (Dahlstrom & Crosno, 2018). A key aspect of this behavior is product involvement. The latter generally refers to the interest that consumers place upon certain goods, services, or categories thereof (Solomon, 2020). This interest then affects their behavior. It impacts how consumers search for, perceive, and process information; and which products they consider, prefer, and purchase (Michaelidou & Dibb, 2008).
However, and despite its importance, research into sustainable product involvement remains sparse. Notably absent are instruments specifically developed to measure involvement towards this category.
One might think that the scales measuring personal sustainability would serve this purpose. An example might be the widely-used New Ecological Paradigm scale, see e.g. Dunlap (2008). Yet sustainability scales present a significant limitation. They tend to focus on individuals' sustainability knowledge and attitudes. While they do address the antecedents of sustainable consumption, sustainability scales ignore the resultant behaviors. The latter are arguably more accurate involvement indicators given the attitude-behavior gap so oft encountered within sustainability research: Whereas a great majority of consumers claims grave concern for the environment, few actually follow through with concrete action, e.g. (Carrington, Neville, & Whitwell, 2010;Peattie, 2010). Personal sustainability scales are thus inadequate to gauge sustainable product involvement.
One might also think that extant involvement scales apply to the sustainable product category. An example might be Zaichkowsky's (1985) seminal Product Involvement Inventory. However, these scales tend to be rather general. They do not consider the peculiarities of specific categories. This issue was experienced by e.g. Alvarez-Milán (2018) while trying to measure involvement towards social causes. Moreover, these scales tend to conceive involvement in terms of personal importance. This focus makes them attitudinal measures, susceptible to the aforementioned attitude-behavior gap. Extant involvement instruments are thus also suboptimal to gauge sustainable products.
Without denying its cognitive-affective origins, this study uses broad behaviors to operationalize involvement towards sustainable products. A valid and reliable involvement scale is developed, its robustness psychometrically confirmed. The scale is purposely general. Before specific involvement aspects are evaluated, the overall inclination towards sustainable products must be ascertained. The scale is also intentionally short. Its scant five items make it quick and easy to administer. This allows incorporating the scale into studies where other involvement measures might not be viable due to their length.
Two characteristics distinguish the scale offered. First, that its development follows Rossiter's (2002Rossiter's ( , 2011Rossiter's ( , 2016 C-OAR-SE technique. The latter is rationalist instead of empirical. It seeks to maximize content validity, not reliability. Doing so ensures that items represent the construct adequately and efficiently. Initially developed for Marketing research, C-OAR-SE has started to be applied across the social sciences. Within sustainability research, it has been used by e.g. De Carvalho and colleagues (2015,2016). However, the approach fundamentally breaks with how measures are conventionally developed, via psychometric techniques, e.g. Churchill (1979). C-OAR-SE remains controversial, which is why it needs empirical testing -as here done.
Second, that the scale was developed in an unusual setting, Costa Rica, via a relatively large sample, demographically similar to the national population. Sustainability research stems mainly from the US and Europe. While useful, its insights do not necessarily reflect other locales. Moreover, as developing countries grow in global economic importance, it becomes necessary to understand how their consumers are evolving (Bangsa & Schlegelmilch, 2020). Costa Rica is ideal for this. Having one of Latin America's highest human development levels (UNDP, 2020), the country indicates where the region's consumption might be headed towards. Costa Rica is also widely recognized for its conservation efforts (HAC, 2021;UNEP, 2019), making this research setting topically relevant.
The authors hope that the scale developed adds to consumer behavior research. This, not only for academic purposes. Understanding the penchant for sustainable products will also help the public and private sectors improve their respective efforts. Beyond the scale's practical value, the authors also hope to make a methodological contribution. It is important that students, professionals, and researchers gain awareness, and ideally discuss the different scale development options available, beyond conventional psychometric methods. Doing so is essential so that research in the different business administration fields advances towards new directions.

CONCEPTUAL BACKGROUND
Product involvement derives from the psychological notion of ego-involvement. The latter emerged in the early 20 th -Century and referred to the interest that an individual might develop towards a given object. Said object may come to play an important role in the person's life. In extreme cases, it becomes an essential part of the individual's identity, see e.g. Allport (1943) or Sherif and Cantril (1947). Ego-involvement was initially a conceptual notion. It then started to be used experimentally. Its application came to relate involvement to various other constructs, contributing to psychology's theoretical corpus (Iverson & Reuder, 1956).
Post-war marketing was characterized by a socio-psychological interest (Conejo & Wooliscroft, 2015). This drew the ego-involvement notion into marketing, albeit directed towards products. Pioneering its application was Krugman (1965), who suggested tailoring product messages to audience involvement levels. Doing so enhanced message effectiveness. Involvement research continued, gained momentum in the mid-1980s, and grew thereafter. This generated a diverse body of literature, which related involvement various marketing aspects. Today, the involvement notion is a staple in consumer behavior textbooks, e.g., Schiffman and Wisenblit (2019), even mentioned in basic marketing texts, e.g., Kotler and Armstrong (2016).
In marketing, product involvement generally refers to the interest that an individual has in a given product or category (Solomon, 2020). This prominence derives from factors that are personal, like interests or values; or situational, such as occasions or needs. The interest might also be the product of external stimuli like peer pressure, reference/membership groups, or marketing communications. Regardless of origin, a higher involvement results in complex purchase behaviors. These comprise more information gathering and processing, as well as more complex and extensive purchase processes. Involvement thereby becomes an important segmentation variable for marketers (Michaelidou & Dibb, 2008).
However, involvement is not a dichotomous variable. It instead spans an intensity continuum, consumers to different degrees involved with a product. At one end of the spectrum, individuals show little interest in the product. Their behavior toward it is habitual and lacking effort. At the other end of the spectrum, consumers are passionate about the product. This penchant activates an intense motivational state, which then drives behavior concerning the product (Solomon, 2020). In the particular case of sustainable products, Atkinson and Rosenthal (2014) found involvement to moderate product perceptions, trust, and purchase intentions.
High involvement products tend to be more expensive and durable. Also, products that allow consumers to express their identity or which are prone to social evaluation. The heightened involvement derives from the personal or social risk tied to making the wrong choice (Solomon, 2020). However, Antil (1984) clarifies that products are not involving per se. Their involvement instead derives from the personal or social meanings that individuals give products. This semiotic attribution is consistent with more current consumption paradigms: Products are consumed not only for what they are or do, but increasingly, for what they mean (Conejo & Wooliscroft, 2015).
Noteworthy is the distinction between temporal and enduring involvement. Temporal involvement refers to a heightened, albeit short-lived, interest. It occurs when individuals focus more on the surrounding consumption context than on the product itself. An example would be when someone looks for a product to gift or address an emergency. Though once the situation is resolved, involvement levels diminish. Conversely, enduring involvement refers to a heightened interest over an extended time. It occurs when individuals focus more on the product than on the surrounding consumption context. This type of involvement is more stable, irrespective of situation (Richins and Bloch,1986). The present study adopts an enduring involvement notion towards sustainable products.

Product Involvement Measures
Involvement has been assessed in various ways over the years. Early approaches include ranking products based on importance, e.g., Sheth and Venkatesan (1968), or rating their relative importance, e.g., Hupfer and Gardner (1971). Involvement then became measured via rated statements. However, many of these statements were either single items or multiple ad-hoc ones. Their reliability and validity were either unreported or low in the best of cases.
Given this situation, Zaichkowsky (1985) developed the seminal Product Involvement Inventory (PII). This unidimensional measure comprises 20 bipolar items, all psychometrically robust. Until then, involvement research had progressed somewhat haphazardly. But the PII formalized this field, applied in numerous studies since its publication.
Development of a short scale to measure sustainable product involvement Other involvement measures have since been developed. These are often multidimensional. However, Mittal's (1995) or Michaelidou and Dibb's (2008) reviews highlight these instruments' lack of consensus. Their number of dimensions differs. The nature of dimensions also varies. Given this disagreement, and instruments' marginal impact, the PII may still be considered the measurement standard within the field. Its use continues to this day. According to Google Scholar (2021), the article presenting this instrument has accrued about 8,900 citations, almost 200 of these in 2021 alone. In the particular case of sustainable products, the scale was used by e.g. Rahman (2018).
However, and despite its widespread use, the PII presents limitations. On the one hand, it conceives involvement in terms of personal importance. Being an attitudinal measure, similar to the sustainability scales mentioned earlier, it becomes susceptible to the attitude-behavior gap. This makes the PII suboptimal to assess sustainable products.
On the other hand, the PII is highly redundant. It evaluates the importance of a product via a series of nearsynonyms like important, of interest, relevant, fundamental, matters to me, or means a lot to me, among others. This redundancy results in an unnecessarily long instrument. Zaichkowsky (1994) streamlined the PII from 20 to 10 items. However, and despite this reduction, the instrument remained redundant. Subsequent studies thus use even more condensed PII versions. E.g., Russell-Bennett, McColl-Kennedy, and Coote (2007), Kim, Jeon, and Hyun (2012), and Rahman (2018) use just four of its items to measure involvement.
These limitations indicate the need for new involvement scales: Ones explicitly developed for the sustainable product category; ones that are not attitudinal but behaviorally-based; and ones short enough to be conveniently applied.
Hence the present scale development effort.

METHODOLOGY
Traditional scale development is empirical-statistical. It begins by generating an initial item set. After collecting data from large samples that reflect the population of interest, items are statistically reduced to those most reliable. This is done via techniques like factor analysis. The remaining items are finally verified/optimized through different validation procedures (De Vellis, 2012).
The present study breaks with psychometric tradition. It uses Rossiter's (2002Rossiter's ( , 2011Rossiter's ( , 2016 C-OAR-SE scaling procedure. The latter is rationalist instead of empirical. But what makes this technique particularly controversial is that initial items are reduced before collecting data. The reduction is furthermore done via experts, not consumer samples. Finally, the reduction strives to maximize content validity, not item reliability (Rossiter, 2002).
C-OAR-SE follows six steps: 1) Construct Definition, 2) Object Classification, 3) Attribute Classification, 4) Rater Identification, 5) Scale Formation, and 6) Enumeration and Reporting. Following, each of these steps, applied towards developing a short and general sustainable product involvement scale.

1) Construct Definition
C-OAR-SE requires that the construct's focal object, attribute, and raters be precisely defined. Conceiving constructs in general terms, frequent within psychometric scaling, not only leads to divergent interpretations. It also muddles operationalization, reducing scale validity (Rossiter, 2002). A precise definition is particularly important in the case of involvement. The construct has been approached from a variety of perspectives. This has resulted in various definitions, some even conflicting (Michaelidou & Dibb, 2008).
In C-OAR-SE, a group of experts specifies the construct's object, attribute, and raters (Rossiter, 2002). Two focus groups/workshops were thus conducted. Each lasted 90 minutes and comprised six Costa Rican social science academics. The participation of academics, not practitioners or consumers, allowed discussions to address the more theoretical/abstract definitional aspects. Participants were recruited via convenience/snowball sampling: Invited academics were asked about other colleagues that might participate. This was done until having enough participants for both focus groups. The latter followed generally accepted guidelines, e.g. those of Stewart and Shamdasani (2015).
Two weeks before sessions, participants received Rossiter's (2002) C-OAR-SE paper to acquaint themselves with the technique. Sessions were conducted via Zoom, one each weekend, to reduce inconvenience. Sessions began with brief overviews of the present project, the C-OAR-SE technique, and the product involvement construct. Participants then used Rossiter's (2002, p. 321) evaluation form to individually specify the construct's object, attributes, and raters. Participants later discussed and reconciled their specifications until reaching certain informational saturation, reflected by a relative consensus.

2) Object Classification
C-OAR-SE requires to specify and classify the definitional object. Doing so determines the type of items subsequently developed. There are three types of objects: Concrete-singular ones are unique or highly homogenous. Abstractcollective objects are heterogeneous but still group into an overarching category. Abstract-formed objects are also heterogeneous but do not group into an overarching category (Rossiter, 2002).
Focus group participants agreed that the definitional object was sustainable products. They mostly classified this object (92% agreement) as abstract-collective: While sustainable products were diverse, they still grouped into a broad overarching category characterized by environmental preservation. Participants added that this broad classification also prevented the scale from being excessively specific. It thereby captured the different notions that respondents might have of this product category. This observation is consistent with Witkowski (2010), who notes that research sometimes neglects the individual meanings that ordinary people give to their consumption. It is also compatible with Dolan (2002), who indicates that sustainable consumption also comprises a socio-cultural context.

3) Attribute Classification
C-OAR-SE requires to specify and classify the object attribute. Doing so further determines the types of items subsequently developed. Attributes come in three types: Concrete ones refer to a single, evident characteristic.
Formed attributes refer to an abstract and multifaceted characteristic. Together, their different aspects conform the overarching characteristic. Eliciting attributes also refer to an abstract characteristic. But the latter is manifested by the mental and physical consequences it generates (Rossiter, 2002).
Participants agreed that the definitional attribute referred to involvement. However, they did not agree as to whether it was formed or eliciting. Upon discussion, and without denying involvement's multifaceted nature, participants preferred to classify it (75% agreement) as eliciting. They considered involvement as an internal disposition. Specifically, as the prevalence of sustainable products in people's lives, manifested cognitively, emotionally, and physically.

4) Rater Identification
C-OAR-SE requires that those who rate the object attribute be specified. Doing so is necessary as evaluations vary according to whose perspective they capture. There are three types of raters: Individuals are used when the attribute is an internal personal difference. Groups comprise homogenous individuals, say managers. They evaluate external attributes based on their perceptions. Experts comprise more homogenous and qualified individuals. They also assess external attributes, but based on technical criteria (Rossiter, 2002).
Participants indicated (83% agreement) that raters ought to be individuals. They justified this by noting that raters would be disclosing their personal propensity towards sustainable products.
Based on focus group results, the present study defines sustainable product involvement as the prevalence (attribute) of sustainable products (object) within individuals' (raters) everyday life (enduring notion), as manifested by mental or physical behaviors (operationalization).

5) Scale Formation
Scale items should reflect the object and attribute types defined. Focus groups classified the object as abstractcollective. Only one item part therefore needed to represent the sustainable product category. However, focus groups categorized the attribute as eliciting. Multiple items thus needed to capture the construct's different manifestations (Rossiter, 2002).
Focus groups acknowledged that involvement elicited cognitive, emotional, and behavioral responses. However, participants also noted that to develop a short, general scale, involvement's multifaceted nature needed to be ignored. The construct would be better operationalized through behavioral items only. Doing so would make the resulting instrument less abstract, and thereby, easier to apply, understand, and interpret. Such an operationalization is consistent with researchers for decades using behaviors as involvement indicators (Zaichkowsky, 1985). It is also compatible with eliciting attributes best operationalized via physical and mental activities (Rossiter, 2002). Given Development of a short scale to measure sustainable product involvement the above, involvement-related behaviors were compiled from the literature. The search resulted, after eliminating redundancies, in 19 preliminary items. These were reduced to those most relevant as follows.
Items sourced from the literature were often formatted as personal behaviors. However, sustainable products are socially-desirable. Their consumption, especially when public, generates approval (Atkinson & Rosenthal, 2014;Naderi & Strutton, 2015). Asking respondents about their sustainable behaviors would have thus likely compromised the data.
To lessen this possibility, items were reformatted per Schwartz, Melech, Lehmann, Burgess, Harris, and Owens (2001): Instead of asking respondents about their sustainable behaviors (e.g., I currently buy many sustainable products), items asked respondents how similar they were to a hypothetical person engaged in those behaviors (The person currently buys many sustainable products: Nothing similar to me -Totally similar to me.) Formatting items in terms of others' behavior reduced social desirability. Yet it still allowed to infer respondents' sustainable product involvement levels.
C-OAR-SE centers on content validity to select scale items. A panel of expert judges determines the latter (Rossiter, 2002). This study used Lawshe's (1975) well-known content validity measures to assess how well items reflected sustainable product involvement. In them, the degree to which expert ratings coincide reflects items' content validity. If ratings strongly agree, there is little basis to refute the consensus. The use of experts also prevents higher authorities from challenging results. (It is acknowledged that this process remains somewhat subjective. Expert ratings are still the product of human judgment, which remains fallible.) The type of experts used to rate items depends on the construct operationalization. Abstract items require a greater inferential leap as to how well they reflect the construct. Experts with deep/broad knowledge, say academics, become necessary. Concrete items require a lesser inferential leap. Sound judgments may be obtained from professionals familiar with the construct (Lawshe, 1975). Since involvement was operationalized through behaviors, and these are fairly concrete, a panel of 36 Costa Rican marketing professionals was used to evaluate potential items. Like with focus groups, these professionals were recruited via convenience/snowball sampling.
Nevertheless, panelists were still prequalified. They first needed to be experienced, having worked in marketing for at least ten years. They secondly had to be familiar with the involvement construct, having applied it on the job. Unlike the academics that participated in the focus groups, the use of experienced professionals in this phase allowed to capture a more practical involvement notion, geared towards developing a short, general, and easily applicable scale.
Each panelist received an evaluation form. The latter asked how well the 19 proposed behaviors reflected sustainable product involvement. Raters then indicated whether behaviors were a) essential, b) useful, but not essential, or c) unnecessary. The evaluation form also allowed to add any behaviors not mentioned. None were added, suggesting comprehensiveness.
The Content Validity Ratio (CVR) of each item was calculated. This ratio derives from the number of essential attributions vis-à-vis total attributions. (CVR = (n e -N/2)/N/2, where n e is the number of panelists considering the item essential, and N is the total number of panelists.) CVR values range from -1 to +1. Content validity emerges with positive values, when over half the panelists consider an item essential. Validity increases as more panelists deem an item essential.
The level of agreement between raters determined whether items were rejected or retained. Hardesty and Bearden (2004) indicate that most studies deem a 75% agreement as the minimum for item retention. Since this study intended to develop a short scale, the minimum agreement was increased to 90%. Items with CVRs thereunder were excluded from the scale. Table 1, below, shows the results of this validation procedure. *In Costa Rica, as elsewhere, sustainable products are known as ecological/green products.
Five items qualified for the involvement scale. This number might seem insufficient to measure the construct adequately. However, Rossiter (2002) notes how psychometric canons have accustomed researchers to unnecessarily long instruments. They generally consider that more items necessarily increase reliability and validity. Yet this not always occurs. Sometimes, a higher number of items actually decreases scale validity. Several researchers thus recommend using concise instruments: Among others, Peterson's (1994) meta-analysis shows how Alphas do not systematically increase after three scale items. Burisch (1997) shows how just four items suffice to measure constructs effectively. Rossiter (2002) indicates that five items are generally enough to measure eliciting attributes, as here the case.
Moreover, in Lawshe's (1975) framework the number of items retained is irrelevant. The goal is to identify items with the highest content validity to thereby maximize scale validity. For academic purposes, the number of items might be higher. This allows to represent the theoretical domain in more detail. But for practical purposes, fewer items suffice. With this in mind, and the goal of developing a short, general scale, the five items retained were deemed sufficient.
The five items' Content Validity Index (CVI) was calculated. The latter refers to the extent to which items in aggregate, i.e. the scale, represent the construct (Lawshe, 1975 Development of a short scale to measure sustainable product involvement

Data Collection
A survey was conducted to assess the scale further. The five involvement items were mixed with 20 personal value items from Sandy, Gosling, Schwartz, and Koelkebeck (2017). Doing so masked the survey's intent. A third of the items were presented in negative form. Doing so reduced the acquiescence bias and helped detect response patterns. Items were also order inverted, and two questionnaire versions were used. Doing so reduced response biases derived from the item order. Responses were anonymous. However, general demographic questions were asked. The survey was purposely brief, able to be completed in about five minutes.
Involvement and value items were formatted alike. Both had respondents indicate their similarity to the persons portrayed. This further masked the survey's intent. Modified Likert responses provided the data. Based on Miller (1956), six answer options were offered: 1-nothing, 2-a little, 3-somewhat, 4-quite, 5-very, and 6-totally similar. These categories provided detail yet kept cognitive response loads low. Numerical-verbal labels enhanced responses (Windschitl & Wells, 1996). An even number of response options forced committed answers. The lack of a neutral option produced less ambiguous responses, reducing the level of error in the data (Suchman, 1950).
Sustainability research sometimes uses self-selected samples. However, doing so skews results. It reflects more environmentally-inclined consumers (McDonald, Oates, Young, & Hwang, 2006). On the other hand, involvement scales sometimes derive from student samples, most notably that of Zaichkowsky (1985). However, the literature warns against using student samples. These differ demo-psychographically from the general population, and therefore, are usually inadequate to conduct research (James & Sonner, 2001;Peterson, 2001). Because of this, and to give this study more credence, a sample that demographically approximated the Costa Rican adult population was used.
A brief pretest with 20 consumers ensured that items were understood and easily answered. The primary data collection followed a snowball sampling approach, per Cleveland, Laroche, and Papadopoulos (2009): As part of a class project, students from four marketing courses at a private Costa Rican university surveyed adults of predetermined ages, genders, education levels, and social classes. The use of quotas ensured that the national demographic characteristics were approximated.
Surveys were paper-based, given respondents' socioeconomic diversity. Of particular concern were differences in online access. Upon finishing their surveys, respondents referred students to further potential participants, which were then contacted. This was done until students filled their assigned quotas. Students followed an administration protocol. The latter had been explained and practiced beforehand in class. Student data was complemented with data collected by the researchers, which also followed a snowball/quota procedure.

Analysis
To verify the internal consistency of eliciting scales, as here the case, Rossiter (2002) recommends calculating coefficient Beta. The latter is the minimum value of a split-halves analysis, and therefore, a more conservative indicator than Alpha, see Revelle (1979) or John and Roedder (1981). A Beta of at least .700 is needed to infer internal consistency (Rossiter, 2002). To calculate it via SPSS v26, Guttman's (1945) Lambda coefficient was used. Specifically, Lambda-4, which is the lower bound reliability of all splits. Its value for the five items was .839, suggesting robust internal consistency.
To assess items in more conventional terms, Alphas were calculated. Unlike Beta, Alpha is the average of all split halves. It is thus a more optimistic indicator. Alphas of at least .800 are needed for eliciting scales (Rossiter, 2002). Its value for the five items was .895. This confirms internal consistency, especially given the low item number (Nunnally & Bernstein, 1994). Rossiter (2002) suggests using Alphas to delete items with low item-total correlations. The scale's internal consistency might thereby be improved. The five items were assessed. Table 3, below, summarizes their characteristics. Item 5 showed a slightly low item-total correlation of .682. However, it was retained. Its deletion would have otherwise reduced the scale's mean, variance, and Alpha below initial values. Research sometimes considers involvement as multidimensional. However, the present objective was to develop a short, general scale. The construct was thus approached from a broad perspective, assumed to be unidimensional.
Exploratory Factor Analyses (EFA) suffice to test unidimensionality (Gorsuch, 1997). However, psychometric scaling usually adds Confirmatory Factor Analyses (CFA) to verify/fine-tune exploratory results (Ahmetoglu, Leutner, & Chamorro-Premuzic, 2011). EFA/CFA on the five involvement items were therefore conducted. Doing so methodologically validated results through a different scaling technique, one more amenable with traditional psychometric practice.
EFA extraction was done via principal components. The number of components was not pre-established. Rotation was also dispensed with to obtain non-optimized results. Eigenvalues (EVs) > 1 determined the number of components to retain. Only Component 1, worth 3.531 EVs and 70.611% variance, met the criterion. Other components were well thereunder. The scree plot's drop-off (Component 2 = 0.528 EVs) confirmed retaining only Component 1. Table 4, below, shows how the five involvement items loaded onto Component 1. Loads range from .794 to .872 and average .839. These values suggest a well-defined single underlying construct. The compact load range further indicates content homogeneity (Tabachnick & Fidell, 2013). Strong sample adequacy, KMO = .843, indicates robust modeling.
To verify results, CFAs using maximum likelihood estimation were conducted. The ideal solution was again a single factor. Factors should be statistically independent, indicated by item loads of at least .700 (Hair, Black, Babin, & Anderson, 2009). Table 4, below, also shows the standardized regression weights for the five items. Ranging from Development of a short scale to measure sustainable product involvement .704 to .860, and averaging .793, all loads comply with the criterion. Different fit indicators support the above results (see Hair et al. (2009) for specifics): GFI = .945, AGFI = .834, Chi-square = 160.222, DF = 5, and probability level = 0.000. These values confirm a clearly defined unidimensional involvement construct. The Scale Formation step led to items of strong content validity. Though how well items related to an external criterion remained to be ascertained. The literature indicates that product involvement is co-determined by personal values, e.g. Zaichkowsky (1985). The five involvement items were thus related to Universalism. Data on this and other values had been collected with those on involvement, allowing correlations.
Universalism derives from the collective need to survive and thrive. Its ideal is a life where the welfare of humanity is enhanced. This includes preserving nature and its scarce resources (Schwartz et al., 2001), precisely what sustainable products strive to do.
Universalism is part of Schwartz's (1992) value framework. Unlike others used in marketing, say Rokeach's (1973) or Kahle's (1983), Schwartz's framework is cross-cultural. It derives from samples of 20 different countries, some from Latin America. The latter makes it especially suited for present purposes. Schwartz obtained a near-universal structure comprising ten fundamental life-guiding values. What varied between nations/individuals were differences in degree, not kind. Given its versatility, Schwartz's framework has become one of the most widely used (Datler, Jagodzinski, & Schmidt, 2013;Parks-Leduc, Feldman, & Bardi, 2015).
Universalism was operationalized via two items from Schwartz's (2001) Portrait Values Questionnaire (PVQ). While developing their brief PVQ version, Sandy et al. (2017) found the two items to effectively encapsulate the Universalism dimension (Alpha = .765, retest reliability = .810, external validities similar to the original instrument. Present Alpha = .739). Table 5, below, shows how the five involvement items/scale average correlate with the two Universalism items/dimensional average. N = 1,036. Correlations two-tailed and significant at the 0.01 level ** . Universalism items were translated from English into Spanish and back-translated as recommended by e.g. Brislin (1970) or Cha, Kim, and Erlen (2007). Despite slight adjustments to their Spanish versions, item content was deemed equivalent.
A moderate, albeit significant relationship emerged between both instrument averages, r = .481, p < 0.01. As peoples' Universalism increases, so does their sustainable product involvement. These results are consistent with what would be expected from the literature. The above notwithstanding, involvement is a complex phenomenon. It derives from multiple variables, situational moderators, and individual characteristics (Atkinson & Rosenthal, 2014). By no means is it suggested that personal values, much less Universalism alone, drive sustainable product involvement. Nonetheless, the correlation obtained externally validates, at least preliminarily, the proposed involvement items/ scale.
Given the positive and significant external relation, coupled with the robust reliability and dimensionality values obtained, we conclude that the five C-OAR-SE items developed comprise a reasonable measure of sustainable product involvement.

6) Enumeration
The final C-OAR-SE step involves explaining how items produce a total scale score. Doing so is necessary given the different object and attribute types possible (Rossiter, 2002).
Total scale scores may derive from various summed, averaged, or weighted procedures (Diamantopoulos & Winklhofer, 2001). However, enumeration was kept simple to develop a practical instrument. As mentioned in the external validation section, the five item scores were averaged to produce an overall score. Item and total scores would then be based on the same six-point scale, facilitating application, analysis, and interpretation.
However, a six-point scale is admittedly awkward. Rossiter (2002) suggests that in cases like these, scores be transformed to a 0 to 10 scale. The 0 value corresponds to the absolute construct absence and the scale's psychological null-point. The 10 value corresponds to the maximum possible score. This 0-10 range is also intuitive for users as people have grown accustomed to evaluating objects in deciles. Rescaling responses into a common, natural format improves how results are analyzed, interpreted, and compared. An intuitive design is particularly important for non-technical scale users.
Development of a short scale to measure sustainable product involvement Answers were thus rescaled from a 1-6 to a 0-10 format. Doing so is mathematically valid. Data matrices may be transformed via additive or multiplicative constants, provided that items' relative intensity is respected (Guttman, 1950b). The first answer option was anchored at 0 = nothing similar to me. The other five answer options were spaced in two-point increments (2 = a little, 4 = somewhat, 6 = quite, 8 = very, and 10 = totally similar to me.) Using six answer options instead of eleven kept them cognitively manageable, facilitating responses; kept the number of answer options even, forcing committed responses; and increased inter-option spacing, distinguishing options better.
In hindsight, this new answer format should have been implemented during the item development stage, before collecting data. This post-hoc improvement was deemed to not significantly impact results. However, the rescaled data was nevertheless reanalyzed to ascertain this. The response range and scale average naturally varied. These went from 1-6 to 0-10, and from 2.693/6 to 4.465/10, respectively. But the factorial item loadings and external correlations were nearly identical to those initially obtained. Given its multiple advantages, we recommend implementing the 0-10 answer format instead of the original 1-6 one. Table 6, below, shows the Sustainable Product Involvement Scale.

DISCUSSION
This study set out to develop a short, general measure of sustainable product involvement. It also set out to evaluate whether Rossiter's (2002Rossiter's ( , 2011Rossiter's ( , 2016 C-OAR-SE scaling framework could serve this purpose. Both goals were achieved. The Sustainable Product Involvement Scale (SPIS) developed shows robust validity and reliability. It thus suits academic research in which general product involvement measures suffice. Moreover, the scale's mere five items make it quick to administer. Not only does this allow researchers to easily incorporate it into studies where longer involvement measures might not be viable. It also makes the scale convenient for respondents, enhancing participation and completion rates. Finally, the scale's intuitive format makes it easily understood by respondents. This increases data quality, adding to studies' outcomes and later theoretical development.
Notable is the SPIS's behavioral focus. Sustainability scales developed to date emphasize environmental knowledge and attitudes. These instruments thereby shed light on sustainable consumption drivers. However, they tend to ignore resultant behaviors, the latter more accurate involvement indicators. Without being overly specific, the scale developed is behaviorally focused. It thereby resolves the attitude-behavior gap so oft encountered within sustainability research.
However, the SPIS is not only geared towards academics. It also serves practitioners in different areas. As an individual difference, involvement impacts a series of consumer behavior aspects. It influences how information is perceived, sought, and processed; how decisions are made; and which products are considered, preferred, and purchased (Michaelidou & Dibb, 2008). Practitioners may therefore use the SPIS to segment consumers. Doing so would allow them to tailor marketing efforts according to the preferences of specific targets, improving effectiveness.
On the one hand, the SPIS identifies different involvement levels. Consumers could thereby be segmented in terms of low (0-3), medium (4-7), and high (8-10) average scores. All else equal, low-involvement consumers might be educated on the importance of adopting sustainable products. This would be a mid to long-term process. But it would lay the foundation for more sustainable future lifestyles. Mid-involvement consumers could be encouraged to adopt a larger quantity and variety of sustainable products. This would be a short to mid-term process. Highinvolvement consumers, who already use sustainable products, could be commended for the latter. Doing so would reinforce and ideally increase their sustainable consumption.
On the other hand, the SPIS identifies different involvement types. Consumers could thus be segmented based on the salience of certain items. Some consumers might be characterized by e.g. planning to buy sustainable products in the future (preponderance of Item 4.) These consumers would need to be nudged so that they convert intentions into actions. Other consumers might be characterized by recommending the purchase of sustainable products to friends and family (preponderance of Item 2.) These consumers could be leveraged as opinion leaders/influencers to increase the quantity and variety of their acquaintances'' sustainable consumption.
To illustrate these two aspects, Figure 1, below, shows some preliminary insights as to the levels and types of sustainable product involvement in Costa Rica. The left bar chart has respondents on the vertical axis and involvement levels on the horizontal one. Noteworthy is respondents' relatively weak involvement. Nearly 87% of respondents show moderate to low levels, and only 13% high ones. This relatively low interest in sustainable products is somewhat surprising given Costa Rica's reputation for sustainability (UNEP, 2019); the country's brand long based on its environmental richness (Florek & Conejo, 2007).
Development of a short scale to measure sustainable product involvement The right radar chart has five axes emanating from its center. Each corresponds to an involvement item/type. Two aspects stand out. First, and consistent with the bar chart, is that involvement levels across types are relatively low. These range from 2.830 to 6.436, averaging just 4.465/10. Second, that higher values correspond to the more passive items/behaviors (thinking and planning), not the more active ones (purchase and advocacy). This pattern would be consistent with the relatively low overall involvement levels. Also, with the attitude-behavior gap so commonly seen within sustainability research.

Sustainable Product Involvement Levels and Types
Note: The axes of the radar chart originally reached 10 but were cut so as to save space.
Notably, the SPIS would not only benefit the for-profit sector. Its ability to distinguish involvement levels and types would also help the public and non-profit sectors to understand the populations they serve better. The insights derived would aid towards superior policies/efforts, and help steer consumption towards more sustainable patterns. As more entities embrace the environmental imperative, it becomes essential for them to understand the consumer behavior tied to sustainable products. The scale developed can help in this regard.
This study also set out to evaluate the effectiveness of Rossiter's (2002Rossiter's ( , 2011Rossiter's ( , 2016 C-OAR-SE scaling method. Empirical verification becomes necessary as the technique fundamentally breaks with scaling orthodoxy, making it quite controversial. Diamantopoulos (2005), Salzberger, Sarstedt, and Diamantopoulos (2016), and others offer detailed discussions on C-OAR-SE's virtues and limitations. In the interest of space, their arguments will not be repeated here. However, and independently of these arguments, the presently developed scale resulted psychometrically robust. This indicates that despite its major philosophical and methodological differences, C-OAR-SE is indeed able to produce scales that are as valid and reliable as those obtained through conventional psychometric means. We therefore conclude that C-OAR-SE is a viable scale development technique, to be considered by researchers in the different fields of business.

LIMITATIONS AND FUTURE RESEARCH
Despite encouraging results, this study is not without limitations. A first one pertains to operationalization. The latter focused on behaviors for being more effective involvement indicators. However, the literature has long recognized involvement as multidimensional, e.g., Laurent and Kapferer (1985). It comprises not only behaviors, but also cognitive and emotional aspects. Future research should thus develop short scales for each of these areas. Together, these various scales would allow to study consumer involvement more comprehensively. A second limitation refers to the scale's scope. This study's objective was to develop a short instrument to assess consumers' general sustainable product inclination. Doing so required a strict validity criterion. However, this excluded certain behaviors from the scale, reducing its coverage. Examples would be those referring to information search. Future research might thus develop more comprehensive scales with more items. A broader behavioral range would not only extend the construct's coverage, but allow more precise diagnostics.
A third limitation pertains the object. The scale developed refers to sustainable products generally. However, said category is very diverse. It spans from organic foods, through fair trade textiles, to non-carbon emitting vehicles, among others. Moreover, each of these sub-categories comprises different sectors. Because of this diversity, the developed items might need to be adapted to the specific types of sustainable products evaluated. Future research should thus test the scale within and across sustainable product categories.
A fourth limitation refers to respondents. The sample used was reasonably large and demographically similar to the national population. Future research should thus use the SPIS to assess the involvement of Costa Ricans as a whole, and that of its different sub-populations. However, the sample remains limited to a single country, present results determined by local socioeconomic conditions. Future research should therefore test the scale in other national contexts. Studies might start in other Latin American countries to identify regional differences and commonalities. Studies might then extend to progressively different cultural contexts, say Anglo America, Europe, and Asia.
A fifth limitation is methodological. C-OAR-SE scaling results were psychometrically confirmed. To this effect, a study with quota/snowball sampling was conducted. This produced a sample that is larger and more diverse than what is frequently found within academic studies, augmenting the credibility of confirmatory results. However, it is acknowledged that from a purist perspective, the statistical techniques used for the psychometric confirmation (factor analysis, regression, and correlation) presuppose the existence of random/probabilistic samples. This the case, future research should confirm present results using this other type of sampling.
A final limitation is also methodological. The final item set was obtained through the C-OAR-SE technique. However, these would have possibly varied had Classical Test Theory canons been followed. Future efforts might therefore develop pure psychometric scales for comparative purposes. Moreover, and while the scale developed quantifies involvement, it does not explain how nor why sustainable products involve. These aspects are equally important to understand. Future research should thus approach sustainable product involvement qualitatively to understand it better. Finally, sustainable product involvement seems to span an intensity range. With data possibly following cumulative response patterns, involvement would be a candidate for other non-traditional scaling approaches, like those of Rasch or Guttman, see, e.g., Conejo et al. (2017; respectively.

CLOSING THOUGHTS
C-OAR-SE scaling might seem radical, even wrong, from a Classical Test Theory perspective. However, the psychometric approach also has limitations. Among others, the pursuit of high Alphas may lead to scales that are redundant, impractical, and even dubious from a content validity perspective. Hence this alternative scaling approach (Rossiter, 2002). Rossiter (2011) claims that his C-OAR-SE technique produces measures superior to those psychometrically derived. He even calls for traditional scaling methods to be discontinued. The present authors advocate a more moderate position. While C-OAR-SE may indeed produce robust measures, the psychometric approach's tremendous value is also acknowledged. By no means should the latter be abandoned.
That said, blind adherence to the psychometric dogma also constrains research. As fields develop, they grow increasingly diverse and complex (Most, Conejo, & Cunningham, 2018). Yet excessive reliance on a limited set of techniques restricts inquiry to questions amenable to these techniques. Findings thereby become less valid, reliable, and generalizable (Davis, Golicic, Boerstler, Choi, & Oh, 2013). With scaling limited to a single major approach, the richness of business disciplines becomes harder to capture, theoretical development thereby thwarted.
We thus call for more methodological diversity within scale development. As this paper evidenced, C-OAR-SE is a valuable complement to traditional psychometric techniques, which together might produce more robust measures. C-OAR-SE is a useful addition to the methodological repertoire, to be considered by researchers in the different business fields.