您的当前位置:首页Drawing Generalized Causal Inferences Based on Meta-Analysis

Drawing Generalized Causal Inferences Based on Meta-Analysis

2022-01-16 来源:飒榕旅游知识分享网
Drawing Generalized Causal Inferences Based onMeta-AnalysisGeorg E. MattINTRODUCTION

Research syntheses are more and more used to inform decisionmakersabout the effects of a particular policy or of different policy options. Forinstance, do substance abuse prevention programs in junior high schoolsreduce drug use in high school? Are random drug tests more effectivethan drug education programs in reducing drug use? Are social influenceprevention programs more effective with boys than with girls?

In the language of Cook and Campbell (1979) and others, these questionsinvolve causal relationships of two kinds: bivariate causal relationshipsand causal moderator relationships. In bivariate causal relationships, oneis examining whether deliberately manipulating one entity (e.g.,

introducing a prevention program) will lead to variability in another entity(e.g., onset of drug use). In causal moderator relationships, one is

interested in identifying variables that modify the magnitude or sign of acausal relationship (e.g., in the presence of peer counselors, preventionprograms are more effective than in their absence).

Meta-analyses seek to draw conclusions about populations, classes, or

universes of variables. This is different from primary studies in which, forinstance, researchers examine the causal effects of a particular drugeducation curriculum in a particular school with students in a particulargrade. Instead, meta-analyses seek to draw conclusions regarding auniverse of persons (e.g., students in grades 4 to 12), a universe of

interventions (e.g., substance abuse prevention programs), a universe ofoutcomes (e.g., drug use), a universe of settings (e.g., schools), and auniverse of times (e.g., 1980's). Thus, meta-analyses are concerned withgeneralized causal relationships. This chapter deals with specific threats tothe validity of meta-analyses, examining generalized bivariate causal andcausal moderator relationships.

As Campbell originally coined the term, \"validity threats\" refer tosituations and issues in research practice that may lead to erroneousconclusions about a causal relationship. However, unlike the validitythreats identified by Campbell and Stanley (1963) and Cook andCampbell (1979), this chapter is not concerned with validity threats in

165

primary studies. Because research synthesis relys on the evidencegenerated from many different studies, the issue is the total bias acrossstudies rather than bias in a single primary study. Thus, the validitythreats discussed in this chapter refer to issues in conducting a researchsynthesis that may lead to erroneous conclusions about a generalizedcausal relationship.

Drawing generalized causal inferences in meta-analysis involves threemajor steps. First, research synthesists need to establish that there is anassociation between the class of interventions and the class of outcomes.In other words, there has to be evidence that the intervention effect acrossstudies is reliably different from zero. Second, research synthesists haveto defend the argument that the relationship examined across studies iscausal. Phrased differently, they have to rule out that factors other thanthe treatments as implemented were responsible for the observed changein the outcomes. Third, given the specific instances of interventions,outcomes, persons, settings, and times included in a review, researchsynthesists have to clarify the universes of interventions, outcomes,

populations, settings, and times about which one can draw inferences. Thefollowing paragraphs discuss validity threats that research synthesists mayencounter at each of these three steps of generalized causal inference.The research reviews by Bangert-Drowns (1988), Hansen (1992), andTobler (1986, 1992) are used to provide examples of validity threats andto indicate ways for coping with them.

THREATS TO INFERENCES ABOUT THE EXISTENCE OF ARELATIONSHIP: IS THERE AN ASSOCIATION BETWEENTREATMENT AND OUTCOME CLASSES?

The first group of validity threats deals with issues that may lead a

research synthesist to draw erroneous conclusions about the existence of arelationship between a class of independent variables (i.e., interventions)and a class of dependent variables (i.e., outcomes). In the language ofstatistical hypothesis testing, these threats may lead to type 1 or type 2errors because of deficiencies in either the primary studies or the meta-analytic review process. Because research syntheses are concerned withgeneralized relationships, a single threat in a single study is not likely tojeopardize meta-analytic conclusions in any meaningful way. Morecritical is whether the same source of bias operates across all or most ofthe studies being reviewed and whether different sources of bias fail tocancel each other out across studies. This may then lead to a predominantdirection of bias, inflating or deflating estimates of a relationship. See

166

table 1 for a list of threats to valid inferences about the existence of arelationship in a meta-analysis.Unreliability in Primary Studies

Unreliability in implementing or measuring variables contributes randomerror to the within-group variability of a primary study, thereby

attenuating effect size estimates not only within such a study but alsowhen studies are aggregated meta-analytically.

In the context of drug prevention programs, reliability issues include themeasurement of outcome variables such as drug knowledge, attitudestoward drugs, actual drug use, and the fidelity with which prevention

programs were implemented. To deal with this issue, correction formulashave been suggested to adjust effect estimates and their standard errorsTABLE 1.Threats to inferences about the existence of a relationship

between treatment and outcome classes. (1) (2) (3) (4) (5) (6) (7) (8)

Unreliability in primary studies

Restriction of range in primary studiesMissing effect sizes in primary studiesUnreliability of codings in meta-analysesCapitalizing on chance in meta-analysesBias in transforming effect sizes

Lack of statistical independence among effect sizes

Failure to weight study level effect sizes proportional to theirprecision

(9)Underjustified use of fixed- or random-effects models(10)Lack of statistical power

167

(Hunter and Schmidt 1990; Rosenthal 1984). However, Tobler (1986)found that program implementation and the reliability of outcomemeasures are often poorly documented in primary studies, making

comprehensive attempts to correct for attenuation unfeasible. Neverthe-less, attenuation corrections are sometimes useful to make the degree ofattenuation constant across studies and to better understand the magnitudeof effects if interventions were consistently implemented and outcomesmeasured without error.

Restriction of Range in Primary Studies

When the range of an outcome measure is restricted in a primary study, allcorrelation coefficients involving this measure are attenuated. Rangerestrictions may influence other effect size measures differently. Forinstance, the selection of homogeneous subgroups, blocking, and

matching reduce both within-group variability and range. Everything elsebeing equal, this decreases the denominator of the effect size estimated,thereby increasing the magnitude of effect sizes. When such design

characteristics operate, Kulik and Kulik (1986) refer to the resulting effectsizes as operative rather than interpretable. Aggregating such operativeeffect sizes may yield a predominant bias across studies.

In research syntheses of prevention programs, restricted ranges can occurif primary studies involve extreme groups or homogeneous subgroupsfrom a larger population. Effect estimates based on these studies mayoverestimate program effects in populations with larger variances.

Correction formulas can be applied to adjust effect size estimates (Hunterand Schmidt 1990) if valid estimates of population variances are available.Missing Effect Sizes in Primary Studies

Researchers sometimes provide an incomplete report of findings becauseof page limitations in journals, the particular emphasis of a research paper,unexpected results, or poor measurement. This reporting practice maybias effect estimates in meta-analyses if researchers in primary studies failto report, for instance, statistically nonsignificant findings or statisticallysignificant findings in an unexpected direction.

Selective reporting in primary studies is a pervasive issue in many meta-analyses. To prevent possible biases, it is always desirable to code themost complete documents and to contact study authors to obtain

information not available in research reports (Premack and Hunter 1988;Shadish 1992). If this strategy is not feasible, there is a need to considerimputation strategies (Little and Rubin 1987; Rubin 1987) and to explore

168

how missing effect sizes may have influenced effect estimates in a meta-analysis.

Unreliability of Codings in Meta-Analyses

All the data synthesized in a meta-analysis are collected through a codingprocess susceptible to human error. Thus, meta-analyses contributesources of unreliability in addition to those in primary studies. Unreali-ability in the coding process adds error variation to the observations,

increasing estimates of standard error and attenuating correlations amongeffect size estimates and study characteristics. Strategies for controllingand reducing error in codings include comprehensive coder training, pilottesting, and reliability assessments (Cooper 1989).Capitalizing on Chance in Meta-Analyses

There are three major ways in which meta-analyses may capitalize onchance. First, a publication bias may exist such that studies with

statistically significant findings in support of a study's hypotheses aremore likely to be submitted for publication. If this is the case, the studiespublished in the behavioral and social sciences are likely to be a biasedsample of all the studies actually carried out (Greenwald 1975; Rosenthal1979). A second way meta-analysts may capitalize on chance is in

extracting effect sizes within studies. Research reports frequently presentmore than one estimate, especially when there are multiple outcomemeasures, multiple treatment and control groups, and multiple delayedtime points for assessment. Not all of these effect estimates may berelevant for a particular topic, and some relevant estimates may be moreimportant than others. Meta-analysts must then decide which effect

estimates should be included in the meta-analysis. Bias may occur whenselected effect estimates are just as substantively relevant as those notselected, but differ in average effect size (Matt 1989). A third way thatmeta-analysts may capitalize on chance is by conducting a large numberof statistical tests without adequately controlling for type 1 error.Bias in Transforming Effect Sizes

Meta-analyses require that findings from primary studies be transformedinto a common metric such as a correlation coefficient, a standardizedmean difference, or standard normal deviate. Because studies differ in thetype of quantitative information they provide about intervention effects,transformation rules were developed to derive common effect sizeestimates from many different metrics. Bias results if some types of

transformation lead to systematically different estimates of average effect

169

size or standard error when compared to others. For instance, this is likelyto be the case when primary studies fail to report exact probability levelsand truncated levels (e.g., p < 0.05) have to be used to estimate an effectsize.

Lack of Statistical Independence Among Effect Sizes

Hedges (1990) states that there are at least four reasons why effect sizeestimates entering into a meta-analysis may lack statistical independence:(a) Different effect size estimates may be calculated on the same

respondents using different measures; (b) effect sizes may be calculatedby comparing different interventions to a single control group, ordifferent control groups to a single intervention group; (c) different

samples may be used in the same study to calculate an effect estimate foreach sample; and (d) a series of studies may be conducted by the sameresearch team, resulting in nonindependent results. A predominant biasmay occur if stochastic dependencies among effect sizes influenceaverage effect estimates and their precision (Hedges and Olkin 1985).The simplest approaches for dealing with dependencies involve analyzingonly one of the possible correlated effects or an average effect for eachstudy. However, these approaches fail to take into account informationconcerning the differences between nonindependent effect sizes, andmultivariate analyses or hierarchical linear models may be called for(Bryk and Raudenbush 1992; Raudenbush et al. 1988; Rosenthal andRubin 1986).

Failure To Weight Study Level Effect Sizes Proportional to TheirPrecision

Even if one obtains unbiased effect estimates within a study, simplyaveraging them may yield biased average effect estimates and incorrectsampling errors if the effect sizes from different studies vary in precision(i.e., have different standard errors) (Shadish 1992). Similarly, t tests,analyses of variance (ANOVAs), and regression analyses may provideincorrect results unless weighted estimation procedures are used (e.g.,weighted least squares).

170

Underjustified Use of Fixed- or Random-Effects Models

For the statistical analysis of effect sizes, Hedges and Olkin (1985)

distinguish between postulating a model with fixed or random effects. Inits simplest form, the fixed-effects model assumes that all studies (e.g.,social influence programs) have a common but unknown effect size andthat estimates of this population value differ only as a result of samplingvariability. In the fixed-effect model, analysts are interested in estimatingthe unknown population effect size and its standard error. In the random-effects model, each treatment is assumed to have its own uniqueunderlying effect and to be sampled from a universe of related butdistinct treatments. Under the random-effects model, the effects of asample of treatments are best represented as a distribution of true effectsrather than as a point estimate.

There is no simple indicator for which model is correct. However, twofactors should be considered in the decision whether to assume a fixed- ora random-effects model. The first concerns assumptions about theprocesses generating an effect. For instance, in the context of drugprevention programs, are all the prevention programs labeled \"socialinfluence\" identical and are they standardized and administered

consistently in all studies? Are the processes by which social influenceprograms affect drug use the same across all studies? If the answer tothese questions is \"no\" or \"probably no,\" a random-effects model isindicated. The second factor to consider is the heterogeneity of the

observed effect sizes. A homogeneity test can be conducted to determinewhether the observed variance exceeds what is expected based onsampling error alone. If the homogeneity hypothesis is rejected, theanalyst may want to consider the possibility of a random-effects model.Alternatively, if one has reason to insist on a fixed-effects model, thesearch would begin for the variables responsible for the increasedvariability.

Lack of Statistical Power

When compared to statistical analyses in primary studies, statistical powerwill typically be much higher in meta-analyses, particularly when meta-analysts are only interested in estimating the average effect of a broadclass of interventions. However, as the meta-analyses on drug preventionprograms show (Bangert-Drowns 1988; Tobler 1986, 1992), researchsynthesists are frequently interested in examining effect sizes for

subclasses of treatments and outcomes, different types of settings, anddifferent subpopulations. These subanalyses often rely on a much smallernumber of studies than the overall analyses and result in a large number

171

of statistical tests. The meta-analyst then has to decide which tradeoff tomake between type 1 and type 2 error, or, in other words, between thenumber of statistical tests and the statistical power of these tests.THREATS TO INFERENCES ABOUT CAUSATION: ARE THEREANY NONCAUSAL REASONS FOR THE ASSOCIATION?

Whenever a reliable association between independent and dependentvariables is presumed to be causal, some additional threats need to be

considered. Note again that inferences about the possible causal nature ofa treatment-outcome relationship are not necessarily jeopardized bydeficiencies in primary studies. A plausible threat arises only if the

deficiencies within each study combine across studies to create a predom-inant direction of bias. In the following, two aspects are considered:bivariate causal relationship and causal moderator relationship. Table 2gives a brief summary of the threats. See Matt and Cook (1993) for adiscussion of threats to causal mediating relationships.TABLE 2.Threats to inferences about causation.(1)(2)(3)

Failure to assign at random

Deficiencies in the implementation of treatment contrasts

Confounding levels of the moderator with substantively irrelevantstudy characteristics

Failure To Assign at Random

If experimental units (e.g., students, classrooms, schools) are not assignedto treatment conditions at random, a variety of third-variable explanationscan jeopardize causal inference in primary studies. The failure to assignat random jeopardizes meta-analytic conclusions if it results in apredominant bias across primary studies.

For research studies of school-based substance abuse prevention

programs, Hansen (1992) argues that selection biases are potential threatsin quasi-experimental designs comparing groups that inherently differ inexpected drug use. In some studies, higher levels of initial risk forsubstance abuse may be a precondition for entry into a preventionprogram. Moreover, Hansen's (1992) research suggests that selectionbiases may be more likely in some program groups (e.g., alternatives)

172

than in others (affective education). However, despite the potential forselection biases, Tobler’s meta-analysis (1986) found little evidence for apredominant bias when comparing randomized trials and quasi-experimental studies.

Deficiencies in the Implementation of Treatment Contrasts

Outside of controlled laboratories, random assignment is often difficult toimplement; and even if successfully implemented, it does not ensure thatcomparability between groups is maintained beyond the initial

assignment. Even the most carefully designed randomized experimentsand quasi-experiments are not immune to implementation problems suchas differential attrition and diffusion of treatments. If the reviewed studiesshare deficiencies of implementation, a predominant bias may result whenstudies are combined. However, in trying to examine the implementationof prevention programs more closely, Tobler (1986) found that primaryreports often failed to report relevant information.

Hansen (1992) points out another type of implementation issue: studiesof school-based prevention programs often involve small numbers ofexperimental units (i.e., schools), thus jeopardizing the equivalence ofcontrol and treatment groups even if experimental units are randomlyassigned. While this may threaten the internal validity of a primary study,one would not expect that such nonequivalence necessarily yields apredominant bias when studies are combined in a meta-analysis.Confounding Levels of a Moderator Variable With SubstantivelyIrrelevant Study Characteristics

Moderator variables condition causal relationships by specifying how anoutcome is related to different variants of an intervention, to differentclasses of outcomes, and to different types of settings and populations.All moderator variables imply a statistical interaction and identify thosefactors that lead to differently sized cause-effect relationships.

Moderators can change the magnitude or the sign of a causal effect, aswhen Tobler (1986) concluded that peer programs are more effective inreducing drug use than other adolescent drug prevention programs.Threats to valid inference about the causal moderating role of a variablemay arise if substantively irrelevant factors are differentially associatedwith each level or category of the moderator variable under analysis. Ifthe moderator variable (e.g., information/knowledge versus socialinfluence programs) is confounded with characteristics of the design,setting, or population (e.g., urban versus rural schools), differences in thesize or direction of a treatment effect brought about by the moderator

173

cannot be distinguished from differential effects brought about by thepotentially confounding variable.

Meta-analysts attempt to deal with confounding issues through statisticalmodeling (e.g., Tobler 1986, 1992) and through the use of within-studycomparisons (e.g., Shapiro and Shapiro 1982). Within-study comparisonsare particularly useful because they do not require making assumptionsregarding the nature of the confounding. For instance, if the moderatingrole of prevention programs type A and B is at stake, a meta-analysiscould be conducted of all the studies with internal comparisons ofprevention programs A and B.

THREATS TO GENERALIZED INFERENCES

Research syntheses promise to generate findings that are more generaliz-able than those of single studies. Following Cronbach (1982) and

Campbell and Stanley (1963), generalizations may involve universes ofpersons, treatments, outcomes, settings, and times. With respect to researchsyntheses, Cook (1990) distinguishes three separate though interrelatedtypes of generalized inferences. The first concerns general-ized

inferences about classes of persons, treatments, outcomes, settings, andtimes from which the reviewed studies were sampled. These are the

generalizations that meta-analysts like to make; for instance, the effects ofgoal-setting programs (the treatment class) on drug use (the outcomeclass) among 8- to 12-year-olds (the target population) in public schools(the target setting class) during the 1980s (the target time).

The second type of generalized inferences concern generalizations acrossuniverses. Here, the issue is probing the robustness of a relationshipacross different populations of persons, different classes of interventions,different categories of settings, different outcome classes, and differenttime periods. When a relationship is not robust, the analyst seeks to

specify the contingencies on which its appearance depends. At issue hereare moderator variables, and of particular importance are moderator

variables that specify the conditions under which a program has no effector negative effects.

The third type of generalized inferences concern the generalizability offindings beyond the universes of persons, treatments, outcomes, andsettings for which data are available. For example, can the effects of

comprehensive prevention programs on the onset of drug use observed inschool settings be generalized to church, YMCA, and prison settings? Arethe effects of social influence programs observed during the 1970s and

174

1980s generalizable to programs to be implemented during the 1990s?In each of these examples, the issue is how one can justify inferences tonovel universes of persons, treatments, outcomes, settings, and times on thebasis of findings in other universes.

Generalizing on the basis of samples is most warranted when formalstatistical sampling procedures have been used to draw the particularinstances studied. That is, a sampling frame has been designed and

instances have been selected with known probability. However, in meta-analyses the instances of person, samples, treatments, outcomes, settings,and times rarely if ever constitute probability samples from whateveruniverses were specified in the guiding research question. Nevertheless,Cook (1990) argues that generalized inferences about persons, treatments,outcomes, and settings can be tentatively justified even in the absence ofrandom sampling. Cook discusses several principles for justifyinggeneralized inferences in meta-analyses; two of these are furtherelaborated below. The first requires making a case for the proximalsimilarity of the sample and population (Campbell 1986). This requiresidentifying the prototypical, identity-inferring elements (Rosch 1978) ofthe target classes of persons, settings, causes, and effects and thenexamining whether they are adequately represented in the sample of

studies entering a meta-analysis. In addition to the prototypical elementsmaking a study relevant to a target universe, each individual study’ssetting, population, measure, and treatments are likely to have uniquecomponents that are not part of the target classes. It is crucial that theseirrelevancies are made heterogeneous in the sample of studies entering ameta-analysis to avoid confounding prototypical and irrelevantcharacteristics (Campbell and Fiske 1957).

The second principle for generalizing when random selection cannot beassumed is empirical interpolation and extrapolation. Simply put, themore regularly intervention effects occur across different levels of anindependent variable (e.g., length of intervention, type of counselor, typeof school), the more tenable is the assumption that a causal effect can beextrapolated to not yet studied but related levels (e.g., shorter or longerinterventions, different types of schools and counselors). The more

dissimilar the yet unstudied levels are from the levels for which interven-tion effects have been examined, the more difficult interpolations andextrapolations are to justify. The wider and more diverse the conditionsunder which the intervention effects follow a predictable pattern, the morejustified are generalizations to yet unstudied levels. Table 3 lists threatsrelated to the different types of generalized inference desired in meta-analyses.

175

TABLE 3.Threats to generalized inferences.(1)(2)(3)(4)(5)(6)(7)

Unknown sampling probabilities associated with the set of persons,settings, treatments, outcomes, and times entering a meta-analysisUnderrepresentation of prototypical attributesFailure to test for heterogeneity in effect sizes

Lack of statistical power for studying disaggregated groupsRestricted heterogeneity of substantively irrelevant aspectsConfounding of subclasses with substantively irrelevant studycharacteristics

Restricted heterogeneity of classes of populations, treatments,outcomes, settings, and times

Unknown Sampling Probabilities Associated With the Set of Persons,Settings, Treatments, Outcomes, and Times Entering a Meta-AnalysisOne can rarely assume that the instances of persons, treatments, outcomes,settings, and times represented in a meta-analysis were randomly selectedfrom the population of persons, settings, treatments, and outcomes towhich generalization is desired. Even if there are random samples at theindividual study level, it is rare that the studies entering into a meta-analysis constitute a formally representative sample of all such possiblestudy-specific populations. The samples entering primary studies arechosen for proximal similarity and convenience rather than for reasons offormal sampling theory, and the studies containing these samples have anunknown relationship to all the studies that have been completed and thatmight be done on a particular topic. To tentatively justify generalizedinferences in the absence of random sampling, the meta-analyst mayfollow the principles suggested by Cook (1990).Underrepresentation of Prototypical Attributes

To demonstrate proximal similarity between a sample and its referentuniverse requires matching theoretically derived prototypical elements ofthe universe with the elements of the studies at hand. For substance abuseprevention programs, the question is whether the samples of students,prevention programs, settings, outcomes, and times examined in the

reviewed studies represent the core attributes of the populations to whichone is interested in generalizing. For instance, Hansen (1992) identified agroup of school-based programs and labeled them \"social influenceprograms.\" Hansen explicates that their \"… primary purpose is to teach

176

students about peer pressure and other social pressures and develop skillsto resist these pressures\" (p. 415). Thus, a meta-analysis of all the

interventions that teach students about peer pressures but fail to includethe development of skills to resist peer pressures might not constitute asocial influence program. Consequently, such a meta-analysis would notallow generalized inferences to the target population of social influenceprograms. In a similar vein, program success could be explicated in termsof long-term abstinence from using illegal substances. A meta-analysis inwhich the majority of studies examine short-term effects, alcohol andtobacco use, the onset of drug use, and attitudes towards drugs wouldmake questionable generalized inferences to the target population ofoutcomes (i.e., long-term abstinence from illegal substances).Failure To Test for Heterogeneity in Effect Sizes

A statistical test for homogeneity has been developed (Hedges 1982;Rosenthal and Rubin 1982) that assesses whether the variability in effectestimates exceeds that expected from sampling error alone. Homogeneitytests play an important role in examining the robustness of a relationshipand in initiating the search for factors that might moderate the

relationship. If the homogeneity hypothesis is rejected, the implication isthat subclasses of studies exist that differ in effect size. The failure to testfor heterogeneity may result in lumping manifestly different subclasses ofpersons, treatments, outcomes, settings, or times into one category (i.e.,apples-and-oranges problem). The heterogeneity test indicates whenstudies yield such different results that average effect sizes need to bedisaggregated through blocking study characteristics that might explainthe mean differences in effect size. Homogeneity tests also protect againstsearching for moderator variables when effects are robust.Lack of Statistical Power for Studying Disaggregated Groups

If there is evidence that effect sizes are moderated by substantive variablesof interest, then aggregated classes of treatments, outcomes, persons, orsettings can be disaggregated to examine the conditions under which aneffect changes in sign or magnitude. Such subgroup analyses rely on asmaller number of studies than main effect analyses and may involveadditional statistical tests, thus lowering the statistical power for the

subanalyses in question. Large samples mitigate against this problem, asdo statistical tests adjusted to take into account the number of tests made.Even more useful are analyses based on aggregating within-studyestimates of consequences of particular moderator variables.

Restricted Heterogeneity of Substantively Irrelevant Characteristics

177

Even if prototypical attributes of a universe are represented in the

reviewed studies, a threat arises if a meta-analysis cannot demonstrate thatthe generalized inference holds across substantively irrelevant

characteristics. For instance, if the reviewed studies on social influenceprograms were conducted by just one research team, relied on voluntaryparticipation by students, depended on teachers and principals beinghighly motivated, or were all conducted in metropolitan areas ofCalifornia, the threat would then arise that all conclusions about thegeneral effectiveness of homework are confounded with substantivelyirrelevant aspects of the research context. To give an even more concreteexample, if school-based programs were explicated to involve programsadministered and implemented in school during grades 4 to 12, it isirrelevant whether the schools are in urban or rural settings, parochial ornonparochial schools, military schools, or elite academic schools. Togeneralize to school-based programs in the abstract requires being able toshow that relationships are not limited to one or a few of these contexts—say, urban or Catholic schools.

The wider the range and the larger the number of substantively irrelevantaspects across which a finding is robust and the better moderating

influences are understood, the stronger the belief that the finding will alsohold under the influence of not yet examined contextual irrelevancies.Limited heterogeneity in substantively irrelevant variables will also impedethe transfer of findings to new universes because it hinders the ability todemonstrate the robustness of a causal relationship across substantiveirrelevancies of design, implementation, or measurement method. Tobler(1986) addresses the issue in examining whether program effects arerobust regardless of substantively irrelevant characteristics of researchdesign.

Confounding of Subclasses With Substantively Irrelevant StudyCharacteristics

Even if substantively irrelevant aspects are heterogeneous across studies,the possibility arises that subclasses of treatments, outcomes, settings,persons, or times are confounded with substantively irrelevant character-istics of studies. This situation arose in a meta-analysis of psychotherapyoutcomes; differences in treatment effects were observed across differenttypes of psychotherapy, but psychotherapy types were confounded withsuch substantively irrelevant research design features as the way

psychotherapy outcomes were assessed (Wittmann and Matt 1986). Thisconfounding impedes the ability to identify treatment type as acharacteristic that moderates intervention effects.

178

Restricted Heterogeneity in Classes of Populations, Treatments,Outcomes, Settings, and Times

Generalizations across universes and generalizations to novel universes arefacilitated if intervention effects can be studied for a large number and awide range of persons, treatment, outcomes, settings, and times. This is thesingle most important potential strength of research syntheses overindividual studies. For instance, a generalization to a novel universe oftime is required if the question is whether school-based drug preventionprograms developed and studied during the 1970s and 1980s can beexpected to have similar effects in the 1990s. The confidence in such ageneralization would be increased if one could demonstrate that theintervention effects were robust throughout the 1970s and 1980s, acrossdifferent school settings, across different drugs, across different outcomemeasures, for students from different backgrounds, and so forth. Themore robust the findings and the more heterogeneous the populations,settings, treatments, outcomes, and times in which they were observed, thegreater the belief that similar findings will be observed beyond thepopulations studied.

SUMMARY AND CONCLUSIONS

Meta-analyses of drug prevention programs address questions regardingthe causal relationship between prevention efforts and substance abuse.Different from primary studies of substance abuse prevention programs,meta-analyses involve generalized causal inferences. At issue are causaleffects involving classes or universes of students, prevention programs,outcomes, settings, and times. This chapter presented threats to drawingsuch generalized inferences regarding bivariate causal and causal

moderator relationships. The first group of threats concerns issues thatcould lead to erroneous conclusions regarding the existence of a

relationship between a class of interventions and a class of outcomes. Thesecond group concerns issues that may lead to erroneous conclusionsregarding the causal nature of the relationship. Note that in all these

instances, deficiencies in primary studies do not necessarily jeopardize thegeneralized inferences of a meta-analysis; in theory, such deficienciesmay cancel each other out. A plausible threat only arises if deficienciescombine across studies to create a predominant bias. The third group ofthreats concerns issues that may lead to erroneous conclusions about theuniverses of persons, treatments, settings, outcomes, and times.

179

All validity threats are empirical products; they are the result of theoriesof method and the practice of research. Consequently, no list of validitythreats is definite. Threats are expected to change as theories of methodare improved and more is learned about the practice of research synthesis.All threats are potential; the existence of a threat by itself does not make ita plausible alternative explanation to a causal claim. Research synthesistshave to use the empirical evidence, logic, common sense, and any

background information available to determine whether a potential threatindeed provides a plausible alternative explanation.REFERENCES

Bangert-Drowns, R.L. The effects of school-based substance abuse

education—a meta-analysis. J Drug Educ 18:243-264, 1988.

Bryk, A.S., and Raudenbush, S.W. Hierarchical Linear Models: Applications

and Data Analysis Methods. Newbury Park, CA: Sage, 1992.

Campbell, D.T. Relabeling internal and external validity for applied social

scientists. In: Trochim, W.M.K., ed. Advances in Quasi-Experimental Design and Analysis. San Francisco: Jossey-Bass, 1986.

Campbell, D.T., and Fiske, D.W. Covergent and discriminant validation by the

multitrait-multimethod matrix. Psychol Bull 59:81-105, 1957.

Campbell, D.T., and Stanley, J.C. Experimental and quasi-experimental designs

for research on teaching. In: Gage, N.L., ed. Handbook ofResearch on Teaching. Chicago: Rand McNally, 1963.

Cook, T.D. The generalization of causal connections: Multiple theories in

search of clear practice. In: Sechrest, L.; Perrin, E; and

Bunker, J., eds. Research Methodology: Strengthening CausalInterpretations of Nonexperimental Data. DHHS PublicationNo. (PHS) 90-3454. Washington, DC: U.S. Department ofHealth and Human Services, 1990.

Cook, T.D., and Campbell, D.T. Quasi-experimentation. Design and Analysis

Issues for Field Settings. Boston: Houghton Mifflin Company,1979.

Cooper, H.M. Integrating Research: A Guide for Literature Reviews. Newbury

Park, CA: Sage, 1989.

Cronbach, L.J. Designing Evaluations of Educational and Social Programs.

San Francisco: Jossey-Bass, 1982.

Greenwald, A.G. Consequences of prejudice against the null hypothesis.

Psychol Bull 82:1-20, 1975.

Hansen, W.B. School-based substance abuse prevention: A review of the state

of the art in curriculum, 1980-1990. Health Educ Res TheoryPract 7:403-430, 1992.

180

Hedges, L.V. Estimation of effect sizes from a series of independent

experiments. Psychol Bull 92:490-499, 1982.

Hedges, L.V. Directions for future methodology. In: Wachter, K.W., and Straf,

M.L., eds. The Future of Meta-Analysis. New York: RussellSage Foundation, 1990.

Hedges, L.V., and Olkin, I. Statistical Methods for Meta-Analysis. Orlando, FL:

Academic Press, 1985.

Hunter, J.E., and Schmidt, F.L. Methods of Meta-Analysis. Correcting Error

and Bias in Research Findings. Newbury Park, CA: Sage,1990.

Kulik, J.A., and Kulik, C.-L.C. \"Operative and Interpretable Effect Sizes in

Meta-Analysis.\" Paper presented at the annual meeting of theAmerican Educational Research Association, San Francisco,April 16-20, 1986.

Little, R.J.A., and Rubin, D.B. Statistical Analysis with Missing Data. New

York: Wiley, 1987.

Matt, G.E. Decision rules for selecting effect sizes in meta-analysis: A review

and reanalysis of psychotherapy outcome studies. PsycholBull 105:106-115, 1989.

Matt, G.E., and Cook, T.D. Threats to the validity of research syntheses. In:

Cooper, H., and Hedges, L.V., eds. The Handbook of ResearchSynthesis. New York: Russell Sage Foundation, 1993.

Premack, S.L., and Hunter, J.E. Individual unionization decisions. Psychol Bull

103:223-234, 1988.

Raudenbush, S.W.; Becker, B.J.; and Kalaian, H. Modeling multivariate effect

sizes. Psychol Bull 103:111-120, 1988.

Rosch, E. Principles in categorization. In: Rosch, E., and Lloyd, B.B., eds.

Cognition and Categorization. Hillsdale, NJ: Erlbaum, 1978.

Rosenthal, R. The ‘file drawer problem’ and tolerance for null results. Psychol

Bull 86:638-641, 1979.

Rosenthal, R. Meta-Analytic Procedures for Social Research. Beverly Hills, CA:

Sage, 1984.

Rosenthal, R., and Rubin, D.B. Comparing effect sizes of independent studies.

Psychol Bull 22:500-504, 1982.

Rosenthal, R., and Rubin, D.B. Meta-analytic procedures for combining studies

with multiple effect sizes. Psychol Bull 99:400-406, 1986.

Rubin, D.B. Multiple Imputation for Nonresponse in Surveys. New York: Wiley,

1987.

Shadish, W.R., Jr. Do family and marital therapies change what people do? A

meta-analysis of behavioral outcomes. In: Cook, T.D.; Cooper,H.M.; Cordray, D.S.; Hartman, H.; Hedges, L.V.; Light, R.J.;Louis, T.A.; and Mosteller, F., eds. Meta-Analysis forExplanation: A Casebook. New York: Russell SageFoundation, 1992.

181

Shapiro, D.A., and Shapiro, D. Meta-analysis of comparative therapy outcome

research: A replication and refinement. Psychol Bull 92:581-604, 1982.

Tobler, N. Meta-analysis of 143 adolescent drug prevention programs:

Quantitative outcome results of program participants

compared to a control or comparison group. J Drug Issues16:537-567, 1986.

Tobler, N.S. Drug prevention programs can work: Research findings. J Addict

Dis 11:1-27, 1992.

Wittmann, W.W., and Matt, G.E. Meta-Analyse als Integration von

Forschungsergebnissen am Beispiel deutschsprachiger

Arbeiten zur Effektivität von Psychotherapie [Integration ofGerman-language psychotherapy outcome studies throughmeta-analysis.]. Psychologische Rundschau 37:20-40, 1986.

AUTHOR

Georg E. Matt, Ph.D.

Department of PsychologySan Diego State UniversitySan Diego, CA 92182-4611

Click here to go to page 183

182

因篇幅问题不能全部显示,请点此查看更多更全内容