Boletim informativo NEPS
Universidade do Minho. Instituto de Ciências Sociais. Núcleo de Estudos de População e Sociedade
2001-11-01
Calculating and reporting effect sizes on scientific papers (1): p < 0.05 limitations in the analysis of mean differences of two groups 
Outros títulos
Calcular e apresentar tamanhos do efeito em trabalhos científicos (1): As limitações do p < 0,05 na análise de diferenças de médias de dois grupos
Data
Descrição
The Portuguese Journal of Behavioral and Social Research requires authors to follow the recommendations of the Publication Manual of the American Psychological Association (APA, 2010) in the presentation of statistical information. One of the APA recommendations is that effect sizes should be presented along with levels of statistical significance.Since p-values from the results of the statistical tests do not indicate the magnitude or importance of a difference, then effect sizes (ES) should reported. In fact, ES give meaning to statistical tests; emphasize the power of statistical tests; reduce the risk of interpret mere sampling variation as real relationship; can increase the reporting of “non-significant"results, and allow the accumulation of knowledge from several studies using meta-analysis. Thus, the objectives of this paper are to present the limits of the significance level; describe the foundations of presentation of ES of statistical tests to analyze differences between two groups; present the formulas to calculate directly ES, providing examples of our own previous studies; show how to calculate confidence intervals; provide the conversion formulas for the review of the literature; indicate how to interpret the ES; and show that, although interpretable, the meaning (small, medium or large effect for an arbitrary metric) could be inaccurate, requiring that interpretation should be made in the context of the research area and in the context of real world variables.
A Revista Portuguesa de Investigação Comportamental e Social exige que os autores sigam as recomendações do Publication Manual of the American Psychological Association (APA, 2010) na apresentação da informação estatística. Uma das recomendações da APA é de que os tamanhos do efeito sejam apresentados associados aos níveis de significância estatística.Uma vez que os valores de p decorrentes dos resultados dos testes estatísticos não informam sobre a magnitude ou importância de uma diferença, devem então reportar-se os tamanhos do efeito (TDE). De facto, os TDE dão significado aos testes estatísticos, enfatizam o poder dos testes estatísticos, reduzem o risco de a mera variação amostral ser interpretada como relação real, podem aumentar o relato de resultados “não-significativos” e permitem acumular conhecimento de vários estudos usando a meta-análise.Assim, os objetivos deste artigo são os de apresentar os limites do nível de significância; descrever os fundamentos da apresentação dos TDE dos testes estatísticos para análise de diferenças entre dois grupos; apresentar as fórmulas para calcular os TDE, fornecendo exemplos de estudos nossos; apresentar procedimentos de cálculo dos intervalos de confiança; fornecer as fórmulas de conversão para revisão da literatura; indicar como interpretar os TDE; e ainda mostrar que, apesar de frequentemente ser interpretável, o significado (efeito pequeno, médio ou grande para uma métrica arbitrária) pode ser impreciso, havendo necessidade de ser interpretado no contexto da área de investigação e de variáveis do mundo real.
Assunto
Fonte
Idioma
Relação
/*ref*/Acion, L., Peterson, J. J., Temple, S., & Arndt, S. (2006). Probabilistic index: An intuitive non-parametric approach to measuring the size of treatment effects. Statistics in Medicine, 25(4), 591–602. doi:10.1002/sim.2256
/*ref*/Aguinis, H., Werner, S., Abbott, J. L., Angert, C., Park, J. H., & Kohlhausen, D. (2010). Customer-centric science: Reporting significant research results with rigor, relevance, and practical impact in mind. Organizational Research Methods, 13(3), 515–539.
/*ref*/Aickin, M. (2004). Bayes without priors. Journal of Clinical Epidemiology, 57(1), 4–13. doi:10.1016/S0895-4356(03)00251-8
/*ref*/American Psychological Association. (APA) (2010). Publication Manual of the American Psychological Association (6th ed.). Washington, DC: APA.
/*ref*/Andersen, M. B., McCullagh, P., & Wilson, G. J. (2007). But what do the numbers really tell us? Arbitrary metrics and effect size reporting in sport psychology research. Journal of Sport & Exercise Psychology, 29(5), 664–672. doi:10.1123/jsep.29.5.664
/*ref*/Baguley, T. (2009). Standardized or simple effect size: What should be reported? British Journal of Psychology, 100(3), 603–617. doi:10.1348/000712608x377117
/*ref*/Berben, L., Sereika, S. M., & Engberg, S. (2012). Effect size estimation: methods and examples. International Journal of Nursing Studies, 49(8), 1039–1047. doi:10.1016/j.ijnurstu.2012.01.015
/*ref*/Bezeau, S., & Graves, R. (2001). Statistical power and effect sizes of clinical neuropsychology research. Journal of Clinical and Experimental Neuropsychology (Neuropsychology, Development and Cognition: Section A), 23(3), 399–406. doi:10.1076/jcen.23.3.399.1181
/*ref*/Blanton, H., & Jaccard, J. (2006). Arbitrary metrics in psychology. The American Psychologist, 61(1), 27–41. doi:10.1037/0003-066X.61.1.27
/*ref*/Blanton, H., & Jaccard, J. (2006). Arbitrary metrics redux. The American Psychologist, 61(1), 62-71. doi:10.1037/0003-066X.61.1.62
/*ref*/Borenstein, M. (2009). Effect sizes for continuous data. In H. Cooper, L. V. Hodges, & J. C. Valentine (Eds.), The handbook of research synthesis and meta-analysis (pp. 221–235). New York, NY: Russell Sage Foundation.
/*ref*/Breaugh, J. A. (2003). Effect size estimation: Factors to consider and mistakes to avoid. Journal of Management, 29(1), 79–97. doi:10.1016/s0149-2063(02)00221-0
/*ref*/Caperos, J. M., & Pardo, A. (2013). Consistency errors in p-values reported in Spanish psychology journals. Psicothema, 25(3), 408–414. doi:10.7334/psicothema2012.207
/*ref*/Carver, R. P. (1978). The case against statistical significance testing. Harvard Educational Review, 48(3), 378–399. doi:10.17763/haer.48.3.t490261645281841
/*ref*/Chow, S. L. (1988). Significance test or effect size? Psychological Bulletin, 103(1), 105-110. Retrieved from http://psych.colorado.edu/~willcutt/pdfs/Chow_1988.pdf
/*ref*/Coe, R. (2002). It's the effect size, stupid: What effect size is and why it is important. Paper presented at the Annual Conference of the British Educational Research Association, University of Exeter, England, Education-line. Retrieved from http://www.cem.org/attachments/ebe/ESguide.pdf
/*ref*/Cohen, J. (1962). The statistical power of abnormal-social psychological research: A review. Journal of Abnormal and Social Psychology, 65(3), 145–153. doi:10.1037/h0045186
/*ref*/Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale: Lawrence Erlbaum Associates.
/*ref*/Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159. doi:10.1037/0033-2909.112.1.155
/*ref*/Cohen, J. (1992). Statistical power analysis. Current Directions in Psychological Science, 1(3), 98–101. doi:10.2307/20182143
/*ref*/Cohen, J. (1994). The earth is round (p < .05). The American Psychologist, 49(12), 997-1003. doi:10.1037/0003-066X.49.12.997
/*ref*/Conn, V. S., Chan, K. C., & Cooper, P. S. (2014). The problem with p. Western Journal of Nursing Research, 36(3), 291–293. doi:10.1177/0193945913492495
/*ref*/Cook, R. J., & Sackett, D. L. (1995). The number needed to treat: A clinically useful measure of treatment effect. BMJ, 310(6977), 452–454. doi:10.1136/bmj.310.6977.452
/*ref*/Cooper, H., Hedges, L. V., & Valentine, J. C. (2009). The handbook of research synthesis and meta-analysis (2nd ed.). New York, NY: Russell Sage Foundation.
/*ref*/Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York, NY: Routledge.
/*ref*/Dunlap, W. P. (1994). Generalizing the common language effect size indicator to bivariate normal correlations. Psychological Bulletin, 116(3), 509–511. doi:10.1037/0033-2909.116.3.509
/*ref*/Durlak, J. A. (2009). How to select, calculate, and interpret effect sizes. Journal of Pediatric Psychology, 34(9), 917–928. doi:10.1093/jpepsy/jsp004
/*ref*/Ellis, P. D. (2010). The essential guide to effect sizes. Statistical power, meta-analysis, and the interpretation of research results. Cambridge: Cambridge University Press.
/*ref*/Embretson, S. E. (2006). The continued search for nonarbitrary metrics in psychology. The American Psychologist, 61(1), 50–55. doi:10.1037/0003-066X.61.1.50
/*ref*/Ferguson, C. J. (2009). An effect size primer: A guide for clinicians and researchers. Professional Psychology: Research and Practice, 40(5), 532-538. doi:10.1037/a0015808
/*ref*/Fern, E. F., & Monroe, K. B. (1996). Effect-size estimates: Issues and problems in interpretation. Journal of Consumer Research, 23(2), 89–105. doi:10.2307/2489707
/*ref*/Fisher, R. A. (1925). Statistical methods for research workers. Edinburgh: Oliver and Boyd.
/*ref*/Fisher, R. A. (1959). Statistical methods and scientific inferente (2nd ed.). Edinburgh: Oliver and Boyd.
/*ref*/Furukawa, T. A., & Leucht, S. (2011). How to obtain NNT from Cohen's d: Comparison of two methods. PLoS ONE, 6(4), e19070, 1-5. doi:10.1371/journal.pone.0019070
/*ref*/Giere, R. N. (1972). The significance test controversy. British Journal for the Philosophy of Science, 23(2), 170–181. doi:10.2307/686441
/*ref*/Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher, 5(10), 3–8. doi:10.3102/0013189X005010003
/*ref*/Glass, G.V., McGaw, B., & Smith, M. L. (1981). Meta-analysis in social research. Sage: Beverly Hills.
/*ref*/Grissom, R. J., & Kim, J. J. (2005). Effect sizes for research: A broad practical approach. Mahwah, NJ: Lawrence Erlbaum Associates Publishers.
/*ref*/Hedges, L. V. (1981). Distribution theory for Glass's estimator of effect size and related estimators. Journal of Educational and Behavioral Statistics, 6(2), 107–128. doi:10.3102/10769986006002107
/*ref*/Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis (Vol. 11, pp. 104–106). Orlando: Academic Press.
/*ref*/Hentschke, H., & Stüttgen, M. C. (2011). Computation of measures of effect size for neuroscience data sets. The European Journal of Neuroscience, 34(12), 1887–1894. doi:10.1111/j.1460-9568.2011.07902.x
/*ref*/Huberty, C. J. (1993). Historical origins of statistical testing practices: The treatment of Fisher versus Neyman-Pearson views in textbooks. The Journal of Experimental Education, 61(4), 317–333. doi:10.2307/20152384
/*ref*/Jacobson, N. S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59(1), 12-19. doi:10.1037/0022-006X.59.1.12
/*ref*/Kazdin, A. E. (2006). Arbitrary metrics: Implications for identifying evidence-based treatments. The American Psychologist, 61(1), 42-49. doi:10.1037/0003-066X.61.1.42
/*ref*/Killeen, P. R. (2005). An alternative to null-hypothesis significance tests. Psychological Science, 16(5), 345–353. doi:10.2307/40064228
/*ref*/Kirk, R. E. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement, 56(5), 746–759. doi:10.1177/0013164496056005002
/*ref*/Kline, R. B. (2004). Beyond significance testing: Reforming data analysis methods in behavioral research (2nd ed.). Washington, DC: American Psychological Association.
/*ref*/Kraemer, H. C., & Kupfer, D. J. (2006). Size of treatment effects and their importance to clinical research and practice. BPS, 59(11), 990–996. doi:10.1016/j.biopsych.2005.09.014
/*ref*/Kühberger, A., Fritz, A., & Scherndl, T. (2014). Publication bias in psychology: A diagnosis based on the correlation between effect size and sample size. PLoS ONE, 9(9), e105825, 1-8. doi:10.1371/journal.pone.0105825
/*ref*/Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4(863), 1-12. doi:10.3389/fpsyg.2013.00863
/*ref*/Lee, M. D., & Wagenmakers, E.-J. (2005). Bayesian statistical inference in psychology: Comment on Trafimow (2003). Psychological Review, 112(3), 662–668. doi:10.1037/0033-295X.112.3.662
/*ref*/Lemos, L., Espirito-Santo, H., Silva, G. F., Costa, M., Cardoso, D., Vicente, F., Martins, S., Vigário, V., Rodrigues, F., Neves, C. S., Pascoal, V., Pinto, A. L., Moitinho, S. (2014). The impact of a Neuropsychological Rehabilitation Group Program (NRGP) on cognitive and emotional functioning in institutionalized elderly. Poster presented at the 22nd European Congress of Psychiatry, Munich. Retrieved from https://www.researchgate.net/publication/264979017_EPA-1657_-_The_impact_of_a_neuropsychological_rehabilitation_group_program_NRGP_on_cognitive_and_emotional_functioning_in_institutionalized_elderly
/*ref*/Lenth, R. V. (2006–2014). Java applets for power and sample size. Retrieved from http://homepage.stat.uiowa.edu/~rlenth/Power/
/*ref*/Liesbeth, W. A., Prins, J. B., Vernooij-Dassen, M. J. F. J., Wijnen, H. H., Olde Rikkert, M. G. M., & Kessels, R. P. C. (2011). Group therapy for patients with mild cognitive impairment and their significant others: Results of a waiting-list controlled trial. Gerontology, 57(5), 444–454. doi:10.1159/000315933
/*ref*/Lipsey, M. W., Puzio, K., Yun, C., Hebert, M. A., Steinka-Fry, K., Cole, M. W., Roberts, M., Anthony, K. S., Busick, M. D. (2012). Translating the statistical representation of the effects of education interventions into more readily interpretable forms. Washington, DC.: National Center for Special Education Research, Institute of Education Sciences. Retrieved from https://ies.ed.gov/ncser/pubs/20133000/pdf/20133000.pdf
/*ref*/Loftus, G. R. (1991). On the tyranny of hypothesis testing in the social sciences. Contemporary Psychology, 36(2), 102–105. doi:10.1037/029395
/*ref*/Loftus, G. R. (1996). Psychology will be a much better science when we change the way we analyze data. Current Directions in Psychological Science, 161–171. doi:10.1111/1467-8721.ep11512376
/*ref*/McCartney, K., & Rosenthal, R. (2000). Effect size, practical importance, and social policy for children. Child Development, 71(1), 173–180. doi:10.1111/1467-8624.00131
/*ref*/McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological Bulletin, 111(2), 361-365. doi:10.1037/0033-2909.111.2.361
/*ref*/McMillan, J. H., & Foley, J. (2011). Reporting and discussing effect size: Still the road less traveled. Practical Assessment, Research & Evaluation, 16(14), 1–12. Retrieved from http://pareonline.net/pdf/v16n14.pdf
/*ref*/Morrison, D. E., & Henkel, R. E. (Eds.). (1970). The significance test controversy: A reader. Chicago: Aldine.
/*ref*/Nakagawa, S., & Cuthill, I. C. (2007). Effect size, confidence interval and statistical significance: A practical guide for biologists. Biological Reviews, 82(4), 591–605. doi:10.1111/j.1469-185X.2007.00027.x
/*ref*/Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5(2), 241–301. doi:10.1037//1082-989X.5.2.241
/*ref*/Nunnally, J. C. (1978). Psychometric theory. New York, NY: McGraw-Hill.
/*ref*/Olejnik, S., & Algina, J. (2000). Measures of effect size for comparative studies: Applications, interpretations, and limitations. Contemporary Educational Psychology, 25(3), 241–286. doi:10.1006/ceps.2000.1040
/*ref*/Orwin, R. G. (1983). A fail-safe N for effect size in meta-analysis. Journal of Educational Statistics, 8(2), 157–159. doi:10.2307/1164923
/*ref*/Paiva, A. C., Cunha, M., Xavier, A. M., Marques, M., Simões, S., & Espirito-Santo, H. (2013). Exploratory study of risk-taking and self-harm behaviours in adolescents: Prevalence, characteristics and its relationship to attachment styles. European Psychiatry, 28(Suppl. 1), 1. doi:10.1016/S0924-9338(13)76530-1
/*ref*/Pearson, K. (1900). Mathematical contributions to the theory of evolution. VII. On the correlation of characters not quantitatively measurable. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 195, 1–47. doi:10.1098/rsta.1900.0022
/*ref*/Reiser, B., & Faraggi, D. (1999). Confidence intervals for the overlapping coefficient: The normal equal variance case. Journal of the Royal Statistical Society: Series D (the Statistician), 48(3), 413–418. doi:10.1111/1467-9884.00199
/*ref*/Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86(3), 638-641. doi:10.1111/1467-9884.00199
/*ref*/Rosenthal, R. (1983). Assessing the statistical and social importance of the effects of psychotherapy. Journal of Consulting and Clinical Psychology, 51, 4-13. doi:10.1037/0022-006X.51.1.4
/*ref*/Rosenthal, R. (1994). Parametric measures of effect size. In H. Cooper & L. V. Hedges (Eds.). The handbook of research synthesis (pp. 231–244). New York, NY: Russell Sage.
/*ref*/Rosenthal, J. A. (1996). Qualitative descriptors of strength of association and effect size. Journal of Social Service Research, 21(4), 37-59. doi:10.1300/J079v21n04_02
/*ref*/Rosnow, R. L., & Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological science. The American Psychologist, 44(10), 1276-1284. doi:10.1037/0003-066X.44.10.1276
/*ref*/Rosnow, R. L., Rosenthal, R., & Rubin, D. B. (2000). Contrasts and correlations in effect-size estimation. Psychological Science, 11(6), 446–453. doi:10.1111/1467-9280.00287
/*ref*/Salsburg, D. (2002). The lady tasting tea. New York, NY: Macmillan.
/*ref*/Sanabria, F., & Killeen, P. R. (2007). Better statistics for better decisions: Rejecting null hypotheses statistical tests in favor of replication statistics. Psychology in the Schools, 44(5), 471–481. doi:10.1002/pits.20239
/*ref*/Schatz, P., Jay, K. A., McComb, J., & McLaughlin, J. R. (2005). Misuse of statistical tests in Archives of Clinical Neuropsychology publications. Archives of Clinical Neuropsychology, 20(8), 1053–1059. doi:10.1016/j.acn.2005.06.006
/*ref*/Schmidt, F. L., & Hunter, J. E. (2004). Methods of meta-analysis. Thousand Oaks: SAGE Publications.
/*ref*/Schneider, A. L., & Darcy, R. E. (1984). Policy implications of using significance tests in evaluation research. Evaluation Review, 8(4), 573–582. doi:10.1177/0193841X8400800407
/*ref*/Schünemann, H. J., Oxman, A. D., Vist, G. E., Higgins, J. P. T., Deeks, J. J., Glasziou, P., & Guyatt, G. H. (2008). Interpreting results and drawing conclusions. In J. P. T. Higgins & S. Green, Cochrane Handbook for Systematic Reviews of Interventions: Cochrane Book Series (pp. 1–29). The Cochrane Collaboration.
/*ref*/Sechrest, L. McKnight, P., & McKnight, K. (1996). Calibration of measures in psychotherapy outcome studies. American Psychologist, 51, 1065-1071. doi:10.1037/0003-066X.51.10.1065
/*ref*/Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of studies?. Psychological Bulletin, 105(2), 309-316. doi:10.1037/0033-2909.105.2.309
/*ref*/Snyder, P., & Lawson, S. (1993). Evaluating results using corrected and uncorrected effect size estimates. The Journal of Experimental Education, 61(4), 334–349. doi:10.1037/0033-2909.105.2.309
/*ref*/Sun, S., Pan, W., & Wang, L. L. (2010). A comprehensive review of effect size reporting and interpreting practices in academic journals in education and psychology. Journal of Educational Psychology, 102(4), 989-1004. doi:10.1037/a0019507
/*ref*/Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Boston: Pearson.
Direitos
Link para registo
http://retrievo.uab.pt/record?id=oai:ojs.revista.ismt.pt:article/14
