• Home   /  
  • Archive by category "1"

Primary Research Paper On Reaction Time

Over the past two decades, research conducted online via the Internet has become increasingly frequent. Today, Web-based research is common across the whole range of social and behavioral sciences. This trend is not surprising, given the well-documented advantages of Web-based research, especially the possibility to recruit large, heterogeneous (and more representative) samples in less time and with lower costs than in traditional lab- or paper/pencil-based research (for overviews, see Birnbaum, 2004; Gosling, Vazire, Srivastava, & John, 2004; Kraut et al., 2004; Reips & Birnbaum, 2011; Skitka & Sargis, 2006).

Beyond these well-documented advantages, a growing body of literature has confirmed that data obtained and results found in Web-based studies are generally comparable to those generated by traditional lab- or paper/pencil-based research—for example, in research on personality (Chuah, Drasgow, & Roberts, 2006; Cronk & West, 2002; Lang, John, Lütdke, Schupp, & Wagner, 2011), ability (Ihme et al., 2009), or perception and cognition (Corley & Scheepers, 2002; Germine et al., 2012; Linnman, Carlbring, Ahman, Andersson, & Andersson, 2006; Reimers & Stewart, 2007). Nonetheless, “researchers doing Web-based experiments can encounter skepticism from reviewers and editors” (Germine et al., 2012, p. 848), especially “skepticism about the accuracy of response time measures recorded in a Web browser online” (Reimers & Stewart, 2015, p. 310).

Specifically, although there is no conclusive evidence to show that Web-based measurement of response latencies is inherently problematic, Birnbaum (2004) noted that “brief time intervals . . . are thought to be less precisely . . . measured via the Web” (p. 824, emphasis added). That is, as comments from editors and/or reviewers regularly reveal, “the measurement of response time in a Web experiment is perceived to be problematic” (Brand & Bradley, 2012, p. 350, emphasis added). Essentially, there is a persistent preconception that “the Internet may not be optimal for research that is dependent on detecting . . . small differences in response time” (Skitka & Sargis, 2006, p. 547). Thus, although several well-established reaction time effects have been replicated in Web-based research (e.g., Crump, McDonnell, & Gureckis, 2013; Keller, Gunasekharan, Mayo, & Corley, 2009; Simcox & Fiez, 2014), skepticism remains widespread—for at least three reasons.

First, one of the core arguments fueling skepticism about Web-based response time measurement lies in the inherent and indubitable increase in technical and situational variance as compared to the lab (Reips, 2002). Unlike in the lab, Web-based data necessarily stem from many different computers, displays, input devices, operating systems, and Web browsers. In light of existing evidence that different input devices (mice/keyboards) or variations in the number of parallel processes (e.g., other applications running) will indeed affect reaction time measurement (Plant, Hammond, & Whitehouse, 2003; Plant & Turner, 2009), technical variation may increase unexplained error variance. In addition, Web-based research comes with less control over aspects of the situation (e.g., the lighting, participant’s viewing position, time of day, or distractions), which may further increase error variance. On the other hand, simulations have shown that the effects of increased error variance are unlikely to offset the advantages in terms of statistical power and more precise effect estimates due to larger sample sizes (Brand & Bradley, 2012).

A second type of concern is with the software and technologies used. That is, some technologies that are suitable for reaction time measurement—such as JAVA applets (Hecht, Oesker, Kaiser, Civelek, & Stecker, 1999) or Adobe Flash (Linnman et al., 2006; Reimers & Stewart, 2007, 2015)—require special software or plugins that may not be available to all potential participants, and more problematically yet, their availability may vary systematically with the characteristics of the users, thus creating potential confounds (Reips & Krantz, 2010). Also, some technologies have been shown to provide inaccurate timing if no countermeasures are taken (Eichstaedt, 2001). The most widely applicable technology (in terms of availability on client machines) offering millisecond resolution is JavaScript (de Leeuw, 2015; Reips & Krantz, 2010). So far, investigations using nonhuman response systems have demonstrated that JavaScript provides adequately accurate timing under most conditions (Reimers & Stewart, 2015), and a recent experiment with human response data has confirmed this conclusion (de Leeuw & Motz, 2015).

Third, and most importantly, it must be acknowledged that most empirical comparisons of Web- versus lab-based reaction time effects (and indeed other effects) suffer an unfortunate methodological drawback: Typically, the results obtained in different, independent samples are compared. For example, Corley and Scheepers (2002) compared their priming results obtained in a Web-based sample to the lab-based data from a previous, independent study. On the basis of high consistency across the two studies, they concluded that Web-based research is “valid.” Similarly, McGraw, Tew, and Williams (2000) compared the results of several Web-based paradigms to well-established effects previously found in the lab and concluded that Web-based data can be trusted, given that they reliably mirror said established effects. More recent investigations have similarly based their conclusions on cross-sample comparisons (e.g., Germine et al., 2012; Linnman et al., 2006). In what has arguably been the largest set of studies to date, Crump et al. (2013) replicated an impressive series of established effects—including Stroop, flanker, Posner cueing, attentional blink, and subliminal priming—on the Web using Amazon Mechanical Turk (see also Simcox & Fiez, 2014). Although insightful and indeed encouraging, the trouble with all of these comparisons is that, strictly speaking, there is no control over possible confounds. Since participants were not randomly assigned to lab- versus Web-based data collection, the comparisons remain inconclusive and cannot be tested statistically. Stated simply, “[f]ailure to find a difference tells us nothing unless we are sure that the samples compared really do not differ on the constructs of interest . . . ,” implying that one must “[r]andomly assign participants to Web versus Lab condition when performing such comparisons” (Reips, Buchanan, Krantz, & McGrawn, in press, MS p. 8).

The most notable recent exception has been the experiment by de Leeuw and Motz (2015), who manipulated within subjects whether a visual search task was performed using JavaScript versus MATLAB’s Psychophysics Toolbox (for a similar experiment comparing Adobe Flash—in the lab and on the Web—to a program written in C, see Reimers & Stewart, 2007). Thus, by assessing real human responses and systematically manipulating the underlying technology, their comparison allows for conclusions about the equivalence of the technologies in practice—that is, the extent to which actual empirical effects will be found with comparable reliability and precision. Indeed, they found no substantial differences between the software packages and concluded that JavaScript thus “offers suitable sensitivity for the measurement of response time differences between conditions in common psychophysical research.”

However, despite these promising results, the experiment by de Leeuw and Motz (2015) is limited to comparisons of software/technology within the lab. That is, their setup did not include a fully Web-based condition, and thus cannot address the concern above regarding technical and situational variance. Aiming to extend their work, the experiment reported in what follows was designed to further tease apart the potential effects of different sources of variation or error. Most importantly, the goal was to test whether Web-based reaction time measurement is offset by the mere technical and situational variance that is usually absent in the lab (see the first point above). At the same time, it is vital to separate such a potential effect from the error that may be inherent in the technologies and software used (see the second point above). Although the latter concern per se is alleviated by the findings of de Leeuw and Motz, it seemed prudent to provide another test, using a different experimental design, other software for comparison, and a different type of task.


Design, procedure, and participants

For the present purpose, the well-known word frequency effect in lexical decisions—that frequent words are detected faster as words (over nonwords or pseudowords) than less frequent words (Gordon, 1983; Rubenstein, Garfield, & Millikan, 1970)—was chosen. This effect is robust and reliable, but nonetheless is typically only 150–200 ms in size. At the same time, it is a genuine within-subjects effect that is particularly useful here, since it allows for substantial statistical power: Testing whether the word frequency effect is equivalent across the between-subjects conditions of interest (see below) corresponds to an F test of a within–between interaction that in turn requires only a moderate sample size, even for relatively small interaction effects (Faul, Erdfelder, Lang, & Buchner, 2007).

To perform the comparisons of interest outlined above, the present experiment comprised three (between-subjects) conditions: First, the lexical decision task was implemented in the lab, using standard software for psychological experimenting, namely E-Prime (Schneider, Eschman, & Zuccolotto, 2002). This condition (termed “lab/E-Prime” in what follows) can be considered the benchmark or baseline. The second and third conditions implemented the same lexical decision task for the Web browser using a “low-tech” solution (Reips & Krantz, 2010). Specifically, the task was written in HTML (with PHP controlling the task flow and handling HTML forms), and reaction time measurement was implemented via a simple JavaScript using an event-handler function for the “keydown” event. The essence of the code used to achieve the reaction time measurements can be found in the Appendix. Importantly, the second condition was run in the exact same lab as the first, and is therefore referred to as “lab/browser.” The only difference to the “lab/E-Prime” condition was thus the technology used for reaction time measurement (E-Prime vs. Web browser with HTML/JavaScript), whereas all other aspects (same lab, computers, etc.) were equivalent. The third condition, by contrast, was a genuine Web-based condition in which the HTML/JavaScript version of the task was completed by participants on whatever computer (in whichever place) they desired. This “Web/browser” condition is thus fully equivalent to the second, except for the place (lab vs. Web) and the differences in technical and situation variation it comes with. In summary, the design allows for an in-depth analysis, not only of whether the lab and Web differ but—if so—also dissecting two aspects: Differences due to software and technology can be tested by comparing “lab/E-Prime” with “lab/browser,” whereas differences due to variation (i.e., technical and situational heterogeneity) can be tested by comparing “lab/browser” with “Web/browser.” Note that this includes differences due to the presence versus absence of an experimenter (Ollesch, Heineken, & Schulte, 2006): The two lab conditions were equivalent in terms of experimenter presence (the same experimenters ran all lab-based sessions, to which they were randomly assigned), whereas the Web condition did not involve an experimenter (but possibly other unknown individuals).

The lexical decision task requested participants to judge—as speedily and accurately as possible—whether or not six-letter strings represented words, by pressing one of two keys. As materials, a total of 200 German six-letter nouns (half of which were high vs. low in word frequency, respectively) with 200 matched pseudowords (created by replacing one letter from the words) were used, taken from a previous psycholinguistic experiment (Albrecht & Vorberg, 2010, Exp. 2).1 For each participant, 140 items were randomly selected and shown one at a time in random order (with a 1,000-ms intertrial interval); on exactly half of the trials (70 in total) a word was displayed, whereas a pseudoword was displayed on the remaining half of trials. The entire experiment (including informed consent and demographics, instructions, the lexical decision task, and debriefing) lasted about 10 min, on average.

A total of 67 participants (35 male, 32 female, between 18 and 32 years of age; M = 21 years, SD = 2.3 years) were recruited from a local participant pool. All were invited via e-mail and registered for the experiment online. The online registration system randomly assigned participants to the three between-subjects conditions outlined above, with the constraint that participants were assigned to the Web/browser condition with a higher probability, so as to counteract the potentially higher drop-out rate (although, ultimately, no drop-outs occurred in any of the conditions). Consequently, there were n = 28 participants in the Web/browser condition, n = 20 in the lab/browser condition, and n = 19 in the lab/E-Prime condition. Participants in the Web/browser condition completed the experiment online at a place and time of their choosing, within one week of having registered. The remaining participants signed up for a lab session within the same week. All participants were paid a flat fee of €2.00 (approximately USD 2.75 at the time).


Reaction times from the lexical decision task served as the dependent variable (the complete raw data are available as supplementary material). To reduce the influence of outliers, the first five trials of each participant were disregarded, as well as all trials in which the reaction time was more than 2.5 standard deviations above or below the individual mean reaction time (2.7 % of trials).2 Descriptives characterizing the reaction time distributions in each of the three experimental conditions are summarized in Table 1. As can be seen, there was a trend toward shorter reaction times in the lab/E-Prime condition, which is in line with previous findings that JavaScript produces slightly longer times both in an automated response system (Neath, Earle, Hallett, & Surprenant, 2011) and in human data (de Leeuw & Motz, 2015). At the same time, the smallest degree of variability was observed in the lab/browser condition, and the largest in the Web/browser condition, implying that variance is not primarily due to software or technology, but rather is caused by situational and technical variation (which is greater on the Web than in the lab).
Table 1

Descriptives of reaction time distributions in the raw data (excluding the first five trials and outliers as described in the main text)

All statistical comparisons were based on individual median reaction times (and double checked with individual mean log-transformed reaction times, which yielded equivalent results). Participants’ overall accuracy was high (M = 94 %, SE = 0.5 %), and the mean of their median reaction time across all trials (M = 958 ms, SE = 30 ms) was in the range typical for this type of task (cf. Rubenstein et al., 1970). Across all (between-subjects) conditions, responses were made more speedily to words (M = 770 ms, SE = 15 ms) than to pseudowords (M = 968 ms, SE = 33 ms), t(66) = 8.2, p < .001, Cohen’s d = 0.99. More importantly, high-frequency words were more speedily accepted as words (M = 697 ms, SE = 12 ms) than were low-frequency words (M = 878 ms, SE = 23 ms), thus mirroring the primary effect of interest, t(66) = 12.4, p < .001, Cohen’s d = 1.52.

To test the main question of interest, the word frequency effect was considered depending on the between-subjects condition (lab/E-Prime vs. lab/browser vs. Web/browser). The effects (the mean difference between participants’ median reaction times for high- vs. low-frequency words) per condition are reported in Table 2. As can be seen, the effect was substantial in all three conditions, albeit somewhat larger in the two browser-based conditions. To test the full pattern, a mixed analysis of variance was conducted on participants’ median reaction times for low- versus high-frequency words (repeated measures factor), with Condition as a between-subjects factor. As expected, the word frequency effect was clearly replicated [F(1, 64) = 150, p < .001, Cohen’s f = 1.5]. By contrast, no main effect of condition emerged [F(2, 64) = 1.1, p = .34, Cohen’s f = 0.19], showing that the descriptive trend in the raw reaction time distributions was not statistically reliable. Most importantly, there was no interaction between word frequency and condition [F(2, 64) = 0.49, p = .62, Cohen’s f = 0.12], confirming that the word frequency effects were essentially comparable in magnitude across all three conditions.
Table 2

Word frequency effect separated by experimental (between-subjects) condition

To rule out that the lack of statistical support for the interaction was due to insufficient power, a criterion power analysis was computed (Faul et al., 2007). The analysis revealed a critical F value of 2.2 (and, thus, a Type I error of α = .12) to detect the observed effect (f = 0.12), with a power of 1 – β = .95 (and thus a Type II error probability of .05), given the present sample size and correlation among the repeated measures (Spearman’s ρ = .85 across all conditions). Clearly, the observed F value is well below this critical value, implying that the null hypothesis can be accepted within a conventional level of statistical error.3

Although the analyses above did not yield any indication of noteworthy differences between the lab- and Web-based reaction time measurements, more specific analyses using Helmert contrasts were conducted to compare the effect of software and technology (E-Prime vs. browser/JavaScript) with the effect of situational and technical variation (lab vs. Web—within the browser/JavaScript conditions). Regressing the individual difference in median reaction times between high- and low-frequency words on the correspondingly coded contrasts revealed that the effect of software and technology (E-Prime vs. browser/JavaScript) was small and statistically nonsignificant (β = .12, p = .35), despite the descriptive tendency for a larger word frequency effect in the browser-based conditions (see above). Within the browser-based conditions, absolutely no evidence emerged (β = .02, p = .87) for an effect of lab versus Web (i.e., of technical and/or situational variation).


Despite a growing body of evidence suggesting that Web-based data will yield results that are comparable to those obtained with more traditional methods (Germine et al., 2012; Gosling et al., 2004; Reips & Birnbaum, 2011), skepticism is still commonplace, especially concerning Web-based measurement of reaction times (Reimers & Stewart, 2015; Simcox & Fiez, 2014). Such reservations are fueled by (i) the indubitably increased technical and situational variance on the Web and (ii) limits in terms of software and technologies. Most importantly, (iii) there has been a lack of direct experimental comparisons between lab and Web—that is, comparisons based on random assignment (Reips et al., in press).

One of the few experimental investigations into the comparability of software packages was recently conducted by de Leeuw and Motz (2015), who demonstrated that JavaScript was largely equivalent, in terms of reaction time measurement, to the Psychophysics Toolbox. However, since their experiment only compared technologies within the lab, it seemed vital to extend their approach to comparisons of Web versus lab and of technologies, thus teasing apart the potential effects of different sources of variation or error. Consequently, the present experiment was designed to critically test whether a classic reaction time effect—the word frequency effect in lexical decisions (Rubenstein et al., 1970)—can be uncovered as reliably on the Web as in the lab, on the basis of full random assignment to the different conditions. Most importantly, I tested three conditions to allow for a more fine-grained analysis of potential differences: The first was lab-based and relied on the widely used E-Prime software (“lab/E-Prime”). The second was also lab-based, but implemented the task in HTML with a simple JavaScript for reaction time measurement (“lab/browser”). The third used the same technological implementation (HTML with JavaScript), but was conducted on the Web (“Web/browser”). Thereby, the effects of software and technology (lab/E-Prime vs. lab/browser) and of situational and technical variation (lab/browser vs. Web/browser) can be teased apart.

The results showed that the effect in question (the word frequency effect in reaction times) was typical in size (170–200 ms), statistically significant, and large (in terms of standardized effect size) in all conditions. Indeed, there was no indication of an interaction between word frequency (within subjects) and condition (between subjects), which confirms that the effects were equivalent across conditions. This finding was statistically confirmed by a criterion power analysis (Faul et al., 2007). Interestingly, if anything, the browser-based conditions produced the larger word frequency effect, although this was a mere descriptive trend, without strong statistical support. Nonetheless, it does imply that reaction time measurement using a browser and HTML/JavaScript is certainly no less appropriate than commonly used software such as E-Prime. This can be considered a conceptual replication of the results of de Leeuw and Motz (2015), using a different experimental design, software for comparison, and paradigm (and effect of interest). In addition, the comparison within the browser-based conditions further revealed that the increase in technical and situational variance inherent in the Web had practically no effect at all. This finding is well-aligned with previous work concluding that technical variation is little cause for worry (Brand & Bradley, 2012), but the first to demonstrate this using human response data and based on experimental manipulation (i.e., all else—including the underlying sample—being equal).

Note that, exactly because the design chosen herein compared different settings for the same population, it cannot provide an estimate of how much noisier Web studies will be due to sample differences in general. This, however, has been addressed by the many studies that have replicated lab-based effects with typical Web samples (e.g., Crump et al., 2013; Germine et al., 2012; Linnman et al., 2006). Thus, the present approach is complementary to the latter and to investigations of whether Web technologies can be considered adequately precise using automated response systems (Reimers & Stewart, 2015): It estimates the effects of technical and situational variation in human response data (holding any sample differences equal). Arguably, the best possible assessment of whether and when Web studies are adequate alternatives to classical lab experiments will come from considering the results of all of these approaches in combination. Note, also, that conclusions from one single task or paradigm need not generalize. Some confidence should come from the fact that the present investigation replicates the results of de Leeuw and Motz (2015) using in a different paradigm, but nonetheless, more experiments using still other tasks will be needed.

Overall, the present findings confirm previous research demonstrating the comparability of lab- and Web-based reaction time measurements (Corley & Scheepers, 2002; Crump et al., 2013; Germine et al., 2012; Linnman et al., 2006; McGraw et al., 2000; Reimers & Stewart, 2015; Simcox & Fiez, 2014), in this case using a simple, “low-tech” solution that can be applied without requiring additional software or plugins beyond a browser (Reips & Krantz, 2010). At the same time, due to reliance on random assignment, the present comparison complements the typical cross-study comparisons (Reips et al., in press) and goes beyond prior experiments (de Leeuw & Motz, 2015) by teasing apart different potential sources of error. In conclusion, the still commonplace skepticism whenever data—and even reaction time data requiring sufficient accuracy to uncover an effect less than 200 ms in size—are collected via the Web is no longer appropriate. Importantly, neither the prior investigations nor the present results discredit classical lab-based approaches in any way; rather, they demonstrate that Web/browser-based methods are a viable alternative that should not be treated with general a priori skepticism or suspicion.


Author Note

I thank Martin Brandt for countless discussions and helpful suggestions, Theresa Strobel for preparing the lexical decision task in E-Prime, and a substantial number of anonymous reviewers (mostly of other manuscripts) for sometimes forceful reminders that this type of research is still sorely needed.

Supplementary material

Appendix. JavaScript and HTML code (simplified extracts) used for reaction time measurement

<script type = "text/javascript">

function response(e){

var stoptime=(new Date()).getTime();

var latency = stoptime-starttime;




var starttime=(new Date()).getTime();


<body onkeydown = "return response(event)">

<form name="form" METHOD="POST" … >

<input type="hidden" name="delay" value="0">


  1. Albrecht, T., & Vorberg, D. (2010). Long-lasting effects of briefly flashed words and pseudowords in ultrarapid serial visual presentation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 36, 1339–1345. doi:10.1037/a0019999PubMedGoogle Scholar

  2. Birnbaum, M. H. (2004). Human research and data collection via the Internet. Annual Review of Psychology, 55, 803–832. doi:10.1146/annurev.psych.55.090902.141601CrossRefPubMedGoogle Scholar

  3. Brand, A., & Bradley, M. T. (2012). Assessing the effects of technical variance on the statistical outcomes of web experiments measuring response times. Social Science Computer Review, 30, 350–357. doi:10.1177/0894439311415604CrossRefGoogle Scholar

  4. Chuah, S. C., Drasgow, F., & Roberts, B. W. (2006). Personality assessment: Does the medium matter? No. Journal of Research in Personality, 40, 359–376. doi:10.1016/j.jrp.2005.01.006CrossRefGoogle Scholar

  5. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.Google Scholar

  6. Corley, M., & Scheepers, C. (2002). Syntactic priming in English sentence production: Categorical and latency evidence from an Internet-based study. Psychonomic Bulletin & Review, 9, 126–131. doi:10.3758/bf03196267CrossRefGoogle Scholar

  7. Cronk, B. C., & West, J. L. (2002). Personality research on the Internet: A comparison of Web-based and traditional instruments in take-home and in-class settings. Behavior Research Methods, Instruments, & Computers, 34, 177–180.CrossRefGoogle Scholar

  8. Crump, M. J. C., McDonnell, J. V., & Gureckis, T. M. (2013). Evaluating Amazon’s Mechanical Turk as a Tool for Experimental Behavioral Research. PLoS ONE, 8(e57410), 1–18. doi:10.1371/journal.pone.0057410Google Scholar

  9. de Leeuw, J. R. (2015). jsPsych: A JavaScript library for creating behavioral experiments in a Web browser. Behavior Research Methods, 47, 1–12. doi:10.3758/s13428-014-0458-yCrossRefPubMedGoogle Scholar

  10. de Leeuw, J. R., & Motz, B. A. (2015). Psychophysics in a Web browser? Comparing response times collected with JavaScript and Psychophysics Toolbox in a visual search task. Behavior Research Methods. doi:10.3758/s13428-015-0567-2PubMedGoogle Scholar

  11. Eichstaedt, J. (2001). An inaccurate-timing filter for reaction time measurement by JAVA applets implementing Internet-based experiments. Behavior Research Methods, Instruments, & Computers, 33, 179–186.CrossRefGoogle Scholar

  12. Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175–191. doi:10.3758/BF03193146CrossRefPubMedGoogle Scholar

  13. Germine, L., Nakayama, K., Duchaine, B. C., Chabris, C. F., Chatterjee, G., & Wilmer, J. B. (2012). Is the Web as good as the lab? Comparable performance from Web and lab in cognitive/perceptual experiments. Psychonomic Bulletin & Review, 19, 847–857. doi:10.3758/s13423-012-0296-9CrossRefGoogle Scholar

  14. Gordon, B. (1983). Lexical access and lexical decision: Mechanisms of frequency sensitivity. Journal of Verbal Learning and Verbal Behavior, 22, 24–44. doi:10.1016/S0022-5371(83)80004-8CrossRefGoogle Scholar

  15. Gosling, S. D., Vazire, S., Srivastava, S., & John, O. P. (2004). Should we trust web-based studies? A comparative analysis of six preconceptions about internet questionnaires. American Psychologist, 59, 93–104.CrossRefPubMedGoogle Scholar

  16. Hecht, H., Oesker, M., Kaiser, A., Civelek, H., & Stecker, T. (1999). A perception experiment with time-critical graphics animation on the World-Wide Web. Behavior Research Methods, Instruments, & Computers, 31, 439–445. doi:10.3758/bf03200724CrossRefGoogle Scholar

  17. Ihme, J. M., Lemke, F., Lieder, K., Martin, F., Muller, J. C., & Schmidt, S. (2009). Comparison of ability tests administered online and in the laboratory. Behavior Research Methods, 41, 1183–1189. doi:10.3758/BRM.41.4.1183CrossRefPubMedGoogle Scholar

  18. Keller, F., Gunasekharan, S., Mayo, N., & Corley, M. (2009). Timing accuracy of Web experiments: A case study using the WebExp software package. Behavior Research Methods, 41, 1–12. doi:10.3758/BRM.41.1.12CrossRefPubMedGoogle Scholar

  19. Kraut, R., Olson, J., Banaji, M., Bruckman, A., Cohen, J., & Couper, M. (2004). Psychological research online: Report of Board of Scientific Affairs’ Advisory Group on the Conduct of Research on the Internet. American Psychologist, 59, 105–117. doi:10.1037/0003-066x.59.2.105CrossRefPubMedGoogle Scholar

  20. Lang, F. R., John, D., Lütdke, O., Schupp, J., & Wagner, G. G. (2011). Short assessment of the Big Five: Robust across survey methods except telephone interviewing. Behavior Research Methods, 43, 548–567. doi:10.3758/s13428-011-0066-zCrossRefPubMedPubMedCentralGoogle Scholar

  21. Linnman, C., Carlbring, P., Ahman, A., Andersson, H., & Andersson, G. (2006). The Stroop effect on the internet. Computers in Human Behavior, 22, 448–455. doi:10.1016/j.chb.2004.09.010CrossRefGoogle Scholar

  22. McGraw, K. O., Tew, M. D., & Williams, J. E. (2000). The integrity of Web-delivered experiments: Can you trust the data? Psychological Science, 11, 502–506. doi:10.1111/1467-9280.00296CrossRefPubMedGoogle Scholar

  23. Neath, I., Earle, A., Hallett, D., & Surprenant, A. M. (2011). Response time accuracy in Apple Macintosh computers. Behavior Research Methods, 43, 353–362. doi:10.3758/s13428-011-0069-9CrossRefPubMedGoogle Scholar

  24. Ollesch, H., Heineken, E., & Schulte, F. P. (2006). Physical or virtual presence of the experimenter: Psychological online-experiments in different settings. International Journal of Internet Science, 1, 71–81.Google Scholar

  25. Plant, R. R., Hammond, N., & Whitehouse, T. (2003). How choice of mouse may affect response timing in psychological studies. Behavior Research Methods, Instruments, & Computers, 35, 276–284. doi:10.3758/bf03202553CrossRefGoogle Scholar

  26. Plant, R. R., & Turner, G. (2009). Millisecond precision psychological research in a world of commodity computers: New hardware, new problems? Behavior Research Methods, 41, 598–614. doi:10.3758/BRM.41.3.598CrossRefPubMedGoogle Scholar

  27. Ratcliff, R. (1979). Group reaction time distributions and an analysis of distribution statistics. Psychological Bulletin, 86, 446–461. doi:10.1037/0033-2909.86.3.446CrossRefPubMedGoogle Scholar

  28. Reimers, S., & Stewart, N. (2007). Adobe flash as a medium for online experimentation: A test of reaction time measurement capabilities. Behavior Research Methods, 39, 365–370. doi:10.3758/bf03193004CrossRefPubMedGoogle Scholar

  29. Reimers, S., & Stewart, N. (2015). Presentation and response timing accuracy in Adobe Flash and HTML5/JavaScript Web experiments. Behavior Research Methods, 47, 309–327. doi:10.3758/s13428-014-0471-1CrossRefPubMedGoogle Scholar

  30. Reips, U.-D. (2002). Internet-based psychological experimenting: Five dos and five don’t@!s. Social Science Computer Review, 20, 241–249. doi:10.1177/08939302020003002Google Scholar

  31. Reips, U.-D., & Birnbaum, M. H. (2011). Behavioral research and data collection via the internet. In R. W. Proctor & K.-P. L. Vu (Eds.), The handbook of human factors in Web design (2nd ed., pp. 563–585). Mahwah, NJ: Erlbaum.CrossRefGoogle Scholar

  32. Reips, U.-D., Buchanan, T., Krantz, J. H., & McGrawn, K. (in press). Methodological challenges in the use of the Internet for scientific research: Ten solutions and recommendations. Studia Psychologica. http://www.uni-konstanz.de/iscience/reips/pubs/papers/StudiaPsy_final.pdf

  33. Reips, U.-D., & Krantz, J. H. (2010). Conducting true experiments on the Web. In S. D. Gosling & J. A. Johnson (Eds.), Advanced methods for conducting online behavioral research (pp. 193–216). Washington, DC: American Psychological Association.CrossRefGoogle Scholar

  34. Rubenstein, H., Garfield, L., & Millikan, J. A. (1970). Homographic entries in the internal lexicon. Journal of Verbal Learning and Verbal Behavior, 9, 487–494. doi:10.1016/S0022-5371(70)80091-3CrossRef

Citation: Ricotti L, Rigosa J, Niosi A, Menciassi A (2013) Analysis of Balance, Rapidity, Force and Reaction Times of Soccer Players at Different Levels of Competition. PLoS ONE 8(10): e77264. https://doi.org/10.1371/journal.pone.0077264

Editor: Bard Ermentrout, University of Pittsburgh, United States of America

Received: April 25, 2013; Accepted: August 30, 2013; Published: October 10, 2013

Copyright: © 2013 Ricotti et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: The authors have no support or funding to report.

Competing interests: The authors have declared that no competing interests exist.


This paper aims at reporting soccer players’ performance in terms of maximum vertical jump height (to determine maximal leg strength), contact time (to assess acyclic rapidity, or quickness), static balance, dynamic balance and visual and acoustic reaction times. This analysis was conducted by taking into account athletes playing in all the ten categories (four professional and six non professional) of the Italian soccer championship, and by analyzing at least fifteen athletes for each category. A control group, including subjects which did not played soccer, or other sports, was also included. This study also aims at demonstrating that a subgroup of the above mentioned characteristics permits to discriminate top-level athletes, among those showing the same training frequency.

Soccer is the most popular team sport worldwide [1], with more than 250 million active players [2]. In general, the formation of a mature athlete necessarily entails the expression of a series of athletic characteristics in a proper and timely manner. Children and adolescents, in fact, are subjected to a maturation process that is not linear, but characterized by “developmental spikes”, affecting their capability to learn specific motor skills at certain ages [3].

Multidimensional performance analysis recently emerged as an effective tool to discriminate talented athletes. Elferink-Gemser and colleagues identified anthropometric, technical, tactical and physiological characteristics that could be able to predict future elite hockey players [4]. More recently, Rikberg and Raudepp measured anthropometric, physical, technical and cognitive characteristics of junior volleyball players, with the aim of discriminating their overall ability [5]. Despite the wide scientific interest in this field and the large number of studies performed, literature lacks of a detailed multiparametric study reporting data on force, rapidity, static and dynamic balance and reaction times of soccer players at different levels of competition.

Besides technical and tactical skills, which are of primary importance in soccer, physical characteristics are actually crucial to discriminate talented from non-talented soccer players. Endurance and (partly) force are much more affected by training frequency and quality in comparison with other characteristics, such as rapidity, balance and reaction times [3].

During a game, professional soccer players perform about 50 turns, comprising sustained forceful contractions to maintain balance and control of the ball against defensive pressure. Hence, force and power expression is an important characteristic of high-level soccer players. Power is, in turn, heavily dependent on maximal leg strength [6].

Acyclic rapidity (or quickness) is the ability to perform a single (non repeated) movement in the shortest time and it is a crucial skill in soccer. The analysis of contact times is an effective means to evaluate athletes’ acyclic rapidity and it was recently used to quantify the performance of professional soccer players during lateral plyometric exercises [7].

Coordinative abilities (dexterity) rely on the movement control and regulation processes: they are of crucial importance in many sports, including soccer, as they allow athletes to easily control their motor actions. Moreover, they permit to learn complex movements in a relatively rapid way. One of the main components of coordinative abilities is balance. Postural control (or balance) can be defined statically as the ability to maintain a base of support with minimal movement (thus minimizing body sway), and dynamically as the ability to perform a task while maintaining a stable position [8]. Balance is influenced by a number of factors, such as sensory information (from somatosensory, visual and vestibular systems), joint range of motion and strength [9,10,11] and it is responsible for the correct execution of complex sport tasks.

Static and dynamic balance performances are often assessed by means of center of pressure (COP) recordings, by using sensorized platforms. Even if COP differs from center of mass (COM), it has been demonstrated that COM trajectory can be computed from the COP one [12], thus justifying COP measurements (relatively easy to obtain) for the assessment of body sway [13]. The relationship between balance ability and athletic performance has been recently deeply reviewed by Hrysomallis, who highlighted the insights achieved in the last two decades about postural control related to athletes’ performance in various sports [14]. Static and dynamic balance was compared between athletes performing different sports, finding that dancers show better static balance than soccer players [15,16], while gymnasts and soccer players do not differ in terms of both static and dynamic balance, also showing superior postural control in comparison with basketball players [17].

Paillard and Noe analyzed the importance of visual information in soccer players according to their level of competition. They found that professional players are less dependent on vision to control their posture in comparison with non professional athletes, thus suggesting that professional players are able to dedicate vision to treat the information emanating from the match [18]. Similar findings have been recently reported by Ben Moussa and colleagues, by comparing the contribution of vision on postural maintenance in professional and amateur soccer players [19].

Reaction times depend on motor nerve conduction velocity and are commonly divided between auditory reaction times (ART) and visual reaction times (VRT). It has been demonstrated that ART are less important than VRT in soccer, since it is essentially a visual game [20,21].

The findings reported in this paper clarify which characteristics are more suitable to discriminate high-level from lower-level soccer players, also between athletes showing the same training frequency. Furthermore, a principal component analysis allows to identify clusters of players with similar performances, thus permitting to resume their characteristics by means of only two parameters, accounting for a significant percentage of data variance.

Materials and Methods


Ten groups of male soccer players (at least 15 subjects per each group) were involved in the study. Each group represented a different category of the Italian soccer championship, from the highest to the lowest level. A control group was also included in the study: to this aim, 15 subjects without any soccer or other sports experience were analyzed. The subjects involved showed an overall age of 23.3. ± 4.9 years, a height of 179.0 ± 5.7 cm and a weight of 74.7 ± 7.8 kg. Athletes playing in professional and non-professional categories obviously differed in terms of training frequency, while athletes playing in the four professional categories were all characterized by the same training frequency (Table 1).

Group labelCategory Italian nameLevelWeekly training frequencyNo. of athletes analyzedAge (years)Height (cm)Weight (kg)
ASerie AProfessional5-71526.2 ± 3.7181.9 ± 6.979.4 ± 4.7
BSerie BProfessional5-71623.4 ± 5.1182.6 ± 2.478.1 ± 4.0
CLega Pro - 1a DivisioneProfessional5-71521.4 ± 1.7182.7 ± 3.578.9 ± 4.4
DLega Pro - 2a DivisioneProfessional5-71725.3 ± 4.1180.6 ± 5.577.8 ± 7.9
ESerie DNon professional4-51719.9 ± 3.7180.0 ± 4.272.5 ± 6.6
FEccellenzaNon professional42322.3 ± 5.8178.3 ± 4.774.1 ± 7.5
GPromozioneNon professional31821.2 ± 3.1177.3 ± 5.470.4 ± 4.5
HPrima CategoriaNon professional31622.6 ± 4.4174.4 ± 6.068.5 ± 5.8
ISeconda CategoriaNon professional21724.8 ± 5.5174.8 ± 7.171.1 ± 9.4
LTerza CategoriaNon professional21624.4 ± 4.7176.9 ± 4.673.5 ± 6.1
X-Control group-1527.3 ± 5.2177.1 ± 4.473.7 ± 8.8
TOT18523.3 ± 4.9179.0 ± 5.774.7 ± 7.8

Table 1. Group labels and number of subjects involved in the study.

A brief interview was carried out before starting the experiments. To be included in the study, subjects should not be injured, nor recovering from ankle, knee, hip or other known injuries. Furthermore, goalkeepers were excluded, as well as subjects that had performed dance, judo or other martial arts for more than six months in their life. Experiments were conducted at the beginning of the competitive season. All the subjects signed an informed consent as required by the Declaration of Helsinki. The study was approved by the local ethics committee of Scuola Superiore Sant’Anna.

Tests and instruments

First, the anthropometric data of each athlete were registered (Figure 1a). Weight was assessed by means of a standard digital balance (Seca, max 200 kg), while height was measured by using a wall-mounted stadiometer (Siber Hegner). Then, athletes’ force was assessed by means of a vertical jump test (Figure 1b). A wall-mounted graduated tape allowed to record vertical maximum jump height. Both static and dynamic balance (Figure 1c,d) were assessed by means of a force platform (WinPosture, Imago snc) that recorded the displacements of the centre of foot pressure (COP) with 1.56 sensors/cm2 and recording in “postural acquisition mode” at 100 Hz. The same platform was used to record contact times during rapidity tests (Figure 1e), by using a “dynamic acquisition mode” at 150 Hz. Finally, visual and acoustic reaction times were recorded by using a personal computer (PC) equipped with a dedicated software (Reaction Times, freely available on the net, Figure 1f).

Figure 1. Overall view of the tests carried out on the subjects involved in the study.

a) measurement of anthropometric values by means of dedicated tools; b) assessment of maximum vertical jump height; c,d) static and dynamic balance tests by means of a force platform; e) assessment of subject’s rapidity by means of contact time measurements; f) assessment of visual and acoustic reaction times by means of a dedicated software.



The tests were conducted in a discrete room free from external distractions and approximately at the same hour, to avoid the possibility of obtaining discrepancies between subjects’ performance (especially concerning balance) due to difference in time of day [22].

After the interview, aiming at identifying and selecting the subjects to involve in the study, soccer players’ age, height and weight were recorded. Then, a brief warm-up (5 min running) was performed. Before vertical jump tests, the total body length of the subject was measured, by asking him to touch the graduated tape with both hands at the highest point possible, without raising the heels from the floor. This value was registered as Lt (total length). During vertical jump tests, subjects started from a standing position and performed a crouching action, immediately followed by a jump for maximal height. Each subject performed the test three times, with two minutes of rest for complete recovery between jumps. The hands were left free to move while jumping and the athlete was asked to touch a point on the graduated tape at the maximum height he could reach. The highest value obtained was recorded and named Hj (height reached with the jump). Athlete’s force performance was quantified as follows:


Vertical jump tests based on three repetitions for each athlete have been demonstrated to represent effective means to measure bilateral leg force, to discriminate between individuals of different performance levels, and to detect training-induced changes to performance [23].

Static balance tests were characterized by unipedal standing postures on both dominant and non-dominant legs. The dominant leg was identified before starting the test as the leg the subject preferentially used to kick the ball. First, the dominant leg was tested: the subject was asked to take position on the force platform, with the standing foot in the centre of the platform, looking at a fixed visual target on the wall (positioned at a distance of 3 m), to raise the non-dominant leg, to keep it flexed 90° at the knee, and to maintain a static position as long as possible for the entire duration of the test (20 s, during which COP displacements were recorded), by keeping both hands on his hips (Movie S1). After this, the subject was allowed to rest for 2 min and then asked to repeat the test, this time raising the dominant leg. A static balance test was repeated when the raised foot touched the surface or the subject moved away the hands from his hips during the test. In order to quantify static balance performances, two parameters were taken into account: COP length (the “travelling distance”, in mm, of COP displacement during the 20 s test) and COP area (the area of the confidence ellipse that encloses 95% of the COP points during the 20 s test).

Dynamic balance tests were also performed for both dominant and non-dominant legs. In this case the subject took position on the force platform with the feet axes parallel to the main axis of the force platform and keeping a distance of 25 cm between the feet.

Then, COP was recorded for 20 s after a small jump (~ 20 cm) landing with only one foot. Once landed, the subject was asked to recover as soon as possible the equilibrium and to stabilize in the unipedal stance, also keeping his hands on the hips and looking at a fixed visual target on the wall, positioned at a distance of 3 m (Movie S1). A rest of 2 minutes was allowed between the two tests. A dynamic balance test was repeated if, after landing, the raised foot touched the surface or the subject moved away the hands from his hips. To quantify dynamic balance performance we considered two parameters, corresponding to COP length respectively 3 and 10 s after subject’s landing.

COP-based postural sway measurements, utilized in this study to quantify both static and dynamic balance, have been demonstrated to be highly reliable by previous literature works [24],[25],[26].

To assess rapidity, the subject was positioned laterally respect to the force platform and asked to perform a small jump on it, setting the instrument on “dynamic” recording modality. Once landed, the subject should jump again as soon as possible leaving the platform area, in order to minimize the contact time on the sensorized surface (Movie S2). Each subject performed the test three times, with 30 s of rest for complete recovery between jumps. The hands were left free to move while jumping. The smallest contact time obtained was recorded and considered as the subject’s rapidity performance. Contact time measurements are considered effective and reliable means to assess subject’s both cyclic and acyclic rapidity, as reported by previous literature examples [3],[27].

For both visual and acoustic reaction time tests the subject was asked to sit on a chair in front of a PC equipped with a dedicated software, with the dominant hand grabbing a mouse and ready to click. The visual test consisted in a series of six visual stimuli appearing at random time intervals on the PC screen: the subject was asked to click as rapidly as possible once he saw the visual stimulus. The acoustic test consisted in a series of six acoustic stimuli generated by the software at random time intervals: the subject was asked to click as rapidly as possible once he heard the acoustic stimulus. For both tests, 5 consecutive trials were performed, with short resting periods between the tests. For each trial, the first two attempts (not recorded) were needed by the subject to familiarize with the procedure and to reach the highest attention level (the first values were often much higher than the following ones, probably due to a drop of attention after the resting periods). Therefore, only the four final values of each trial (20 values in total for each test) were considered and used for the mean value calculation. Such value represented the visual/acoustic reaction time of the subject. This procedure and this number of trials and repetitions have been demonstrated to be sufficiently reliable for the determination of subjects’ reaction times [28].

Statistical analyses

The statistical analyses of the collected data were based on the null hypothesis, which is founded on the assumption that there was not a significant difference between the measured values, for the different tests, concerning the soccer players and the control subjects involved in the study. A one-way ANOVA, where each group represented a different category, was used to assess the existence of any violation of the null hypothesis assumption.

For each test, a series of coupled t-tests was then performed, by comparing each category with all the other ones. The results were plotted as a matrix of colored squares (with each color corresponding to a specific p value) which made easier the identification of clusters of athletes with similar performance. The significance level was set at 0.05.

Since considering a set of statistical inferences simultaneously causes more likely type I errors, i.e. incorrect rejections of correct null hypotheses [29], post-hoc multiple comparisons of the one way ANOVA(s) were also considered. As a consequence, a stronger level of evidence should be observed in the phenomenon to be "significant". The Bonferroni correction is considered to be the most conservative method to control the familywise error rate (i.e. the probability of making false discoveries) in a multiple comparisons problem. Briefly, assuming m as the number of groups, in a multiple comparisons problem we need to determine m confidence intervals (CIs) with an overall confidence level of 1 - α, where α is the significance level. The Bonferroni correction adjusts each individual CI according to the following equation:


Based on the above considerations, we completed our statistical analysis by reporting, for each test, the average value and 95% of the CI (calculated by means of Bonferroni correction) of each category.

For further meta-analyses of the obtained results we also reported an effect size, namely the Cohen’s coefficient f2, defined as:


where R2 is the squared multiple correlation.

Due to the considerably different variances that characterized the collected data, we opted for a non-parametric ANOVA coupled with a resampling method, in particular a repeated random sub-sampling validation. For each test, we performed 100 non-parametric ANOVAs on sub-groups composed of 7 subjects, randomly selected within each experimental group. The distribution of the correspondent obtained p-values was then reported, for each test.

In correspondence to the non-parametric ANOVAs coupled with a resampling method, we calculated the Cohen’s coefficients accordingly. In this case, we considered the expression of f2 in the case of a balanced design (equivalent sample sizes across groups), namely:


where SS is the sum of squares manipulation in ANOVA, μi is the population mean within the ith group of the total K groups and σ includes the equivalent population standard deviations within each group. f2 was considered “small” if around 0.02, “medium” if around 0.15 and “large” if around 0.35.

Finally, we performed a principal component analysis (PCA), by converting a subclass of the observations into linearly uncorrelated variables (the principal components, PCs). More specifically, the observations that we included in the PCA were those referring to force, static balance (COP length values for both dominant and non dominant limbs and COP area values, only concerning non dominant limb), dynamic balance (COP length values 10 s after landing, only for non dominant limb) and rapidity tests. This choice was determined by the experimental results (described in the next section), which highlighted significant differences between athletes showing the same training frequencies (even if playing in different categories) only for the mentioned parameters. PCA is known to be a powerful instrument for data reduction. This is useful when large amount of data may be approximated by a moderately complex model structure [30]. In our specific case, PCA was useful to investigate the topological distribution of the subjects on the plane identified by the first two principal components (accounting ~ 70% of the data variance), with the aim of scattering all the subjects and to identify distinct classes of athletes, grouped according to their performances in the different executed tests. In this way, it was possible to resume such performances by means of only two parameters (the PCs), which were linear combinations of the mentioned tests outcomes. Data analyses were all performed by means of the software MATLAB (Mathwork Inc., MA), by using both existing and ad hoc-developed routines.

Experimental data and MATLAB codes used for the described analyses are available as on-line supporting files (Files S1, S2, S3, S4 and S5).

Results and Discussion

Force performance

Figure 2a reports the ΔL values, calculated as described in (1), for soccer players playing in the categories from A to L and for the control group (X). The ANOVA results, reported in Figure 2b, suggest that the groups are characterized by significantly different force performances (p value, highlighted in red, is much smaller than 0.01), a conclusion that is further confirmed by the Cohen’s f2 effect size, which is much larger than 0.35. Figure 2c shows a graphical representation of the p values of single statistical comparisons between groups, while Figure 2d shows a plot of the ΔL average values ± 95% of CI for the different groups, calculated by applying the Bonferroni correction.

Figure 2. Force performance results.

a) ΔL values for the different analyzed groups (average value ± standard deviation), from A to X. *=p<0.05, **=p<0.01; b) results of the analysis of variance (ANOVA) and effect size (Cohen’s f2) calculation; c) matrix reporting the p values for the coupled t-tests between the different groups; d) multiple comparison plot (average values ± 95% of CI, with Bonferroni adjustement).


Isokinetic strength and anaerobic power have been analyzed in the past years for elite, sub-elite and amateur soccer players. Results revealed that professional players differ from amateurs in terms of knee flexor muscle strength [31]. Maximal isometric force, force-time curve characteristics and vertical jump were also measured in young soccer players at different competition levels, finding that elite athletes expressed significantly higher strength characteristics in comparison with sub-elite and recreational counterparts [32]. More recent studies aimed at comparing strength-related parameters in young or adult soccer players at different levels of competition, focusing on maximal strength [33], full squat power output [34], jumping ability [35] and even specific muscle characteristics by means of tensiomyography [36].

In our case, ΔL values decrease almost linearly between A and L categories, without defining specific clusters of athletes with similar force performances. In addition, results show that control subjects show force performances significantly lower than those of soccer players playing in the L category (p<0.05). The obtained results confirm previous literature findings, reporting that soccer players at different levels of competition show different force performance. However, they also highlight that athletes showing the same training frequency (groups C and D, but also groups G and H) show different force performances. Force is generally strongly dependent on training frequency [3,31], but it is known that it is also partly related to athlete’s intrinsic factors, such as muscle fibre composition, neuromuscular control, etc [37,38]. Such training-unrelated factors would therefore explain the significant differences that we found in force performance between soccer players showing the same training frequency.

Static balance performance

COP length values for the different categories, concerning static balance tests on dominant limb, and the corresponding statistical analyses, are reported in Figure 3. Two macro-groups of soccer players with similar performance can be identified, the former constituted by athletes from A to C categories, the latter constituted by athletes from D to L categories. Control subjects significantly differ (p<0.01) from athletes playing in the L category. ANOVA analysis reveals significant differences between the groups, while Cohen’s f2 shows a considerably high value. Similar results were obtained by analysing COP length values concerning static balance tests on non dominant limb (Figure 4).

Figure 3. Results of static balance tests, evaluated by means of COP length values, on dominant limb.

a) COP length values for the different analyzed groups (average value ± standard deviation), from A to X. *=p<0.05, **=p<0.01; b) results of the analysis of variance (ANOVA) and effect size (Cohen’s f2) calculation; c) matrix reporting the p values for the coupled t-tests between the different groups; d) multiple comparison plot (average values ± 95% of CI, with Bonferroni adjustement).


Figure 4. Results of static balance tests, evaluated by means of COP length values, on non dominant limb.

a) COP length values for the different analyzed groups (average value ± standard deviation), from A to X. *=p<0.05, **=p<0.01; b) results of the analysis of variance (ANOVA) and effect size (Cohen’s f2) calculation; c) matrix reporting the p values for the coupled t-tests between the different groups; d) multiple comparison plot (average values ± 95% of CI, with Bonferroni adjustement).


COP area values evidence no significant differences between soccer players (groups from A to L) for dominant limb (Figure 5), while control subjects significantly differ (p<0.01) from soccer players belonging to group L. Significant differences can be found for the non dominant limb (Figure 6) between soccer players, which clearly identify two separate macro-groups. Furthermore, control subjects significantly differ (*=p<0.05) from soccer players belonging to group L. In the case of COP area values, the size of the macro-group of athletes showing high static balance performances is further reduced, in comparison with COP length values, being constituted by athletes playing only on A and B categories. ANOVA outcomes reveal that COP area values are statistically different for both dominant and non dominant limbs, though the difference is much higher in the case of non dominant limb. Cohen’s f2 effect size values also differ between dominant and non dominant limb, being ~ 0.26 (medium) in the first case and ~ 0.63 (high) in the second case.

Figure 5. Results of static balance tests, evaluated by means of COP area values, on dominant limb.

a) COP area values for the different analyzed groups (average value ± standard deviation), from A to X. **=p<0.01; b) results of the analysis of variance (ANOVA) and effect size (Cohen’s f2) calculation; c) matrix reporting the p values for the coupled t-tests between the different groups; d) multiple comparison plot (average values ± 95% of CI, with Bonferroni adjustement).


Figure 6. Results of static balance tests, evaluated by means of COP area values, on non dominant limb.

a) COP area values for the different analyzed groups (average value ± standard deviation), from A to X. *=p<0.05; b) results of the analysis of variance (ANOVA) and effect size (Cohen’s f2) calculation; c) matrix reporting the p values for the coupled t-tests between the different groups; d) multiple comparison plot (average values ± 95% of CI, with Bonferroni adjustement).


Balance has been effectively used as predictor of injury risks [39,40] and proprioceptive training programs have been used to prevent lower limb injuries in many sports [41,42,43,44]. Furthermore, the reciprocal influence of balance and sport performance has been recently investigated. Concerning soccer, several studies reported that soccer training strongly influences balance abilities, especially concerning unilateral stance [45,46,47,48,49,50,51]. Conversely, it is also clear that intense balance training can improve some aspects of soccer performances, especially at early ages [52,53,54]. Recently, Paillard and colleagues focused on the analysis of postural performance and strategy of soccer players at different levels of competition. They found that national players have superior unipedal static balance than regional players [55].

Our results confirm the general insights already reported in literature, showing that high-level (professional) athletes are characterized by higher static balance performances in comparison with non-professional ones [19,55]. In addition, we were able to identify significant differences in static balance performance between professional athletes (showing the same training frequency): COP length values for both limbs were significantly different between group C and group D, while COP area values for non dominant limb were significantly different between group B and group C. These training-unrelated differences can be ascribed to intrinsic athlete’s abilities, such as greater sensitivity, a higher number of sensory receptors, better integration of information at the central nervous system level, more efficient afferent information at the vestibular or visual level, etc.

Both COP length and COP area values did not significantly differ between dominant and non dominant limb, within the different categories, with the exception of categories E and I, which showed significantly smaller COP area values for non dominant limb (Table 2). These differences are probably due to the preferential use of non dominant limb, in soccer, for balancing the body during most technical movements (e.g. kicking, passing, etc.). However, this tendency is not confirmed for all the categories involved in the study.

Category subjected to comparisonsStatic balance COP LengthStatic balance COP AreaDynamic balance COP Length - 3 sDynamic balance COP Length - 10 s
Limb showing smaller valuesp valueLimb showing smaller valuesp valueLimb showing smaller valuesp valueLimb showing smaller valuesp value

Table 2. Comparison of COP length and COP area values (related to both static and dynamic balance tests) between dominant and non dominant (ND) limbs for athletes playing in the same categories or control subjects.

Dynamic balance performance

Dynamic balance performance was evaluated by means of COP length values at respectively 3 and 10 s after landing on one foot from a jump task. These values provided information concerning the ability to recover a stable stance at different time-points. Figure 7 shows the results obtained for dominant limb. COP length values at 3 and 10 s appear significantly different (p<0.01 in both cases). However, as evidenced by Figure 7d, such difference is mainly due to control subjects, whose dynamic balance performances are greatly lower in comparison with those of soccer players. In fact, if we perform the ANOVA by excluding the control group X, we find no significant differences between the groups from A to L (p = 0.47 and 0.76 for COP length values after 3 s and 10 s, respectively).

Figure 7. Results of dynamic balance tests, evaluated by means of COP length values calculated respectively 3 and 10 s after jump landing, on dominant limb.

a) COP length values for the different analyzed groups (average value ± standard deviation), from A to X and for the different time-points. *=p<0.05; b) results of the analysis of variance (ANOVA) and effect size (Cohen’s f2) calculation for the different time-points; c) matrices reporting the p values for the coupled t-tests between the different groups, for the different time-points; d) multiple comparison plots (average values ± 95% of CI, with Bonferroni adjustement) for the different time-points.


The same parameters, calculated for the non dominant limb, show different trends. Results, reported in Figure 8, show that dynamic COP length values are significantly different between the groups at both the time-points. Such differences are maintained (even if reduced) if we perform ANOVA by excluding the control group X: p values remain smaller than 0.01. Each parameter clearly identifies two macro-groups of athletes, almost corresponding to the division between professional and non professional soccer players. This conclusion is partly mitigated by the statistical analysis based on Bonferroni correction (Figure 8d), which does not confirm the relevant differences in athletes’ performance concerning dynamic balance on non dominant limb for the 3 s time-point. However, significant differences can be still found between athletes concerning COP length values 10 s after jump landing. Concerning data meta-analysis, Cohen’s f2 parameter values calculated for dynamic balance performances are larger in the case of non dominant limb, since significant differences can be found not only between soccer players and control subjects (as in the case of dominant limb), but also between professional and non professional soccer players.

Figure 8. Results of dynamic balance tests, evaluated by means of COP length values calculated respectively 3 and 10 s after jump landing, on non dominant limb.

a) COP length values for the different analyzed groups (average value ± standard deviation), from A to X and for the different time-points. *=p<0.05, **=p<0.01; b) results of the analysis of variance (ANOVA) and effect size (Cohen’s f2) calculation for the different time-points; c) matrices reporting the p values for the coupled t-tests between the different groups, for the different time-points; d) multiple comparison plots (average values ± 95% of CI, with Bonferroni adjustement) for the different time-points.


The reasons of these differences in dynamic balance performances between professional and non-professional players can be found in the different strategy used to process postural-related information: it has been demonstrated that non-professional athletes use more short-loop information (proprioceptive myotactic and plantar cutaneous), while professional ones use more long-loop (vestibular) information [55].

Similarly to COP length and COP area values of static balance tests, dynamic balance results do not significantly differ between dominant and non dominant limb (the only significant difference concerns COP length values at 10 s for athletes playing in category H, see Table 2).

Rapidity performance

Figure 9 reports the results obtained concerning subjects’ rapidity. Contact time values are significantly different between the groups (p < 0.01), with a large Cohen’s f2 (~ 1.32). Two macro-groups of soccer players can be identified, the former constituted by athletes playing in A, B and C categories, the latter constituted by athletes playing in categories from D to L. In addition, control subjects significantly differ (p<0.05) from athletes playing in the L category.

Figure 9. Results of the rapidity tests, evaluated by assessment of contact time values.

a) contact time values for the different analyzed groups (average value ± standard deviation), from A to X. *=p<0.05; b) results of the analysis of variance (ANOVA) and effect size (Cohen’s f2) calculation; c) matrix reporting the p values for the coupled t-tests between the different groups; d) multiple comparison plot (average values ± 95% of CI, with Bonferroni adjustement).


These results confirm the insights of recent studies [56,57] and highlight that only high-level athletes show significantly short contact values, Interestingly, we also found that some professional players strongly differ in terms of rapidity: the athletes of the A, B and C groups show significantly smaller contact times in comparison with those of the D group, thus highlighting that intrinsic factors (e.g. a higher nerve conduction velocity or a different muscle fibre composition) distinguish top-level players among those showing the same training frequency.

Reaction times performance

Figure 10 and Figure 11 report visual and acoustic reaction times, respectively. ANOVA results for both visual (Figure 10b) and acoustic (Figure 11b) reaction times highlight significant differences between the groups (p<0.01), while Cohen’s f2 values are ~ 0.20 (medium) and ~ 0.41 (large), respectively. However, the single comparisons between groups (Figure 10c,d and Figure 11c,d) highlight that no specific macro-groups can be identified, concerning reaction times. These insights are in contrast with recent findings [20,21], probably due to the small number of subjects analyzed in the mentioned studies.

Figure 10. Results of the visual reaction times tests.

a) reaction time values for the different analyzed groups (average value ± standard deviation), from A to X; b) results of the analysis of variance (ANOVA) and effect size (Cohen’s f2) calculation; c) matrix reporting the p values for the coupled t-tests between the different groups; d) multiple comparison plot (average values ± 95% of CI, with Bonferroni adjustement).


Figure 11. Results of the acoustic reaction times tests.

a) reaction time values for the different analyzed groups (average value ± standard deviation), from A to X; b) results of the analysis of variance (ANOVA) and effect size (Cohen’s f2) calculation; c) matrix reporting the p values for the coupled t-tests between the different groups; d) multiple comparison plot (average values ± 95% of CI, with Bonferroni adjustement).


Non-parametric ANOVAs coupled with resampling method

Figure 12

One thought on “Primary Research Paper On Reaction Time

Leave a comment

L'indirizzo email non verrà pubblicato. I campi obbligatori sono contrassegnati *