U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

The Stroop Color and Word Test

Affiliations.

  • 1 "Rita Levi Montalcini" Department of Neuroscience, University of TurinTurin, Italy.
  • 2 IRCCS Istituto Auxologico Italiano, Ospedale San GiuseppePiancavallo, Italy.
  • 3 CiMeC Center for the Mind/Brain Sciences, University of TrentoRovereto, Italy.
  • PMID: 28446889
  • PMCID: PMC5388755
  • DOI: 10.3389/fpsyg.2017.00557

The Stroop Color and Word Test (SCWT) is a neuropsychological test extensively used to assess the ability to inhibit cognitive interference that occurs when the processing of a specific stimulus feature impedes the simultaneous processing of a second stimulus attribute, well-known as the Stroop Effect. The aim of the present work is to verify the theoretical adequacy of the various scoring methods used to measure the Stroop effect. We present a systematic review of studies that have provided normative data for the SCWT. We referred to both electronic databases (i.e., PubMed, Scopus, Google Scholar) and citations. Our findings show that while several scoring methods have been reported in literature, none of the reviewed methods enables us to fully assess the Stroop effect. Furthermore, we discuss several normative scoring methods from the Italian panorama as reported in literature. We claim for an alternative scoring method which takes into consideration both speed and accuracy of the response. Finally, we underline the importance of assessing the performance in all Stroop Test conditions (word reading, color naming, named color-word).

Keywords: executive functions; inhibition; neuropsychological assessment; stroop color and word test; systematic review.

PubMed Disclaimer

Similar articles

  • An updated Italian normative dataset for the Stroop color word test (SCWT). Brugnolo A, De Carli F, Accardo J, Amore M, Bosia LE, Bruzzaniti C, Cappa SF, Cocito L, Colazzo G, Ferrara M, Ghio L, Magi E, Mancardi GL, Nobili F, Pardini M, Rissotto R, Serrati C, Girtler N. Brugnolo A, et al. Neurol Sci. 2016 Mar;37(3):365-72. doi: 10.1007/s10072-015-2428-2. Epub 2015 Nov 30. Neurol Sci. 2016. PMID: 26621362
  • Construct Validity of the Stroop Color-Word Test: Influence of Speed of Visual Search, Verbal Fluency, Working Memory, Cognitive Flexibility, and Conflict Monitoring. Periáñez JA, Lubrini G, García-Gutiérrez A, Ríos-Lago M. Periáñez JA, et al. Arch Clin Neuropsychol. 2021 Jan 15;36(1):99-111. doi: 10.1093/arclin/acaa034. Arch Clin Neuropsychol. 2021. PMID: 32514527 Review.
  • Stroop color and word test (SCWT): Normative data for the Lebanese adult population. Ktaiche M, Fares Y, Abou-Abbas L. Ktaiche M, et al. Appl Neuropsychol Adult. 2022 Nov-Dec;29(6):1578-1586. doi: 10.1080/23279095.2021.1901101. Epub 2021 Mar 29. Appl Neuropsychol Adult. 2022. PMID: 33780300
  • Demographically adjusted norms for Catalan older adults on the Stroop Color and Word Test. Llinàs-Reglà J, Vilalta-Franch J, López-Pousa S, Calvó-Perxas L, Garre-Olmo J. Llinàs-Reglà J, et al. Arch Clin Neuropsychol. 2013 May;28(3):282-96. doi: 10.1093/arclin/act003. Epub 2013 Feb 3. Arch Clin Neuropsychol. 2013. PMID: 23380811
  • Picture-word interference is a Stroop effect: A theoretical analysis and new empirical findings. Starreveld PA, La Heij W. Starreveld PA, et al. Psychon Bull Rev. 2017 Jun;24(3):721-733. doi: 10.3758/s13423-016-1167-6. Psychon Bull Rev. 2017. PMID: 27714665 Free PMC article. Review.
  • Effects of transcranial direct current stimulation on cognition in MCI with Alzheimer's disease risk factors using Bayesian analysis. Kang DW, Wang SM, Um YH, Kim S, Kim T, Kim D, Lee CU, Lim HK. Kang DW, et al. Sci Rep. 2024 Aug 13;14(1):18818. doi: 10.1038/s41598-024-67664-9. Sci Rep. 2024. PMID: 39138281 Free PMC article.
  • Brain metabolites are associated with sleep architecture and cognitive functioning in older adults. Mueller C, Nenert R, Catiul C, Pilkington J, Szaflarski JP, Amara AW. Mueller C, et al. Brain Commun. 2024 Jul 19;6(4):fcae245. doi: 10.1093/braincomms/fcae245. eCollection 2024. Brain Commun. 2024. PMID: 39104903 Free PMC article.
  • Beat-to-beat blood pressure variability, hippocampal atrophy, and memory impairment in older adults. Lohman T, Sible I, Engstrom AC, Kapoor A, Shenasa F, Head E, Sordo L, Alitin JPM, Gaubert A, Nguyen A, Rodgers KE, Bradford D, Nation DA. Lohman T, et al. Geroscience. 2024 Aug 5. doi: 10.1007/s11357-024-01303-z. Online ahead of print. Geroscience. 2024. PMID: 39098984
  • Attentiveness and mental health in adolescents with moderate-to-severe atopic dermatitis without ADHD. Paller AS, Gonzalez ME, Barnum S, Jaeger J, Shao L, Ozturk ZE, Korotzer A. Paller AS, et al. Arch Dermatol Res. 2024 Jul 30;316(8):497. doi: 10.1007/s00403-024-03210-x. Arch Dermatol Res. 2024. PMID: 39080094 Free PMC article.
  • In Males with Adequate Dietary Needs Who Present No Sleep Disturbances, Is an Acute Intake of Zinc Magnesium Aspartate, Following Either Two Consecutive Nights of 8 or 4 h of Sleep Deprivation, Beneficial for Sleep and Morning Stroop Interference Performance? Edwards BJ, Adam RL, Gallagher C, Germaine M, Hulton AT, Pullinger SA, Chester NJ. Edwards BJ, et al. Behav Sci (Basel). 2024 Jul 22;14(7):622. doi: 10.3390/bs14070622. Behav Sci (Basel). 2024. PMID: 39062445 Free PMC article.
  • Al-Ghatani A. M., Obonsawin M. C., Binshaig B. A., Al-Moutaery K. R. (2011). Saudi normative data for the Wisconsin Card Sorting test, Stroop test, test of non-verbal intelligence-3, picture completion and vocabulary (subtest of the wechsler adult intelligence scale-revised). Neurosciences 16, 29–41. - PubMed
  • Amato M. P., Portaccio E., Goretti B., Zipoli V., Ricchiuti L., De Caro M. F., et al. . (2006). The Rao's brief repeatable battery and stroop test: normative values with age, education and gender corrections in an Italian population. Mult. Scler. 12, 787–793. 10.1177/1352458506070933 - DOI - PubMed
  • Andrews K., Shuttleworth-Edwards A., Radloff S. (2012). Normative indications for Xhosa speaking unskilled workers on the Trail Making and Stroop Tests. J. Psychol. Afr. 22, 333–341. 10.1080/14330237.2012.10820538 - DOI
  • Artiola L., Fortuny L. A. I. (1999). Manual de Normas Y Procedimientos Para la Bateria Neuropsicolog. Tucson, AZ: Taylor & Francis.
  • Barbarotto R., Laiacona M., Frosio R., Vecchio M., Farinato A., Capitani E. (1998). A normative study on visual reaction times and two Stroop colour-word tests. Neurol. Sci. 19, 161–170. 10.1007/BF00831566 - DOI - PubMed

Publication types

  • Search in MeSH

Related information

Linkout - more resources, full text sources.

  • Europe PubMed Central
  • Frontiers Media SA
  • PubMed Central
  • University of Turin Instituional Repository AperTO

Other Literature Sources

  • scite Smart Citations

full text provider logo

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

  • Search Menu
  • Sign in through your institution
  • Advance articles
  • Editor's Choice
  • Continuing Education
  • Author Guidelines
  • Submission Site
  • Open Access
  • Why publish with this journal?
  • About Archives of Clinical Neuropsychology
  • About the National Academy of Neuropsychology
  • Journals Career Network
  • Editorial Board
  • Advertising and Corporate Services
  • Self-Archiving Policy
  • Dispatch Dates
  • Journals on Oxford Academic
  • Books on Oxford Academic

Issue Cover

Article Contents

Introduction, materials and methods, conflict of interest.

  • < Previous

Construct Validity of the Stroop Color-Word Test: Influence of Speed of Visual Search, Verbal Fluency, Working Memory, Cognitive Flexibility, and Conflict Monitoring

  • Article contents
  • Figures & tables
  • Supplementary Data

José A Periáñez, Genny Lubrini, Ana García-Gutiérrez, Marcos Ríos-Lago, Construct Validity of the Stroop Color-Word Test: Influence of Speed of Visual Search, Verbal Fluency, Working Memory, Cognitive Flexibility, and Conflict Monitoring, Archives of Clinical Neuropsychology , Volume 36, Issue 1, February 2021, Pages 99–111, https://doi.org/10.1093/arclin/acaa034

  • Permissions Icon Permissions

85 years after the description of the Stroop interference effect, there is still a lack of consensus regarding the cognitive constructs underlying scores from standardized versions of the test. The present work aimed to clarify the cognitive mechanisms underlying direct (word-reading, color-naming, and color-word) and derived scores (interference, difference, ratio, and relative scores) from Golden’s standardized version of the test.

After a comprehensive review of the literature, five cognitive processes were selected for analysis: speed of visual search, phonemic verbal fluency, working memory, cognitive flexibility, and conflict monitoring. These constructs were operationalized by scoring five cognitive tasks (WAIS-IV Digit Symbol, phonemic verbal fluency [letter A], WAIS-IV Digit Span, TMT B-A, and reaction times to the incongruent condition of a computerized Stroop task, respectively). About 83 healthy individuals (mean age = 25.2 years) participated in the study. Correlation and regression analyses were used to clarify the contribution of the five cognitive processes on the prediction of Stroop scores.

Data analyses revealed that Stroop word-reading reflected speed of visual search. Stroop color-naming reflected working memory and speed of visual search. Stroop color-word reflected working memory, conflict monitoring, and speed of visual search. Whereas the interference score was predicted by both conflict monitoring and working memory, the ratio score (color-word divided by color-naming) was predicted by conflict monitoring alone.

The present results will help neuropsychologists to interpret altered patient scores in terms of a failure of the cognitive mechanisms detailed here, benefitting from the solid background of preceding experimental work.

The Stroop color-word test is considered one of the gold standards of attentional measures and is one of the most widely used instruments in clinical and experimental neuropsychological settings ( Strauss, Sherman, & Spreen, 2006 ). Among the different standardized versions, the test proposed by Golden (1978 ) is one of the most extensive, owing to its relatively large number of specific norms for individuals from different sociodemographic conditions and cultures (e.g., Lubrini et al., 2014 ; Strauss et al., 2006 ). This version features a three-page test booklet. On the first page, the words “red,” “green,” and “blue” are printed in black ink and repeated randomly in columns (henceforth Stroop word-reading or SWR). On the second page, the item “XXXX” appears repeatedly in columns, printed in red, green, or blue ink (henceforth Stroop color-naming or SCN). On the third page (referred to as the interference page, henceforth Stroop color-word or SCW), the words “red,” “green,” and “blue” are printed in red, green, or blue ink, but the words and the colors in which they are printed never match. The subject must look at each page and move down the columns, reading words or naming the ink colors as quickly as possible. The test yields three direct scores, based on the number of items completed on each of the three stimulus sheets in 45 s. In addition, interference (SCW − [(SWR*SCN)/(SWR + SCN)]; Golden & Freshwater, 2002 ), difference (SCN − SCW), ratio (SCW/SCN), and relative ([(SCN − SCW)/SCN]*100) derived scores can be calculated ( Lansbergen, Kenemans, & van Engeland, 2007 ; Scarpina & Tagini, 2017 ; Sisco, Slonena, Okun, Bowers, & Price, 2016 ). However, current handbooks of neuropsychological assessment still assert that scores measured in core Stroop test conditions, such as SCW, have only marginal/acceptable reliability and should not be used as the basis of diagnostic decisions, without supplementation by other data ( Strauss et al., 2006 ). Clarifying this point is a central concern when using the Stroop test in experimental and clinical settings. In particular, it is crucial that the clinician has robust validation evidence regarding which cognitive operations underlie the scores provided by standardized neuropsychological measures. Whereas most prior studies agree that the Stroop test has a complex and multifactorial structure comprising several cognitive mechanisms, there is a lack of consensus about their exact nature and about their relative contribution to Stroop test scores. Table 1 presents an overview of 15 studies that have provided information useful to clarify the processes underlying Golden’s version of the Stroop test ( Golden, 1978 ).

Overview of studies of relevance to Stroop construct validity

Authors (year)Sample ( )Stroop scoresOther cognitive scoresStatistical analysesResultsConclusions and implications
Healthy young and middle-age controls, and
traumatic brain injury patients (n=170)
DS- Letter Cancellation
- Serial Subtraction
- Digit Span
- DigSym
- TMT-A, TMT-B
- SDMT
- Knox Cube
PCAIS: loaded .59 in a Sustained Selective Processing factor (together with Serial Substraction scores), and .38 in a Visuo-motor Scanning factor (together with DigSym, Letter Cancellation, SDMT, TMT-A, and TMT-B).Not related to Stroop test validity.
Parents of children with learning disabilities (n = 306)IS (T score)- WCST (Failures to maintain set, Pers. Resp., Categories)
- Test of variables of attention, TOVA (Commission errors)
- Word Fluency (Semantic, Phonetic)
- SCN, SWR, SCW
- Selective Reminding Instructions
- Rey Complex Figure (Copy Organization)
CorrelationIS: correlates with TOVA Commission errors s (r > -.32), SWR (-.37), SCW (.61), and Rey Copy Organization (.34) only in subjects with high reading automaticity.IS reflects the ability to inhibit an automatic response pattern.
Traumatic brain injury (n=622)SWR, SCN, SCW, IS- Visual processing
- Verbal Learning and Memory
- Attention
- WCST (Concept formation)
- Verbal Fluency
- Language
- BNT (Paraphasic Errors)
Correlation
PCA
SWR, SCN, and SCW loaded .62, .60 and .61 on a “Verbal Fluency” factor.
IS: no significant relationship with severity of injury measures.
SWR, SCN, and SCW reflect Verbal Fluency.
IS has a doubtful clinical interest.
Healthy young and old controls (n=60)
(traumatic brain injury n=60 not analysed here)
SCW- TMT-A, TMT-B
- RT Distraction Task
- PASAT-5
- RT Dual Task
- 15 Words Test (LOC score)
- MCST (PERSREL score)
PCAHealthy Controls:
SCW: loaded .78 in a “Control” or Memory-driven Action component (together with PASAT-5, TMT-B, LOC score, and PERSREL), and .45 in a “Speed” or Stimulus-driven Reaction component (together with RT Dual task, and RT Distraction task).
Not related to Stroop test validity.
Healthy control (n=51) (Probable Alzheimer's disease n=59, not analysed here)SWR, SCN, SCW- BNT
- COWAT (Letter Fluency)
- TMT-A, TMT-B
- WCST (Cat, Pers. Err.)
- WAIS-R (Digit span, DigSym, Vocabulary)
- WISC (Block design)
- WMS (delayed and immediate recall)
PCAHealthy Controls:
SWR and SCN loaded both .70 in “speeded of visual processing” factor (together with TMT A and B, Dig. Sym., WISC block design).
SCW: loaded .55 in a “semantic knowledge and verbal processing speed” (together with the BNT, COWAT, Digit Span, and WAIS–R vocabulary).
SWR and SCN reflect information processing speed.
SCW reflects verbal processing speed.
Results also revealed a different factor structure for healthy participants and AD patients.
Traumatic brain injury patients (n=29)
Healthy controls (n=30)
SCN, SWR, SCW, IS- WCST (Pers. Err., Pers. Resp., Incorrect Resp., Correct Resp., Non-Pers. Err.)
- TMT-A, TMT-B, B:A
PCASWR and SCN loaded .91 and .87 in a “speed factor” (together with TMT-A, TMT-B, and SCW).
SCW: loaded .72 on “speed” and .58 on the “interference control” factor (together with IS).
IS: loaded .98 on “interference control” factor (together with SCW).
SWR and SCN tap on ‘speed of processing’.
SCW taps on ‘speed of processing’ and ‘interference control’.
IS provides an indicator of ‘interference control’.
Neurological adult patients (n=46)SCW, IS- WCST (% Pers. Err).
- TMT-A, TMT-B, B-A
- COWAT
CorrelationsSCW: correlates with TMT-B, B-A, IS, COWAT, WCST % Pers. Err (r > .35).
IS: Only correlates with SCW (.36).
Not related to Stroop validity.
Children with autism spectrum disorders (n=18)
Healthy controls (23 Biological siblings of children with autism spectrum disorder, and 25 non-sibling control group.
SWR, SCN, SCW- Stroop computerized task (RTs and errors in neutral, inhibitory trials)
- Flanker task (RTs and errors in neutral, inhibitory trials)
- Go-no go task (RTs and errors in go trials and errors in no go trials)
Group differencesPatients and controls showed no differences in any Stroop score, either computerized or paper and pencil.
Patients and controls differed in RTs from the inhibitory trials of the Flanker task.
Patients and controls differed in errors from the go trials of the go-no go task.
Stroop, Flanker, and Go-no go tasks are measuring different aspects of inhibitory control.
)Healthy Children (n=156)SCN, SCW, DS- Reading skills (10 scores)
- Raven’s SPM
- WISC-III (Arithmetic, Digit Span)
Correlation, PCASCN: correlates r>.39 with time in reading skills (pseudoword, word, and text reading)
SCW: correlates r>.32 with time and errors in reading skills (pseudoword, word, and text reading, and with word and text spelling), and r= -.34 with Digit span.
IS (SCW-SCN): correlates r>.30 with time and errors in reading skills (pseudoword, word, and text reading, and with word and text spelling).
SCN and IS (SCW-SCN) loaded .51 and -.47 in a single “reading speed” factor (together with pseudowords, words, and text read times, and text read errors).
There is a direct link between reading skills and Stroop interference, beyond the effects of executive functioning.
)Healthy controls (n=41)SCW- DigSym
- Finger Tapping
- DFor, DBack
- WCST computer version (RT Switch Cost)
- TMT-A, TMT-B, B-A, B:A; Log B:A
Correlation
Regression
SCW: correlates with DigSym (.48), DBack (.34), TMT-A (.34), TMT-B (.38), B-A (.31).Not related to Stroop validity.
)Patients with mild cognitive impairment or dementia (n=112)SCN, SCW- NPI DisinhibitionCorrelation
Regression
SCN: correlates -.26 with NPI Disinhibition.
SCW: correlates -.22 with NPI Disinhibition.
SCN predict NPI Disinhibition (r =.07), after controlling for SCW, Mini Mental State Examination, and age.
Neither SCW correct nor errors do predict NPI Disinhibition (r =.01), after controlling for colour naming, MMSE, and age.
SCW showed no association with behavioural disinhibition. Indeed, SCN showed a stronger relationship with behavioural disinhibition than did SCW.
Adrover-Roig et al., (2012)Healthy middle-age and elderly controls (n=122)SWR, SCW- TMT-A, TMT-B, TMT B/A
- DFor, DBack
- Rey’s Complex Figure
- BNT
- COWAT
- Semantic Fluency
- Brixton (errors)
- WCST modified version (4 scores)
- DigSym
Latent variable analysis (LISREL)SWR: loaded .43 on “Speed” latent factor (together with DigSym and TMT-A).
SCW: loaded .69 on “Working Memory” latent factor.
Not related to Stroop validity.
)Healthy controls (N=1923).SCW- Symbol Digit
- Finger Tapping (MacQuarrie Test for Mechanical Ability)
- DFor, DBack
- TMT (Log A, Log B, Log Dif. Log Ratio)
- Verbal Fluency (phonemic and semantic)
Correlations
Factorial Analysis
Regression
SCW: correlated with Symbol Digit (.58) Log TMTB (-.54) Finger Tapping (.51), Log TMTA (-.46), Log TMT Dif (-.44), verbal fluency-semantic (.39), DigBack (.38), verbal fluency-phonemic (.35), DigFor (.32), Log TMT Ratio (-.16).
SCW loaded > .72 in speed related factors of the four factorial analyses performed.
Log TMTB, Log TMT Dif, and Log TMT Ratio were predicted by SCW.
Not related to Stroop validity.
Patients with Parkinson Disease (N=58) and non-Parkinson disease age matched peers (N=68)IS, DS, RelS, RatS, Residualized- Digit symbol
- Symbol Search
- TMTA, TMTB, ratio
- COWAT
- WCST (Cat.)
CorrelationsIS: did not correlate with any other cognitive score (r < .21, in all cases).
DS: correlated with Digit symbol (.34), Symbol Search (.25), and TMTA (.20).
RelS: correlated with TMT ratio (.27)
RatS: correlated with TMT ratio (.39) and WCST Cat (-.27).
Residualized: TMT ratio (.27)
IS did not correlate with any measures of processing speed or
executive function.
DS correlated significantly with
standardized processing speed but not executive function measures
RelS, RatS, and Residualized scores correlated with executive
function but not processing speed measures
Patients with neurological disorder (50%), mood disorder (40%), and not specified (10%; n=648)IS (T score)- Time (T score) and errors in TMT-A and TMT-B
- WCST (Trials, Pers. Err., Cat., Trial to complete 1st cat.; % conceptual level; Correct Resp; Non-Pers. Err., Total Err., Failures to Maintain Set, and Learning to Learn)
RegressionIS: only WCST total err. (β= ± .24) minimally predicted IS.IS is related to inhibiting task irrelevant information.
Authors (year)Sample ( )Stroop scoresOther cognitive scoresStatistical analysesResultsConclusions and implications
Healthy young and middle-age controls, and
traumatic brain injury patients (n=170)
DS- Letter Cancellation
- Serial Subtraction
- Digit Span
- DigSym
- TMT-A, TMT-B
- SDMT
- Knox Cube
PCAIS: loaded .59 in a Sustained Selective Processing factor (together with Serial Substraction scores), and .38 in a Visuo-motor Scanning factor (together with DigSym, Letter Cancellation, SDMT, TMT-A, and TMT-B).Not related to Stroop test validity.
Parents of children with learning disabilities (n = 306)IS (T score)- WCST (Failures to maintain set, Pers. Resp., Categories)
- Test of variables of attention, TOVA (Commission errors)
- Word Fluency (Semantic, Phonetic)
- SCN, SWR, SCW
- Selective Reminding Instructions
- Rey Complex Figure (Copy Organization)
CorrelationIS: correlates with TOVA Commission errors s (r > -.32), SWR (-.37), SCW (.61), and Rey Copy Organization (.34) only in subjects with high reading automaticity.IS reflects the ability to inhibit an automatic response pattern.
Traumatic brain injury (n=622)SWR, SCN, SCW, IS- Visual processing
- Verbal Learning and Memory
- Attention
- WCST (Concept formation)
- Verbal Fluency
- Language
- BNT (Paraphasic Errors)
Correlation
PCA
SWR, SCN, and SCW loaded .62, .60 and .61 on a “Verbal Fluency” factor.
IS: no significant relationship with severity of injury measures.
SWR, SCN, and SCW reflect Verbal Fluency.
IS has a doubtful clinical interest.
Healthy young and old controls (n=60)
(traumatic brain injury n=60 not analysed here)
SCW- TMT-A, TMT-B
- RT Distraction Task
- PASAT-5
- RT Dual Task
- 15 Words Test (LOC score)
- MCST (PERSREL score)
PCAHealthy Controls:
SCW: loaded .78 in a “Control” or Memory-driven Action component (together with PASAT-5, TMT-B, LOC score, and PERSREL), and .45 in a “Speed” or Stimulus-driven Reaction component (together with RT Dual task, and RT Distraction task).
Not related to Stroop test validity.
Healthy control (n=51) (Probable Alzheimer's disease n=59, not analysed here)SWR, SCN, SCW- BNT
- COWAT (Letter Fluency)
- TMT-A, TMT-B
- WCST (Cat, Pers. Err.)
- WAIS-R (Digit span, DigSym, Vocabulary)
- WISC (Block design)
- WMS (delayed and immediate recall)
PCAHealthy Controls:
SWR and SCN loaded both .70 in “speeded of visual processing” factor (together with TMT A and B, Dig. Sym., WISC block design).
SCW: loaded .55 in a “semantic knowledge and verbal processing speed” (together with the BNT, COWAT, Digit Span, and WAIS–R vocabulary).
SWR and SCN reflect information processing speed.
SCW reflects verbal processing speed.
Results also revealed a different factor structure for healthy participants and AD patients.
Traumatic brain injury patients (n=29)
Healthy controls (n=30)
SCN, SWR, SCW, IS- WCST (Pers. Err., Pers. Resp., Incorrect Resp., Correct Resp., Non-Pers. Err.)
- TMT-A, TMT-B, B:A
PCASWR and SCN loaded .91 and .87 in a “speed factor” (together with TMT-A, TMT-B, and SCW).
SCW: loaded .72 on “speed” and .58 on the “interference control” factor (together with IS).
IS: loaded .98 on “interference control” factor (together with SCW).
SWR and SCN tap on ‘speed of processing’.
SCW taps on ‘speed of processing’ and ‘interference control’.
IS provides an indicator of ‘interference control’.
Neurological adult patients (n=46)SCW, IS- WCST (% Pers. Err).
- TMT-A, TMT-B, B-A
- COWAT
CorrelationsSCW: correlates with TMT-B, B-A, IS, COWAT, WCST % Pers. Err (r > .35).
IS: Only correlates with SCW (.36).
Not related to Stroop validity.
Children with autism spectrum disorders (n=18)
Healthy controls (23 Biological siblings of children with autism spectrum disorder, and 25 non-sibling control group.
SWR, SCN, SCW- Stroop computerized task (RTs and errors in neutral, inhibitory trials)
- Flanker task (RTs and errors in neutral, inhibitory trials)
- Go-no go task (RTs and errors in go trials and errors in no go trials)
Group differencesPatients and controls showed no differences in any Stroop score, either computerized or paper and pencil.
Patients and controls differed in RTs from the inhibitory trials of the Flanker task.
Patients and controls differed in errors from the go trials of the go-no go task.
Stroop, Flanker, and Go-no go tasks are measuring different aspects of inhibitory control.
)Healthy Children (n=156)SCN, SCW, DS- Reading skills (10 scores)
- Raven’s SPM
- WISC-III (Arithmetic, Digit Span)
Correlation, PCASCN: correlates r>.39 with time in reading skills (pseudoword, word, and text reading)
SCW: correlates r>.32 with time and errors in reading skills (pseudoword, word, and text reading, and with word and text spelling), and r= -.34 with Digit span.
IS (SCW-SCN): correlates r>.30 with time and errors in reading skills (pseudoword, word, and text reading, and with word and text spelling).
SCN and IS (SCW-SCN) loaded .51 and -.47 in a single “reading speed” factor (together with pseudowords, words, and text read times, and text read errors).
There is a direct link between reading skills and Stroop interference, beyond the effects of executive functioning.
)Healthy controls (n=41)SCW- DigSym
- Finger Tapping
- DFor, DBack
- WCST computer version (RT Switch Cost)
- TMT-A, TMT-B, B-A, B:A; Log B:A
Correlation
Regression
SCW: correlates with DigSym (.48), DBack (.34), TMT-A (.34), TMT-B (.38), B-A (.31).Not related to Stroop validity.
)Patients with mild cognitive impairment or dementia (n=112)SCN, SCW- NPI DisinhibitionCorrelation
Regression
SCN: correlates -.26 with NPI Disinhibition.
SCW: correlates -.22 with NPI Disinhibition.
SCN predict NPI Disinhibition (r =.07), after controlling for SCW, Mini Mental State Examination, and age.
Neither SCW correct nor errors do predict NPI Disinhibition (r =.01), after controlling for colour naming, MMSE, and age.
SCW showed no association with behavioural disinhibition. Indeed, SCN showed a stronger relationship with behavioural disinhibition than did SCW.
Adrover-Roig et al., (2012)Healthy middle-age and elderly controls (n=122)SWR, SCW- TMT-A, TMT-B, TMT B/A
- DFor, DBack
- Rey’s Complex Figure
- BNT
- COWAT
- Semantic Fluency
- Brixton (errors)
- WCST modified version (4 scores)
- DigSym
Latent variable analysis (LISREL)SWR: loaded .43 on “Speed” latent factor (together with DigSym and TMT-A).
SCW: loaded .69 on “Working Memory” latent factor.
Not related to Stroop validity.
)Healthy controls (N=1923).SCW- Symbol Digit
- Finger Tapping (MacQuarrie Test for Mechanical Ability)
- DFor, DBack
- TMT (Log A, Log B, Log Dif. Log Ratio)
- Verbal Fluency (phonemic and semantic)
Correlations
Factorial Analysis
Regression
SCW: correlated with Symbol Digit (.58) Log TMTB (-.54) Finger Tapping (.51), Log TMTA (-.46), Log TMT Dif (-.44), verbal fluency-semantic (.39), DigBack (.38), verbal fluency-phonemic (.35), DigFor (.32), Log TMT Ratio (-.16).
SCW loaded > .72 in speed related factors of the four factorial analyses performed.
Log TMTB, Log TMT Dif, and Log TMT Ratio were predicted by SCW.
Not related to Stroop validity.
Patients with Parkinson Disease (N=58) and non-Parkinson disease age matched peers (N=68)IS, DS, RelS, RatS, Residualized- Digit symbol
- Symbol Search
- TMTA, TMTB, ratio
- COWAT
- WCST (Cat.)
CorrelationsIS: did not correlate with any other cognitive score (r < .21, in all cases).
DS: correlated with Digit symbol (.34), Symbol Search (.25), and TMTA (.20).
RelS: correlated with TMT ratio (.27)
RatS: correlated with TMT ratio (.39) and WCST Cat (-.27).
Residualized: TMT ratio (.27)
IS did not correlate with any measures of processing speed or
executive function.
DS correlated significantly with
standardized processing speed but not executive function measures
RelS, RatS, and Residualized scores correlated with executive
function but not processing speed measures
Patients with neurological disorder (50%), mood disorder (40%), and not specified (10%; n=648)IS (T score)- Time (T score) and errors in TMT-A and TMT-B
- WCST (Trials, Pers. Err., Cat., Trial to complete 1st cat.; % conceptual level; Correct Resp; Non-Pers. Err., Total Err., Failures to Maintain Set, and Learning to Learn)
RegressionIS: only WCST total err. (β= ± .24) minimally predicted IS.IS is related to inhibiting task irrelevant information.

Note : TMT = Trail Making Test, WAIS-R = Wechsler Adult Intelligence Test-Revised, WMS = Wechsler Memory Scale, WISC = Wechsler Intelligence Scale for Children, DigSym = WAIS-III Digit Symbol, FingT = Finger tapping, DFor = WAIS-III Digit Forward, DBack = WAIS-III Digit Backward, SCN = Stroop color-naming, SWR = Stroop word-reading, SCW = Stroop color-word, IS (Stroop interference score = SCW − [(SWR*SCN)/(SWR + SCN)]), DS (Stroop difference score = SCW−SCN), RelS (Stroop relative score = [(SCN−SCW)/SCN]*100), RatS (Stroop ratio score = SCW/SCN), BNT = Boston Naming Test, MMSE = Mini Mental State Examination; WCST = Wisconsin Card Sorting Test, MCST = Modified Card Sorting Test, COWAT = Controlled Word Association Test, CVLT = California Verbal Learning Test, PASAT = Paced Auditory Serial Addition Test, CPT = Continuous Performance Test, SDMT = Symbol Digit Modality Test, NPI = Neuropsychiatric Inventory, and PCA = Principal components analysis.

The psychological constructs related to SWR and SCN have been largely related to verbal fluency and speed of processing constructs. Beyond basic language skills such as verbal fluency ( Lanham, Vanderploeg, & Curtiss, 1999 ) or reading skills ( Protopapas, Archonti, & Skaloumbakas, 2007 ), the vertical arrangement of SWR and SCN stimuli in Golden’s version (100 words and 100 colored “XXXX” organized in five columns each with 20 words) appears to involve visual scanning abilities. In this regard, the association of both SWR and SCN, and neuropsychological tests with analogous visual scanning demands, such as TMT-A, TMT-B, Digit Symbol (WISC-III), or block design (WISC-III), supports the interpretation of SWR and SCN as useful measures of “speed of visual processing” ( Adrover-Roig, Sesé, Barceló, & Palmer, 2012 ; Bondi et al., 2002 ; Ríos, Periáñez, & Muñoz-Céspedes, 2004 ). However, it must be noted that only four of the 15 reviewed studies focused on the SWR score, with SCN and SCW attracting the most interest.

The psychological constructs related to the SCW score included cognitive control ( Spikman, Kiers, Deelman, & van Zomeren, 2001 ), phonemic and semantic verbal fluency (COWAT; Bondi et al., 2002 ; Chaytor, Schmitter-Edgecombe, & Burr, 2006 ; Lanham et al., 1999 ; Llinàs-Reglà et al., 2015 ), processing speed (Symbol Digit, TMT-A, TMT-B, Finger tapping; Chaytor et al., 2006 ; Llinàs-Reglà et al., 2015 ; Ríos et al., 2004 ; Sánchez-Cubillo et al., 2009 ; Spikman et al., 2001 ), interference control ( Ríos et al., 2004 ), behavioral disinhibition (NPI-Disinhibition, Heflin et al., 2011 ), cognitive flexibility (WCST-Perseverative errors, TMT-B, B-A, Chaytor et al., 2006 ; Sánchez-Cubillo et al., 2009 ), or working memory (Digit Forwards, Digit Backwards; Adrover-Roig et al., 2012 ; Llinàs-Reglà et al., 2015 ; Protopapas et al., 2007 ; Sánchez-Cubillo et al., 2009 ). However, some of the aforementioned associations are open to question, given that many of these neuropsychological measures require more than a single cognitive ability. For example, the association between SCW and factors like cognitive flexibility, based on its correlation with the TMT-B score ( Chaytor et al., 2006 ), disappeared after controlling for visual search and perceptual speed factors in a multiple regression analysis, which raises questions regarding the involvement of this mechanism in SCW performance ( Sánchez-Cubillo et al., 2009 ). Also, the association between SCW and verbal fluency, as measured by FAS, may be more related to shared executive control abilities such as working memory or shifting/updating than to linguistic skills, as revealed by correlations between FAS and other tasks involving working memory ( Aita et al., 2018 ).

The psychological constructs related to derived test scores are much less clear. For instance, two studies associated the Golden’s interference score to response inhibition (commission errors in TOVA, WCST total errors; Cox et al., 1997 ; Kluttz & Golden, 2016 ) and one more with executive functions ( Sisco et al., 2016 ). Also, the difference score has been related to sustained selective processing (Serial Subtraction task), visuomotor scanning (Digit Symbol, Letter Cancelation, SDMT, TMT-A, TMT-B; Shum, McFarland, Bain, & Humphreys, 1990 ), and reading skills ( Protopapas et al., 2007 ). Only the study by Sisco and colleagues (2016 )) reported data about a relationship of the ratio and relative scores with other executive function measures. Particularly, whereas the Stroop ratio score correlated with both TMT B:A ratio score (TMT-B divided by TMT-A) and WCST completed categories, the relative score correlated with the TMT B:A ratio score. However, bearing in mind the relatively scant available data, it is difficult to extract any robust conclusion for validity purposes.

In addition to the analysis of prior neuropsychological evidence, the study of the association between the Stroop pencil-and-paper test and computerized versions may help to clarify the underlying cognitive processes for at least two sound reasons. First, computerized versions reduce many of the cognitive demands associated with the standard test. For instance, Stroop stimuli are presented one by one in the center of a screen, minimizing both visual search demands and the potential interference from flanking words when targets are arranged in vertical rows. Also, responses in computerized tasks use to be made by pressing a button reducing certain verbal demands. In spite of this, the interference effect is still present in computerized versions ( MacLeod & MacDonald, 2000 ). Second, computerized Stroop tasks accumulate a large amount of behavioral, physiological, and neuroimaging data that supports an association between the Stroop reaction time (RT) interference effect and a more specific executive control mechanism, that is, conflict monitoring, or the operation of a system that detects the occurrence of conflicts in information processing and then passes information on to centers responsible for control (e.g., Botvinick, Braver, Barch, Carter, & Cohen, 2001 ; Botvinick, Cohen, & Carter, 2004 ). However, and as mentioned previously, only few studies have analyzed the association between computerized Stroop tasks and neuropsychological data from pencil-and-paper versions, providing contradictory results ( Kindt, Bierman, & Brosschot, 1996 ; Penner et al., 2012 ). This is a relevant issue given the theoretical and applied interest of building a bridge between these two sets of evidences.

The present study aimed to improve the existing knowledge about construct validity of Stroop test scores ( Golden, 1978 ). Specifically, there are discrepancies regarding the involvement of some of these cognitive abilities in Stroop scores. There is also a lack of consensus regarding the terminology used to refer to cognitive constructs. In addition, there has been little effort to connect theoretical advances from the experimental literature with existing neuropsychological data about the standardized version of the Stroop task. At least, some of these problems can be attributed to the extended use of bivariate correlational methodologies in validation studies, which makes it difficult to identify the relative contribution of different cognitive processes to the relationship observed between two scores. Alternatively, the use of multiple regression provides specific data regarding the joint and unique contribution of the different predictors to the variance of a given criteria score. In an attempt to overcome these limitations, in the present work, five well-known cognitive variables previously associated with Stroop test performance (i.e., processing speed, verbal fluency, working memory, cognitive flexibility, and conflict monitoring) were introduced in a multiple regression model as predictors of direct and derived scores from the Golden’s version of the Stroop test.

Participants

A sample of 83 healthy adults (mean ± SE age = 25.2 ± 1.1; range = 18–64; mean ± SE years of education = 13.6 ± .3; range = 8–24; 61 females) took part in this study. Participants were recruited as volunteers from university courses, university staff, and health care centers. Each participant provided a self-reported history of medical and psychiatric problems. Exclusion criteria were a history of neurological disease, psychiatric illness, head injury, stroke, substance abuse (excluding nicotine), learning disabilities, or any other difficulty that could interfere with testing. All participants had normal or corrected-to-normal vision.

Instruments and Procedure

A neuropsychological examination was conducted by experienced psychologists in one session that included a brief interview, standardized neuropsychological testing, and computerized testing. This study was completed in compliance with institutional research standards for human research and in accordance with the Declaration of Helsinki.

Stroop test: The Spanish adaptation of the Stroop test ( Golden, 1994 ) was used. The number of correct responses in 45 s in the word-reading (SWR), color-naming (SCN), and color-word (SCW) conditions was recorded. The examiner indicated the errors, and participants were asked to correct them before continuing. Four derived scores were also calculated. First, the interference score proposed by Golden and Freshwater (2002 ) represents the difference between SCW score and SCW’, where SCW’ equals (SWR*SCN)/(SWR + SCN). Second, a difference score represents the difference between SCN and SCW conditions. A ratio score was calculated by dividing the SCW score by the SCN score. Lastly, a relative score was calculated according to the formula [(SCN − SCW)/SCN]*100 (see a review of Stroop interference derivate scores in Lansbergen et al., 2007 , and Sisco et al., 2016 ). Evidences of compatibility between English monolinguals and Spanish monolinguals performance of Stroop test have been shown in preceding crosslinguistic studies, revealing a lack of differences in the scores between samples ( Rosselli et al., 2002b ).

Digit Symbol subtest (WAIS-IV): Speed of visual search was assessed using the Digit Symbol subtest from the Spanish adaptation of the WAIS-IV ( Wechsler, 2012 ). This score has shown the highest load in the processing speed factor as described in the WAIS-IV construct validity data ( Wechsler, 2012 ). The score in the test (number of symbols correctly encoded in 2 min) was considered the variable for analyses.

Digit Span (WAIS-IV): Digit Span subtest from the Spanish adaptation of the WAIS-IV ( Wechsler, 2012 ) was used to assess working memory. This test was selected because it shown the highest load in the working memory factor as described in the WAIS-IV construct validity data ( Wechsler, 2012 ). Validation studies have shown that it loaded together with other verbal working memory tasks on working memory latent variables ( Kane et al., 2004 ). The Digit Span score (the sum of the number of correctly recalled items in both Digit Forward and Digit Backward tasks) was recorded as the variable for analysis.

Trail Making Test difference score (TMT B-A): Cognitive flexibility was assessed using the TMT B-A difference score (TMT-B minus TMT-A). Parts A and B of the Trail Making Test were administered to participants according to the guidelines presented by Strauss and colleagues (2006 ). Prior TMT validation studies have suggested that, beside alternative derived scores such as the TMT B:A ratio score (TMT-B divided by TMT-A), the TMT B-A difference score represents a relatively pure measure of cognitive flexibility/task-switching abilities on the basis of its association with the behavioral switch cost as measured in a computerized WCST-like paradigm ( Sánchez-Cubillo et al., 2009 ).

Phonemic verbal fluency (letter A): According to the guidelines presented by Strauss and colleagues (2006 ), participants were asked to produce as many words beginning with the letter “A” as quickly as possible during a period of 1 min, excluding proper names or the same word with different ending. The number of admissible different words produced in the selected phonemic category was the score considered as the variable for analysis (or Fluency A), given that the letter A loads more on verbal fluency latent variables besides alternative ones ( Whiteside et al., 2016 ). Evidences of compatibility between English monolinguals and Spanish monolinguals performance of phonemic verbal fluency (letter A) score have been shown in crosslinguistic studies, revealing a lack of differences between samples ( Rosselli et al., 2002a ).

Computerized Stroop task: This task, inspired by experimental paradigms used in previous behavioral and brain activation studies ( Mead et al., 2002 ; Penner et al., 2012 ; Swick & Jovanovic, 2002 ), was used in order to measure conflict monitoring ( Botvinick et al., 2001 ; Botvinick et al., 2004 ). Participants were instructed to name to the ink color of the word appearing in the center of a computer screen as quickly and accurately as possible. Given that it has been demonstrated that the Stroop interference effect has been affected both by the proportion of congruent an incongruent trials (being smaller when incongruent trials are frequent), and the duration of the interstimulus interval (disappearing during very short intervals; De Jong, Berendsen, & Cools, 1999 ), both parameters were manipulated to better capture the interference effect. Accordingly, the task consisted of 144 trials from one of three experimental conditions randomly intermixed (48 trials each): congruent color-words, incongruent color-words, and color-neutral words. In the congruent condition, three color-words (red, green, and blue) appear in a congruent ink color. In the incongruent condition, the same three color-words appear in one of three incongruent ink colors (red, green, or blue). Lastly, in the color-neutral condition, two Spanish non-color-words (i.e., “glasses,” “table,” in Spanish) appear in one of three ink colors (red, green, and blue). Responses were made by pressing one of three buttons on a computer keyboard with the index finger of the dominant hand in an array corresponding to the three possible printed colors (red, green, and blue). The stimulus duration was 150 ms. A long interstimulus interval of 2400 ms was established. A practice block of 18 trials (6 in each condition) was administered to the participants before the task. Given the already described problems when selecting control conditions to estimate the interference effect by means of subtraction methods (either when using neutral or congruent conditions; Regan, 1978 ; MacLeod, 1991 ), RTs to the incongruent condition (Incongruent RT) were used as the dependent measure for analysis. The use of this variable in a regression context provides a better measure of interference effects, because commonalities between the different predictors can be estimated and controlled (See Methods).

Data Analyses

A set of exploratory correlation analyses helped to describe the pattern of relationships within Stroop test variables and those between Stroop test variables and the remaining cognitive scores. Multiple regression analyses (stepwise) were performed for each Stroop test score using the five selected cognitive scores as potential predictors (i.e., Digit Symbol, Digit Span, Fluency A, TMT B-A, and Incongruent RT). In order to estimate and control the potential effect of age and education in the modulation of the relationships between predictors and criteria scores, these two variables were also introduced in the regression models together with the five cognitive scores. Standard criteria were applied for a predictor variable to be included (probability of F  < .05) or excluded (probability of F  > .1) from each model. Normality, linearity of residuals, absence of multicollinearity, and independence of errors were assessed prior to all regression analyses. A priori contrasts were used in all statistical comparisons with an uncorrected significance level of p  < .05 given that our variable selection derived from a review of studies that had already demonstrated a relationship between scores. The SPSS v.22.0 statistical software package was used to perform analyses. G*Power statistical software ( Faul, Erdfelder, Buchner, & Lang, 2009 ) was used to estimate both the effect sizes ( f 2 ) and the statistical power of the analyses.

Descriptive statistics of all scores are shown in Table 2 .

Descriptive statistics

Stroop scoresOther cognitive measures
SWRSCNSCWISDSRatSRelSDigSymFluency ADigit SpanTMT B-AIncongRT
838383838383838383838383
Mean112.877.250.14.427.1.6535.189.2131625.9513.4
SE1.61.11.31.11.1.011.41.5.531.712.6
Min-Max85–15553–9921–80−19–30.61–48.3–.91.3–6851–1265–287–24−19–9327–943
Stroop scoresOther cognitive measures
SWRSCNSCWISDSRatSRelSDigSymFluency ADigit SpanTMT B-AIncongRT
838383838383838383838383
Mean112.877.250.14.427.1.6535.189.2131625.9513.4
SE1.61.11.31.11.1.011.41.5.531.712.6
Min-Max85–15553–9921–80−19–30.61–48.3–.91.3–6851–1265–287–24−19–9327–943

Note : SWR = Stroop word-reading; SCN = Stroop color-naming; SCW = Stroop color-word; IS (Stroop interference score = SCW − [(SWR*SCN)/(SWR + SCN)]); DS (Stroop difference score = SCN − SCW); RatS (Stroop ratio score = SCW/SCN); RelS (Stroop relative score = [(SCN − SCW)/SCN]*100; DigSym = WAIS-IV Digit Symbol; Fluency A (phonemic verbal fluency task with the letter A); Digit Span = WAIS-IV Digit Forward + Digit Backward); TMT B-A (TMT difference score = TMT-B − TMT-A); IncongRT (reaction time in the incongruent condition of a computerized Stroop task); and SE (standard error of measurement).

Exploratory Correlation Analyses

Intercorrelation Pearson coefficients between Stroop scores and other cognitive measures are shown in Table 3 . All direct scores significantly correlated to each other. Whereas all derived scores had significant correlations with each other, and with the SCW condition, only the difference score was also related to the SCN score (see Table 3 ). With regard to the remaining cognitive scores, SWR correlated with Digit Symbol. The SCN score correlated with Digit Symbol, Digit Span, and Incongruent RT. SCW correlated with Digit Symbol, Digit Span, Incongruent RT, and marginally ( p  = .056) with Fluency A score. The interference score correlated significantly with Digit Symbol, Digit Span, and Incongruent RT, whereas ratio and relative scores only correlated with Incongruent RT. The difference score showed no relation with any of the considered cognitive variables. The linear dependency measured between the ratio and relative scores led to exclude the latter in subsequent analyses.

Correlation matrix

SWRSCNSCWISDSRatSRelSDigSymFluency ADigit SpanTMT B-AIncongRT
SWR1
SCN.55 1
SCW.38 .55 1
IS.01.16.90 1
DS.10.35 −.59 −.85 1
RatS.10.01.83 .96 −.93 1
RelS−.10−.01−.83 −.96 .93 −1 1
DigSym.22 .27 .31 .22 −.09.22−.221
Fluency A.11.15.21 .17−.09.17−.17.22 1
Digit Span.10.28 .31 .24 −.07.20.20.01.28 1
TMT B-A−.16−.08−.13−.08.07−.10.10−.15−.13−.121
IncongRT−.11−.26 −.33 −.27 .12−.25 .25 −.33 −.34 −.08.23 1
SWRSCNSCWISDSRatSRelSDigSymFluency ADigit SpanTMT B-AIncongRT
SWR1
SCN.55 1
SCW.38 .55 1
IS.01.16.90 1
DS.10.35 −.59 −.85 1
RatS.10.01.83 .96 −.93 1
RelS−.10−.01−.83 −.96 .93 −1 1
DigSym.22 .27 .31 .22 −.09.22−.221
Fluency A.11.15.21 .17−.09.17−.17.22 1
Digit Span.10.28 .31 .24 −.07.20.20.01.28 1
TMT B-A−.16−.08−.13−.08.07−.10.10−.15−.13−.121
IncongRT−.11−.26 −.33 −.27 .12−.25 .25 −.33 −.34 −.08.23 1

Note : ** p <0.01; * p <0.05; ( * ) p  < 0.06 (two-tailed); SWR = Stroop word-reading; SCN = Stroop color-naming; SCW = Stroop color-word; IS (Stroop interference score = SCW − [(SWR * SCN)/(SWR + SCN)]); DS (Stroop difference score = SCN − SCW); RatS (Stroop ratio score = SCW/SCN); RelS (Stroop relative score = [(SCN − SCW)/SCN] * 100; DigSym = WAIS-IV Digit Symbol; Fluency A (phonemic verbal fluency task with the letter A); Digit Span = WAIS-IV Digit Forward + Digit Backward, TMT B-A (TMT difference score = TMT-B = TMT-A); and IncongRT (reaction time in the incongruent condition of a computerized Stroop task).

Regression Analyses

The assumptions of regression were met for all the analyses being performed. Thus, visual inspection of histograms and partial regression plots revealed that normality and linearity of residuals were met in all cases. There was an absence of multicollinearity between the independent variables of all regression analyses (FIV < 1.13 in all cases). The analyses also revealed that errors of prediction were independent of each other in all regression models (Durbin-Watson values between 1.5 and 2.1 in all cases).

The multiple regression performed on SWR score as the criterion was significant ( R 2  = .049; p  < .045; Effect size f 2  = .05; Power = .25). Digit Symbol was the variable entering the model with a significant contribution of 4.9% to the prediction of SWR (see Table 4 , top panel). Age and education showed nonsignificant contributions to the prediction of SWR with 3.2% and 1.4% of the variance, respectively ( p s > .107).

Results of multiple regression analysis on Stroop direct and derived scores

Partial Part
SWR
 DigSym .2212.04.045.049.049
 Fluency A.0670.60.550.004
 Digit Span.1060.98.331.012
 TMT B-A−.129−1.18.242.017
 IncongRT−.047−0.40.688.002
SCN
 DigSym .2702.63.010.080.073
 Fluency A.0110.10.921.000
 Digit Span .2802.71.008.085.078
 TMT B-A−.003−0.03.979.000
 IncongRT−.167−1.54.128.029
SCW
 DigSym .2232.25.027.060.048
 Fluency A−.001−0.01.993.000
 Digit Span .2892.94.004.099.083
 TMT B-A−.002−0.02.982.000
 IncongRT −.234−2.25.027.060.048
IS
 DigSym.1451.31.192.021
 Fluency A.0260.23.822.001
 Digit Span .2192.09.040.052.048
 TMT B-A.0040.04.967.000
 IncongRT −.257−2.45.017.070.066
RatS
 DigSym.1511.33.188.022
 Fluency A.0950.83.407.009
 Digit Span.1851.73.087.036
 TMT B-A−.049−0.44.662.002
 IncongRT −.245−2.27.026.060.060
Partial Part
SWR
 DigSym .2212.04.045.049.049
 Fluency A.0670.60.550.004
 Digit Span.1060.98.331.012
 TMT B-A−.129−1.18.242.017
 IncongRT−.047−0.40.688.002
SCN
 DigSym .2702.63.010.080.073
 Fluency A.0110.10.921.000
 Digit Span .2802.71.008.085.078
 TMT B-A−.003−0.03.979.000
 IncongRT−.167−1.54.128.029
SCW
 DigSym .2232.25.027.060.048
 Fluency A−.001−0.01.993.000
 Digit Span .2892.94.004.099.083
 TMT B-A−.002−0.02.982.000
 IncongRT −.234−2.25.027.060.048
IS
 DigSym.1451.31.192.021
 Fluency A.0260.23.822.001
 Digit Span .2192.09.040.052.048
 TMT B-A.0040.04.967.000
 IncongRT −.257−2.45.017.070.066
RatS
 DigSym.1511.33.188.022
 Fluency A.0950.83.407.009
 Digit Span.1851.73.087.036
 TMT B-A−.049−0.44.662.002
 IncongRT −.245−2.27.026.060.060

Note: Age and years of education were entered in the regression models as covariates but are not displayed in Table 4 (see Results ); SWR = Stroop word-reading; SCN = Stroop color-naming; SCW = Stroop color-word; IS (Stroop interference score = SCW − [(SWR * SCN)/(SWR + SCN)]); RatS (Stroop ratio score = CW/CN); DigSym (WAIS-IV Digit Symbol); Digit Span (WAIS-IV Digit Forward + Digit Backward); Fluency A (phonemic verbal fluency task with the letter A); TMT B-A (TMT difference score = TMT-B − TMT-A); and IncongRT (reaction time in the incongruent condition of a computerized Stroop task).

* = variables showing significant contributions to the regression model.

The multiple regression performed on SCN score as the criterion was significant ( R 2  = .132; p  < .01; Effect size f 2  = .18; Power = .78). Digit Symbol and Digit Span were the variables entering the model, accounting for 8% and 8.5% of the variance of SCN, respectively (see Table 4 , second panel). Age and education showed nonsignificant contributions to the prediction of SCN with 0.3% and 0.2% of the variance, respectively ( p s > .635).

The multiple regression performed on SCW score as the criterion was significant ( R 2  = .24; p  < .027; Effect size f 2  = .32; Power = .97). Digit Symbol, Digit Span, and Incongruent RT were the variables entering the model, accounting for 6%, 9.9%, and 6% of the variance of SCW, respectively (see Table 4 , third panel). Age and education showed nonsignificant contributions to the prediction of SCW with 0.05 and 0.003% of the variance, respectively ( p s > .830).

The multiple regression performed on Stroop interference score as the criterion was significant ( R 2  = .123; p  < .04; Effect size f 2  = .14; Power = .65). Digit Span and Incongruent RT were the variables entering the model, accounting for 5.2% and 7% of the variance of Stroop interference score, respectively (see Table 4 , fourth panel). Age and education showed nonsignificant contributions to the prediction of Stroop interference with 2.3% and 0.2% of the variance, respectively ( p s > .169).

Lastly, the multiple regression performed on Stroop ratio score as the criterion was significant ( R 2  = .06; p  < .026; Effect size f 2  = .06; Power = .30). Incongruent RT was the variable entering the model, accounting for 6% to the prediction of Stroop ratio score (see Table 4 , bottom panel). Age and education showed nonsignificant contributions to the prediction of Stroop interference with 2% and 0.1% of the variance, respectively ( p s > .205).

The aim of this study was to clarify which cognitive mechanisms underlie Stroop standardized test scores ( Golden, 1978 ). A sample of 83 healthy individuals was assessed by means of a battery of neuropsychological tests and a computerized task that, according to a comprehensive review of the literature, had previously demonstrated a relationship with Stroop test performance.

A series of exploratory Pearson product–moment correlations confirmed the relationship between Stroop direct scores, supporting the general assumption of common underlying cognitive mechanisms. As shown in Table 3 , results also confirmed our a priori assumption about a relationship between Stroop scores and most cognitive measures selected for the analyses. A series of multiple regression analyses using Stroop direct (SWR, SCN, SCW) and derived scores (interference and ratio) as the criteria were performed to assess the predictive value of the five selected cognitive scores measuring speed of visual search, phonemic verbal fluency, working memory, cognitive flexibility, and conflict monitoring. In the following section, the results will be discussed in relation to preceding findings.

Stroop Test Direct Scores

In summary, multiple regression analysis of SWR suggested that this score was predicted by speed of visual search, as measured by WAIS-IV Digit Symbol score (accounting for a 4.9% of the variance). Likewise, multiple regression analyses performed on SCN suggested that this score was predicted by working memory, as measured by WAIS-IV Digit Span score, and by speed of visual search, as measured by WAIS-IV Digit Symbol score (accounting together for a 16.5% of the variance). Lastly, multiple regression analysis performed on SCW suggested that working memory, conflict monitoring, and speed of visual search were the main contributing variables (as measured by Digit Span, Incongruent RT, and Digit Symbol, respectively), accounting for 21.9% of the variance. The implications of these results and their association with preceding literature are discussed subsequently.

The association between all the Stroop test direct scores and processing speed is coherent with preceding studies in healthy controls as well as clinical populations (e.g., Adrover-Roig et al., 2012 ; Bondi et al., 2002 ; Llinàs-Reglà et al., 2015 ; Ríos et al., 2004 ). Also, the association between working memory and both SCN and SCW test conditions has been replicated in a number of studies ( Adrover-Roig et al., 2012 ; Bondi et al., 2002 ; Llinàs-Reglà et al., 2015 ; Sánchez-Cubillo et al., 2009 ). Even computational proposals of the Stroop task have modeled the way task demand specifications must provide an input to the system in order to perform the appropriate task (i.e., “respond to color” or “respond to word”; Cohen, Dunbar, & McClelland, 1990 ). In addition, the association between SCW and both working memory and conflict monitoring is compatible with the results described by Kane and Engle (2003 ). In their five experiments, they showed that individual differences in working memory predicted Stroop performance and suggested that interference was jointly determined by working memory (i.e., goal maintenance) and executive control mechanisms (i.e., competition resolution). As detailed earlier in the introduction to this article, the interpretation of conflict monitoring as one key executive control mechanism underlying the Stroop interference effect relies on a considerable amount of behavioral, electrophysiological, and neuroimaging preceding evidence. Consequently, the fact that computerized Stroop RTs in the interference condition (Incongruent RT) was a reliable predictor of the SCW score here could be taken as an evidence of a plausible implication of conflict monitoring mechanisms in the score of the standardized test. Recently, attempts have been made to refine the type of conflict underlying Stroop performance. For instance, Kalanthroff, Goldfarb, and Henik (2013 ) showed that both informational conflict (the conflict between contradictory information that arises from the irrelevant word meaning and the relevant word color) and task conflict (the conflict between two tasks—the relevant color-naming task and the irrelevant word-reading task) appear in the Stroop task. However, only the informational conflict seems to determine the interference effect ( Kalanthroff et al., 2013 ). In any case, the possibility of conflict monitoring being a significant mechanism underlying Stroop performance does not invalidate the possibility of complementary cognitive mechanisms being implicated ( MacLeod & MacDonald, 2000 ).

The present analyses did not replicate the association between SCW and cognitive flexibility (as measured by TMT B-A difference score; i.e., Chaytor et al., 2006 ; Llinàs-Reglà et al., 2015 ; Sánchez-Cubillo et al., 2009 ). In light of the present results, this association may owe more to common speed and working memory factors than to cognitive flexibility. This latter explanation is in line with previous data. For instance, factorial studies revealed that the SCW score loaded on factors that appear to represent processing speed (e.g., Digit Symbol, TMT A, FAS) rather than executive functions as measured by tests such as WCST and TMT B ( Bondi et al., 2002 ; Boone, Ponton, Gorsuch, Gonzalez, & Miller, 1998 ). Also, Sánchez-Cubillo and colleagues (2009 ) showed a correlation between SCW and TMT B-A (r = −.31), which disappeared when processing speed and working memory were entered in their multiple regression analyses. Taken together, the present correlation and regression results suggest that the association between TMT B-A and SCW may rely more on common working memory processes, rather than cognitive flexibility, which make a secondarily contribute to both scores.

The results revealed a marginally significant correlation between phonemic verbal fluency (i.e., Fluency A) and SCW condition, but not with SWR or SCN conditions. Moreover, when a multiple regression was performed on SCW, the influence of phonemic verbal fluency disappeared in the presence of the remaining variables. Contradictory evidence exists in the literature about the relation between Stroop test scores and verbal fluency with both positive ( Chaytor et al., 2006 ; Lanham et al., 1999 ; Llinàs-Reglà et al., 2015 ) and negative findings ( Adrover-Roig et al., 2012 ; Sisco et al., 2016 ). The present results support the view that verbal fluency does not seem to be a key variable to account for Stroop test performance. Even if apparently surprising, fluency tasks require participants to select freely among a number of potential responses that would be minimized or even absent during Stroop test conditions ( Botvinick et al., 2001 ).

Stroop Test-Derived Scores

First, and regarding Golden’s interference score, only 6 of the 15 reviewed studies considered this score for analysis. Importantly, only the investigations by Cox and colleagues (1997 ) and by Kluttz and Golden (2016 ) associated the interference score to a specific cognitive operation, that is, response inhibition, on the basis of its relationship with commission errors in the TOVA and WCST total errors, respectively. However, the lack of association between Stroop interference effects and different inhibition tasks like the go/no go task ( Christ, Holt, White, & Green, 2007 ), the negative priming task ( Vitkovitch, Bishop, Dancey, & Richards, 2002 ), or the Hayling test ( Cipolotti et al., 2016 ) also suggests that inhibition may be an inadequate construct to account for the abovementioned effects. In fact, both stop-signal tasks (like the TOVA) and task-switching paradigms (like the WCST) have been related to various coordinated cognitive operations (e.g., Lange et al., 2016 ; Verbruggen & Logan, 2008 ), making it difficult to clarify which of them may determine the association with Stroop test scores. The present data, showing an association between Stroop interference and both conflict monitoring and working memory, as measured by more specific tasks, may provide a parsimonious account for both present and previous validation data given the implications of conflict monitoring and working memory in both the WCST ( Lange et al., 2016 ; Periáñez et al., 2004 ) and the stop-signal task ( Verbruggen & Logan, 2008 ). Second, and regarding the ratio score, only one reviewed study ( Sisco et al., 2016 ) considered it for analysis, reporting a significant association with “executive function,” as measured by both the TMT B:A ratio score and WCST completed categories. The present data are compatible with such a general interpretation and allow clarifying that it was conflict monitoring and not working memory or cognitive flexibility (that did not entered the regression model as significant predictors), the key executive mechanism involved in Stroop ratio score. Lastly, it is important to notice that whereas the difference score was associated to sustained selective processing (Serial Subtraction task), visuomotor scanning speed (Digit Symbol, Letter Cancelation, SDMT, TMT-A, TMT-B; Shum et al., 1990 ; Sisco et al., 2016 ), and reading skills ( Protopapas et al., 2007 ) in three previous works, our analyses were unable to find any association with the cognitive scores being considered. It has to be noticed that the cognitive variables used here differed from those used in preceding investigations. However, the fact that the difference score was the only derived index showing no relationship with any non-Stroop cognitive score may provide support to the idea of Lansbergen and colleagues (2007 )) that the difference score suffers from computational problems making it inadequate for clinical purposes.

In summary, the present data clearly suggests that the SWR requires speed of visual search, SCN reflects working memory and speed, whereas SCW reflects working memory, conflict monitoring, and speed. The results also suggest that the analyzed derived interference indexes (i.e., interference and ratio scores) minimize speed of visual search. In this regard, whereas Golden’s interference score reflected both working memory and conflict monitoring, the ratio score resulted to be a marker of conflict monitoring. However, two potential limitations should be highlighted. First, it is important to notice that the amount of variance accounted by certain predictors, although statistically significant, was modest in some cases. The analysis of the possible reasons of these effects may help clarifying whether it represents a potential limit to the interpretative utility of the current data. One explanation could be that cognitive abilities others than those considered here might be also playing a role in Stroop test performance. This possibility, although plausible, was minimized in the present work by means of a detailed analysis of preceding literature. A complementary possibility is that the different cognitive demands modulating Stroop performance may be operating not as mere additive factors (i.e., the Donders’ fallacy of “pure insertion”; Jensen, 2006 ) but as interacting factors that will increase execution time beyond the time taken by each individual operation alone. In fact, the multiple regression methodology can clearly estimate the association between a group of predictors and a given criteria, but it is unable to estimate the way the cognitive processes underlying those predictors will interact in the cognitive system when operating together (unless another variable measuring this interaction is introduced in the model). If this later possibility is true, the variables accounting for modest amounts of variance should not be neglected when interpreting Stroop scores. In fact, the apparently small effects could be reflecting “the tip of the iceberg” of a more robust association between predictors and outcomes, unable to be captured by current regression models. Whereas contrasting this later hypothesis exceeds the goals and methods described here, future studies investigating the validity of a complex tool like the Stroop test should consider this possibility. In any case, caution must be taken when using the present validation data in applied contexts, especially with those Stroop scores in which the portion of accounted variance was modest. Second, it should be noted that the number of female participants was higher than the number of males. This, however, should not be considered an important bias in the present work, because sex has demonstrated no significant influence modulating Stroop test scores in at least three different Spanish samples from a preceding normative study ( Lubrini et al., 2014 ). In spite of it, the present results on the Stroop test validity fill an important gap in the literature ( Strauss et al., 2006 ) and will help researchers and clinicians to interpret altered patient scores in terms of a failure of the cognitive mechanisms detailed here, benefitting from the solid background provided by preceding experimental work on Stroop tasks.

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Adrover-Roig , D. , Sesé , A. , Barceló , F. , & Palmer , A. ( 2012 ). A latent variable approach to executive control in healthy ageing . Brain and Cognition , 78 , 284 – 299 . doi: 10.1016/j.bandc.2012.01.005 .

Google Scholar

Aita , S. L. , Beach , J. D. , Taylor , S. E. , Borgogna , N. C. , Harrell , M. N. , & Hill , B. D. ( 2018 ). Executive, language, or both? An examination of the construct validity of verbal fluency measures . Applied Neuropsychology. Adult , 7 , 1 – 11 . doi: 10.1080/23279095.2018.1439830 .

Bondi , M. W. , Serody , A. B. , Chan , A. S. , Eberson-Schumate , S. C. , Delis , D. C. , Hansen , L. A.  et al.  ( 2002 ). Cognitive and neuropathologic correlates of Stroop color-word test performance in Alzheimer’s disease . Neuropsychology , 16 , 335 – 343 . doi: 10.1037/0894-4105.16.3.335 .

Boone , K. B. , Ponton , M. O. , Gorsuch , R. L. , Gonzalez , J. J. , & Miller , B. L. ( 1998 ). Factor analysis of four measures of prefrontal lobe functioning . Archives of Clinical Neuropsychology , 13 , 585 – 595 . doi: 10.1093/arclin/13.7.585 .

Botvinick , M. M. , Braver , T. S. , Barch , D. M. , Carter , C. S. , & Cohen , J. D. ( 2001 ). Conflict monitoring and cognitive control . Psychological Review , 108 , 624 – 652 . doi: 10.1037/0033-295X.108.3.624 .

Botvinick , M. M. , Cohen , J. D. , & Carter , C. S. ( 2004 ). Conflict monitoring and anterior cingulate cortex: An update . Trends in Cognitive Sciences , 8 , 539 – 546 . doi: 10.1016/j.tics.2004.10.003 .

Chaytor , N. , Schmitter-Edgecombe , M. , & Burr , R. ( 2006 ). Improving the ecological validity of executive functioning assessment . Archives of Clinical Neuropsychology , 21 , 217 – 227 . doi: 10.1016/j.acn.2005.12.002 .

Christ , S. E. , Holt , D. D. , White , D. A. , & Green , L. ( 2007 ). Inhibitory control in children with autism spectrum disorder . Journal of Autism and Developmental Disorders , 37 , 1155 – 1165 . doi: 10.1007/s10803-006-0259-y .

Cipolotti , L. , Spanò , B. , Healy , C. , Tudor-Sfetea , C. , Chan , E. , White , M.  et al.  ( 2016 ). Inhibition processes are dissociable and lateralized in human prefrontal cortex . Neuropsychologia , 93 , 1 – 12 . doi: 10.1016/j.neuropsychologia.2016.09.018 .

Cohen , J. D. , Dunbar , K. , & McClelland , J. L. ( 1990 ). On the control of automatic processes: A parallel distributed processing account of the Stroop effect . Psychological Review , 97 , 332 – 361 . doi: 10.1037/0033-295X.97.3.332 .

Cox , C. S. , Chee , E. , Chase , G. A. , Baumgardner , T. L. , Schuerholz , L. J. , Reader , M. J.  et al.  ( 1997 ). Reading proficiency affects the construct validity of the Stroop test interference score . The Clinical Neuropsychologist , 11 , 105 – 110 . doi: 10.1080/13854049708407039 .

De Jong , R. , Berendsen , E. , & Cools , R. ( 1999 ). Goal neglect and inhibitory limitations: Dissociable causes of interference effects in conflict situations . Acta Psychologica , 101 , 379 – 394 . doi: 10.1016/S0001-6918(99)00012-8 .

Faul , F. , Erdfelder , E. , Buchner , A. , & Lang , A. G. ( 2009 ). Statistical power analyses using G * power 3.1: Tests for correlation and regression analyses . Behavior Research Methods , 41 , 1149 – 1160 . doi: 10.3758/BRM.41.4.1149 .

Golden , C. J. ( 1978 ). Stroop Color and Word Test: A manual for clinical and experimental uses . Chicago, IL : Stoelting Co.

Google Preview

Golden , C. J. ( 1994 ). Stroop: Test de colores y palabras . Madrid, Spain : TEA Ediciones .

Golden , C. J. , & Freshwater , S. M. ( 2002 ). Stroop Color and Word Test: Revised examiner’s manual . Wood Dale, IL : Stoelting Co.

Heflin , L. H. , Laluz , V. , Jang , J. , Ketelle , R. , Miller , B. L. , & Kramer , J. H. ( 2011 ). Let's inhibit our excitement: The relationships between Stroop, behavioral disinhibition, and the frontal lobes . Neuropsychology , 25 , 655 – 665 . doi: 10.1037/a0023863 .

Jensen , A. R. ( 2006 ). Clocking the mind: Mental chronometry and individual differences . Oxford : Elsevier .

Kalanthroff , E. , Goldfarb , L. , & Henik , A. ( 2013 ). Evidence for interaction between the stop signal and the Stroop task conflict . Journal of Experimental Psychology. Human Perception and Performance , 39 , 579 – 592 . doi: 10.1037/a0027429 .

Kane , M. J. , & Engle , R. W. ( 2003 ). Working-memory capacity and the control of attention: The contributions of goal neglect, response competition, and task set to Stroop interference . Journal of Experimental Psychology: General , 132 , 47 – 70 . doi: 10.1037/0096-3445.132.1.47 .

Kane , M. J. , Hambrick , D. Z. , Tuholski , S. W. , Wilhelm , O. , Payne , T. W. , & Engle , R. W. ( 2004 ). The generality of working memory capacity: A latent-variable approach to verbal and visuospatial memory span and reasoning . Journal of Experimental Psychology: General , 133 , 189 – 217 . doi: 10.1037/0096-3445.133.2.189 .

Kindt , M. , Bierman , D. , & Brosschot , J. F. ( 1996 ). Stroop versus Stroop: Comparison of a card format and a single-trial format of the standard color-word Stroop task and the emotional Stroop task . Personality and Individual Differences , 21 , 653 – 661 . doi: 10.1016/0191-8869(96)00133-X .

Kluttz , A. , & Golden , J. C. ( 2016 ). Assessing executive function measures-the predictors of Stroop interference T-scores . The Clinical Neuropsychologist , 30 , 781 – 782 . doi: 10.1080/13854046.2016.1194480 .

Lange , F. , Kröger , B. , Steinke , A. , Seer , C. , Dengler , R. , & Kopp , B. ( 2016 ). Decomposing card-sorting performance: Effects of working memory load and age-related changes . Neuropsychology , 30 ( 5 ), 579 – 590 . doi: 10.1037/neu0000271 .

Lanham , R. A. , Vanderploeg , R. D. , & Curtiss , G. ( 1999 ). The lack of construct validity of the Stroop color and word test with traumatic brain injury . Archives of Clinical Neuropsychology , 14 , 782 – 783 . doi: 10.1093/arclin/14.8.782 .

Lansbergen , M. M. , Kenemans , J. L. , & van   Engeland , H. ( 2007 ). Stroop interference and attention-deficit/hyperactivity disorder: A review and meta-analysis . Neuropsychology , 21 , 251 – 262 . doi: 10.1037/0894-4105.21.2.251 .

Llinàs-Reglà , J. , Vilalta-Franch , J. , López-Pousa , S. , Calvó-Perxas , L. , Rodas , D. T. , & Garre-Olmo , J. ( 2015 ). The Trail Making Test: Association with other neuropsychological measures and normative values for adults aged 55 years and older from a Spanish-speaking population-based sample . Assessment , 24 , 1 – 14 . doi: 10.1177/1073191115602552 .

Lubrini , G. , Periáñez , J. A. , Ríos-Lago , M. , Viejo-Sobera , R. , Ayesa-Arriola , R. , Sánchez-Cubillo , I.  et al.  ( 2014 ). Clinical Spanish norms of the Stroop test for traumatic brain injury and schizophrenia . Spanish Journal of Psychology , 17 , E96 . doi: 10.1017/sjp.2014.90 .

MacLeod , C. M. ( 1991 ). Half a century of research on the Stroop effect: An integrative review . Psychological Bulletin , 109 , 163 – 203 . doi: 10.1037/0033-2909.109.2.163 .

MacLeod , C. M. , & MacDonald , P. A. ( 2000 ). Interdimensional interference in the Stroop effect: Uncovering the cognitive and neural anatomy of attention . Trends in Cognitive Sciences , 4 , 383 – 391 . doi: 10.1016/S1364-6613(00)01530-8 .

Mead , L. A. , Mayer , A. R. , Bobholz , J. A. , Woodley , S. J. , Cunningham , J. M. , Hammeke , T. A.  et al.  ( 2002 ). Neural basis of the Stroop interference task: Response competition or selective attention?   Journal of the International Neuropsychological Society , 8 , 735 – 742 . doi: 10.1017/S1355617702860015 .

Penner , I. K. , Kobel , M. , Stöcklin , M. , Weber , P. , Opwis , K. , & Calabrese , P. ( 2012 ). The Stroop task: Comparison between the original paradigm and computerized versions in children and adults . Clinical Neuropsychology , 26 , 1142 – 1153 . doi: 10.1080/13854046.2012.713513 .

Periáñez , J. A. , Maestú , F. , Barceló , F. , Fernández , A. , Amo , C. , & Ortiz Alonso , T. ( 2004 ). Spatiotemporal brain dynamics during preparatory set shifting: MEG evidence . NeuroImage , 21 , 687 – 695 . doi: 10.1016/j.neuroimage.2003.10.008 .

Protopapas , A. , Archonti , A. , & Skaloumbakas , C. ( 2007 ). Reading ability is negatively related to Stroop interference . Cognitive Psychology , 54 , 251 – 282 . doi: 10.1016/j.cogpsych.2006.07.003 .

Regan , J. E. ( 1978 ). Involuntary automatic processing in color-naming tasks . Perception and Psychophysics , 24 , 130 – 136 . doi: 10.3758/BF03199539 .

Ríos , M. , Periáñez , J. A. , & Muñoz-Céspedes , J. M. ( 2004 ). Attentional control and slowness of information processing after severe traumatic brain injury . Brain Injury , 18 , 257 – 272 . doi: 10.1080/02699050310001617442 .

Rosselli , M. , Ardila , A. , Salvatierra , J. , Marquez , M. , Matos , L. , & Weekes , V. A. ( 2002a ). A cross-linguistic comparison of verbal fluency tests . International Journal of Neuroscience , 112 , 759 – 776 . doi: 10.1080/00207450290025752 .

Rosselli , M. , Ardila , A. , Santisi , M. N. , Arecco , M. R. , Salvatierra , J. , Conde , A.  et al.  ( 2002b ). Stroop effect in Spanish-English bilinguals . Journal of the International Neuropsychological Society , 8 , 819 – 827 . doi: 10.1017.S1355617702860106 .

Sánchez-Cubillo , I. , Pariáñez , J. A. , Adrover-Roig , D. , Rodríguez-Sánchez , J. M. , Ríos-Lago , M. , Tirapu , J.  et al.  ( 2009 ). Construct validity of the Trail Making Test: Role of task-switching, working memory, inhibition/interference control, and visuomotor abilities . Journal of the International Neuropsychological Society , 15 , 438 – 450 . doi: 10.1017/S1355617709090626 .

Scarpina , F. , & Tagini , S. ( 2017 ). The Stroop color and word test . Frontiers in Psychology , 8 , 557 . doi: 10.3389/fpsyg.2017.00557 .

Shum , D. H. , McFarland , K. , Bain , J. D. , & Humphreys , M. S. ( 1990 ). Effects of closed-head injury on attentional processes: An information-processing stage analysis . Journal of Clinical and Experimental Neuropsychology , 12 , 247 – 264 . doi: 10.1080/01688639008400971 .

Sisco , S. M. , Slonena , E. , Okun , M. S. , Bowers , D. , & Price , C. C. ( 2016 ). Parkinson's disease and the Stroop color word test: Processing speed and interference algorithms . The Clinical Neuropsychologist , 30 , 1104 – 1117 . doi: 10.1080/13854046.2016.1188989 .

Spikman , J. M. , Kiers , H. A. , Deelman , B. G. , & van   Zomeren , A. H. ( 2001 ). Construct validity of concepts of attention in healthy controls and patients with CHI . Brain and Cognition , 47 , 446 – 460 . doi: 10.1006/brcg.2001.1320 .

Strauss , E. , Sherman , E. M. S. , & Spreen , O. ( 2006 ). A compendium of neuropsychological tests: Administration, norms, and commentary (3rd ed.). New York, England : Oxford University Press .

Swick , D. , & Jovanovic , J. ( 2002 ). Anterior cingulate cortex and the Stroop task: Neuropsychological evidence for topographic specificity . Neuropsychologia , 40 , 1240 – 1253 . doi: 10.1016/S0028-3932(01)00226-3 .

Verbruggen , F. , & Logan , G. D. ( 2008 ). Response inhibition in the stop-signal paradigm . Trends in Cognitive Sciences , 12 , 418 – 424 . doi: 10.1016/j.tics.2008.07.005 .

Vitkovitch , M. , Bishop , S. , Dancey , C. , & Richards , A. ( 2002 ). Stroop interference and negative priming in patients with multiple sclerosis . Neuropsychologia , 40 , 1570 – 1576 . doi: 10.1016/S0028-3932(02)00022-2 .

Wechsler , D. ( 2012 ). Escala de inteligencia de Wechsler para adultos – IV . Madrid : Pearson .

Whiteside , D. M. , Kealey , T. , Semla , M. , Luu , H. , Rice , L. , Basso , M. R.  et al.  ( 2016 ). Verbal fluency: Language or executive function measure?   Applied Neuropsychology Adult , 23 , 29 – 34 . doi: 10.1080/23279095.2015.1004574 .

  • short-term memory
  • mental processes
  • reaction time
  • executive functioning
  • verbal fluency
  • naming function
  • construct validity
  • stroop test
  • visual search
  • data analysis
Month: Total Views:
June 2020 31
July 2020 27
August 2020 15
September 2020 24
October 2020 31
November 2020 32
December 2020 22
January 2021 70
February 2021 71
March 2021 133
April 2021 107
May 2021 99
June 2021 97
July 2021 53
August 2021 75
September 2021 111
October 2021 95
November 2021 82
December 2021 87
January 2022 112
February 2022 406
March 2022 724
April 2022 523
May 2022 586
June 2022 470
July 2022 455
August 2022 1,050
September 2022 590
October 2022 544
November 2022 613
December 2022 554
January 2023 481
February 2023 754
March 2023 958
April 2023 690
May 2023 709
June 2023 549
July 2023 527
August 2023 400
September 2023 638
October 2023 795
November 2023 933
December 2023 780
January 2024 1,793
February 2024 764
March 2024 1,026
April 2024 1,022
May 2024 904
June 2024 673
July 2024 569
August 2024 241

Email alerts

Citing articles via.

  • Recommend to your Library

Affiliations

  • Online ISSN 1873-5843
  • Copyright © 2024 Oxford University Press
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

ORIGINAL RESEARCH article

Use of stroop test for sports psychology study: cross-over design research.

\r\nShinji Takahashi*

  • 1 Faculty of Liberal Arts, Tohoku Gakuin University, Sendai, Japan
  • 2 School of Psychology, The University of Queensland, St Lucia, QLD, Australia

Background: In sports psychology research, the Stroop test and its derivations are commonly used to investigate the benefits of exercise on cognitive function. The measures of the Stroop test and the computed interference often have different interclass correlation coefficients (ICC). However, the ICC is never reported in cross-over designs involving multiple variances associated with individual differences.

Objective: We investigated the ICC of the Stroop neutral and incongruent tests and interference (neutral test—incongruent test), and reverse Stroop task using the linear mixed model.

Methods: Forty-eight young adults participated in a cross-over design experiment composed of 2 factors: exercise mode (walking, resistance exercise, badminton, and seated rest as control) and time (pre- and post-tests). Before and after each intervention, participants completed the Stroop neutral and incongruent, and the reverse-Stroop neutral and incongruent tests. We analyzed for each test performance and interference and calculated ICC using the linear mixed model.

Results: The linear mixed model found a significant interaction of exercise mode and time for both the Stroop and reverse-Stroop tasks, suggesting that exercise mode influences the effect of acute exercise on inhibitory function. On the other hand, there was no significant effect of exercise mode for both the Stroop and reverse-Stroop interference. The results also revealed that calculating both the Stroop and reverse-Stroop interference resulted in smaller ICCs than the ICCs of the neutral and incongruent tests for both the Stroop and reverse-Stroop tasks.

Conclusion: The Stroop and reverse-Stroop interferences are known as valid measures of the inhibitory function for cross-sectional research design. However, to understand the benefits of acute exercise on inhibitory function comprehensively by cross-over design, comparing the incongruent test with the neutral test also seems superior because these tests have high reliability and statistical power.

Introduction

Several studies have demonstrated that exercise has beneficial effects on brain structure and cognitive function ( Colcombe et al., 2006 ; Pedersen et al., 2009 ). For example, regular exercise can increase brain volume of older people ( Colcombe et al., 2006 ). To elucidate the mechanism of how exercise affects the structure and function of the brain, researchers have investigated intensity, duration, and mode of exercise ( Lambourne and Tomporowski, 2010 ; Voss et al., 2011 ; Chang et al., 2012 ). The Stroop task ( Stroop, 1935 ) which can measure the inhibitory function is extensively applied in research ( Etnier and Chang, 2009 ). The Stroop task is commonly composed of a neutral test, a congruent test, and an incongruent test. For the neutral and congruent test, individuals are required to name the color of irrelevant letters (e.g., XXXX), a color patch, or the corresponding color word (e.g., “Red” is printed in red ink). In the incongruent test, individuals suppress reading the meaning of the word and respond to the color of the ink which is not matched to the color name (e.g., “Red” is printed in blue ink). Typically, the incongruent test yields a longer response time relative to both the neutral and congruent test. The delay of the response in the incongruent test is called “Stroop effect,” and it is associated with activation in brain regions (e.g., prefrontal cortex, anterior cingulate cortex) associated with the control executive function ( Ruff et al., 2001 ; Zysset et al., 2001 ; Song and Hakoda, 2015 ).

The reverse-Stroop task is a derivation of the Stroop task employed to measure inhibitory function. During the reverse-Stroop task, individuals are asked to respond to the word while ignoring the color of the text rather than identifying the color and ignoring the word. Although the reverse-Stroop task is thought to measure inhibitory function as well as the Stroop task, there are the results which the brain regions associated with the reverse-Stroop task differs from those of the Stroop task ( Ruff et al., 2001 ; Song and Hakoda, 2015 ). The reverse-Stroop task has been used by researchers to investigate how acute exercise influences executive function ( Tsukamoto et al., 2016a , b ). These studies have obtained large effect sizes with relatively small samples, suggesting that the reverse-Stroop task is sensitive to the effect of exercise.

Although the Stroop and reverse-Stroop tasks are adopted to assess the inhibitory function, there is still debate around the method of measurement ( Scarpina and Tagini, 2017 ). Scarpina and Tagini (2017) systematically reviewed studies in which used the Stroop task, suggesting that researchers should report not only test performance (e.g., reaction time or the number of correct responses) but also the Stroop interference which is defined as the difference between the neutral/congruent test and the incongruent test. The neutral and congruent tests which do not involve cognitive conflict are categorized as a test of the information processing ( Chang et al., 2012 ). Given that the incongruent test might be affected by information processing constraints, it seems that the interference which partials out the contribution of information processing is a better index than the incongruent test. Indeed, several studies reported that the Stroop interference is associated with specific structures of the brain, cortical activation, and psychological arousal ( Takeuchi et al., 2012 ; Byun et al., 2014 ; Song and Hakoda, 2015 ), suggesting that the interference is a valid and useful measurement of the inhibitory function.

On the other hand, there is a possibility that incongruent test performance is a better measure of inhibitory function than interference in complex experimental research designs. This is because the intraclass correlation coefficient (ICC) associated with incongruent performance could be higher than for interference ( Siegrist, 1997 ; Strauss et al., 2005 ; Hedge et al., 2018 ). Specifically, in cross-over or mixed designs ( Barnhart et al., 2007 ; Nakagawa and Schielzeth, 2010 ), higher ICC enhances statistical power. Although a number of previous studies investigated the reliability of the Stroop task using ICC ( Franzen et al., 1987 ; Kozora et al., 2004 ; Strauss et al., 2005 ; Wallman et al., 2005 ; Portaccio et al., 2010 ; Mohammadirad et al., 2012 ; Register-Mihalik et al., 2012 ; Bajaj et al., 2015 ; Martínez-Loredo et al., 2017 ), the manners of the Stroop task and the assessment of interference were varied and how to test ICC has been not formatted yet ( Parsons et al., 2019 ). Therefore, the ICC about the Stroop task and its interference seems to has not been adequately examined.

Previous studies involving test-retest designs revealed that each test of the Stroop task showed a higher ICC, than Stroop interference ( Siegrist, 1997 ; Strauss et al., 2005 ; Hedge et al., 2018 ). ICC is defined as the ratio of the variance between participants and the sum of the between participants and the residual variances ( Shrout and Fleiss, 1979 ). Hedge et al. (2018) also explained that calculating interference did not affect the residual variance but it reduced the variance associated with individual differences. In experimental research, the effect of exercise on the inhibitory function may be masked due to low ICC and statistical power. Therefore, the Stroop incongruent test performance might be better suited to experimental research than the Stroop interference.

If calculating the interference selectively reduces the variance between participants, the ICC of the Stroop interference might decrease more substantially in a cross-over design. Test-retest research measures of the Stroop task involve only two observations per participant. On the other hand, cross-over designs involve at least four measures per participant (e.g., experimental condition and control condition × pre-test and post-test). Given that the positive impact of exercise on the inhibitory function is small to medium ( Lambourne and Tomporowski, 2010 ; Voss et al., 2011 ), cross-over designs need to enhance statistical power using measurements with high ICC. However, to the best of our knowledge, no previous investigations have reported ICC of the Stroop test performance and interference for cross-over designs. Therefore, we investigated the ICC of the Stroop task in a cross-over design investigating the effect of exercise on inhibitory function.

One of the reasons why ICC in cross-over design research has not been reported is concerned with statistical analysis. The ICC is commonly calculated using the outputs of one- or two-way analysis of variance (ANOVA) in which one factor is participants. The ANOVA uses the moment method to estimate variance components. This method cannot directly distinguish the variance between participants and the residual variance. Even in a simple test-retest design with both between participants variance and the residual variance as random effects, the moment method cannot distinguish between the two variances. However, the moment method estimates the between participants variance by subtracting from the total random effects’ variance (the sum of the variance between participants and the residual) to the residual variance ( Shrout and Fleiss, 1979 ). Therefore, this method can yield a negative ICC when a sum of variance components of individual differences is smaller than a residual variance, which is substantially meaningless. This disadvantage is a challenge to apply ANOVA in cross-over designs in which there are multiple variances associated with individual differences.

To be able to calculate ICC in a cross-over design, Nakagawa and Schielzeth (2010) and Hedge et al. (2018) suggest using the linear mixed model (LMM), also known as a multilevel model or a hierarchical linear model. The LMM, unlike ANOVA, can estimate each parameter using maximum likelihood (ML) or restricted maximum likelihood (REML), computing multiple variances associated with individual differences separately from the residual variance. Brouwer et al. (2012) and Demetrashvili et al. (2016) demonstrated that the ICC can be calculated using the LMM even in complicated research designs which have multiple variances associated with individual differences. We aimed to calculate the ICC for the Stroop task in a cross-over design investigating an acute exercise effect on inhibitory function and to consider the ICCs’ influence on revealing the effect of acute exercises. We also calculated ICC of the reverse-Stroop task. As described above, although the reverse-Stroop task is a useful measurement, no previous reports have reported the ICC for reverse-Stroop tasks.

We expected that individual tests will show higher ICCs than the interferences for both of the Stroop and reverse-Stroop tasks, and each test with higher ICCs may be more likely to reveal the effects of exercises more than interferences. In this study, we analyzed the dataset composed of a 4 × 2 cross-over design: exercise mode 4 levels (walking, resistance exercise, badminton, and seated rest as a control condition) × time 2 levels (pre- and post-exercise).

Materials and Methods

Participants.

The sample size was calculated using power analysis for a one-way repeated ANOVA with partial eta squared (η p 2 ) of 0.05, power (1–β) of 0.95, expected ICC of.50, and α at 0.05. This analysis indicated the sample size was 43 adequate. Participants consisted of undergraduate students from Tohoku Gakuin University who volunteered to participate in the study. A total of 48 healthy participants (25 men, 23 women) were included in the final analysis. All participants were determined to be free of any cardiopulmonary and metabolic disease and visual disorder. The participants were asked to refrain from alcohol use and strenuous physical activity for 24 h before each experiment, and from smoking, food or caffeine consumption for 2 h preceding the experiments. Written informed consent was obtained from all participants before the first experiment. The Human Subjects Committee of Tohoku Gakuin University approved the study protocol. Table 1 shows the characteristics of the participants.

www.frontiersin.org

Table 1. Characteristics of participants (Mean ± SE).

Participants were required to visit the sports physiology laboratory in the gymnasium on five different days (average interval, 4.5 ± 1.6 days). During the first visit, each participant received a brief introduction to this study and completed informed consent. Their height and weight were measured using a stadiometer and a digital scale, respectively. Next, a Stroop/reverse-Stroop color-word test ( Hakoda and Sasaki, 1990 ) was administered to familiarize participants with the test. A fitness assessment that measured 10-repetition maximum (RM) of 3 resistance exercises (chest press, seated row, and leg press) and aerobic fitness (peak oxygen uptake: V . O 2 peak) was then conducted.

Day 2–5 Experimental Sessions

Laboratory visits 2 to 5 were experimental sessions. Participants completed 4 treatment interventions (walking, resistance exercise, badminton, and seated rest). To minimize the learning effect on the Stroop/reverse-Stroop test, the orders of experimental sessions were counterbalanced. We then confirmed there was no bias between order and exercise mode [χ 2 (9) = 2.3, p = 0.985]. After arriving at the laboratory, participants rested on a comfortable chair for 10 min, then they wore a heart rate (HR) monitor (Model RS800cx; Polar Electro Oy, Kempele, Finland). Before and after each intervention, participants lay on a bed for 5 min to calm their HR, then completed the Stroop and reverse-Stroop test. HR was monitored throughout experimental session, oxygen uptake ( V . O 2 ) was also measured by a portable indirect calorimetry system (MetaMax-3B; Cortex, Leipzig, Germany) during each intervention for 10 min. HR and V . O 2 were averaged for last 7 min.

During the walking condition, walked briskly on a motor-driven treadmill (O2road, Takei Sci. Instruments Co., Niigata, Japan). The speed of brisk walking was set at 6.0 km⋅h –1 . Participants were instructed to walk at a brisk but comfortable pace. However, none changed their speed, and all participants completed the brisk walking at the initial speed. During the resistance exercise, participants performed least two sets of 10 repetitions at 10-RM for three exercises (chest press, seated row, and leg press) using a series of machines (Life Fitness Pro2 series models, Life Fitness, IL) in the gym adjacent to the laboratory. Participants were given a 30 s rest between each set and exercise. During the badminton condition, participants played a singles game against one of three experimenters who had experience in instruction of badminton in the arena adjacent to the laboratory. The investigators played at a level of proficiency that matched the participant’s level and also provided the participants with advice for improvement during the games. During the game, the scores were not recorded and “victory or defeat” was not determined. During the control intervention, participants were seated on a comfortable chair with their smart phones and were instructed to spend time operating their smartphones as normal.

Physical Fitness Assessment

Participants performed a graded exercise test on the motor-driven treadmill. The initial speed was set 7.2–9.6 km⋅h –1 according to estimated physical fitness levels of each participant. Each stage lasted 2-min and was increased by 1.2 km⋅h –1 per stage until volitional exhaustion occurred. V . O 2 was measured throughout the test (MetaMax-3B) and the average of the final 30 s was defined as the V . O 2 peak. HR was monitored throughout the test, and rating of perceived exertion (RPE) was taken at the end of each stage.

To determine the load of the resistance exercise, 10-RM for chest press, seated row, and leg press were measured using the weight stack machines. After warm-up trials, following the advice of an instructor, participants performed 10 repetitions at an initial load selected by participant’s perceived capacity for the 3 exercises. After a 3 min rest, participants performed 10 more repetitions at a load adjusted by the participant based on their perception of the previous set. Participants selected the load of the resistance exercise from one of the two sets closest to the 10-RM.

Stroop and Reverse-Stroop Task

The Stroop/reverse-Stroop test is a pencil and paper exercise that requires manual matching rather than oral naming of items. It consists of four tests arranged in the following order: First is the reverse-Stroop neutral test. Here, a color name (e.g., red) in black ink is in the leftmost column and five different color patches (red, blue, yellow, green, and black) are placed in right side columns. Participants are asked to check the patch corresponding to the color name. Second is the reverse-Stroop incongruent test. Here, a color name (e.g., red) is written in colored ink (e.g., blue) in the leftmost column and five different color patches are in the right-side columns. Participants are instructed to check the patch corresponding to the color name in the leftmost column. Third is the Stroop neutral test. Here, a color patch (e.g., red) is in the leftmost column and five different color names in black ink are in the right-side columns. Participants are asked to check the color name corresponding to the color patch in the leftmost column. Forth is the Stroop incongruent test in which a color name (e.g., red) written using a colored ink (e.g., blue) is in the leftmost column and five color names in black ink are in the-right side columns. Participants are instructed to check a word corresponding to the color of the word in the leftmost column. Each test consists of 100 items and the materials are printed on an A3-size paper. Each test includes practice trails (10 items in 10 s) that precede each test. In each test, participants were instructed to check as many correct items as possible in 60 s. We measured the number of correct responses in each test and then calculated the Stroop- and reverse-Stroop-interferences by subtracting the number of correct responses in the incongruent test from those in the neutral test. Hakoda and Sasaki (1990) recommended the interference ratio (incongruent test score—neutral test score/neutral test score) because the value of the difference between the neutral test score and the incongruent test score for the inhibitory function varies depending on the neutral test score when investigating inhibitory function in a cross-sectional study. However, we employed the interference (incongruent test score—neutral test score) for two reasons. One reason is that both the interference and the interference ratio are substantially equal in a well-controlled longitudinal study that compares the inhibitory function changes over time-course. In practice, we confirmed that there were extremely high correlation coefficients between the interference ratio and the interference divided into each exercise mode and time (pre-, and post-test) (Reverse-Stroop task: r ≥ 0.937; Stroop task r ≥ 0.978). The other reason is that several previous reliability studies used the interference ( Strauss et al., 2005 ; Hedge et al., 2018 ; Parsons et al., 2019 ). Therefore, we feel the interference can provide more relevant information than the interference ratio.

Statistical Analysis

All measurements were described as group mean ± standard error. Statistical analyses were conducted using IBM SPSS 25 (SPSS Inc., Chicago, IL, United States). To examine the exercise intensity of each intervention, % V . O 2 peak and %HRmax were compared by the LMM with exercise mode as a fixed effect and participant as a random effect. A significant main effect of exercise mode was followed up with the Bonferroni method.

To calculate the ICC of the performance of each the Stroop, reverse-Stroop test, and the interferences throughout the whole of interventions, the following statistical model in the LMM was used.

where, y ijk is the number of correct responses in each test or the Stroop or reverse-Stroop interferences of participant i = 1,…, I observed in the exercise mode j = 1,…, J at time point k = 1,…, K , with μ the grand mean, α j the fixed effect of the exercise mode, β k the fixed effect of time, (αβ) jk the fixed effect of the interaction of exercise mode and time, b i ∼ N (0, σ p 2 ) the random effect of participant, ( b α) ij ∼ N (0, σ pm 2 ) the random effect as the interaction of participant and exercise mode, ( b β) ik ∼ N (0, σ pt 2 ) the random effect as the interaction of participant and time, and e ijk ∼ N (0, σ e 2 ) the residual. The REML was used to estimate parameters. The structure of the random effects was assumed as variance components. Following the manner by Brouwer et al. (2012) and Demetrashvili et al. (2016) , the ICC was calculated by following equation.

In Equation 2, the numerator is a sum of the random effects concerned with individual differences, and the denominator is the sum of the random effects and the residual variance. If individual performance is consistent throughout the whole experiment, the ICC should be high. We then calculated a 95% confidence interval of the ICC using the F -approach by Demetrashvili et al. (2016) . Based on Shrout (1998) , we assessed ICCs as follows: “substantial” is 0.81–1.00; “moderate” is 0.61–0.80; “fair” is 0.40–0.60; “slight” is 0.10–0.40; “virtually none” is 0.0–0.10. To investigate the fixed effects, if the interaction (exercise mode × time) was significant in the LMM model, another LMM model, in which a fixed effect is exercise mode and a random effect is participant, and the Bonferroni methods were conducted for pre-test and post-test, respectively.

Intensity of Interventions

Table 2 represents intensities of each intervention. The results of the LMM for % V . O 2 peak and %HRpeak revealed significant main effects [ F (3, 141) ≥ 276.2, p < 0.001], badminton showed significantly higher % V . O 2 peak and %HRpeak than other interventions ( p < 0.001, Cohen’ d ≥ 1.59). The seated rest showed significantly lower % V . O 2 peak and %HRpeak than the other interventions ( p < 0.001, Cohen’ d ≥ 3.53). Differences of % V . O 2 peak and %HRpeak between the walking and resistance exercise were not significant ( p ≥ 0.056, Cohen’ d ≤ 0.438).

www.frontiersin.org

Table 2. Intensities of each intervention (Mean ± SE).

Fixed Effects on Cognitive Performances

Table 3 represents each test performance and interference across exercise mode and time. The LMM showed significant interactions for the reverse-Stroop neutral test [ F (3, 141) = 3.9, p = 0.010] and the Stroop incongruent test [ F (3, 188) = 5.5, p = 0.001]. Results of the post hoc analysis indicate that while no main effects of exercise mode were revealed on pre-test for both of the reverse-Stroop neutral and Stroop incongruent test [ F (3, 141) < 0.3, p > 0.814], significant main effects of exercise mode were found on post-test for both the reverse-Stroop neutral test and Stroop incongruent test [ F (3, 141) > 3.2, p ≤ 0.026]. Badminton significantly enhanced performance of the reverse-Stroop neutral test ( p = 0.018, Cohen’s d = 0.378) and the Stroop incongruent test ( p = 0.006, Cohen’s d = 0.369) relative to control. For the reverse-Stroop incongruent and Stroop neutral tests, although there were no significant interactions [ F (3, 188) < 2.0, p ≥ 0.111] and main effects of exercise mode [ F (3, 141) < 1.7, p ≥ 0.161], main effects of time were significant [ F (3, 188) > 22.3, p < 0.001]. For the Stroop and reverse-Stroop interferences, main effects of exercise mode [ F (3, 141) ≤ 0.9, p ≥ 0.425] and time [ F (1, 47) ≤ 2.0, p ≥ 0.162], and interactions [ F (3, 141) ≤ 2.4, p ≥ 0.067] were not significant.

www.frontiersin.org

Table 3. Each test of the Stroop and reverse-Stroop tasks (Mean ± SE) across exercise modes and time.

Random Effects on Cognitive Performances

When the LMM were conducted for the Stroop and reverse-Stroop tasks, it appeared that the variance of the random interaction of the participant and time gradually transited to the random effect of the participant. Finally, the variance of the random interaction of the participant and time calculated as 0.0, indicating that the covariance parameter was redundant. Yamazaki et al. (2018) reported that individuals with a lower performance before exercise tend to increase greatly in performance after exercise. The results of Yamazaki et al. (2018) implies that there might be a multiple co-linearity between the random effect of the participant and the random interaction of the participant and time. The multiple co-linearity might cause redundant random interactions. Therefore, we modified the model by removing the redundant parameter from the models.

Figure 1 shows each random effect and the residual across each test condition. For the Stoop and reverse-Stroop task, while there were no differences in the residual in all of the indices, random effects in the interferences became much smaller than the neutral and incongruent test. Table 4 shows the ICC for each test and interference. The ICCs of all tests were more than “moderate” ICCs (ICC ≥ 0.745). Notably, reverse-Stroop neutral test, Stroop neutral test, and Stroop incongruent test showed “substantial” ICC (ICC ≥ 0.833). On the other hand, the ICCs of both the reverse-Stroop interference (ICC = 0.392) and the Stroop interference (ICC = 0.362) were “slight” ICC.

www.frontiersin.org

Figure 1. Comparison of variances concerned with participants individuality and the residual variance for the Stroop task (A) and the reverse-Stroop task (B) .

www.frontiersin.org

Table 4. The intraclass correlation coefficients (ICCs) and 95% confidence intervals (95% CI) for each test and interference.

This study investigated ICCs of the Stroop and reverse-Stroop tasks in a cross-over research design. The main finding of this study was that different results were found in the Stroop tests and interference. There was the significant interaction of exercise mode and time for the Stroop incongruent test, while the LMM did not reveal a significant interaction for the Stroop neutral test. The post hoc analysis for the incongruent test revealed that the badminton selectively enhanced the incongruent test performance compared with the control, suggesting that the effects of acute exercise on inhibitory function are influenced by exercise modes. The results that the badminton, which is a hard intensity and open-skilled exercise, improves cognitive functions more than a light intensity and closed-skilled exercise agree with the results of systematic reviews ( Chang et al., 2012 ; Gu et al., 2019 ). There were also large random effects associated with participants comparing with the residual variance for the Stroop tests. The large random effects and small residual yielded “substantial” ICCs throughout the whole experimental procedure, suggesting that the Stroop tests are highly reliable measures for cross-over design researches.

In contrast to the Stroop tests, the LMM did not reveal fixed effects concerned with exercise modes on inhibitory function for the Stroop interference. The Stroop interference also showed much lower ICC relative to both the Stroop tests. These results suggest that calculation of the interference might attenuate the individual differences as the numerator of ICC, resulting in low reliability and statistical power. Given these results, for cross-over design investigating how acute exercise benefits inhibitory function, analyzing the performances of the Stroop neutral/congruent and incongruent tests separately and comparing their changes might be a better approach than calculating and analyzing the Stroop interference. The Stroop interference is known as a valid measure for inhibitory function for cross-sectional studies ( Takeuchi et al., 2012 ; Byun et al., 2014 ; Song and Hakoda, 2015 ; Fagundo et al., 2016 ; Scarpina and Tagini, 2017 ). However, because of the possibility of low reliability and statistical power with the Stroop interference, employing Stroop interference as a dependent variable could reduce the likelihood of finding the effects of exercises for cross-over design study.

The reverse-Stroop test showed different results from the Stroop tests about the fixed effects. While the LMM found a significant interaction of exercise mode and time for the neutral test, there was no significant interaction for the incongruent test. We also did not find significant effects of exercise mode, time and interaction for reverse-Stroop interference. These results suggest that there is no effect of acute exercise on inhibitory function measured by the reverse-Stroop task. We expected that the reverse-Stroop task would be more sensitive to an effect of acute exercise because the previous studies ( Tsukamoto et al., 2016a , b ) showed that the reverse-Stroop incongruent test and the reverse-Stroop interference were significantly enhanced by acute exercises. There is a possibility that the different measurement methods between the previous studies and the present study seems to cause different results. The previous studies ( Tsukamoto et al., 2016a , b ), employing small sample sizes ( N = 12 and N = 10, respectively), measured the Reverse-Stroop neutral and incongruent tests by a computerized test. They found large significant effects of acute exercise on the Reverse-Stroop interference ratio. Although the effect sizes for the previous studies (e.g., Cohen’s d or partial η square) were not reported, considering the small sample size, we expected that the Reverse-Stroop tests would be more sensitive to the effect of acute exercise. However, in spite of the relatively large sample size ( N = 48), unexpectedly, the LMM did not reveal any effects of exercise on the Reverse-Stroop tests measured by a pencil and paper method in the present study. Given that the effect of exercise on the Stroop tests in the present study is similar to the systematic reviews ( Chang et al., 2012 ; Gu et al., 2019 ), the difference between computerized test and pencil and paper test might be a critical factor in the Reverse-Stroop task.

Although the LMM showed differences in fixed effects among the Stroop and reverse-Stroop tests, Random effects and ICCs for the reverse-Stroop tests were similar to the Stroop tests. The neutral test and incongruent test for the reverse-Stroop task showed larger random effects concerned with individual differences relative to the residuals, resulting in more than “moderate” ICCs. The results suggest that the two reverse-Stroop tests are reliable measurements as well as the Stroop tests. The changes of random effects for the reverse-Stroop task from each test to the interference were also similar to the Stroop task. For the reverse-Stroop interference, random effects concerned with individual differences vastly decreased compared with those of the neutral and incongruent tests. Still, the residuals did not much differ from each test to the interference. This discrepancy of changes for random effects and residual seems to be the leading cause of the low reliability of the interferences for the cross-over design.

The comparison of each variance across tests and interferences revealed that the main reason for reduced ICC for the interferences was due to the reduction of random effects concerned with individual differences. These results strongly support our hypothesis that the Stroop and reverse-Stroop tests show higher ICCs than the interferences. Given the small to moderate effect of exercise on cognitive function ( Lambourne and Tomporowski, 2010 ; Voss et al., 2011 ), experimental studies investigating how exercise benefits inhibitory function, employing the interferences for the Stroop and reverse-Stroop tasks with low reliability as a dependent variable might mask the significance of the effect of an acute exercise. The Stroop and the reverse-Stroop incongruent test appear to be affected by inhibitory function and information processing. Therefore, interference that partial out the influence of information processing by subtracting the neutral/congruent tests from the incongruent test might be a reasonable method of assessment. Indeed, substantial cross-sectional studies employed interference to investigate the association between interferences and brain structure or behavioral measurements ( Takeuchi et al., 2012 ; Fagundo et al., 2016 ; Peven et al., 2018 ). However, several experimental studies which detected a selective effect of interventions on inhibitory function have used the incongruent test as the dependent variable ( Ferris et al., 2007 ; Nouchi et al., 2013 ; Ishihara et al., 2017 ). The results of this study might explain why the previous experimental studies used the Stroop or reverse-Stroop incongruent test not but interference. It seems that interference with “slight” ICC is not sensitive to the impact of exercise or any factors (i.e., time ore learning effect). Given more than “moderate” ICCs of the neutral and incongruent tests for the Stroop and reverse-Stroop tasks, analyzing the neutral and the incongruent tests, respectively, and comparing outputs of the analyses for both of the Stroop tasks also might be a better approach to understand comprehensively how acute exercise works on inhibitory function.

One notable difference between the present study and previous research is in the measurement method. We used a paper and pencil matching test to measure each performance of the Stroop and reverse-Stroop task, showing that the calculation of interference for the Stroop and reverse-Stroop tasks decreases the ICC and might mask the fixed effects in cross-over design research. These results and our interpretation correspond to most of the previous studies that measured the Stroop and reverse-Stroop tasks in their experiments. Other studies were detected the fixed effects by analyzing the Stroop interference ( Hyodo et al., 2012 ; Byun et al., 2014 ) and reverse-Stroop interference ( Tsukamoto et al., 2016a , b ). Particularly, the difference in measurement methods might selectively influence the performance of Reverse-Stroop tasks. As described above, we had expected that Reverse-Stroop tasks would be sensitive to exercise based on previous studies ( Tsukamoto et al., 2016a , b ) that showed the Reverse-Stroop performance measured by a computerized test is extremely sensitive to exercise. However, we did not find any effects of exercise on the Reverse-Stroop tests in the present study. This inconsistency between the present study and previous studies might be due to differences between a computerized test and a pencil and paper test. There are fewer studies that have used Reverse-Stroop tasks relative to Stroop tasks, so that we could not interpret that inconsistency about Reverse-Stroop tasks. Therefore, other measurement methods, such as a computerized test or an oral test, might change the influence of calculation of the interference on the ICC. To clarify an interaction between test manners and types of cognitive function, further studies would be needed in the future.

In conclusion, the performance of each neutral and incongruent test for the Stroop and reverse-Stroop tasks has a high ICC while calculating the interference decreases ICC in cross-over design research. We have shown that the cause of the decrease of ICC is the reduction of variances associated with individual differences. The interference for the Stroop and reverse-Stroop tasks are valid indices for the inhibitory function. However, to investigate the effect of exercise on the inhibitory function with adequate statistical power in cross-over design research, researchers should also draw attention to incongruent test performance for the Stroop and reverse-Stroop tasks.

Data Availability Statement

The original contributions presented in the study are included in the article/ Supplementary Material , further inquiries can be directed to the corresponding author/s.

Ethics Statement

The studies involving human participants were reviewed and approved by the Human Subjects Committee of Tohoku Gakuin University. The patients/participants provided their written informed consent to participate in this study.

Author Contributions

ST: conception of this research, data collection, analysis and interpretation, and writing original draft. PG: supervision and review and editing. Both authors contributed to the article and approved the submitted version.

This study was a part of the research project of “Influence of types of acute exercise on physical, mental state, and cognitive function” supported by the Japan Society for the Promotion of Science (Grant number JP 15K01563).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We are grateful to all participants and the two badminton instructors and a resistance exercise trainer. We also thank Dr. Keita Kamijo for providing valuable comments. The results of this study are presented without any fabrication, falsification, or inappropriate data manipulation.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg.2020.614038/full#supplementary-material

Bajaj, J. S., Heuman, D. M., Sterling, R. K., Sanyal, A. J., Siddiqui, M., Matherly, S., et al. (2015). Validation of EncephalApp, smartphone-based Stroop test, for the diagnosis of covert hepatic encephalopathy. Clin. Gastroenterol. Hepatol. 13, 1828–1835. doi: 10.1016/j.cgh.2014.05.011

PubMed Abstract | CrossRef Full Text | Google Scholar

Barnhart, H. X., Haber, M. J., and Lin, L. I. (2007). An overview on assessing agreement with continuous measurements. J. Biopharm. Stat. 17, 529–569. doi: 10.1080/10543400701376480

Brouwer, C. L., Steenbakkers, R. J., van den Heuvel, E., Duppen, J. C., Navran, A., Bijl, H. P., et al. (2012). 3D variation in delineation of head and neck organs at risk. Radiat. Oncol. 7:32.

Google Scholar

Byun, K., Hyodo, K., Suwabe, K., Ochi, G., Sakairi, Y., Kato, M., et al. (2014). Positive effect of acute mild exercise on executive function via arousal-related prefrontal activations: an fNIRS study. NeuroImage 98, 336–345. doi: 10.1016/j.neuroimage.2014.04.067

Chang, Y. K., Labban, J. D., Gapin, J. I., and Etnier, J. L. (2012). The effects of acute exercise on cognitive performance: a meta-analysis. Brain Res. 1453, 87–101. doi: 10.1016/j.brainres.2012.02.068

Colcombe, S. J., Erickson, K. I., Scalf, P. E., Kim, J. S., Prakash, R., McAuley, E., et al. (2006). Aerobic exercise training increases brain volume in aging humans. J. Gerontol. A Biol. Sci. Med. Sci. 61, 1166–1170. doi: 10.1093/gerona/61.11.1166

Demetrashvili, N., Wit, E. C., and van den Heuvel, E. R. (2016). Confidence intervals for intraclass correlation coefficients in variance components models. Stat. Methods Med. Res. 25, 2359–2376. doi: 10.1177/0962280214522787

Etnier, J. L., and Chang, Y. K. (2009). The effect of physical activity on executive function: a brief commentary on definitions, measurement issues, and the current state of the literature. J. Sport Exerc. Psychol. 31, 469–483. doi: 10.1123/jsep.31.4.469

Fagundo, A. B., Jiménez-Murcia, S., Giner-Bartolomé, C., Agüera, Z., Sauchelli, S., Pardo, M., et al. (2016). Modulation of irisin and physical activity on executive functions in obesity and morbid obesity. Sci. Rep. 6:30820.

Ferris, L. T., Williams, J. S., and Shen, C. L. (2007). The effect of acute exercise on serum brain-derived neurotrophic factor levels and cognitive function. Med. Sci. Sports Exerc. 39, 728–734. doi: 10.1249/mss.0b013e31802f04c7

Franzen, M. D., Tishelman, A. C., Sharp, B. H., and Friedman, A. G. (1987). An investigation of the test-retest reliability of the stroop colorword test across two intervals. Arch. Clin. Neuropsychol. 2, 265–272. doi: 10.1016/0887-6177(87)90014-x

CrossRef Full Text | Google Scholar

Gu, Q., Zou, L., Loprinzi, P. D., Quan, M., and Huang, T. (2019). Effects of open versus closed skill exercise on cognitive function: a systematic review. Front. Psychol. 10:1707. doi: 10.3389/fpsyg.2019.01707

Hakoda, Y., and Sasaki, M. (1990). Group version of the Stroop and reverse-Stroop test: the effects of reaction mode, order and practice. Kyoikushinrigakukenkyu (Educ. Psychol. Res.) 38, 389–394. doi: 10.5926/jjep1953.38.4_389

Hedge, C., Powell, G., and Sumner, P. (2018). The reliability paradox: why robust cognitive tasks do not produce reliable individual differences. Behav. Res. Methods 50, 1166–1186. doi: 10.3758/s13428-017-0935-1

Hyodo, K., Dan, I., Suwabe, K., Kyutoku, Y., Yamada, Y., Akahori, M., et al. (2012). Acute moderate exercise enhances compensatory brain activation in older adults. Neurobiol. Aging 33, 2621–2632. doi: 10.1016/j.neurobiolaging.2011.12.022

Ishihara, T., Sugasawa, S., Matsuda, Y., and Mizuno, M. (2017). The beneficial effects of game-based exercise using age-appropriate tennis lessons on the executive functions of 6–12-year-old children. Neurosci. Lett. 642, 97–101. doi: 10.1016/j.neulet.2017.01.057

Kozora, E., Ellison, M. C., and West, S. (2004). Reliability and validity of the proposed American College of Rheumatology neuropsychological battery for systemic lupus erythematosus. Arthritis Care Res. 51, 810–818. doi: 10.1002/art.20692

Lambourne, K., and Tomporowski, P. (2010). The effect of exercise-induced arousal on cognitive task performance: a meta-regression analysis. Brain Res. 1341, 12–24. doi: 10.1016/j.brainres.2010.03.091

Martínez-Loredo, V., Fernández-Hermida, J. R., Carballo, J. L., and Fernández-Artamendi, S. (2017). Long-term reliability and stability of behavioral measures among adolescents: the Delay Discounting and Stroop tasks. J. Adolesc. 58, 33–39. doi: 10.1016/j.adolescence.2017.05.003

Mohammadirad, S., Salavati, M., Takamjani, I. E., Akhbari, B., Sherafat, S., Mazaheri, M., et al. (2012). Intra and intersession reliability of a postural control protocol in athletes with and without anterior cruciate ligament reconstruction: a dual−task paradigm. Int. J. Sports Phys. Ther. 7:627.

Nakagawa, S., and Schielzeth, H. (2010). Repeatability for Gaussian and non−Gaussian data: a practical guide for biologists. Biol. Rev. 85, 935–956.

Nouchi, R., Taki, Y., Takeuchi, H., Hashizume, H., Nozawa, T., Kambara, T., et al. (2013). Brain training game boosts executive functions, working memory and processing speed in the young adults: a randomized controlled trial. PLoS One 8:e55518. doi: 10.1371/journal.pone.0055518

Parsons, S., Kruijt, A.-W., and Fox, E. (2019). Psychological science needs a standard practice of reporting the reliability of cognitive-behavioral measurements. Adv. Methods and Pract. Psychol. Sci. 2, 378–395. doi: 10.1177/2515245919879695

Pedersen, B. K., Pedersen, M., Krabbe, K. S., Bruunsgaard, H., Matthews, V. B., and Febbraio, M. A. (2009). Role of exercise-induced brain-derived neurotrophic factor production in the regulation of energy homeostasis in mammals. Exp. Physiol. 94, 1153–1160. doi: 10.1113/expphysiol.2009.048561

Peven, J. C., Grove, G. A., Jakicic, J. M., Alessi, M. G., and Erickson, K. I. (2018). Associations between short and long bouts of physical activity with executive function in older adults. J. Cogn. Enhanc. 2, 137–145. doi: 10.1007/s41465-018-0080-5

Portaccio, E., Goretti, B., Zipoli, V., Iudice, A., Pina, D. D., Malentacchi, G. M., et al. (2010). Reliability, practice effects, and change indices for Rao’s Brief Repeatable Battery. Mult. Scler. J. 16, 611–617. doi: 10.1177/1352458510362818

Register-Mihalik, J. K., Kontos, D. L., Guskiewicz, K. M., Mihalik, J. P., Conder, R., and Shields, E. W. (2012). Age-related differences and reliability on computerized and paper-and-pencil neurocognitive assessment batteries. J. Athl. Train. 47, 297–305. doi: 10.4085/1062-6050-47.3.13

Ruff, C. C., Woodward, T. S., Laurens, K. R., and Liddle, P. F. (2001). The role of the anterior cingulate cortex in conflict processing: evidence from reverse stroop interference. Neuroimage 14, 1150–1158. doi: 10.1006/nimg.2001.0893

Scarpina, F., and Tagini, S. (2017). The stroop color and word test. Front. Psychol. 8:557. doi: 10.3389/fpsyg.2017.00557

Shrout, P. E. (1998). Measurement reliability and agreement in psychiatry. Stat. Methods Med. Res. 7, 301–317. doi: 10.1177/096228029800700306

Shrout, P. E., and Fleiss, J. L. (1979). Intraclass correlations: uses in assessing rater reliability. Psychol. Bull. 86:420. doi: 10.1037/0033-2909.86.2.420

Siegrist, M. (1997). Test-retest reliability of different versions of the Stroop test. J. Psychol. 131, 299–306. doi: 10.1080/00223989709603516

Song, Y., and Hakoda, Y. (2015). An fMRI study of the functional mechanisms of Stroop/reverse-Stroop effects. Behav. Brain Res. 290, 187–196. doi: 10.1016/j.bbr.2015.04.047

Strauss, G. P., Allen, D. N., Jorgensen, M. L., and Cramer, S. L. (2005). Test-retest reliability of standard and emotional stroop tasks: an investigation of color-word and picture-word versions. Assessment 12, 330–337. doi: 10.1177/1073191105276375

Stroop, J. R. (1935). Studies of interference in serial verbal reactions. J. Exp. Psychol. 18, 643–662. doi: 10.1037/h0054651

Takeuchi, H., Taki, Y., Sassa, Y., Hashizume, H., Sekiguchi, A., Nagase, T., et al. (2012). Regional gray and white matter volume associated with Stroop interference: evidence from voxel-based morphometry. NeuroImage 59, 2899–2907. doi: 10.1016/j.neuroimage.2011.09.064

Tsukamoto, H., Suga, T., Takenaka, S., Tanaka, D., Takeuchi, T., Hamaoka, T., et al. (2016a). Greater impact of acute hi gh-intensity interval exercise on post-exercise executive function compared to moderate-intensity continuous exercise. Physiol. Behav. 155, 224–230. doi: 10.1016/j.physbeh.2015.12.021

Tsukamoto, H., Suga, T., Takenaka, S., Tanaka, D., Takeuchi, T., Hamaoka, T., et al. (2016b). Repeated high-intensity interval exercise shortens the positive effect on executive function during post-exercise recovery in healthy young males. Physiol. Behav. 160, 26–34. doi: 10.1016/j.physbeh.2016.03.029

Voss, M. W., Nagamatsu, L. S., Liu-Ambrose, T., and Kramer, A. F. (2011). Exercise, brain, and cognition across the life span. J. Appl. Physiol. 111, 1505–1513. doi: 10.1152/japplphysiol.00210.2011

Wallman, K. E., Morton, A. R., Goodman, C., and Grove, R. (2005). Reliability of physiological, psychological, and cognitive variables in chronic fatigue syndrome. Res. Sports Med. 13, 231–241. doi: 10.1080/15438620500222562

Yamazaki, Y., Sato, D., Yamashiro, K., Tsubaki, A., Takehara, N., Uetake, Y., et al. (2018). Inter-individual differences in working memory improvement after acute mild and moderate aerobic exercise. PLoS One 13:e0210053. doi: 10.1371/journal.pone.0210053

Zysset, S., Muller, K., Lohmann, G., and von Cramon, D. Y. (2001). Color-word matching stroop task: separating interference and response conflict. Neuroimage 13, 29–36. doi: 10.1006/nimg.2000.0665

Keywords : inhibitory function, random effect, individuality, experimental design, statistical power

Citation: Takahashi S and Grove PM (2020) Use of Stroop Test for Sports Psychology Study: Cross-Over Design Research. Front. Psychol. 11:614038. doi: 10.3389/fpsyg.2020.614038

Received: 05 October 2020; Accepted: 18 November 2020; Published: 07 December 2020.

Reviewed by:

Copyright © 2020 Takahashi and Grove. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Shinji Takahashi, [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Stroop Color and Word Test, Children’s Version

  • Reference work entry
  • pp 2403–2404
  • Cite this reference work entry

stroop test research paper

  • Lisa Moran 5 &
  • Keith Owen Yeates 6  

1117 Accesses

2 Citations

Description

The Stroop Color and Word Test, Children’s Version (2003), is designed to measure the ability to inhibit a prepotent reading response in order to engage a naming response. According to the manual, when used with children, the test can also provide information regarding the development and dominance of the reading system. This version of the Stroop paradigm uses three cards with 100 items each. On the first card, the child is asked to read a list of color words (e.g., red and green) printed in black ink. The second card contains columns of nonword stimuli (XXXX) printed in different colors and the child is asked to name the color of each stimulus. On the final card, color words are printed in colors different from the word (e.g., blue printed in green ink) and the child is required to name the color rather than read the word. In each part, the child is given 45 s to read or name as many items as possible.

The manual suggests the test can be administered in group format,...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References and Readings

Cattell, J. M. (1886). The time it takes to see and name objects. Mind, 11 , 63–65.

Google Scholar  

Golden, C. J. (1978). Stroop Color and Word Test: A manual for clinical and experimental uses . Wood Dale, IL: Stoelting Co.

Golden, Z., & Golden, C. J. (2002). Patterns of performance on the Stroop Color and Word Test in children with learning, attentional, and psychiatric disabilities. Psychology in the Schools, 39 (5), 489–495.

Golden, C. J., Freshwater, S. M., & Golden, Z. (2003). Stroop Color and Word Test, Children’s version for Ages 5–14: A manual for clinical and experimental uses . Wood Dale, IL: Stoelting Co.

Homack, S., & Riccio, C. A. (2004). A meta-analysis of the sensitivity and specificity of the Stroop Color and Word Test with children. Archives of Clinical Neuropsychology, 19 , 725–743.

PubMed   Google Scholar  

Macleod, C. M. (1991). Half a century of research on the Stroop effect – an integrative review. Psychological Bulletin, 109 (2), 163–203.

Neyens, L. G. J., & Aldenkamp, A. P. (1996). Stability of cognitive measures in children of average ability. Child Neuropsychology, 2 , 161–170.

Download references

Author information

Authors and affiliations.

Department of Psychology, Nationwide Children's Hospital, 700 Children's Drive, 43205, Columbus, OH, USA

Keith Owen Yeates

You can also search for this author in PubMed   Google Scholar

Editor information

Editors and affiliations.

Physical Medicine and Rehabilitation, and Professor of Neurosurgery, and Psychiatry Virginia Commonwealth University – Medical Center Department of Physical Medicine and Rehabilitation, VCU, 980542, Richmond, Virginia, 23298-0542, USA

Jeffrey S. Kreutzer

Kessler Foundation Research Center, 1199 Pleasant Valley Way, West Orange, NJ, 07052, USA

John DeLuca

Professor of Physical Medicine and Rehabilitation, and Neurology and Neuroscience, University of Medicine and Dentistry of New Jersey – New Jersey Medical School, New Jersey, USA

Independent Practice, 564 M.O.B. East, 100 E. Lancaster Ave., Wynnewood, PA, 19096, USA

Bruce Caplan

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this entry

Cite this entry.

Moran, L., Yeates, K.O. (2011). Stroop Color and Word Test, Children’s Version. In: Kreutzer, J.S., DeLuca, J., Caplan, B. (eds) Encyclopedia of Clinical Neuropsychology. Springer, New York, NY. https://doi.org/10.1007/978-0-387-79948-3_1597

Download citation

DOI : https://doi.org/10.1007/978-0-387-79948-3_1597

Publisher Name : Springer, New York, NY

Print ISBN : 978-0-387-79947-6

Online ISBN : 978-0-387-79948-3

eBook Packages : Behavioral Science Reference Module Humanities and Social Sciences Reference Module Business, Economics and Social Sciences

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Front Psychol

Stroop effects from newly learned color words: effects of memory consolidation and episodic context

Sebastian geukes.

1 Institut für Psychologie, Westfälische Wilhelms-Universität Münster, Münster, Germany

M. Gareth Gaskell

2 Department of Psychology, University of York, York, UK

Pienie Zwitserlood

Associated data.

The Stroop task is an excellent tool to test whether reading a word automatically activates its associated meaning, and it has been widely used in mono- and bilingual contexts. Despite of its ubiquity, the task has not yet been employed to test the automaticity of recently established word-concept links in novel-word-learning studies, under strict experimental control of learning and testing conditions. In three experiments, we thus paired novel words with native language (German) color words via lexical association and subsequently tested these words in a manual version of the Stroop task. Two crucial findings emerged: When novel word Stroop trials appeared intermixed among native-word trials, the novel-word Stroop effect was observed immediately after the learning phase. If no native color words were present in a Stroop block, the novel-word Stroop effect only emerged 24 h later. These results suggest that the automatic availability of a novel word's meaning depends either on supportive context from the learning episode and/or on sufficient time for memory consolidation. We discuss how these results can be reconciled with the complementary learning systems account of word learning.

Introduction

Learning a foreign language after childhood entails the acquisition of the rules of grammar of the novel language, knowledge that may arguably become explicit with practice, as well as learning a tremendous number of new words, as labels for concepts that have been acquired and mapped onto native words during first-language acquisition. Words as labels for concepts constitute explicit knowledge, and in the course of learning a new language, the human mental lexicon, which stores word knowledge, may double in size. There are intriguing questions as to when and how newly learned words are connected to the conceptual-semantic knowledge they refer to. Quite a few proposals have been offered for this aspect of second-language acquisition (e.g., Kroll and Stewart, 1994 ; Dijkstra and van Heuven, 2002 ). One problem that hampers the study of foreign-language vocabulary acquisition is that it mainly takes place in situations that do not provide adequate control over the input, the learning context, and many other potentially confounding variables that influence learning success.

For these reasons, researchers are increasingly turning to studying foreign-language learning with what are called novel-word learning paradigms. Common to these paradigms is that the learning input and method, stimulus materials, and external influences can all be kept under much stricter experimental control than in natural learning or in classroom situations. For example, entirely novel words are used instead of existing foreign words, to make sure that there is no overlap between the novel and native language word-forms. While learning with this approach may be ecologically less valid, many confounding influences can be excluded, allowing for clearer conclusions. The experimental manipulation of the learning process further makes it easier to relate the observed effects to the actual learning experience.

In recent novel-word learning studies, words and meanings were associated with rather different methods, such as presenting novel words together with definitions (e.g., Clay et al., 2007 ; Tamminen and Gaskell, 2013 ), associating novel words and their concepts by means of pictures (e.g., Yu and Smith, 2007 ; Dobel et al., 2010 ), and presenting novel words at the end of meaning-constraining sentences (Mestres-Missé et al., 2007 ; Borovsky et al., 2010 , 2012 ). Common to these methods is that the word-concept links are established within a rich semantic context and with a salient focus on word meanings. Likewise, tests of these novel links also take place in contexts in which semantic processing is a major element of the task.

To test whether effective word-to-concept links have been established, different speeded and non-speeded tasks have been employed, such as object naming (Breitenstein et al., 2005 ), translation matching (Dobel et al., 2010 ), or semantic priming (Dobel et al., 2010 ; Tamminen and Gaskell, 2013 ). Results from these studies have shown that such links are indeed established and that these links are also evident when interacting with stimuli that were not presented during learning. However, both the more explicit, non-speeded tasks as well as the semantic priming paradigm are known to be susceptible to strategic manipulations (e.g., Neely, 1991 ), thus rendering it unclear as to how automatic the activation of the novel word's meaning actually is. Results from the Stroop task (Stroop, 1935 ; MacLeod, 1991 ), in contrast, are known to be much more robust against such manipulations. This makes the task a good one to test for at least some components of automaticity in the access process for word meanings 1 (Moors and De Houwer, 2006 ). The Stroop task thus promises to be an excellent extension to previous studies, because it allows to assess whether reading a novel word will automatically activate its meaning. Surprisingly, there seems to be only one word-learning study (Altarriba and Mathis, 1997 ) that made use of the Stroop task to test newly learned links between unfamiliar words and color concepts, and even the results from this study offer only limited conclusions with regard to the automaticity of semantic activation in novel words (see below).

The main focus of the present study is thus (1) to link novel words with familiar concepts within a semantically poor learning context, without an explicit focus on semantic processing and (2) to assess whether this learning nevertheless results in stable links between novel words and their meaning, to the extent that this meaning is automatically activated when merely reading the novel word. Because consolidation effects have been observed in several recent word-learning studies (e.g., Dumay and Gaskell, 2007 , 2012 ; Davis et al., 2008 ), a further aim of this study is to test whether the establishment and availability of such semantic links in any way depends on an opportunity for memory consolidation.

During learning, novel words were directly paired with L1 (German) color words in a statistical association procedure adapted from Breitenstein and Knecht ( 2002 ). In our version of the paradigm, pairs of novel words and native color words are presented—some pairs representing correct matches, some not—such that correct word-word links can only be derived over time, on the basis of co-occurrence frequencies. Importantly, participants are merely instructed to decide whether the novel and native word of the current pair match or not (by pressing one of two buttons)—no semantic processing of the word stimuli is required (but not explicitly prevented). This simple instruction and the fact that no explicit feedback is given make it a procedure of low cognitive demand (e.g., see Clay et al., 2007 , for a more explicit procedure, and Kachergis et al., 2013 , for an interactive approach). Given that the words are not paired directly with a perceptual representation of their to-be-learned color concepts (e.g., a color patch, a color-related object), any connection between the novel word and the color concept can only be drawn indirectly, via the native color word. The amount of exposure can be easily quantified and manipulated, because novel and native words are associated in a systematic fashion. Likewise, learning progress can be continuously monitored, because a matching judgment is required in every trial. This paradigm has been successfully employed in a number of studies to associate novel words with pictures of existing concepts, using both spoken novel words (Breitenstein et al., 2005 , 2007 ; Yu and Smith, 2007 ; Dobel et al., 2010 ; Liuzzi et al., 2010 ; Freundlieb et al., 2012 ) and written novel words (Laeger et al., 2014 ). To our knowledge, our implementation is the first to associate word-word pairs instead of word-picture pairs based on this statistical procedure.

In the typical modern version of the Stroop task, participants name (or indicate by button press) the ink color of a presented word. This response is slowed down if the word's meaning is incompatible with the ink color (e.g., ink color is red , but word is BLUE). Thus, the word's meaning interferes with task performance, although the task does not require any processing of the word's meaning. Apparently, reading the word activates the conceptual representation associated with that word. This ability of the Stroop task to reveal automatic semantic activation in such an indirect way promises to be an excellent test for whether, how fast, and how strongly, novel words are linked to their assigned color concepts.

Many studies have investigated how color words from a second language (L2) compare to L1 color words in the Stroop task, with the typical result that a substantial, but smaller interference effect is found in L2 compared to L1 color words (e.g., Preston and Lambert, 1969 ; Chen and Ho, 1986 ; Sumiya and Healy, 2004 ; earlier work reviewed in MacLeod, 1991 ). Even if the L2 is well-established and participants report equal levels of competence in both languages, the effect is larger in color words from the language that is dominant in everyday use (Altarriba and Mathis, 1997 ). Stroop effects of comparable size in the speaker's two languages are only found when both usage and competence are equally high (e.g., Mägiste, 1984 ).

As mentioned earlier, we are aware of only one published experiment in which a group of participants learned the set of novel color words immediately before the Stroop test (Experiment 2 in Altarriba and Mathis, 1997 ). In this experiment, monolingual English-speaking participants were trained with a set of four Spanish color words and subsequently further familiarized with these words in a series of quizzes. The quizzes involved rehearsing the new lexical link (e.g., matching Spanish to English color words) as well as the new semantic link (e.g., matching the Spanish words to color patches or to compatible objects: amarillo [yellow] goes with the school bus). These Spanish words, along with the English translations, were then entered into a Stroop task, in which the ink color had to be named using English color terms. In the English trials, naming latencies between congruently and incongruently colored words differed by 112 ms. Importantly, there was a similar but smaller difference in the Spanish trials (52 ms), indicating that the incompatibility between the word meaning and the verbal response slowed down color naming even with these newly learned words as distractors (Altarriba and Mathis, 1997 ).

These results are remarkable as they show that, even after a short learning session, the newly learned L2 words have already been sufficiently learned as to interfere with a task that does not explicitly require processing of word meaning. However, some features of this study hinder a full assessment of the power of the underlying semantic learning mechanisms. First of all, the color words of both languages have some phonological and orthographic overlap (e.g., r ed— r ojo, ye llow —amari llo ) that may have artificially increased the L2 effect (cf. Sumiya and Healy, 2004 ). Second, given that the experiment was performed in the United States, it is also likely that participants, even though monolingual speakers of English, had some familiarity with the Spanish color words. Finally, the experiment required English color words as responses, which were the same color words that were repeatedly presented with the Spanish words during learning. One could argue that the observed interference was not between the novel words and their meanings, but between the novel words and the required English responses, as these links had been intensely rehearsed during learning (cf. the analogs discussion in the semantic priming literature on the differentiation between genuine semantic priming and priming by association, e.g., Lucas, 2000 ; Tamminen and Gaskell, 2013 ).

Hence, in our experiment, the stimuli and parameters of the Stroop task are chosen in such a way that these alternative explanations can be excluded. First, pseudowords instead of existing words are used to serve as to-be-learned color words. This is done to avoid any phonological/orthographic overlap between the L1 and the new color words, and to exclude that participants are familiar with any of the new words. Furthermore, the response format during the Stroop task is changed from verbal responses (color naming) to manual responses (color-matching): Participants indicate the ink color of the presented color-word stimulus by pressing one of four colored buttons. As the buttons are only present during the Stroop task, participants cannot learn any word-response associations beforehand. Consequently, a congruency effect in the Stroop task cannot be explained by a word-response association stemming from the learning phase.

The manual response format offers a further advantage over color naming. Although covert naming cannot be excluded (see e.g., Lupyan, 2012 ), lexical access is not even necessary to perform the Stroop task. Participants can simply rely on matching the presented ink color to the color of the corresponding button for correct responses. Consequently, this task should make it easier to ignore the presented word and its meaning. Indeed, in the native-language Stroop literature, the manual Stroop effect is usually substantially reduced relative to the verbal Stroop effect (about half the size, MacLeod, 2005 ). Moreover, Sharma and McKenna ( 1998 ) showed that, in contrast to verbal responses, there is no interference component in the manual response format that can be attributed to the word status of control items (that is, they found that a manual color-matching response to an XXXX letter string is as fast as a manual response to a color-unrelated existing word such as CHIEF). This in turn suggests that the manual response format more clearly captures the semantic component of the Stroop effect. In sum, the manual response format offers a stronger test for semantic learning of the novel color words.

Taken together, the adaptations we introduced to the original learning and testing paradigm provide a strong test of the power of semantic novel-word learning and of the automaticity of the resulting memory traces.

A further aim of our study was to test whether the establishment and availability of such semantic links depends on an opportunity for memory consolidation. In most studies that used our variant of a statistical learning procedure, learning took place over a number of consecutive days, and the crucial test of semantic integration was performed after the learning phase had been completed (e.g., Breitenstein et al., 2007 ; Dobel et al., 2010 ; Liuzzi et al., 2010 ; Freundlieb et al., 2012 ). With such designs, there is ample opportunity for consolidation, and it is not known whether effects obtained after 4 or 5 days of learning would also be present immediately after learning. However, in several other word-learning studies that used more targeted paradigms, clear effects of memory consolidation on word learning were found (e.g., Gaskell and Dumay, 2003 ; Bowers et al., 2005 ; Clay et al., 2007 ; Dumay and Gaskell, 2007 ; Tamminen et al., 2010 ; Tamminen and Gaskell, 2013 ; Bakker et al., 2014 ; but see Coutanche and Thompson-Schill, 2014 ; Kapnoula et al., 2015 ). It was further shown that, while consolidation may also happen during time awake (Walker, 2005 ; Lindsay and Gaskell, 2013 ), consolidation of novel words is strongest during sleep (Dumay and Gaskell, 2007 ; Henderson et al., 2012 ). There is also evidence that these consolidation effects are directly related to electrophysiological patterns of brain activity during sleep, such as sleep spindles and slow-wave activity (Tamminen et al., 2010 , 2013 ).

Davis and Gaskell ( 2009 ) offered an explanation of the word-learning data based on the more general theory of Complementary Learning Systems (CLS; McClelland et al., 1995 ). According to their account, word learning is based on two separate neural systems, namely a fast-learning but temporary memory system involving the medial temporal lobe (particularly the hippocampus), and a slower-learning but longer-lasting neocortical memory system. Novel lexical entries are thought to rely initially on hippocampal mediation, with this reliance diminishing only some time after initial encoding, by means of interleaving novel and existing memories (possibly via hippocampal memory replay: Rasch and Born, 2008 ). Thus, novel lexical entries are thought to fully interact with existing neocortical memories only after they have been consolidated, avoiding the danger of catastrophic interference (McCloskey and Cohen, 1989 ).

Many of the studies that focus on consolidation included learning of novel word-forms and tested whether and when the novel words showed lexical competition effects with existing neighbors. Only a few looked at how acquiring the meaning of a novel word might be influenced by consolidation (e.g., Clay et al., 2007 ; Tamminen and Gaskell, 2013 ). The latter studies also showed evidence for consolidation, but the results were less clear than in the lexical-competition studies. Thus, further research is certainly warranted to identify necessary conditions for consolidation effects in semantic word learning.

Here, as the exact mechanism of any consolidation effects was not a focus of our study, a simple method for testing consolidation was selected: the set of novel words was split in half, and the two resulting sets of words were tested at different delays after learning. With this design, we are able to capture basic effects of consolidation, but not specific effects of sleep.

In the following, results from three experiments are reported. In all experiments, participants could associate novel words with L1 color words, by means of the above-described word-word pairing procedure. These words were then entered into a Stroop task, during which participants were instructed to press the button that corresponded in color to the ink color of the presented word. Novel words were presented either in their congruent (“learned”) or in an incongruent ink color. To capture the potential influence of memory consolidation, different subsets of the learned words were tested either immediately after learning and/or a day later.

Experiment 1 assessed whether newly learned color words would show any Stroop effects at all, immediately or a day after learning. To obtain a direct quantitative comparison of the effect sizes in the native and in the novel words, the novel words were intermixed with (L1) German color words. In Experiment 2, novel words were again tested alongside their German counterparts, but after a much shorter learning phase, and control trials were added to assess facilitation and inhibition components of the Stroop effect. Experiment 3 returned to the design of Experiment 1 and tested whether removing the German color word trials from the Stroop blocks affected the basic novel-word Stroop effect. To assess consolidation effects in more detail, this third experiment also included a second group of participants who received their first Stroop block only on the second day, 24 h after learning.

Experiment 1

Experiment 1 was designed to test whether novel color words are sufficiently integrated into lexico-semantic memory to produce Stroop congruency effects within 24 h of learning. In a brief learning session, novel words were associated with native color words and subsequently tested in a manual Stroop task. To assess potential effects of memory consolidation, half of the novel color words were tested immediately after learning, the other half 24 h later.

Materials and methods

Experiment 1 was divided into two sessions, spaced approximately 24 h apart (see Figure ​ Figure1C 1C for an overview). Session 1 consisted of two parts: (a) Statistical learning of 10 novel words each paired with a German color term, with both novel and German words printed black ( learning phase ); (b) manual Stroop task with a subset of four novel color words and their German translations as stimuli ( Stroop 1 ). Session 2: manual Stroop task, with a different subset of four novel color words and their German translations as stimuli ( Stroop 2 ). In the manual Stroop tasks, participants had to press one of four colored buttons that matched the ink color of the novel or German word on the screen. To minimize effects of verbal short-term memory, a crossword puzzle separated the learning and test phases on Day 1.

An external file that holds a picture, illustration, etc.
Object name is fpsyg-06-00278-g0001.jpg

Overview of Experiment 1 . (A) Statistical learning principle: While match and mismatch trials appear equally often, some novel words are paired frequently with a particular native language color word (illustrated here for the pair of alep and blau [blue]). (B) Stroop task: Example stimuli for the four conditions. (C) The order of tasks.

Participants

Twenty-four native speakers of German, most of them students, took part in the experiment (21 female; age range: 19 to 28 years, M = 21.25, SD = 2.33). Participants reported to have no color vision deficiency and had normal or corrected-to-normal visual acuity. They gave their written consent and received course credit or 9 €. All experiments reported here complied with the ethical standards formulated by the Ethics Committee of the Psychology department, University of Münster.

Four focal colors ( red, green, blue, yellow ) and four subordinate colors ( violet, orange, pink, brown ) were selected, as well as black and white . Except for the latter two, which were included merely to increase the size of the learning set, all colors were used as “ink” colors in the Stroop task. Two different subsets of four colors were used for the Stroop tasks on Day 1 and on Day 2. The two subsets were composed so as to keep the four colors within a set sufficiently discriminable (Set 1: red - yellow-violet-brown , Set 2: green-blue-pink-orange ). The two subsets were identical for all participants, but the assignment of the subsets to the two Stroop sessions was counterbalanced between participants.

The 10 corresponding German color words were: rot (red), gelb (yellow), blau (blue), grün (green), lila (violet), orange (orange), pink (pink), braun (brown), schwarz (black), and weiß (white). These were used for novel word to color word associations during the learning phase, and except of the latter two, as word stimuli during the Stroop blocks.

Twenty-five nonwords (e.g., alep, fupo, lopek) from an existing corpus (Breitenstein and Knecht, 2002 ) served as novel words in the learning and the Stroop tasks. They are 4–5 letters long and do not elicit any particular lexical associations, as rated by an independent sample (see Breitenstein and Knecht, 2002 , for details on word generation and selection criteria). The nonwords are easily pronounceable for native German speakers. Because of their common bi-syllabic structure and simple vowel-consonant alternations, they can be classified as stemming from a common vocabulary of an unknown language. Ten of these nonwords were selected to serve as to-be-learned color names, from which three different sets of novel word to color word assignments were constructed (see Supplementary Materials ). We made sure that there was no phonological or graphemic onset or offset overlap between selected nonwords and their corresponding German color names within each list. The remaining 15 of the 25 nonwords served as fillers during statistical learning. For practical purposes, we will henceforth use the generic term Language to differentiate the sets of German and novel words.

Experimental procedure

The experiment was conducted using DMDX software (Forster and Forster, 2003 ) running on a Windows PC. Stimuli were presented at an eye-to-screen distance of about 60 cm on a 17″ LCD monitor running at 120 Hz. Stimuli appeared on a gray background (RGB values: 210-210-210). Words appeared in lower case Arial Bold font, subtending a maximal visual angle of about 3.5° horizontally and 1° vertically. Responses were recorded using a standard Windows keyboard connected via a USB port.

Learning procedure

The learning paradigm was adapted from the statistical learning procedure described by Breitenstein and Knecht ( 2002 ). During the learning phase, pairs of words were presented on a computer screen. Each pair consisted of a novel word and a German color word. On each trial, a fixation cross appeared centrally for 200 ms, followed by one of the novel words in black font, just above the center. 250 ms later, a German color word was added to the display, just below the center, and also in black. The two-word display remained on the screen for 1500 ms. From the onset of the second word, participants had a 1800 ms time window to decide whether the two words belonged together or not, pressing the right shift -key to indicate that the words belong together, or the left shift -key to indicate that they do not. Within the learning block, matching and mismatching word pairs appeared equally often (cf. Figure ​ Figure1A 1A ).

Participants were informed beforehand that it was initially impossible to tell whether a pair matched or not, but that during the course of the learning phase, the more frequent co-occurrence of some word-word pairs would help discriminate matching from mismatching pairs. No trial feedback was given except if the participant failed to come up with a response in time, in which case the words “ Zu langsam! ” (= too slow) were presented at the bottom of the screen for 600 ms. After the button press or the time-out feedback, the next trial started after a random delay between 100 and 400 ms.

The statistical learning principle was implemented in the following manner (see also Table ​ Table1): 1 ): During the learning phase, each German color word was presented 24 times with its to-be-associated novel word ( match trials), and once with each of the remaining 24 novel words ( mismatch trials). Of the 24 novel words from the mismatch trials, nine were from the other novel words of the learning set (i.e., novel words to take on the meaning of a different color). The remaining mismatch words were novel words that appeared in mismatch trials only and were not systematically associated with any particular meaning. Thus, over the course of the learning phase, participants could find out the matching word-word pairs only by exploiting the frequency of couplings.

Frequencies of word pairings during statistical learning of Experiment 1 .

rot2411111111115 × 1
gelb1241111111115 × 1
blau1124111111115 × 1
grün1112411111115 × 1
lila1111241111115 × 1
orange1111124111115 × 1
pink1111112411115 × 1
braun1111111241115 × 1
schwarz1111111124115 × 1
weiß1111111112415 × 1

Numbers indicate how often each German color word (left column) and a novel word (top row) were presented together during learning. The right column represents the 15 additional pseudowords that were included to obtain an equal number of match and mismatch trials. Each of these additional 15 pseudowords appeared once with every German color word. The assignments of novel words to color words were swapped between participants (there were three different versions) .

The learning phase consisted of 480 trials and lasted about 22 min. It was subdivided into 4 blocks of 120 trials each, separated by three 30-s breaks. Trials were presented in different random order for each participant, with the constraint that each 120-trial block contained 6 match and 6 non-match trials for each of the German color words. After the learning phase on Day 1, participants filled out the crossword puzzle (duration approx. 5 min.), after which the Stroop task of Day 1 followed.

Stroop task

Immediately after the crossword puzzle and again at the beginning of the second day's experimental session, participants took part in a Stroop block. The Stroop tasks of Day 1 and Day 2 were identical except that different sets of four colors were used on each day, along with the corresponding German and the learned novel color words.

In the Stroop task, words were presented one at a time: either a German color word or a novel color word. These words were printed in one of the four ink colors assigned to that session, yielding congruent and incongruent combinations of ink color and word meaning (see Figure ​ Figure1B). 1B ). Each trial began with the presentation of a fixation cross that stayed on the screen for 200 ms and was followed by a word presented centrally for 150 ms. Participants were to indicate the ink color of the word by pressing the correspondingly colored response button as quickly as possible. Four buttons of the PC keyboard were used (“y” “x” “,” and “.” on the German layout), marked by correspondingly colored stickers. Participants were to use their left and right middle and index fingers to indicate the ink color the word had been presented in, ignoring the word's meaning. Color-to-button assignments were switched between participants. Participants were given 1800 ms to respond. Feedback was given on the screen for all responses ( Richtig! = correct, Falsch! = incorrect, Zu langsam! = too slow). A blank screen (random duration between 850 and 1150 ms) concluded each trial.

For the Stroop task, we selected only one incongruent ink color for each German or novel color word: e.g., we presented gike either in red (congruent) or in yellow (incongruent), not in the ink colors violet and brown that also appeared during the same block (see Table ​ Table2). 2 ). The reason for this deviation from the classic Stroop design is the following: In a typical native-language four-colors Stroop task (e.g., with colors red, green, blue, yellow), each color word is presented three times as often in the congruent version ( red printed in red) as in each of the three possible incongruent versions ( red printed in green, blue, or yellow), such that congruent and incongruent trials occur equally often. However, if we had presented the novel-word Stroop trials according to this scheme, participants would have had an additional opportunity to learn the correct novel-word-to-color couplings (because, e.g., gike , meaning red, is more often presented in red than in any of the other colors). Moreover, such a presentation scheme would also have provided the opportunity for direct word-response association (e.g., gike = second button from left), which would be a severe confound in a manual Stroop task. Schmidt et al. ( 2007 ) present evidence for such associative learning within the Stroop task. By presenting the color words in just one incongruent version, we eliminated any opportunity to learn the correct word-color or word-response pairs within the Stroop task. Crucially, this excludes the possibility that subsequent performance differences between congruent and incongruent Stroop trials might be due to or influenced by learning effects during the Stroop task itself. The German color words were presented in the same incongruent color as the corresponding novel color words.

Overview of color-word stimuli in the Stroop task .

An external file that holds a picture, illustration, etc.
Object name is fpsyg-06-00278-i0001.jpg

The Stroop stimuli can be reconstructed as combinations of ink color (top) and novel or German color word (left). The numbers represent the repetitions of the color-word stimulus during one Stroop block. Green numbers represent congruent stimuli, red numbers represent incongruent stimuli. Half of the participants received the color subsets in reverse order, i.e. the green-blue-pink-orange subset on Day 1 and the other set on Day 2 .

Each of the session's four German and four novel color words was shown 30 times in its congruent and 30 times in its incongruent ink color, yielding 480 trials, which were presented randomly in 4 blocks of 120 trials, separated by breaks of 30 s. The Stroop task lasted about 26 min.

On the second day, 24 ± 2 h after the first session, participants returned to the laboratory to repeat the Stroop task. This second Stroop task included the remaining set of four colors and their corresponding German and novel color words. All other details were identical to the Stroop task on Day 1.

Learning phase

To assess learning success, the percentage of correct responses was calculated for the final block from the learning phase (last 60 trials). Participants reached an average level of 95.1 % [ SD = 4.4] correct decisions (chance level = 50%; see Supplementary Material for learning curves to all three experiments).

For reaction time (RT) analysis of the Stroop data, the first 40 trials of each day's Stroop block, error trials, as well as the slowest and fastest 5% of each condition's remaining responses were discarded before calculating mean RTs. On both days and in both stimulus languages, responses to incongruent trials were slower than those to congruent trials, but the effect was larger in the German trials (Figure ​ (Figure2 2 ).

An external file that holds a picture, illustration, etc.
Object name is fpsyg-06-00278-g0002.jpg

Mean response times in the Stroop task of Experiment 1 . Error bars here and in the following graphs indicate within-participant standard errors of the mean (Loftus and Masson, 1994 ; Cousineau, 2005 ; Morey, 2008 ).

A repeated-measures analysis of variance (ANOVA) with factors Language (German/Novel), Congruency (Congruent/Incongruent) and Day (Day 1/Day 2) was calculated to confirm these observations. There were main effects of Language (responses to German words were slower than those to novel words), F (1, 23) = 28.49, p < 0.001, η 2 p = 0.55, and of Congruency (responses to congruent stimuli were faster than those to incongruent stimuli), F (1, 23) = 95.80, p < 0.001, η 2 p = 0.81. The main effect of Day just failed significance, F (1, 23) = 3.87, p = 0.061, η 2 p = 0.14. As indicated by a significant Congruency by Language interaction, F (1, 23) = 65.12, p < 0.001, η 2 p = 0.74, the congruency effect was larger for German color words (mean congruency effect over both days: 73 ms) than for novel color words (mean effect 20 ms). The remaining two-way interactions did not reach significance ( Fs ≤ 1.31, p s ≥ 0.264).

To add statistical backing to the visual impression that congruency effects were present at both time points in both stimulus languages, we calculated separate repeated-measures ANOVAs for the German and novel word mean RTs, each including Congruency and Day as factors. The resulting pattern of effects was identical for both languages. The only significant effect in both cases was the main effect of Congruency : German words, F (1, 23) = 129.40, p < 0.001, η 2 p = 0.85; novel words, F (1, 23) = 14.97, p < 0.001, η 2 p = 0.39. The main effect of Day was marginally significant in both languages: German words, F (1, 23) = 4.01, p = 0.057, η 2 p = 0.15; novel words, F (1, 23) = 3.17, p = 0.09, η 2 p = 0.12. The interaction effect was not significant in either of the languages: German words, F (1, 23) = 2.01, p = 0.170; novel words, F (1, 23) = 0.56, p = 0.46. Thus, in both stimulus languages, the congruency effect was present on both days and did not change significantly between days.

Despite the fact that in both stimulus languages Congruency did not reliably interact with Day , there was a Three-Way interaction of Language by Congruency by Day in the overall ANOVA, F (1, 23) = 5.69, p = 0.026, η 2 p = 0.20. This is explained by the fact that the change of the congruency effect from Day 1 to Day 2 goes in opposite directions in the two languages: There is a decrease of the congruency effect in the German words from Day 1 to Day 2 (from 82 to 65 ms), and an increase of the effect in the novel words (from 15 to 24 ms). Although these changes themselves are not significant (see interaction effects in within-language ANOVAs), the three-way interaction is.

Errors showed a similar pattern as the RTs. A repeated-measures ANOVA with factors Language, Congruency , and Day on the arcsine-transformed percent error rates revealed significant main effects of Language, F (1, 23) = 15.48, p < 0.001, η 2 p = 0.40, and Congruency, F (1,23) = 12.55, p = 0.002, η 2 p = 0.35. Neither the main effect of Day nor any of the interactions reached significance (all F s < 2.29, all p s > 0.143). Averaged over the two sessions, the mean percent error rates were ( SD s in brackets): German congruent, 5.71 [3.80], incongruent, 7.84 [5.04], novel congruent, 4.91 [3.76], incongruent, 6.48 [4.03].

Experiment 1 was designed to test whether novel words that have recently been associated with native color words via lexical association are already able to produce a congruency effect in the Stroop paradigm. The response-time findings show that this is indeed the case: Immediately after learning as well as 24 h later, novel color words generated sizable congruency effects. Given that learning in this experiment consisted of a word-word-association procedure that neither required nor encouraged deep semantic processing of the novel words, the presence of a Stroop effect seems notable. The fact that we see the effect immediately after learning suggests that, under these conditions, consolidation is not necessary for the effect to emerge.

We further found that the change of the congruency effect between the two sessions was not identical in the two stimulus languages: The congruency effect in the German words decreased by 17 ms on the second day compared to the first day's Stroop session, while in the novel words the effect increased by 9 ms. Thus, in both languages, congruency effects are present on both days, but the significantly contrasting pattern of overnight changes in the Stroop effects, signaled by the three-way interaction, points to the possibility that, during the 24 h interval, the two classes of words are processed in a qualitatively different way. Experiment 3 will address the question of time and consolidation effects more directly.

The learning run in this first experiment, although based on a relatively shallow learning task, contained a large number of trials per word and thus resulted in a classification performance that approached ceiling levels. It is therefore unclear whether the novel word congruency effect crucially depends on such a large number of learning trials or whether a significant reduction of the trial number will lead to a qualitatively similar result.

Furthermore, so far the Stroop sessions only contained congruent and incongruent trials but no neutral control stimuli, rendering it impossible to clearly identify the effect as facilitatory, inhibitory, or a mix of both. In native-language Stroop, these two main components (facilitation and inhibition) can indeed be distinguished (e.g., Redding and Gerjets, 1977 ). They are respectively defined as the difference in response times between neutral control stimuli and congruent stimuli (facilitation) or between neutral control stimuli and incongruent stimuli (inhibition). While the relative proportions of the components may vary depending on the properties of the neutral stimuli (e.g., Sharma and McKenna, 1998 ), the interference component is typically substantially larger than the facilitation component (MacLeod, 1991 ). If the novel word effect were closely linked to the native words effect, then it should at least be similarly divisible into an inhibitory and a facilitatory component.

In Experiment 2, we addressed both the question of learning intensity and the question of whether the novel word congruency effect is composed of facilitation, inhibition, or both.

Experiment 2

The design of Experiment 2 closely followed that of Experiment 1, but it contained two changes. First, we lowered the number of learning trials per novel word to one third of that from the previous experiment, to test whether the congruency effect in the novel words is obtained even if the classification performance at the end of learning is significantly reduced. Second, to isolate facilitation and inhibition effects, we introduced neutral control stimuli into the experiment, namely names of non-color-related objects.

Because these control stimuli were supposed to serve as a baseline for the respective stimulus language's color words, we introduced control items for both languages: for German color words, a set of not color-related German object names (e.g., Mappe [folder]); for novel color words, a further set of novel words that were to become translations of the German object names. The latter were learned in the same manner as the novel color words. Thus, German and novel color words had their own corresponding lexical baselines (the respective object names). The experiment also included a set of non-lexical control stimuli (strings consisting of upper- and lower case X-letters), but because responses to these stimuli did not differ from those to the other (lexical) control items, we will only briefly report the results from this condition.

Participants were 41 native speakers of German, most of them students (29 female; age range: 19–42 years, M = 24.06, SD = 4.68). They reported to have no color vision deficiency and had normal or corrected-to-normal visual acuity. Participants gave their written consent and received course credit or 9 €. One participant's data were incomplete and thus discarded from further analysis.

For this experiment, the same four focal and four subordinate colors as in Experiment 1 were used, omitting black and white from the learning set. For the lexical control condition, we selected eight names of objects that can appear in different colors but are not associated with one color in particular (as rated in an independent sample, e.g., Mappe [folder], Eimer [bucket], see Supplementary Material ). We also included a non-lexical control condition that consisted of eight different strings made up of upper and lower case X letters (length 3–7 letters, e.g., XxxXX, XxX).

Laboratory and apparatus were the same as in Experiment 1, as was the overall procedure (cf. Figure ​ Figure1C 1C ).

The eight object names were added to the eight colors, to form a set of 16 concepts for which pseudowords had to be learned. Sixteen pseudowords from the vocabulary described in Experiment 1 served as associates for the colors and object names.

The number of learning trials per color word or object name was reduced to only a third of the previous version. That is, eight match and eight mismatch trials were now presented per concept during learning (instead of the former 24 each). The learning principle remained identical: For the match trials, each native color word or object name was presented eight times with its assigned novel word. For the mismatch trials, each concept was presented once with each of eight other novel words. These were now all taken from the set of the remaining 15 pseudowords that were to attain a meaning (i.e., no additional filler pseudowords were used). To avoid stimulus-specific effects, four different assignments of novel words to native color words or object names were created.

In the learning phase, a total of 256 trials were presented (16 novel words × 16 trials [8 match, 8 mismatch]). These were shown in four blocks of 64 trials, with three 30-s breaks in-between. Trials were presented randomly with the constraint that, per novel word, two match and two mismatch trials were presented per 64 trial block. Trial timing and instructions were identical to those of Experiment 1.

The Stroop tasks of Day 1 and 2 closely followed the design from Experiment 1, with the following changes: Apart from German color word and novel color word trials, the Stroop task now also included lexical control trials consisting of the German and the learned novel object names, as well as non-lexical control trials made up of letter X strings. As in Experiment 1, the eight colors were split into two sets, such that four different colors were tested on Day 1 and on Day 2. Likewise, the sets of eight object names and eight strings of the letter X were split into two subsets, tested on either the first or the second day.

The German and novel color words were shown in their natural congruent and in one assigned incongruent version, just as in Experiment 1. Each control stimulus was also shown in only two color versions, to assure that the colors it was presented in were equally predictable as those of the color words. There were altogether 20 different strings in a day's Stroop block (4 German color words, 4 novel color words, 4 German object names, 4 novel object names, 4 letter X strings), each presented in two variants of ink color. The resulting 40 stimuli were each presented 15 times. Thus, over the whole block, 600 trials were presented, in a random order and with breaks after every 120 trials.

To assess learning performance, the percentage of correct responses was calculated for the final block from the learning phase (last 32 trials). Separate values were calculated for the two stimulus types novel color word and novel object name . Participants reached similar levels of correct decisions for the color words ( M = 79.0 % [ SD = 12.15]) and for the object names (77.0 % [13.39]), substantially lower than the final performance in Experiment 1: 95.1 % [ SD = 4.4].

Mean RTs in the Stroop task were calculated as in Experiment 1. For the German words, the expected pattern was observed: a large difference of mean RTs between the congruent and incongruent conditions and a lexical control condition that was situated in between, somewhat closer to the congruent condition than to the incongruent condition. Novel words showed a smaller effect but a similar pattern: congruent trials yielded faster responses than incongruent trials, and lexical control items were situated between the congruent and incongruent conditions (see Figure ​ Figure3). 3 ). Mean RTs from the non-lexical control items (the strings of the letter X; not shown in Figure ​ Figure3) 3 ) were identical to those from the lexical control items (in line with Sharma and McKenna, 1998 ) and are therefore not further analyzed (Day 1: M = 599 ms [ SD = 71], Day 2: 589 [65]).

An external file that holds a picture, illustration, etc.
Object name is fpsyg-06-00278-g0003.jpg

Mean response times in the Stroop task of Experiment 2 . The respective lexical control stimuli (blue lines) are comprised of German object names (left) and newly learned object names (right). Because the non-lexical control condition (strings of the letter X) had RTs that were practically identical to the lexical control conditions, they are not shown here.

To substantiate these observations, we calculated a repeated-measures ANOVA with factors Language (German/Novel) , Day (Day 1/Day 2), and Congruency (Congruent/Incongruent/Neutral). Greenhouse-Geisser corrected p -values are reported where it is warranted by violations of the sphericity assumption. We observed a main effect of Congruency, F (2, 78) = 55.46, p < 0.001, η 2 p = 0.58, a marginal main effect of Language, F (1, 39) = 3.36, p = 0.075, η 2 p = 0.08, and a significant Congruency by Language interaction, F (2, 78) = 28.48, p < 0.001, η 2 p = 0.42. None of the remaining effects were significant (all F s < 2.35, all p s > 0.134). As in Experiment 1, we calculated separate follow-up ANOVAs for the two stimulus languages, each incorporating Day and Congruency as factors, which confirmed that, within both stimulus languages, the only significant effect was the main effect of Congruency : German words, F (2, 78) = 61.43, p < 0.001, η 2 p = 0.61; novel words, F (2, 78) = 6.53, p < 0.001, η 2 p = 0.14. Additionally, there was a marginal main effect of Day in the German words, F (1, 39) = 2.92, p = 0.095, η 2 p = 0.07. No other effects were significant, F s < 1.49, p s > 0.229.

Finally, to isolate facilitation and inhibition components of the congruency effect, we calculated separate F -contrasts for these effects in each language. Because, in the overall ANOVA, there was no significant interaction involving the factor Day , we aggregated the RTs across the two sessions to increase statistical power. For the German words, there were significant effects of facilitation and inhibition: difference neutral—congruent, 13.9 ms, F (1, 78) = 10.45, p = 0.002, η 2 p = 0.12; difference incongruent—neutral, 32.5 ms, F (1, 78) = 112.41, p < 0.001, η 2 p = 0.59. For the novel words, there was no significant effect of facilitation, but a significant inhibition effect: difference neutral—congruent, 4.5 ms, F (1, 78) = 2.22, p = 0.140; difference incongruent—neutral, 6.3 ms, F (1, 78) = 10.84, p = 0.001, η 2 p = 0.12.

Error rates were similar across all conditions. A repeated-measures ANOVA with factors Language, Day , and Congruency on arcsine-transformed error rates showed no significant effects (all F s < 0.60, all p s > 0.443). Averaged across the three congruency conditions and the two days, the percent error rates and standard deviations were similar for the two languages (German: 6.59 [3.18], novel: 6.69 [3.29]).

In Experiment 2, we investigated whether newly learned color words lead to a Stroop effect after a much-shortened learning phase. We also included control stimuli to test whether the novel word Stroop effect is driven by inhibition, facilitation, or a combination of both. Despite a significantly shortened learning phase and the inclusion of control trials, the novel-word Stroop effect was still present, and notably so in the Stroop block immediately after learning. As predicted, the novel word effect was smaller than in Experiment 1 (averaged over the two sessions: 20 ms in Experiment 1 vs. 11 ms in Experiment 2). The fact that the effect size in the German trials was also reduced significantly (from 74 ms in Experiment 1 to 47 ms in Experiment 2) lends some support to the idea that not only the less intense learning but also the inclusion of control trials, and thus the decrease of the proportion of congruent trials (Bugg and Crump, 2012 ), may have contributed to the reduction in the novel-word effect.

As in our German trials and in the Stroop literature, response times to novel lexical control words were in between responses to congruent and incongruent novel color words. The overall congruency effect was small (11 ms), rendering differentiation between facilitation and inhibition components difficult. Nevertheless, the contrast between the control and incongruent condition shows a significant difference, indicating that the overall effect contains an interference component.

Contrary to Experiment 1, Experiment 2 showed no indication that the time of test (immediately after learning vs. 24 h later) had an impact on how the novel and German stimuli were processed in the Stroop task. Both experiments demonstrated that recently learned novel color words lead to significant Stroop effects, immediately after they have been learned. Note that the novel words were always tested in blocks that also contained the German words they had been associated with during learning. It is easily conceivable that the presence of the German color words in the Stroop blocks helped activating the links between novel words and color concepts. The Complementary Learning Systems (CLS) account of word-learning provides a framework for effects of context on learning and consolidation (Davis and Gaskell, 2009 ). Before consolidation, novel words are thought to exist only as episodic and context-dependent memory traces. Only after consolidation, that is, after successful transfer of the learned contents from the medio-temporal to the neocortical memory system, do novel lexical traces become independent of the specific learning context.

Evidence on this hypothesis is still sparse, but a relevant study was presented by Tamminen et al. ( 2012 ). They taught participants a set of novel, meaning-conveying affixes (e.g., - nule ) by pairing the affix with an existing word stem (e.g., buildnule ) and accompanying it with a definition of the composite meaning (e.g., “ buildnule —someone who is able to build furniture at a remarkable speed”). These affixes showed an immediate advantage in a speeded shadowing task, but only when presented in their trained context (e.g., buildnule ). This advantage in shadowing performance generalized to untrained word stems and thus to novel contexts (e.g., sailnule ) only after consolidation. In a non-speeded classification task, however, generalization effects emerged already immediately after training. Thus, their study supports the hypothesis that context-independence of novel lexical items requires memory consolidation, particularly so when the task used to test the novel lexical items requires rapid online processing (Tamminen et al., 2012 ).

More detailed knowledge about moderating factors is certainly desirable. Deleting the L1 (German-words) context from the Stroop blocks is thus a useful change in design relative to Experiments 1 and 2. Because our experiment is indeed based on a speeded task, we reasoned that the presence of the German words (and thus of additional learning context) in Experiments 1 and 2 may have masked a more pronounced effect of memory consolidation on Stroop performance. We thus wanted to assess whether novel words activate their semantic concepts on their own, without support from the native-language color words. To do so, it is necessary to test novel words in Stroop blocks that do not provide any learning context. This is the key element of Experiment 3.

Experiment 3

In this experiment, we returned to the design of Experiment 1 with more extensive learning and without control trials. Crucially, novel color words were now tested in the Stroop blocks in isolation, without German color word trials. The stimuli that initially linked novel words and color concepts were thus no longer available during the Stroop test. Apart from the novel words themselves, no further stimuli from the learning context were available. Following the CLS prediction, we hypothesized that, in the absence of further learning context, the Stroop effect in Experiment 3 would only show on the second day, after memory consolidation had taken place.

If we were to indeed find such a pattern, one could still argue that, in the absence of the German words, participants may need more time to familiarize themselves with the task and that a novel word effect might thus appear only after a sufficient number of trials, possibly coinciding with the transition between the two blocks. Therefore, to differentiate between such a practice effect and an effect of memory consolidation, we added a second group of participants that also did two Stroop blocks, but did both of them only on Day 2, 24 h after the learning phase. If this second group showed a Stroop effect only in their second Stroop block, a practice effect must indeed be underlying the hypothesized Group 1 pattern. If however, Group 2 shows the effect already in their first Stroop block, then this difference to the Group 1 pattern must be a consequence of the passage of time, providing an opportunity for memory consolidation.

At the end of the experiment, both groups received a Stroop block with four of the German color words to allow for a numerical comparison of native and novel word effects.

Participants were 44 native speakers of German, most of them students (34 female; age range: 19–46, M = 24.72, SD = 6.55). They reported no color vision deficiency and had normal or corrected-to-normal visual acuity. Participants gave their written consent and received course credit or 9 €. Participants were randomly assigned to the two groups, such that there were 22 participants in each group. The data from three participants had to be discarded because they were either incomplete (2 participants) or because of excessive error rates in the Stroop task (>33% errors, 1 participant). The resulting group sizes used for statistical analysis were 20 (Group 1) and 21 participants (Group 2).

Experiment 3 incorporated the same materials and procedures as Experiment 1, except for the following changes: First, from the 480 trials in each of the two Stroop blocks, we removed the 240 trials that contained German color words. The remaining 240 novel color word trials of each Stroop block were presented as in Experiment 1, that is, in random order and with a short break after 120 trials. Second, the time points at which the Stroop blocks were presented were manipulated between groups. Group 1 was subjected to the same temporal procedure as in Experiment 1, that is, with one Stroop block shortly after learning, and one Stroop block about 24 h later. For Group 2 however, only the learning phase was presented on Day 1. The Stroop blocks for Group 2 were both presented only on Day 2 (cf. Figure ​ Figure4 4 ).

An external file that holds a picture, illustration, etc.
Object name is fpsyg-06-00278-g0004.jpg

Time course of tasks for the two groups in Experiment 3 .

Both groups received a block with 240 German color word Stroop trials after the second novel word Stroop block. The four colors for the German Stroop block were those that were used for the second novel word Stroop block, such that the color-to-button assignments of the second block remained valid for the German Stroop block.

Learning performance was assessed as in Experiment 1, separately for the two groups. Participants from Group 1 reached an average of 90.2% [3.06] correct decisions in the final block, those from Group 2 reached 94.5% [2.39]. This difference was not statistically significant, Welch's t -test: t (25.79) = 1.35, p = 0.188.

Mean RTs were calculated as before. RTs in the final German Stroop block showed the expected effect: Group 1 had a congruency effect of 77 ms (incongruent 654 ms [60], congruent 577 [20]), Group 2 had an effect of 59 ms (incongruent 640 [53], congruent 581 [21]). Because the German words block was merely included to compare the size of native and novel effects, the data were not analyzed statistically.

Mean RTs from the novel-word Stroop blocks showed that only Group 2, tested 24 h after learning, had a congruency difference in their Block 1 (of 26 ms). Group 1, who performed their first Stroop block immediately after learning, showed no such effect (congruency difference = 2 ms). Nevertheless, in the second Stroop block, which both groups performed on the second day, both groups showed a clear Stroop effect, which was furthermore identically sized (14 ms; see Figure ​ Figure5 5 ).

An external file that holds a picture, illustration, etc.
Object name is fpsyg-06-00278-g0005.jpg

Mean response times to novel color words in the Stroop blocks of Experiment 3 . In contrast to the previous two experiments, only novel words were included in these Stroop blocks. The left panel shows data from Group 1, which did their first Stroop block immediately after learning and the second block 24 h later. The right panel shows data from Group 2, which did both Stroop blocks on Day 2.

These observations were confirmed in a mixed repeated-measures ANOVA on the novel word mean RTs that included the within-participant factors Congruency (Congruent/Incongruent) and Stroop Block (Block 1/2) and the between-participants factor Group (Group 1/Group 2). The only significant main effect was that of Congruency, F (1, 39) = 28.79, p < 0.001, η 2 p = 0.42. The Group by Congruency interaction was also significant, F (1, 39) = 4.36, p = 0.043, η 2 p = 0.10, and so was the Group by Stroop Block interaction, F (1, 39) = 5.54, p = 0.024, η 2 p = 0.12. Crucially, there was a three-way interaction of Group by Congruency by Stroop Block, F (1, 39) = 9.13, p = 0.004, η 2 p = 0.19, indicating that the development of the Congruency effect over the two Stroop Blocks differed between Groups . All other effects were non-significant ( F s ≤ 1.49, p s ≥ 0.230).

To further explore the three-way interaction, we calculated separate ANOVAs for the two Stroop blocks, both including the between-participants factor Group and the within-participants factor Congruency . In the first Stroop block, the main effect of Group was not significant, F (1, 39) = 0.40, p = 0.531. But the main effect of Congruency and the Congruency by Group interaction were significant, F (1, 39) = 17.54, p < 0.001, η 2 p = 0.31, and F (1, 39) = 13.41, p < 0.001, η 2 p = 0.26, respectively. The latter result confirms that the two groups clearly differed in their response patterns in the first block. With the data from the second Stroop block, only the main effect of Congruency reached significance, F (1, 39) = 17.12, p < 0.001, η 2 p = 0.31 (the other two effects: F s ≤ 0.32, p s ≥ 0.575), suggesting no difference in the response patterns for the two groups.

Because final learning performances of the two groups differed numerically (although not statistically), we wanted to make sure that this difference did not affect the pattern of the Stroop RTs. We therefore recalculated the Stroop RT analysis in the following manner: We removed the data for the five best-performing learners of Group 1 and the five worst-performing learners of Group 2, such that we obtained closely matching learning curves and final discrimination performances between the two groups. We then recalculated the main ANOVA that included all three experimental factors. The pattern of results did not change, with the effect size of the critical three-way interaction actually increasing: F (1, 29) = 10.41, p = 0.003, η 2 p = 0.21.

There were again few differences between congruency conditions in the error rates. In the first Stroop block, error rates seemed to mirror the result from the RTs: There was no congruency effect in Group 1's first Stroop block (% errors congruent M = 8.50 [ SD = 5.88], incongruent 7.67 [5.33]), but there was one in Group 2 (congruent 4.84 [3.38], incongruent 7.54 [3.82]). In the second Stroop block, neither group showed congruency differences in the error rates (Group 1: congruent 6.88 [6.39], incongruent 6.42 [4.95], Group 2: congruent 4.25 [3.04], incongruent 4.92 [3.40]). A repeated-measures ANOVA on the arcsine-transformed error rates with factors Congruency, Block , and Group showed a main effect of Block, F (1, 39) = 16.59, p < 0.001, η 2 p = 0.30, indicating an overall reduction of errors in the second block. There was also a Congruency by Group interaction, F (1, 39) = 5.47, p = 0.002, η 2 p = 0.12, reflecting that Group 2 made more errors in the incongruent condition. This interaction effect seems to be particularly driven by Group 2's large congruency difference in Block 1, but the three-way interaction of Congruency, Block , and Group just failed significance, F (1, 39) = 3.59, p = 0.065, η 2 p = 0.08.

In the third experiment, we tested whether the novel-word Stroop effect depended on the presence of German words during the Stroop test. We therefore removed the German words from these blocks and otherwise repeated Experiment 1. We added a between-participants manipulation to differentiate predicted consolidation effects from effects of mere temporal order. One group of participants performed the first Stroop block immediately after learning and the second Stroop block about 24 h later. A second group of participants had no Stroop block immediately after learning, but rather performed both blocks on the second day.

Results show that in Group 1, the Stroop effect was not present in the block that was administered immediately after learning but only on the second day. In Group 2, with both Stroop blocks on the second day, the effect was present already in the first Stroop block. These results lead to two important conclusions: First, the novel word effect can be observed even when no German color words are included in the Stroop blocks. Second, the differing results between the two groups indicate that, in the absence of the native-language words, the effect arises only after a period that allows memory consolidation.

General discussion

In three experiments, we tested the semantic links of novel color words that had been associated with color concepts through lexical association with native language (German) color words. To assess which conditions are necessary for semantic learning, the learning and test phases were realized such that they minimized semantic processing. In Experiment 1, novel words were associated with native-language color words until almost perfect discrimination performance. They were then entered into the Stroop task together with German color words. We observed substantial novel word Stroop effects both immediately after learning and 1 day later. A significant three-way interaction indicated that the reduction of the effect from Day 1 to Day 2 in the German words contrasted significantly with a simultaneous increase of the effect in the novel words, thus suggesting an influence of memory consolidation. In Experiment 2, learning intensity was considerably reduced and neutral control stimuli were added to the Stroop blocks. We again observed substantial Stroop-congruency effects directly after learning and 24 h later. A detailed analysis including the control condition showed that interference made up a significant portion of the novel-word Stroop effect. In Experiment 3, we repeated Experiment 1, but crucially removed the German words from the Stroop task, so that the novel words were now tested without any L1 context from the learning phase. The novel-word Stroop effect was now not observed immediately after learning, but only 24 h later. Results from a second group that had a different time course of Stroop blocks showed that the delayed emergence of the effect in Group 1 is not due to a simple build-up or training effect from one block to the next. Rather, it must be related to the temporal distance between learning and test—that is, to memory consolidation.

It should be stressed again that semantic processing, though still possible and likely given the explicit setting, was not necessary for correct task performance, neither during learning (lexical association) nor in the timed memory test (color-matching Stroop task). Given the potential of shallow association, it is surprising that a congruency effect emerged at all. This is broadly consistent with a number of recent studies (Breitenstein et al., 2007 ; Clay et al., 2007 ; Mestres-Missé et al., 2007 ; Borovsky et al., 2010 , 2012 ; Dobel et al., 2010 ; Tamminen and Gaskell, 2013 ) in showing a rapid and effective link-up of novel words with an assigned concept—in our case, even despite an intentionally shallow learning experience and an impoverished semantic context. While in almost all earlier studies, novel words were associated with concepts either via pictures or in semantically elaborate contexts (e.g., with definitions or in sentence contexts), here, meanings were introduced merely via lexical association with an L1 word (see also Experiment 3 in Duyck and Brysbaert, 2004 ). The emergence of Stroop effects in this situation shows that, even when potential meanings for novel words can only be indirectly derived via the L1 word, these novel words may nevertheless activate their associated meanings early on.

Note that the novel-word Stroop effect obtained in our study is not so much based on priming but rather on interference (cf. Experiment 2). This fits with results from the native-language Stroop literature. To our knowledge, there are hardly any studies that show a semantic interference effect for recently learned words. Clay et al. ( 2007 ) used a picture-word interference paradigm that generally produces interference of semantic relatedness between pictures and distractor words (e.g., picture of a cat that has to be named, distractor word “dog”), and they found a similar interference effect in newly learned words. This and our current result support the conclusion that semantic novel-word effects are not constrained to facilitation and priming paradigms, but generalize also to semantic interference paradigms such as picture-word interference and Stroop. Whereas priming is often considered to have automatic as well as controlled components (Neely, 1991 ), this Stroop interference effect strongly suggests that reading a novel word co-activates its recently learned meaning in an automatic fashion, even when the semantic context is highly impoverished (only 4 colors in a test block) and when a meaning for novel words is not needed to fulfill the task.

Perhaps the most interesting aspect of our results is that the opportunity for consolidation affected Stroop performance, and that this consolidation effect was further moderated by context, that is, by the presence of German color word trials in the Stroop task. This impact of memory consolidation on the integration of novel words fits with data from word-learning studies on the learning of word forms only (e.g., Gaskell and Dumay, 2003 ; Dumay and Gaskell, 2007 , 2012 ; Bakker et al., 2014 ) or on the acquisition of form and meaning (Clay et al., 2007 ; Tamminen et al., 2012 ; Coutanche and Thompson-Schill, 2014 ). The data from Experiment 3 in particular demonstrate that memory consolidation is relevant for associating novel words with meaning, not only for integrating novel word forms into lexical networks. This is consistent with the CLS account of word-learning (Davis and Gaskell, 2009 ).

While some evidence for consolidation effects was found in Experiment 1, the clearest evidence was in Experiment 3 that provided no German words in the Stroop task. Stroop effects prior to a consolidation period were observed only when the German color words were present in the Stroop blocks (Experiments 1 and 2). Given that novel and German words are paired during learning, the German words in the Stroop test provide contextual information from the learning phase. This context seems critical for the emergence of immediate Stroop effects. It is yet unknown how such contextual cues from learning may facilitate access to the recently learned associations. We suggest three possibilities of contextual support: First, the German Stroop trials may provide a general reminder of the learning situation as a whole and thus facilitate episodic retrieval (cf. Cairney et al., 2011 ). Second, they may help activating the general semantic field of color, which in turn may facilitate access to the specific meanings. Third, they may provide the specific opportunity to re-process the critical stimuli by which the novel words had been linked to the semantic concepts, thereby facilitating a reactivation of the crucial links (cf. Tamminen et al., 2012 ). Taking into account that immediate effects of newly learned words are observed in semantically rich learning situations (Mestres-Missé et al., 2007 ; Freundlieb et al., 2012 ) and in our first two experiments, the latter explanation, with a retrieval of the specific memory traces including semantic cues seems to be an explanation that fits all of the observed results. Clearly, these alternative explanations for an interaction between memory consolidation and learning context cannot be differentiated on the basis of the current data, and thus should be targeted in future studies.

Finally, how can our immediate but context-dependent effect be reconciled with what is known about neural correlates of learning and retrieval? Figure ​ Figure6 6 illustrates how learning context may moderate effects of memory consolidation in semantic word learning. We assume that the employed training regime results in an immediate hippocampal association between the German (L1) word and its novel counterpart. This novel association means that the L1 word provides a mediating link in memory between the novel (L2) word and the color semantics. So, even prior to an opportunity for consolidation (provided by sleep, in our case), Stroop effects can be obtained, as long as the L1 word is present in the Stroop task as a contextual cue that “primes” or temporarily strengthens this indirect association. In fact, there is direct evidence for the involvement of the hippocampus during associative learning of the type implemented here: Breitenstein et al. ( 2005 ) used event-related fMRI while participants learned novel words in the scanner. Correlated amplitude changes between the hippocampus and neocortical regions were observed, in line with the overall evidence for the importance of the hippocampus in the formation of arbitrary associations in memory (e.g., McClelland et al., 1995 ; Davachi and Wagner, 2002 ; Kesner, 2013 ).

An external file that holds a picture, illustration, etc.
Object name is fpsyg-06-00278-g0006.jpg

Memory consolidation of novel color words learned via lexical association . Before learning, L1 color words and their corresponding color concepts are linked via stable, long-established connections. During learning, L1 words and novel L2 words are paired, mediated via hippocampal activation (dashed red lines). Immediately after learning, L2 words are still linked to their corresponding color concepts via the L1 words. If any direct cortical links exist at all, they are still very weak (dotted black lines). Thus, novel color words best activate their corresponding color concepts when the L1 color words are present in the testing context. If the L1 words are present (Variant A: solid L1 box), L2 words activate their color concepts via the hippocampal link. If there is no L1 context (Variant B: light gray L1 box), there is also no priming of the episodic link between the L2 words and their corresponding concepts, and therefore insufficient conceptual activation. After full consolidation, L2 words have stable cortical links to their L1 counterparts, and to their corresponding color concepts. Therefore, regardless of the presence of L1 words, the L2 words automatically co-activate their corresponding color concepts. (Illustration inspired by Frankland and Bontempi, 2005 ).

After a period suitable for consolidation, a qualitatively different memory trace seems to be involved in the Stroop effect. There is no longer any need for contextual “priming” from the L1 words. Instead, the novel words operate just as would be expected for words from an established second language, showing clear Stroop effects independent of the L1 context. Potentially, a stronger direct link has now emerged between the new word and the semantics of the word, which means that contextual priming is no longer necessary for swift and obligatory access to the meaning. This is coherent with a systems-consolidation account of the new word memory in which sleep-associated consolidation reduces the dependence on hippocampal mediation and increases the strength of a direct neocortical link between the new word and its meaning (McClelland et al., 1995 ; Davis and Gaskell, 2009 ; Takashima et al., 2009 ). This 24-h change may just be the start of the process, but may still be sufficient to allow context-independent Stroop effects to emerge. Given that semantic access for the L2 words is independent of the L1 words already 24 h after learning, our results stand in contrast to models that assume a prolonged dependence of L2 words on L1 mediation for semantic access (e.g., Kroll and Stewart, 1994 ).

Putting these results together, the data suggest that although some markers of automaticity in the perception of words are evident soon after learning, the access to meaning becomes more automatic after an opportunity for consolidation (see also Coutanche and Thompson-Schill, 2014 ; Takashima et al., 2014 ). Moors and De Houwer ( 2006 ) discuss the notion of automaticity with reference to a set of overlapping features. Automatic processes will tend to be ones that are unintentional, uncontrollable, goal independent, autonomous, stimulus-driven, unconscious, efficient and fast. However, these properties may not all co-occur, and it is feasible to think of automaticity as a graded phenomenon. Such a characterization fits well with the current results. Soon after learning, the new words can be processed in a way that is partly automatic. As long as there is sufficient contextual priming, then the new meaning of the novel words is unintentionally and uncontrollably accessed, leading to inhibition of the desired response (indicating the ink color of the word). However, after consolidation there is no longer a contextual requirement, and the meaning of the novel word can be thought of as accessible independently or autonomously (and possibly more efficiently).

These results are also in line with another study that looked at the effects of consolidation on markers of automaticity. Tham, Lindsay, and Gaskell (submitted) used two different effects that have been given as evidence of automaticity: the semantic distance effect and the semantic congruity effect. The authors found that newly learned words would show some hallmarks of automatic processing a few minutes after learning (particularly the semantic distance effect), but that sleep, and particularly slow wave sleep and spindle activity, were associated with the emergence of the semantic congruity effect, which is thought to be a sterner test of automaticity.

In sum, our results stress that careful experimental manipulations are necessary to fully capture the intricate learning and memory processes involved in the acquisition of novel meaningful words. The brain recruits multiple resources to immediately associate newly learned material with well-established knowledge. The context in which learning takes place, and the particular aspects that the learning situation provides or focuses upon, are important for the immediacy of effects that indicate the integration of newly learned words. A stable and strong integration in existing semantic networks, diagnosed by automatic effects in suitable tasks, seems to require consolidation, to become less dependent on contextual cues from the learning situation.

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We would like to thank Britta Radenz, Dario Zaremba, and Julian Wonner for their assistance in data collection and Dirk Vorberg for his valuable comments on the manuscript. Publication of this open access article was supported by Deutsche Forschungsgemeinschaft and the Open Access Publication Fund of the University of Münster.

1 There is in fact a lively debate about whether word reading in the Stroop task is in itself fully automatic or whether it can be blocked under extreme experimental circumstances (probably yes; e.g., Besner, 2001 ). However, the relevant questions here are whether word reading can be avoided through the participant's intention alone (probably not; see, e.g., Experiment 7, Brown et al., 2002 ), and whether once a word has been read, a participant can avoid to process its semantic content (most probably not; e.g., Marcel, 1983 ; Dehaene et al., 1998 ; see also Augustinova and Ferrand, 2014 ).

Supplementary material

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fpsyg.2015.00278/abstract

  • Altarriba J., Mathis K. M. (1997). Conceptual and lexical development in second language acquisition . J. Mem. Lang . 36 , 550–568 10.1006/jmla.1997.2493 [ CrossRef ] [ Google Scholar ]
  • Augustinova M., Ferrand L. (2014). Automaticity of word reading: evidence from the semantic Stroop paradigm . Curr. Dir. Psychol. Sci . 23 , 343–348 10.1177/0963721414540169 [ CrossRef ] [ Google Scholar ]
  • Bakker I., Takashima A., van Hell J. G., Janzen G., McQueen J. M. (2014). Competition from unseen or unheard novel words: lexical consolidation across modalities . J. Mem. Lang . 73 , 116–130 10.1016/j.jml.2014.03.002 [ CrossRef ] [ Google Scholar ]
  • Besner D. (2001). The myth of ballistic processing: evidence from Stroop's paradigm . Psychon. Bull. Rev . 8 , 324–330. 10.3758/BF03196168 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Borovsky A., Elman J. L., Kutas M. (2012). Once is enough: N400 indexes semantic integration of novel word meanings from a single exposure in context . Lang. Learn. Dev . 8 , 278–302. 10.1080/15475441.2011.614893 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Borovsky A., Kutas M., Elman J. (2010). Learning to use words: event-related potentials index single-shot contextual word learning . Cognition 116 , 289–296. 10.1016/j.cognition.2010.05.004 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bowers J., Davis C., Hanley D. (2005). Interfering neighbours: the impact of novel word learning on the identification of visually similar words . Cognition 97 , B45–B54. 10.1016/j.cognition.2005.02.002 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Breitenstein C., Jansen A., Deppe M., Foerster A., Sommer J., Wolbers T., et al.. (2005). Hippocampus activity differentiates good from poor learners of a novel lexicon . Neuroimage 25 , 958–968. 10.1016/j.neuroimage.2004.12.019 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Breitenstein C., Knecht S. (2002). Development and validation of a language learning model for behavioral and functional-imaging studies . J. Neurosci. Methods 114 , 173–179. 10.1016/S0165-0270(01)00525-8 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Breitenstein C., Zwitserlood P., de Vries M., Feldhues C., Knecht S., Dobel C. (2007). Five days versus a lifetime: intense associative vocabulary training generates lexically integrated words . Restor. Neurol. Neurosci . 25 , 493–500. [ PubMed ] [ Google Scholar ]
  • Brown T. L., Joneleit K., Robinson C. S., Brown C. R. (2002). Automaticity in reading and the Stroop task: testing the limits of involuntary word processing . Am. J. Psychol . 115 , 515–543. 10.2307/1423526 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bugg J. M., Crump M. J. C. (2012). In support of a distinction between voluntary and stimulus-driven control: a review of the literature on proportion congruent effects . Front. Cogn . 3 : 367 . 10.3389/fpsyg.2012.00367 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cairney S. A., Durrant S. J., Musgrove H., Lewis P. A. (2011). Sleep and environmental context: interactive effects for memory . Exp. Brain Res . 214 , 83–92. 10.1007/s00221-011-2808-7 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chen H., Ho C. (1986). Development of Stroop interference in Chinese-English bilinguals . J. Exp. Psychol. Learn. Mem. Cogn . 12 , 397–401. [ Google Scholar ]
  • Clay F., Bowers J. S., Davis C. J., Hanley D. A. (2007). Teaching adults new words: the role of practice and consolidation . J. Exp. Psychol. Learn. Mem. Cogn . 33 , 970–976. 10.1037/0278-7393.33.5.970 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cousineau D. (2005). Confidence intervals in within-subject designs: a simpler solution to Loftus and Masson's method . Tutor. Quant. Methods Psychol . 1 , 42–45. [ Google Scholar ]
  • Coutanche M. N., Thompson-Schill S. L. (2014). Fast mapping rapidly integrates information into existing memory networks . J. Exp. Psychol. Gen . 143 , 2296–2303. 10.1037/xge0000020 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Davachi L., Wagner A. D. (2002). Hippocampal contributions to episodic encoding: insights from relational and item-based learning . J. Neurophysiol . 88 , 982–990. 10.1152/jn.00046.2002 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Davis M. H., Di Betta A. M., Macdonald M. J. E., Gaskell M. G. (2008). Learning and consolidation of novel spoken words . J. Cogn. Neurosci . 21 , 803–820. 10.1162/jocn.2009.21059 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Davis M. H., Gaskell M. G. (2009). A complementary systems account of word learning: neural and behavioural evidence . Philos. Trans. R. Soc. B Biol. Sci . 364 , 3773–3800. 10.1098/rstb.2009.0111 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dehaene S., Naccache L., Le Clec' H. G., Koechlin E., Mueller M., Dehaene-Lambertz G., et al.. (1998). Imaging unconscious semantic priming . Nature 395 , 597–600. 10.1038/26967 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dijkstra T., van Heuven W. (2002). The architecture of the bilingual word recognition system: from identification to decision . Biling. Lang. Cogn . 5 , 175–197. 10.1017/S1366728902003012 [ CrossRef ] [ Google Scholar ]
  • Dobel C., Junghöfer M., Breitenstein C., Klauke B., Knecht S., Pantev C., et al.. (2010). New names for known things: on the association of novel word forms with existing semantic information . J. Cogn. Neurosci . 22 , 1251–1261. 10.1162/jocn.2009.21297 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dumay N., Gaskell M. G. (2007). Sleep-associated changes in the mental representation of spoken words . Psychol. Sci . 18 , 35–39. 10.1111/j.1467-9280.2007.01845.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dumay N., Gaskell M. G. (2012). Overnight lexical consolidation revealed by speech segmentation . Cognition 123 , 119–132. 10.1016/j.cognition.2011.12.009 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Duyck W., Brysbaert M. (2004). Forward and backward number translation requires conceptual mediation in both balanced and unbalanced bilinguals . J. Exp. Psychol. Hum. Percept. Perform . 30 , 889–906. 10.1037/0096-1523.30.5.889 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Forster J. C., Forster K. I. (2003). DMDX: a Windows display program with millisecond accuracy . Behav. Res. Methods Instrum. Comput . 35 , 116–124. 10.3758/BF03195503 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Frankland P. W., Bontempi B. (2005). The organization of recent and remote memories . Nat. Rev. Neurosci . 6 , 119–130. 10.1038/nrn1607 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Freundlieb N., Ridder V., Dobel C., Enriquez-Geppert S., Baumgaertner A., Zwitserlood P., et al.. (2012). Associative vocabulary learning: development and testing of two paradigms for the (re-) acquisition of action- and object-related words . PLoS ONE 7 :e37033. 10.1371/journal.pone.0037033 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Gaskell M. G., Dumay N. (2003). Lexical competition and the acquisition of novel words . Cognition 89 , 105–132. 10.1016/S0010-0277(03)00070-2 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Henderson L. M., Weighall A. R., Brown H., Gareth Gaskell M. (2012). Consolidation of vocabulary is associated with sleep in children . Dev. Sci . 15 , 674–687. 10.1111/j.1467-7687.2012.01172.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kachergis G., Yu C., Shiffrin R. M. (2013). Actively learning object names across ambiguous situations . Top. Cogn. Sci . 5 , 200–213. 10.1111/tops.12008 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kapnoula E. C., Packard S., Gupta P., McMurray B. (2015). Immediate lexical integration of novel word forms . Cognition 134 , 85–99. 10.1016/j.cognition.2014.09.007 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kesner R. P. (2013). A process analysis of the CA3 subregion of the hippocampus . Front. Cell. Neurosci . 7 : 78 . 10.3389/fncel.2013.00078 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kroll J. F., Stewart E. (1994). Category interference in translation and picture naming: evidence for asymmetric connections between bilingual memory representations . J. Mem. Lang . 33 , 149–174. 10.1006/jmla.1994.1008 [ CrossRef ] [ Google Scholar ]
  • Laeger I., Keuper K., Heitmann C., Kugel H., Dobel C., Eden A., et al.. (2014). Have we met before? Neural correlates of emotional learning in women with social phobia . J. Psychiatry Neurosci . 39 , E14–E23. 10.1503/jpn.130091 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lindsay S., Gaskell M. G. (2013). Lexical integration of novel words without sleep . J. Exp. Psychol. Learn. Mem. Cogn . 39 , 608–622. 10.1037/a0029243 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Liuzzi G., Freundlieb N., Ridder V., Hoppe J., Heise K., Zimerman M., et al.. (2010). The involvement of the left motor cortex in learning of a novel action word lexicon . Curr. Biol . 20 , 1745–1751. 10.1016/j.cub.2010.08.034 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Loftus G. R., Masson M. E. J. (1994). Using confidence intervals in within-subject designs . Psychon. Bull. Rev . 1 , 476–490. 10.3758/BF03210951 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lucas M. (2000). Semantic priming without association: a meta-analytic review . Psychon. Bull. Rev . 7 , 618–630. 10.3758/BF03212999 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lupyan G. (2012). Linguistically modulated perception and cognition: the label-feedback hypothesis . Front. Psychol . 3 : 54 . 10.3389/fpsyg.2012.00054 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • MacLeod C. M. (1991). Half a century of research on the Stroop effect: an integrative review . Psychol. Bull . 109 , 163–203. 10.1037/0033-2909.109.2.163 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • MacLeod C. M. (2005). The stroop task in cognitive research , in Cognitive Methods and their Application to Clinical Research , eds Wenzel A., Rubin D. C. (Washington, DC: American Psychological Association; ), 17–40. [ Google Scholar ]
  • Mägiste E. (1984). Stroop tasks and dichotic translation: the development of interference patterns in bilinguals . J. Exp. Psychol. Learn. Mem. Cogn . 10 , 304–315. [ Google Scholar ]
  • Marcel A. J. (1983). Conscious and unconscious perception: experiments on visual masking and word recognition . Cognit. Psychol . 15 , 197–237. 10.1016/0010-0285(83)90009-9 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • McClelland J. L., McNaughton B. L., O'Reilly R. C. (1995). Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory . Psychol. Rev . 102 , 419–457. 10.1037/0033-295X.102.3.419 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • McCloskey M., Cohen N. J. (1989). Catastrophic interference in connectionist networks: the sequential learning problem , in Psychology of Learning and Motivation , ed Gordon H. B. (New York, NY: Academic Press; ), 109–165. [ Google Scholar ]
  • Mestres-Missé A., Rodríguez-Fornells A., Münte T. (2007). Watching the brain during meaning acquisition . Cereb. Cortex 17 , 1858–1866. 10.1093/cercor/bhl094 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Moors A., De Houwer J. (2006). Automaticity: a theoretical and conceptual analysis . Psychol. Bull . 132 , 297–326. 10.1037/0033-2909.132.2.297 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Morey R. D. (2008). Confidence intervals from normalized data: a correction to Cousineau (2005) . Tutor. Quant. Methods Psychol . 4 , 61–64. [ Google Scholar ]
  • Neely J. (1991). Semantic priming effects in visual word recognition: A selective review of current findings , in Basic Processes in Reading: Visual Word Recognition , eds. Besner D., Humphreys G. W. (Hillsdale, NJ: Erlbaum; ), 264–336. [ Google Scholar ]
  • Preston M., Lambert W. (1969). Interlingual interference in a bilingual version of the stroop color-word task . J. Verbal Learn. Verbal Behav . 8 , 295–301 10.1016/S0022-5371(69)80079-4 [ CrossRef ] [ Google Scholar ]
  • Rasch B., Born J. (2008). Reactivation and consolidation of memory during sleep . Curr. Dir. Psychol. Sci . 17 , 188–192 10.1111/j.1467-8721.2008.00572.x [ CrossRef ] [ Google Scholar ]
  • Redding G. M., Gerjets D. A. (1977). Stroop effect: interference and facilitation with verbal and manual responses . Percept. Mot. Skills 45 , 11–17. 10.2466/pms.1977.45.1.11 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Schmidt J. R., Crump M. J. C., Cheesman J., Besner D. (2007). Contingency learning without awareness: evidence for implicit control . Conscious. Cogn . 16 , 421–435. 10.1016/j.concog.2006.06.010 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sharma D., McKenna F. P. (1998). Differential components of the manual and vocal Stroop tasks . Mem. Cognit . 26 , 1033–1040. [ PubMed ] [ Google Scholar ]
  • Stroop J. R. (1935). Studies of interference in serial verbal reactions . J. Exp. Psychol . 18 , 643–662 10.1037/h0054651 [ CrossRef ] [ Google Scholar ]
  • Sumiya H., Healy A. (2004). Phonology in the bilingual Stroop effect . Mem. Cognit . 32 , 752–758. 10.3758/BF03195865 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Takashima A., Bakker I., van Hell J. G., Janzen G., McQueen J. M. (2014). Richness of information about novel words influences how episodic and semantic memory networks interact during lexicalization . Neuroimage 84 , 265–278. 10.1016/j.neuroimage.2013.08.023 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Takashima A., Nieuwenhuis I. L. C., Jensen O., Talamini L. M., Rijpkema M., Fernández G. (2009). Shift from hippocampal to neocortical centered retrieval network with consolidation . J. Neurosci . 29 , 10087–10093. 10.1523/JNEUROSCI.0799-09.2009 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tamminen J., Davis M. H., Merkx M., Rastle K. (2012). The role of memory consolidation in generalisation of new linguistic information . Cognition 125 , 107–112. 10.1016/j.cognition.2012.06.014 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tamminen J., Gaskell M. G. (2013). Novel word integration in the mental lexicon: evidence from unmasked and masked semantic priming . Q. J. Exp. Psychol . 66 , 1001–1025. 10.1080/17470218.2012.724694 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tamminen J., Payne J. D., Stickgold R., Wamsley E. J., Gaskell M. G. (2010). Sleep spindle activity is associated with the integration of new memories and existing knowledge . J. Neurosci . 30 , 14356–14360. 10.1523/jneurosci.3028-10.2010 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tamminen J., Ralph M. A. L., Lewis P. A. (2013). The role of sleep spindles and slow-wave activity in integrating new information in semantic memory . J. Neurosci . 33 , 15376–15381. 10.1523/jneurosci.5093-12.2013 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Walker M. P. (2005). A refined model of sleep and the time course of memory formation . Behav. Brain Sci . 28 , 51–64. 10.1017/S0140525X05000026 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Yu C., Smith L. B. (2007). Rapid word learning under uncertainty via cross-situational statistics . Psychol. Sci . 18 , 414–420. 10.1111/j.1467-9280.2007.01915.x [ PubMed ] [ CrossRef ] [ Google Scholar ]

IMAGES

  1. Stroop task

    stroop test research paper

  2. Printable Stroop Test

    stroop test research paper

  3. Example of traditional paper-and-pencil Stroop Test

    stroop test research paper

  4. The Stroop Effect

    stroop test research paper

  5. (PDF) The Stroop Color and Word Test

    stroop test research paper

  6. Stroop task

    stroop test research paper

COMMENTS

  1. The Stroop Color and Word Test

    Introduction. The Stroop Color and Word Test (SCWT) is a neuropsychological test extensively used for both experimental and clinical purposes. It assesses the ability to inhibit cognitive interference, which occurs when the processing of a stimulus feature affects the simultaneous processing of another attribute of the same stimulus (Stroop, 1935).In the most common version of the SCWT, which ...

  2. Frontiers

    Introduction. The Stroop Color and Word Test (SCWT) is a neuropsychological test extensively used for both experimental and clinical purposes. It assesses the ability to inhibit cognitive interference, which occurs when the processing of a stimulus feature affects the simultaneous processing of another attribute of the same stimulus (Stroop, 1935).In the most common version of the SCWT, which ...

  3. Use of Stroop Test for Sports Psychology Study: Cross-Over Design Research

    Background: In sports psychology research, the Stroop test and its derivations are commonly used to investigate the benefits of exercise on cognitive function. The measures of the Stroop test and the computed interference often have different interclass correlation coefficients (ICC). ... The Stroop/reverse-Stroop test is a pencil and paper ...

  4. (PDF) The Stroop Color and Word Test

    The Stroop Color and Word T est (SCWT) is a neuropsychological test extensively used. to assess the ability to inhibit cognitive interference that occurs when the processing of. a specific ...

  5. The loci of Stroop effects: a critical review of methods and evidence

    The Stroop task is also ubiquitously used in basic and applied research—as indicated by the fact that the original paper (Stroop, 1935) is ... (2020), the extra sensitivity of the Stroop test (stemming from the ability to detect and rate each of these components separately) would provide clinical practitioners with invaluable information ...

  6. The Stroop Color and Word Test

    The Stroop Color and Word Test (SCWT) is a neuropsychological test extensively used to assess the ability to inhibit cognitive interference that occurs when the processing of a specific stimulus feature impedes the simultaneous processing of a second stimulus attribute, well-known as the Stroop Effect. The aim of the present work is to verify ...

  7. Construct Validity of the Stroop Color-Word Test: Influence of Speed of

    Stroop test: The Spanish adaptation of the Stroop test (Golden, 1994) was used. The number of correct responses in 45 s in the word-reading (SWR), color-naming (SCN), and color-word (SCW) conditions was recorded. The examiner indicated the errors, and participants were asked to correct them before continuing.

  8. The Stroop Color and Word Test.

    The Stroop Color and Word Test (SCWT) is a neuropsychological test extensively used to assess the ability to inhibit cognitive interference that occurs when the processing of a specific stimulus feature impedes the simultaneous processing of a second stimulus attribute, well-known as the Stroop Effect. The aim of the present work is to verify the theoretical adequacy of the various scoring ...

  9. The Stroop Color and Word Test

    The Stroop Color and Word Test (SCWT) is a neuropsychological test extensively used to assess the ability to inhibit cognitive interference that occurs when the processing of a specific stimulus feature impedes the simultaneous processing of a second stimulus attribute, well-known as the Stroop Effect. The aim of the present work is to verify

  10. The Stroop legacy: A cautionary tale on methodological ...

    The Stroop task is a seminal paradigm in experimental psychology, so much that various variants of the classical color-word version have been proposed. Here we offer a methodological review of them to emphasize the importance of designing methodologically rigorous Stroop tasks. This is not an end by itself, but it is fundamental to achieve adequate measurement validity, which is currently ...

  11. The Stroop Color-Word Test:

    1. A Zscore is usually calculated as (observed score - mean score) /SD(so without reversing the positive/negative sign), because in general a higher/lower observed score compared to the mean score signifies a better/worse performance than expected, respectively.With regard to the Stroop test scores, however, a higher score means a worse performance.

  12. (PDF) Stroop test

    Abstract. I am currently researching and conducting experiments on the Stroop test and other related tests for my thesis. In this presentation, I have made an effort to provide a thorough ...

  13. Half a century of research on the Stroop effect: An integrative review

    The literature on interference in the Stroop Color and Word Test, covering over 50 yrs and some 400 studies, is organized and reviewed. In so doing, a set of 18 reliable empirical findings is isolated that must be captured by any successful theory of the Stroop effect. Existing theoretical positions are summarized and evaluated in view of this critical evidence and the 2 major candidate ...

  14. The Stroop Task: Comparison Between the Original Paradigm and

    the Stroop results can be attributed to the translation of a paper-pencil test to computerized versions. On the other hand the presentation of stimuli as card version

  15. The loci of Stroop effects: a critical review of methods and evidence

    The Stroop task is also ubiquitously used in basic and applied research—as indicated by the fact that the original paper (Stroop ... (2020), the extra sensitivity of the Stroop test (stemming from the ability to detect and rate each of these components separately) would provide clinical practitioners with invaluable information since the ...

  16. What Stroop tasks can tell us about selective attention from childhood

    A rich body of research concerns causes of Stroop effects plus applications of Stroop. However, several questions remain. We included assessment of errors with children and adults (N = 316), who sat either a task wherein each block employed only trials of one type (unmixed task) or where every block comprised of a mix of the congruent, neutral, and incongruent trials.

  17. The stroop test and its relationship to academic performance and

    The test developed by Stroop some 70 years ago is used, among other purposes, as an indicator of attention disorder and general mood fluctuations. The present research attempted to determine whether a correlation existed between the Stroop Test, student ability as defined by a standardised IQ test, and general classroom behaviour.

  18. Use of Stroop Test for Sports Psychology Study: Cross-Over Design Research

    1 Faculty of Liberal Arts, Tohoku Gakuin University, Sendai, Japan; 2 School of Psychology, The University of Queensland, St Lucia, QLD, Australia; Background: In sports psychology research, the Stroop test and its derivations are commonly used to investigate the benefits of exercise on cognitive function. The measures of the Stroop test and the computed interference often have different ...

  19. PDF The Stroop Effect

    Variants of the Classic Stroop Task In essence, Stroop's paradigm provides a template for studying interference,and investigators have often mined that template to create Stroop-like tasks suited to their particular research purposes. Figure 2 illustrates some of the many alternate versions in the literature. The best known is the picture-word

  20. Stroop Color and Word Test, Children's Version

    The authors of the test manual present little psychometric data regarding the use of the test in samples of children. The manual indicates that the test-retest reliability of the Stroop paradigm is fairly robust across test versions (nearly all rs > 0.8).Using a Dutch sample, Neyens and Aldenkamp examined the test-retest reliability for children ages 4-12 and reported similar results (r ...

  21. (PDF) Replicating the Stroop Effect

    PDF | A replication study based on J. Ridley Stroop's original 1935 experiment titled "Studies of Interference in Serial Verbal Reactions". | Find, read and cite all the research you need on ...

  22. A Study of the Interference in Selective Attention on the Stroop Test

    Stroop (1935b) developed a. 2. 3. test to elicit this sort of interference. The test'includes three. sets of stimuli: (1) the W card, which consists of color-words to be. read aloud by Ss; (2) the C card, consisting of color patches which are. to be named; (3) the CW card, which consists of color-words printed.

  23. Stroop effects from newly learned color words: effects of memory

    Stroop effects prior to a consolidation period were observed only when the German color words were present in the Stroop blocks (Experiments 1 and 2). Given that novel and German words are paired during learning, the German words in the Stroop test provide contextual information from the learning phase.

  24. The mechanism for the specificity of gaze direction: Inhibiting

    The experiment combined the spatial Stroop paradigm to examine the effect of background location on the perception of arrow or gaze direction in the vertical dimension by manipulating the congruence between the target direction and background location, and to validate a possible cognitive mechanism for gaze direction specificity - inhibiting background location.