The design and evaluation of a critical appraisal tool for qualitative and quantitative health research
Other Publication ResearchOnline@JCUAbstract
Objective: To design and evaluate a critical appraisal tool (CAT) that can assess the research methods used in a broad range of qualitative and quantitative health research papers; has the depth to fully assess these research papers; has an appropriate scoring system; and has validity and reliability data available to evaluate the scores obtained by the tool. Critical appraisal is defined here as the impartial assessment of one or more research papers to determine their strengths, weaknesses and benefits. Study design and setting: The study was a sequential mixed methods research design where data collected in one phase informed the design and focus of the next. Data collection took place between July 2008 and September 2010 at James Cook University, Australia. There were two sections to the study: collection and synthesis of secondary data; and planning, collection and analysis of primary data. The study began with an exploration of the divide between qualitative and quantitative research. This showed that the divide is more an historical distinction than a current one. As such, there are no theoretical impediments for a single qualitative and quantitative research CAT. The scope of research methods was examined next through the use of mind maps. This exploration was required so that the design of a CAT could be situated within an overall understanding of research methods. A critical review of how CATs are designed was the final part of secondary data analysis. This review of 45 papers informed the design of the proposed critical appraisal tool, which was based on empirical evidence and the nature of research methods rather than subjective or biased assessments of what a critical appraisal tool could include. The first part of the primary data collection was an exploratory study of the validity of the scores obtained by the proposed CAT. A random selection of 60 health research papers were analysed using the proposed CAT and five alternative CATs. Next was an exploratory study of reliability, where the proposed CAT was used by five raters, each of whom appraised 24 randomly selected research papers. The final part was to test whether using a CAT was an improvement over using no CAT to appraise research papers because there is little empirical evidence to show if this is true. A total of ten raters were randomly assigned to two groups and they appraised a random selection of five health research papers. One group used the proposed CAT, while the other group did not use any CAT. Results: Critical review – Explanations on how a critical appraisal tool was designed and guidelines on how to use the CAT were available in five (11%) out of 45 papers evaluated. Thirty-eight CATs (84%) reported little or no validity evaluation and 33 CATs (73%) had no reliability testing. The questions and statements which made up each CAT were coded into a proposed CAT with eight categories, 22 items, and 98 item descriptors, such that each category and item was distinct from every other. Validity – In all research designs, the proposed CAT had significant (p < 0.05, 2-tailed) weak to moderate positive Kendall's tau correlations with the alternative CATs (0.33 ≤ τ ≤ 0.55), except in the Preamble category. There were significant moderate to strong positive correlations in true experimental (0.68 ≤ τ ≤ 0.70); quasi-experimental (0.70 ≤ τ ≤ 1.00); descriptive, exploratory or observational (0.72 ≤ τ ≤ 1.00); qualitative (0.74 ≤ τ ≤ 0.81); and systematic review (0.62 ≤ τ ≤ 0.82) research designs. There were no significant correlations in single system research designs. Reliability – The intraclass correlation coefficient (ICC) for all research papers was 0.83 for consistency and 0.74 for absolute agreement using the proposed CAT. The G study showed a majority paper effect (53–70%) for each research design, with small to moderate rater effects or paper × rater interaction effects (0–27%). Compare CAT with no CAT – The ICC for absolute agreement was 0.76 for the group not using a CAT and 0.88 for the proposed CAT group. A G study showed that the group not using a CAT had a total score variance of 24% attributable to either the rater or paper × rater interactions, whereas in the proposed CAT group this variance was 12%. Analysis of covariance (ANCOVA) showed that there were significant effects in the group not using a CAT for subject matter knowledge (F(1,18) = 7.03, p < 0.05 1-tailed, partial η² = 0.28) and rater (F(4,18) = 4.57, p < 0.05 1-tailed, partial η² = 0.50). Discussion: Critical review – Many CATs have been developed based on a subjective view of research quality rather than on evidence for what elements should or should not be included in a critical appraisal of research. When choosing a CAT, researchers should: (1) take into account the context of the appraisal; (2) determine whether the CAT was developed using the best evidence available; (3) ensure that the validity of the scores obtained from the CAT can be verified; and (4) analyse the scores obtained from the CAT for reliability. Validity – The proposed CAT exhibited a good degree of validity based on the theory the CAT was built, the collection of empirical evidence, and the stated context for its use. Therefore, inferences made based on the scores obtained using the proposed CAT should reflect the value of the papers appraised. Reliability – Given the assessment of validity and the reliability scores obtained, the proposed CAT appears to be a viable tool that can be used across a wide range of research designs and appraisal situations. Any variability in the scores obtained using the proposed CAT can be explained by the diverse subject matter of papers and participants' unfamiliarity with some research designs. Difficulties with subject matter and research designs are less likely in normal use of the proposed CAT where raters are more familiar with the subject matter and research designs used. Compare CAT with no CAT – The proposed CAT was more reliable than not using a CAT when appraising research papers. In the group not using a CAT there were significant effects for rater and subject matter knowledge. In the proposed CAT group the rater effect was almost eliminated and there was no subject matter knowledge effect. There was no research design knowledge effect in either group. Conclusion: A CAT was designed and evaluated, which met the aim and objectives of the study. The proposed CAT can be used across a broad range of qualitative and quantitative health research; has the depth to fully assess research papers; has an appropriate scoring system; and has validity and reliability data available. Further research can extend the proposed CAT to determine whether it is useful in criterion-referencing health research and general research. Furthermore, the proposed CAT can be applied to the increased use of mixed and multiple research methods, and be used to assess, understand and communicate this research knowledge.
Journal
N/A
Publication Name
N/A
Volume
N/A
ISBN/ISSN
N/A
Edition
N/A
Issue
N/A
Pages Count
333
Location
N/A
Publisher
N/A
Publisher Url
N/A
Publisher Location
N/A
Publish Date
N/A
Url
N/A
Date
N/A
EISSN
N/A
DOI
10.25903/cwz6-vj50