Are Video Recordings Reliable for Assessing Surgical Performance? A Prospective Reliability Study Using Generalizability Theory

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Are Video Recordings Reliable for Assessing Surgical Performance? A Prospective Reliability Study Using Generalizability Theory. / Frithioff, Andreas; Frendø, Martin; Foghsgaard, Søren; Sørensen, Mads Sølvsten; Andersen, Steven Arild Wuyts.

In: Simulation in healthcare : journal of the Society for Simulation in Healthcare, Vol. 18, No. 4, 2023, p. 219-225.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Frithioff, A, Frendø, M, Foghsgaard, S, Sørensen, MS & Andersen, SAW 2023, 'Are Video Recordings Reliable for Assessing Surgical Performance? A Prospective Reliability Study Using Generalizability Theory', Simulation in healthcare : journal of the Society for Simulation in Healthcare, vol. 18, no. 4, pp. 219-225. https://doi.org/10.1097/SIH.0000000000000672

APA

Frithioff, A., Frendø, M., Foghsgaard, S., Sørensen, M. S., & Andersen, S. A. W. (2023). Are Video Recordings Reliable for Assessing Surgical Performance? A Prospective Reliability Study Using Generalizability Theory. Simulation in healthcare : journal of the Society for Simulation in Healthcare, 18(4), 219-225. https://doi.org/10.1097/SIH.0000000000000672

Vancouver

Frithioff A, Frendø M, Foghsgaard S, Sørensen MS, Andersen SAW. Are Video Recordings Reliable for Assessing Surgical Performance? A Prospective Reliability Study Using Generalizability Theory. Simulation in healthcare : journal of the Society for Simulation in Healthcare. 2023;18(4):219-225. https://doi.org/10.1097/SIH.0000000000000672

Author

Frithioff, Andreas ; Frendø, Martin ; Foghsgaard, Søren ; Sørensen, Mads Sølvsten ; Andersen, Steven Arild Wuyts. / Are Video Recordings Reliable for Assessing Surgical Performance? A Prospective Reliability Study Using Generalizability Theory. In: Simulation in healthcare : journal of the Society for Simulation in Healthcare. 2023 ; Vol. 18, No. 4. pp. 219-225.

Bibtex

@article{2cf9f3b1e8964134af05d0d39f965dd3,
title = "Are Video Recordings Reliable for Assessing Surgical Performance?: A Prospective Reliability Study Using Generalizability Theory",
abstract = "INTRODUCTION: Reliability is pivotal in surgical skills assessment. Video-based assessment can be used for objective assessment without physical presence of assessors. However, its reliability for surgical assessments remains largely unexplored. In this study, we evaluated the reliability of video-based versus physical assessments of novices' surgical performances on human cadavers and 3D-printed models-an emerging simulation modality.METHODS: Eighteen otorhinolaryngology residents performed 2 to 3 mastoidectomies on a 3D-printed model and 1 procedure on a human cadaver. Performances were rated by 3 experts evaluating the final surgical result using a well-known assessment tool. Performances were rated both hands-on/physically and by video recordings. Interrater reliability and intrarater reliability were explored using κ statistics and the optimal number of raters and performances required in either assessment modality was determined using generalizability theory.RESULTS: Interrater reliability was moderate with a mean κ score of 0.58 (range 0.53-0.62) for video-based assessment and 0.60 (range, 0.55-0.69) for physical assessment. Video-based and physical assessments were equally reliable (G coefficient 0.85 vs. 0.80 for 3D-printed models and 0.86 vs 0.87 for cadaver dissections). The interaction between rater and assessment modality contributed to 8.1% to 9.1% of the estimated variance. For the 3D-printed models, 2 raters evaluating 2 video-recorded performances or 3 raters physically assessing 2 performances yielded sufficient reliability for high-stakes assessment (G coefficient >0.8).CONCLUSIONS: Video-based and physical assessments were equally reliable. Some raters were affected by changing from physical to video-based assessment; consequently, assessment should be either physical or video based, not a combination.",
author = "Andreas Frithioff and Martin Frend{\o} and S{\o}ren Foghsgaard and S{\o}rensen, {Mads S{\o}lvsten} and Andersen, {Steven Arild Wuyts}",
note = "Copyright {\textcopyright} 2022 Society for Simulation in Healthcare.",
year = "2023",
doi = "10.1097/SIH.0000000000000672",
language = "English",
volume = "18",
pages = "219--225",
journal = "Simulation in Healthcare",
issn = "1559-2332",
publisher = "Lippincott Williams & Wilkins",
number = "4",

}

RIS

TY - JOUR

T1 - Are Video Recordings Reliable for Assessing Surgical Performance?

T2 - A Prospective Reliability Study Using Generalizability Theory

AU - Frithioff, Andreas

AU - Frendø, Martin

AU - Foghsgaard, Søren

AU - Sørensen, Mads Sølvsten

AU - Andersen, Steven Arild Wuyts

N1 - Copyright © 2022 Society for Simulation in Healthcare.

PY - 2023

Y1 - 2023

N2 - INTRODUCTION: Reliability is pivotal in surgical skills assessment. Video-based assessment can be used for objective assessment without physical presence of assessors. However, its reliability for surgical assessments remains largely unexplored. In this study, we evaluated the reliability of video-based versus physical assessments of novices' surgical performances on human cadavers and 3D-printed models-an emerging simulation modality.METHODS: Eighteen otorhinolaryngology residents performed 2 to 3 mastoidectomies on a 3D-printed model and 1 procedure on a human cadaver. Performances were rated by 3 experts evaluating the final surgical result using a well-known assessment tool. Performances were rated both hands-on/physically and by video recordings. Interrater reliability and intrarater reliability were explored using κ statistics and the optimal number of raters and performances required in either assessment modality was determined using generalizability theory.RESULTS: Interrater reliability was moderate with a mean κ score of 0.58 (range 0.53-0.62) for video-based assessment and 0.60 (range, 0.55-0.69) for physical assessment. Video-based and physical assessments were equally reliable (G coefficient 0.85 vs. 0.80 for 3D-printed models and 0.86 vs 0.87 for cadaver dissections). The interaction between rater and assessment modality contributed to 8.1% to 9.1% of the estimated variance. For the 3D-printed models, 2 raters evaluating 2 video-recorded performances or 3 raters physically assessing 2 performances yielded sufficient reliability for high-stakes assessment (G coefficient >0.8).CONCLUSIONS: Video-based and physical assessments were equally reliable. Some raters were affected by changing from physical to video-based assessment; consequently, assessment should be either physical or video based, not a combination.

AB - INTRODUCTION: Reliability is pivotal in surgical skills assessment. Video-based assessment can be used for objective assessment without physical presence of assessors. However, its reliability for surgical assessments remains largely unexplored. In this study, we evaluated the reliability of video-based versus physical assessments of novices' surgical performances on human cadavers and 3D-printed models-an emerging simulation modality.METHODS: Eighteen otorhinolaryngology residents performed 2 to 3 mastoidectomies on a 3D-printed model and 1 procedure on a human cadaver. Performances were rated by 3 experts evaluating the final surgical result using a well-known assessment tool. Performances were rated both hands-on/physically and by video recordings. Interrater reliability and intrarater reliability were explored using κ statistics and the optimal number of raters and performances required in either assessment modality was determined using generalizability theory.RESULTS: Interrater reliability was moderate with a mean κ score of 0.58 (range 0.53-0.62) for video-based assessment and 0.60 (range, 0.55-0.69) for physical assessment. Video-based and physical assessments were equally reliable (G coefficient 0.85 vs. 0.80 for 3D-printed models and 0.86 vs 0.87 for cadaver dissections). The interaction between rater and assessment modality contributed to 8.1% to 9.1% of the estimated variance. For the 3D-printed models, 2 raters evaluating 2 video-recorded performances or 3 raters physically assessing 2 performances yielded sufficient reliability for high-stakes assessment (G coefficient >0.8).CONCLUSIONS: Video-based and physical assessments were equally reliable. Some raters were affected by changing from physical to video-based assessment; consequently, assessment should be either physical or video based, not a combination.

U2 - 10.1097/SIH.0000000000000672

DO - 10.1097/SIH.0000000000000672

M3 - Journal article

C2 - 36260767

VL - 18

SP - 219

EP - 225

JO - Simulation in Healthcare

JF - Simulation in Healthcare

SN - 1559-2332

IS - 4

ER -

ID: 344978617