Abstract and Introduction
Abstract
Objective. To design, implement, and assess a rubric to evaluate student presentations in a capstone doctor of pharmacy (PharmD) course.
Design. A 20-item rubric was designed and used to evaluate student presentations in a capstone fourth-year course in 2007–2008, and then revised and expanded to 25 items and used to evaluate student presentations for the same course in 2008–2009. Two faculty members evaluated each presentation.
Assessment. The Many-Facets Rasch Model (MFRM) was used to determine the rubric's reliability, quantify the contribution of evaluator harshness/leniency in scoring, and assess grading validity by comparing the current grading method with a criterion-referenced grading scheme. In 2007–2008, rubric reliability was 0.98, with a separation of 7.1 and 4 rating scale categories. In 2008–2009, MFRM analysis suggested 2 of 98 grades be adjusted to eliminate evaluator leniency, while a further criterion-referenced MFRM analysis suggested 10 of 98 grades should be adjusted.
Conclusion. The evaluation rubric was reliable and evaluator leniency appeared minimal. However, a criterion-referenced re-analysis suggested a need for further revisions to the rubric and evaluation process.
Introduction
Evaluations are important in the process of teaching and learning. In health professions education, performance-based evaluations are identified as having "an emphasis on testing complex, 'higher-order' knowledge and skills in the real-world context in which they are actually used." Objective structured clinical examinations (OSCEs) are a common, notable example. On Miller's pyramid, a framework used in medical education for measuring learner outcomes, "knows" is placed at the base of the pyramid, followed by "knows how," then "shows how," and finally, "does" is placed at the top. Based on Miller's pyramid, evaluation formats that use multiple-choice testing focus on "knows" while an OSCE focuses on "shows how." Just as performance evaluations remain highly valued in medical education, authentic task evaluations in pharmacy education may be better indicators of future pharmacist performance. Much attention in medical education has been focused on reducing the unreliability of high-stakes evaluations. Regardless of educational discipline, high-stakes performance-based evaluations should meet educational standards for reliability and validity.
PharmD students at University of Toledo College of Pharmacy (UTCP) were required to complete a course on presentations during their final year of pharmacy school and then give a presentation that served as both a capstone experience and a performance-based evaluation for the course. Pharmacists attending the presentations were given Accreditation Council for Pharmacy Education (ACPE)-approved continuing education credits. An evaluation rubric for grading the presentations was designed to allow multiple faculty evaluators to objectively score student performances in the domains of presentation delivery and content. Given the pass/fail grading procedure used in advanced pharmacy practice experiences, passing this presentation-based course and subsequently graduating from pharmacy school were contingent upon this high-stakes evaluation. As a result, the reliability and validity of the rubric used and the evaluation process needed to be closely scrutinized.
Each year, about 100 students completed presentations and at least 40 faculty members served as evaluators. With the use of multiple evaluators, a question of evaluator leniency often arose (ie, whether evaluators used the same criteria for evaluating performances or whether some evaluators graded easier or more harshly than others). At UTCP, opinions among some faculty evaluators and many PharmD students implied that evaluator leniency in judging the students' presentations significantly affected specific students' grades and ultimately their graduation from pharmacy school. While it was plausible that evaluator leniency was occurring, the magnitude of the effect was unknown. Thus, this study was initiated partly to address this concern over grading consistency and scoring variability among evaluators.
Because both students' presentation style and content were deemed important, each item of the rubric was weighted the same across delivery and content. However, because there were more categories related to delivery than content, an additional faculty concern was that students feasibly could present poor content but have an effective presentation delivery and pass the course.
The objectives for this investigation were: (1) to describe and optimize the reliability of the evaluation rubric used in this high-stakes evaluation; (2) to identify the contribution and significance of evaluator leniency to evaluation reliability; and (3) to assess the validity of this evaluation rubric within a criterion-referenced grading paradigm focused on both presentation delivery and content.