558 - Top-box Scoring Methodology Does Not Overly Skew Interpretation of Pat...

#558 Top-box Scoring Methodology Does Not Overly Skew Interpretation of Patient Satisfaction Scores

Value and Outcomes in Spine Surgery

Poster Presented by: K. Nicholson


K. Nicholson (1)
B. Woods (2)
G. Schroeder (2)
D.G. Anderson (2)
C. Kepler (2)
M. Kurd (2)
J. Rihn (2)
A. Vaccaro (2)
A. Hilibrand (2)
K. Radcliff (2)

(1) Rothman Institute, Philadelphia, PA, United States
(2) Rothman Institute, Thomas Jefferson University, Philadelphia, PA, United States


Introduction: Top-box scoring is a common tool for reporting patient satisfaction. It is not clear whether this methodology oversimplifies the analysis of patient satisfaction. It is further unclear whether patients equally weigh their interactions with the appointment scheduler, the front desk, the nurse/medical assistant, and their physician when rating their overall satisfaction with the clinic.

Methods: Retrospective review of a consecutive series of patients who completed a satisfaction questionnaire at a large spine orthopaedics practice January 2014-January 2016. Patient satisfaction is assessed by a 64-question survey. Three questions ask the patient to rate their overall service, experience, and likelihood to recommend (LTR). Patients are also asked to rate their experience with the scheduling staff, front desk staff, nurse/medical assistant (MA), and physician. Each question measures responses on a 1-5 likert scale. Binomial logistic regressions determined the relative contribution of the scheduling staff, front desk staff, nurse/MA, and physician to overall satisfaction by applying top-box scoring methodology. Ordered logistic regressions evaluated the model with the 1-5 likert scale rating for all questions.

Results: There were 2,423 patients who completed the satisfaction questionnaire. Patients were generally satisfied with their service (58% excellent) and experience (59% excellent) and 65% were likely to recommend. Satisfaction scores were mostly “excellent” for the scheduler (50%), front desk (55%), nurse/MA (55%) and physician (66%). Across all categories, top-box scores received the most ratings; the number of ratings dropped as the grade dropped such that less than 2% reported “poor” service or experience and less than 2% indicated they would strongly not recommend the practice. The scheduler, front desk staff, nurse/MA, and physician ratings all significantly (p< 0.001) contribute to a patient's overall service, experience, and likelihood to recommend score. The log odds ratio for physician scores was generally 2-3 times higher than any other category when evaluated using top box scores or the full likert scales. For conciseness, only results for likelihood to recommend are reported here. The odds ratio in the binomial logistic regression for top-box scores for the physician was 5.368 ([4.244-6.800] 95% CI) which is greater than the odds ratios for the scheduler (2.220 [1.708-2.884]), front desk (2.002 [1.531-2.613]), and nurse/MA (2.217 [1.689-2.906]). Similarly, in the ordered logistic regression model, the odds ratios were 3.753 [3.295-4.288] (physician), 1.557 [1.358-1.787] (scheduler), 1.587 [1.371-1.835] (front desk), and 1.819 [1.562-2.126] (nurse/MA).

Conclusion: The patient's perspective of their interaction with their physician contributes the most to their overall rating of a practice and likelihood to recommend. Patients weigh their interaction with the scheduler, front desk, and nurse or medical assistant staff equally. Under the top box scoring methodology, only patients that report “excellent” are considered to have had a satisfactory experience; this groups patients who report “very good” with those who report “poor.” Despite such simplification, this method does not appear to misconstrue the conclusions drawn when identifying which interactions influence patient satisfaction. Because the data is so heavily skewed with more than 50% of the data distributed in the top box and less than 1-2% in the bottom box, statistical analyses using top box scoring are likely more robust.