Supplementary material

Manuscript title: A wall-mounted stopwatch as a cognitive aid for propofol titration during sedation endoscopy: a retrospective quality improvement evaluation

Target journal: BMJ Open Quality (Quality Improvement Report, SQUIRE 2.0)

Introduction

This supplement reports detailed methods and results for the seven sensitivity analyses referenced in the main manuscript. The primary analytic cohort (n = 1,371) and the primary linear regression of weight-adjusted propofol dose (WAPD) on period × sex (adjusted for age, body weight, height, and procedure time; heteroskedasticity-consistent robust standard errors) are reported in the main text; this supplement extends the primary specification with alternative cohort definitions, alternative covariate sets, an alternative inferential approach (median quantile regression), and an alternative time-domain model (interrupted time series). All analyses use the same 2019 single-operator data, with results extracted from the same de-identified clinical record described in the main Methods.

For brevity, sensitivity analyses are labelled §S1–§S7 throughout. A summary table of point estimates and 95% confidence intervals is provided in §S8. Software and code availability are described in §S9.

§S1. Sex-stratified propensity-score-matched cohort

Rationale

Two patient-level imbalances were detected in the primary cohort (Table 1, main text): mean height was modestly greater in the intervention period (165.1 vs 163.9 cm; p = 0.020) and median procedure time was shorter post-stopwatch (74 vs 78 seconds; p < 0.001). In addition, monthly case-mix variation in age, height, and procedure time was statistically detectable (Kruskal–Wallis p < 10⁻⁷ for age and procedure time), reflecting Korean health-screening seasonality. Although adjusted regression already accounts for these covariates linearly, propensity-score matching (PSM) provides a robustness check that does not rely on the linearity assumption.

Methods

Sex-stratified 1:1 nearest-neighbour propensity-score matching without replacement was performed within each sex separately. The propensity score was estimated by logistic regression of period (intervention vs baseline) on age, weight, height, and procedure time. Matching used a calliper of 0.2 standard deviations of the logit propensity score, with strict 1:1 matching to ensure balanced cohort size. Post-matching balance was verified by standardised mean differences (target |SMD| < 0.10 across all matched covariates). The primary linear regression specification was then applied to the matched cohort.

Software: MatchIt (R v4.4); regression in Python statsmodels 0.14.6 with HC3 robust SEs.

Results

Matched cohort: n = 1,052 (526 baseline, 526 intervention; 289 female pairs, 237 male pairs after sex-stratified matching).

Post-matching balance: all matched covariates (age, weight, height, procedure time) achieved |SMD| < 0.05.

Adjusted period × sex interaction (β₃):

Quantity	n	Estimate (mg/kg)	95% CI	p
β₃ (period × sex)	1,052	−0.105	[−0.160, −0.050]	< 0.001
Within-female adjusted change	578	−0.063	[−0.097, −0.029]	< 0.001
Within-male adjusted change	474	+0.041	[−0.002, +0.084]	0.064

Figure reference

Supplementary Figure S1 — Density convergence in the PSM cohort. Female and male WAPD distributions (kernel density) before and after stopwatch installation, restricted to the matched cohort. Pre-installation densities are visibly separated (female right-shifted); post-installation densities almost completely overlap, providing a distributional-level confirmation of the sex-gap reduction beyond the median.

Interpretation

Sex-stratified PSM reproduces the primary finding (β₃ = −0.105 vs primary β₃ = −0.097) in a cohort balanced on age, weight, height, and procedure time, ruling out residual case-mix imbalance as a sole explanation. The within-male period change is borderline non-significant (p = 0.064) in the matched cohort, consistent with the main-text Mann–Whitney within-male p = 0.095 in the full cohort. As discussed in the main text Limitations, PSM can address case-mix variation but cannot rescue the pre-existing temporal trend (§S7).

§S2. Linear regression omitting procedure time

Rationale

Procedure time enters the primary model as a linear covariate (main text Methods) because it differed modestly between periods (median 74 vs 78 seconds; p < 0.001, Table 1, main text). Procedure time can be conceived not only as a confounder but as a partial mediator: the intervention may have shortened decision latency and thereby reduced procedure time, in which case adjusting for it could absorb part of the intervention effect. This sensitivity analysis removes procedure time from the right-hand side to test the magnitude of any such mediation.

Methods

The primary specification (WAPD on period + sex + period:sex + age + body weight + height + procedure time) was modified to remove procedure time: WAPD on period + sex + period:sex + age + body weight + height (HC3 robust SEs). Software: Python statsmodels 0.14.6.

Results

Adjusted period × sex interaction (β₃):

Quantity	Estimate (mg/kg)	95% CI	p
β₃ (period × sex)	−0.095	[−0.145, −0.046]	< 0.001

Interpretation

Removing procedure time leaves the period × sex interaction essentially unchanged from the primary specification (β₃ = −0.095 vs primary −0.097). The 4-second median procedure-time difference is not a material mediator of the marginal sex-gap shift.

§S3. Linear regression with BMI substituted for weight and height

Rationale

The primary specification adjusts for body weight and height as two separate linear covariates. BMI is a derived single index that combines the two (kg/m²) and is commonly used in propofol pharmacokinetic discussions. Substituting BMI for the two anthropometric variables tests whether the result depends on the functional form of body-size adjustment.

Methods

The primary specification was modified to replace weight and height with BMI as a single linear covariate: WAPD on period + sex + period:sex + age + BMI + procedure time (HC3 robust SEs). Software: Python statsmodels 0.14.6.

Results

Adjusted period × sex interaction (β₃):

Quantity	Estimate (mg/kg)	95% CI	p
β₃ (period × sex)	−0.095	[−0.146, −0.044]	< 0.001

Interpretation

The estimate is essentially identical to the primary specification (β₃ = −0.095 vs primary −0.097). The sex-gap shift is not driven by the specific functional form of body-size adjustment.

§S4. Linear regression without weight (avoiding double-adjustment)

Rationale

WAPD is defined as total propofol dose divided by body weight, so weight enters the outcome variable. Including weight as a covariate (directly, or implicitly via BMI) constitutes partial double-adjustment, which can bias regression estimates of period or interaction terms. This sensitivity analysis removes weight entirely from the right-hand side.

Methods

The primary specification was modified to remove all weight components: WAPD on period + sex + period:sex + age + height (HC3 robust SEs). BMI was not included to avoid reintroducing weight via the BMI denominator. Software: Python statsmodels 0.14.6.

Results

Adjusted period × sex interaction (β₃):

Quantity	Estimate (mg/kg)	95% CI	p
β₃ (period × sex)	−0.085	[−0.140, −0.029]	0.003

Interpretation

Removing weight attenuates the magnitude of β₃ from −0.097 to −0.085 (12% reduction) but the direction and significance are preserved. The double-adjustment artefact in the primary specification, if present, is small and does not change the qualitative conclusion.

§S5. January-2019-excluded sensitivity

Rationale

January 2019 was the operator’s first month after the unit’s transition from midazolam–propofol combination sedation to propofol monotherapy. Twenty-five January cases (15 female, 10 male) reflect early-protocol learning-curve dosing rather than stable propofol-monotherapy practice. The operator’s narrative anchor (a near-miss case of unexpectedly deep sedation) occurred during this transition month. This sensitivity analysis tests whether the result depends on early-transition cases.

Methods

The primary regression specification was applied to the cohort with January 2019 cases excluded (n = 1,346: 501 baseline, 845 intervention). The within-sex Mann–Whitney rank-sum test of WAPD between periods was also recomputed.

Results

Adjusted period × sex interaction (β₃):

Quantity	Estimate (mg/kg)	95% CI	p
β₃ (period × sex)	−0.093	[−0.143, −0.042]	< 0.001

Within-sex Mann–Whitney p:

Sex	January-included (full cohort)	January-excluded
Female	0.001	0.001
Male	0.095	0.032

Interpretation

January exclusion does not change the direction or significance of β₃ (−0.093 vs primary −0.097). The within-male Mann–Whitney p shifts from borderline (0.095) to significant (0.032), consistent with the bidirectional dosing-realignment framing once the operator’s protocol-transition month is removed. We retain the full cohort as primary to avoid the appearance of post-hoc selection (see main text Methods).

§S6. Median quantile regression

Rationale

Linear regression estimates change in mean WAPD. Median regression at τ = 0.5 estimates change in the median WAPD, which is robust to right-tail outliers. The dose-grid distributional framing (main text Results · Dose-grid shifts) is naturally aligned with median-based inference because the protocol-induced 10-mg grid produces a discrete, multimodal distribution where the median is more interpretable than the mean.

Methods

Quantile regression at τ = 0.5 was performed with the same right-hand side as the primary specification (period + sex + period:sex + age + BMI). Software: Python statsmodels.regression.quantile_regression.

Results

Adjusted period × sex interaction (β₃) at τ = 0.5:

Quantity	Estimate (mg/kg)	95% CI	p
β₃ (period × sex)	−0.125	[−0.171, −0.079]	< 0.001

Interpretation

Median quantile regression estimates a slightly larger sex-gap reduction than the primary mean-based specification (−0.125 vs −0.097). The larger magnitude is consistent with right-tail compression in the female distribution visible in the dose-grid figure (main text Figure 2): mean-based regression averages over the right tail, whereas median regression isolates the central-tendency shift. Direction and significance are preserved.

§S7. Interrupted time-series analysis

Rationale

The monthly run chart (main text Figure 3) revealed a pre-existing downward trend in absolute WAPD levels in both sexes during the pre-intervention months. A simple before-after comparison absorbs this pre-trend into the period × sex estimate. Interrupted time-series (ITS) modelling explicitly separates the pre-existing slope from a step-shift at the intervention boundary, providing the most conservative test of an intervention-attributable change.

Methods

Patient-level segmented regression of WAPD on month index, covering the January–May 2019 pre-period and July–October 2019 post-period (June 2019 was the installation month and contributed no analysed cases). The model includes sex-specific pre-period slopes, sex-specific level shifts at the July 2019 boundary, and sex-specific post-period slopes:

WAPD ~ month_index + after + post_month_index + female + female:month_index + female:after + female:post_month_index

with HC3 robust standard errors. The female:after term gives the female-versus-male DIFFERENCE in level shift at the boundary and is the ITS analogue of the primary β₃. Pre-period slopes by sex are also reported as simple patient-level linear regressions of WAPD on month index restricted to January–May 2019 (consistent with the main-text Figure 3 description); these coincide with the corresponding terms of the segmented model. Software: Python statsmodels 0.14.6.

Results

Pre-period slopes (patient-level, January–May 2019):

Quantity	n	Estimate (mg/kg/month)	95% CI	p
Pre-intervention slope (Male)	237	−0.038	[−0.063, −0.012]	0.004
Pre-intervention slope (Female)	289	−0.060	[−0.083, −0.036]	< 10⁻⁶

Level shifts at intervention boundary (July 2019):

Quantity	Estimate (mg/kg)	95% CI	p
Level shift (Male)	+0.094	[+0.016, +0.171]	0.017
Female-vs-Male DIFFERENCE in level shift	−0.039	[−0.145, +0.066]	0.47

Pre-period female–male gap trajectory:

The gap itself was not significantly trending pre-intervention: regression of the monthly female–male median gap on month index (January–May, five monthly observations) gives slope = +0.011/month (p = 0.51). The level shift in the gap visible at the July 2019 boundary is therefore not a continuation of pre-existing gap-shrinkage.

Figure reference

Supplementary Figure S2 — ITS model fit. Monthly median WAPD by sex (Female red; Male blue) overlaid with patient-level segmented-regression fitted lines; sex-specific pre-period downward slopes (January–May) are shown as dashed lines, the July 2019 boundary as a vertical reference, and the post-period observed monthly medians relative to the projected pre-trend.

Interpretation

Under ITS adjustment, the female-versus-male level shift attenuates from the primary β₃ = −0.097 (p < 0.001) to −0.039 with a confidence interval that includes zero (p = 0.47). This is the principal limitation of the headline period × sex interaction, addressed openly in the main text Limitations §2. However, the dose-grid and BMI-substructure findings (main text Results · Dose-grid shifts and BMI substructure) depend on within-period distributional patterns rather than on attributing the level shift to the intervention alone, and are therefore not subject to this concern in the same way. The pre-period gap stability (slope p = 0.27) further indicates that the gap-level shift at July 2019 is not a continuation of a pre-existing gap trend.

§S8. Sensitivity analysis summary

All seven sensitivity analyses preserve the negative direction of the period × sex interaction. Six of seven remain p < 0.01; the ITS analysis (§S7) attenuates to a confidence interval that includes zero, as discussed in main text Limitations §2.

§	Analysis	n	β₃ (mg/kg)	95% CI	p
Primary	OLS, full cohort (period + sex + period:sex + age + weight + height + procedure time; HC3)	1,371	−0.097	[−0.147, −0.048]	< 0.001
§S1	Sex-stratified PSM	1,052	−0.105	[−0.160, −0.050]	< 0.001
§S2	Omit procedure time	1,371	−0.095	[−0.145, −0.046]	< 0.001
§S3	BMI substituted for weight + height	1,371	−0.095	[−0.146, −0.044]	< 0.001
§S4	Without weight (no double-adjustment)	1,371	−0.085	[−0.140, −0.029]	0.003
§S5	January-2019-excluded	1,346	−0.093	[−0.143, −0.042]	< 0.001
§S6	Median quantile regression (τ = 0.5)	1,371	−0.125	[−0.171, −0.079]	< 0.001
§S7	Interrupted time-series (level shift)	1,371	−0.039	[−0.145, +0.066]	0.47

§S8b. Supplementary Figure S3 — installation photograph

Supplementary Figure S3 — Wall-mounted stopwatch in situ. Original photograph of the installation depicted as a line-drawing illustration in main-text Figure 1. The endoscopist wears a surgical mask; identifying features have been further attenuated by Gaussian face-region blur. Provided for readers who wish to view the original clinical setting; the line-drawing version (Figure 1) serves as the primary visual reference.

§S9. Software and code availability

Statistical analyses were conducted in Python 3.12 (pandas 2.x, statsmodels 0.14.6, numpy, scipy) and R 4.4 (MatchIt for propensity-score matching). Hartigan’s dip test for multimodality was computed via the diptest R package. All analytic scripts (01_clean_data.py through 18_floor_uncertainty.py) and de-identified intermediate datasets are available from the corresponding author on reasonable request, subject to IRB approval (P01-202102-11-001). The final analytic dataset (cleaned_primary_with_date.csv, n = 1,371 with month/year of procedure) is sufficient to reproduce all results reported in this supplement.

End of supplement.