Branch 1 Manuscript — Draft v3 (2026-05-18)

Target: Endoscopy / Gastrointestinal Endoscopy (clinical track A) Status: draft v3 — manuscript review reflected; numbers verified against analysis outputs E–I, mediation (I2), age (I3).


Title

Why is colonoscopy harder in women? A quantitative morphological explanation from CT colonography

Short title: Colon shape complexity and colonoscopy difficulty

Author: Yong Bae Kim, MD

ORCID: 0000-0002-9933-438X

Affiliation: Department of Endoscopy, Korea Association of Health Promotion (KAHP) Jeju Branch, 111 Yeonbuk-ro, Jeju-si 63136, Republic of Korea

Corresponding author: Yong Bae Kim, Department of Endoscopy, Korea Association of Health Promotion (KAHP) Jeju Branch, 111 Yeonbuk-ro, Jeju-si 63136, Republic of Korea. Email: kcchgs@gmail.com


Abstract

Background and aims. Difficult colonoscopy causes incomplete examination, patient discomfort, and increased sedation, yet no widely adopted objective pre-procedural index of colonic shape complexity exists. Previous attempts to classify colon shape have assumed discrete morphological types without testing whether such types exist. We aimed to quantify colon shape complexity from CT colonography (CTC) on a continuous scale and to relate it to endoscopist-rated difficulty and to the known sex difference in colonoscopy difficulty.

Methods. We analysed colon centerlines from 374 supine CTC examinations (ACRIN 6664). We tested whether colon shape forms discrete clusters using multiple clustering and cluster-tendency tests. We defined the Global Tortuosity Index (GTI), a scale-free measure of how densely the colon is packed in space, and described the shape continuum with three archetypes. GTI and archetypes were correlated with an expert difficulty score (DIFFI, 1–5) and with sex, and a formal mediation analysis (sex → GTI → difficulty) with bootstrap confidence intervals was performed.

Results. Colon shape showed no discrete clusters (Gap statistic optimal K = 1; all shape axes unimodal by dip test) — it is a continuous spectrum. GTI correlated with expert difficulty (Spearman ρ = 0.63, p = 9×10⁻⁴²). Women had significantly higher GTI than men (12.1 vs 10.9, p = 3×10⁻¹¹) and a higher mean share of the “meandering” archetype (mean A3 share 0.23 vs 0.08, p = 3×10⁻¹⁶), and higher difficulty (3.19 vs 2.86, p = 6×10⁻⁵). In a formal mediation analysis the effect of sex on difficulty was fully mediated by colon shape: the indirect path through GTI accounted for essentially all of the total effect (indirect effect 0.36, 95% CI 0.26–0.48, bootstrap p < 0.001), whereas the direct effect of sex was not significant (−0.03, 95% CI −0.17 to 0.11).

Conclusions. Colon shape is a morphological continuum, not a set of discrete types. A single scale-free index (GTI) computed from CTC predicts endoscopist-rated difficulty and provides a morphological explanation for the sex gap in colonoscopy difficulty: women’s colons are, on average, geometrically more complex.


1. Introduction

Difficult colonoscopy — prolonged or incomplete intubation, looping, and patient pain — remains a clinically important problem, contributing to failed caecal intubation, higher sedation requirements, and missed lesions. Identifying difficulty before the procedure would allow appropriate allocation of endoscopist experience, adjunct devices, or alternative preparation. Existing pre-procedural predictors (age, body-mass index, prior abdominal/pelvic surgery, constipation) are indirect and inconsistent [1], because a major determinant of difficulty — the three-dimensional shape of the colon itself — is not directly measured in routine practice.

CT colonography (CTC) contains the full 3D geometry of the colon, and several groups have attempted to characterise it. Tortuosity “typing” has assigned the transverse colon to discrete shape categories (e.g. Boomerang, Gamma, M, Omega, U, V, W) using image-based AI [2]. Anatomical classifications of the redundant colon and dolichocolon distinguish normal from abnormal configurations [3]. Quantitative studies have measured colonic length, diameter, volume, tortuosity, and flexure counts, and have related these to incomplete colonoscopy [4–6]. However, every classification scheme to date assumes that discrete morphological types exist; none has tested this assumption, and most examine a single colonic segment or small samples.

We took the opposite starting point. Using 374 CTC examinations we first asked whether colon shape genuinely separates into discrete types. Finding that it does not, we describe the shape continuum with a single interpretable index and relate it to clinical difficulty — including the long-observed but mechanistically unexplained observation that colonoscopy is, on average, harder in women.


2. Methods

2.1 Study sample and centerline extraction

We analysed 374 supine CTC examinations from the publicly available ACRIN 6664 cohort [7]. For each case the colonic lumen was manually segmented from the CTC volume in 3D Slicer, producing a surface model of the colon. The centerline — the 3D curve running along the lumen from rectum to caecum — was extracted from this surface model with a geodesic level-set centroid algorithm: a geodesic distance field was computed over the mesh between its two most distant points (rectal and caecal tips), the mesh was divided into 300 equidistant geodesic level-sets, and the centroid of the surface vertices in each level-set was taken as a centerline point. The resulting sequence was smoothed and resampled to a uniform 1000-point curve, and then to a curvature-aware 50-point representation for analysis; robustness of the shape descriptors to this resampling density was confirmed (50- vs 200-point; Supplement S2). Coordinate and scale consistency were verified as a standard quality-control step (Supplement S1).

2.2 The Global Tortuosity Index (GTI)

Quantifying “how complex” a colon is requires a single, stable, and size-independent number. The classical tortuosity — path length divided by the straight-line distance between the two endpoints (arc/chord) — is unsatisfactory: it depends only on the two endpoints, so when the rectum and caecum happen to lie close together the denominator collapses and the index becomes unstable, while the shape of the curve between the endpoints is ignored.

We therefore define the Global Tortuosity Index (GTI):

GTI = L / R_g

where L is the total centerline length and R_g is the radius of gyration — the root-mean-square distance of all centerline points from their centroid.

R_g measures how widely the curve is spread in space, using the full distribution of points rather than only the endpoints, and is therefore stable. GTI — length divided by spatial spread — measures how densely a curve of given length is packed into space.

The intuition (Figure 1): take a single thread of fixed length. Stretched out straight, it has a large R_g and a low GTI. Crumpled into a tight ball, it has a small R_g and a high GTI. A straight colon has a low GTI; a colon repeatedly folded within a confined abdominal space has a high GTI.

GTI is conceptually related to the “compactness” used in robotic-colonoscopy morphometry (bounding-box volume / centerline length), but R_g is invariant to rotation and translation and uses the whole point cloud, making GTI more stable and coordinate-independent. Being a dimensionless ratio, GTI is invariant to patient body size and isolates shape complexity alone.

2.3 Test for discrete types

Before describing the continuum we tested whether colon shape forms discrete clusters, using K-means, hierarchical, spectral, HDBSCAN and Gaussian-mixture clustering, together with cluster-tendency tests (Hopkins statistic, the dip test for unimodality [8], the Gap statistic [9], and bootstrap stability). Gaussian-null reference tests (null-model silhouette comparison) were also examined; however, as confirmed on synthetic curved manifolds, such tests systematically misclassify a continuous curved manifold as clustered (Supplement S3). They were therefore not used for the final determination, which rested on the Gap statistic and the dip test.

2.4 Shape archetypes

To describe the shape continuum in interpretable terms, we applied archetypal analysis [10], representing each colon as a convex combination of a small number of extreme “archetypes”. Input features were restricted to those that are both scale-free and robust to centerline point sampling. The number of archetypes was fixed by reproducibility (agreement of archetype positions across multiple random initialisations); three archetypes were selected on this basis.

2.5 Difficulty score (DIFFI)

Procedural difficulty was rated as DIFFI, an ordinal 1–5 score (1 = easiest, 5 = most difficult) assigned from a three-dimensional surface rendering of the segmented colon — the same model from which the centerline was derived — by a single endoscopist with experience of more than 10,000 colonoscopies. The rater was blinded to GTI, to all patient information including sex, and to the scores given in previous sets. To strengthen intra-rater reliability, the entire case set was rated three times: cases were presented in a different random order in each set, each set was completed within a single day, and consecutive sets were separated by at least one week. The third (final) set was adopted as the DIFFI value. Cases whose rating was stable across the three sets were retained directly; cases showing large shifts between sets were re-reviewed and adjudicated. The rationale was that the first two passes establish a consistent internal difficulty schema, which the final pass then applies, thereby improving reliability. DIFFI is nonetheless a single-rater, image-based surrogate for true procedural difficulty; this is addressed in the Discussion.

2.6 Statistics

GTI, archetype mixtures and sex were related to DIFFI by Spearman correlation, Mann–Whitney U test, Kruskal–Wallis test, and a standardised multivariable linear model (DIFFI ~ GTI + sex + age). The hypothesis that the sex difference in difficulty operates through colon shape was tested by a formal mediation analysis (exposure = sex, mediator = GTI, outcome = DIFFI): the indirect effect (a × b) was estimated with 5000-sample bootstrap percentile confidence intervals and corroborated by the Sobel test, with DIFFI treated as continuous for this analysis. Age (recorded in the dataset for 322 cases) was assessed as a potential confounder and included as a covariate in the multivariable and mediation models. Of the 374 cases with complete shape data, one (CTC-245) was excluded for a missing DIFFI score, yielding n = 373 cases for all analyses involving difficulty.


3. Results

3.1 Colon shape is a continuum, not discrete types

No clustering produced meaningful structure (silhouette ≤ 0.14 across all K). The reliable cluster-tendency tests agreed: the Gap statistic gave an optimal K = 1, and every principal shape axis was unimodal by the dip test (p > 0.4). Gaussian-null silhouette tests did appear to indicate structure, but this is a known artifact of applying a Gaussian reference to a smooth curved manifold (Supplement S3) and does not reflect genuine clusters. Colon shape is therefore a continuous morphological spectrum. Existing discrete “type” schemes correspond to arbitrary partitions of this continuum. Consequently, difficulty should be predicted from a continuous shape value, not from a categorical type label. Archetypes are best read as the extremes of the continuum: a large fraction of colons (about half the cohort) are mixtures rather than pure types.

3.2 GTI predicts endoscopist-rated difficulty

GTI was strongly associated with expert-rated difficulty: Spearman ρ = +0.63 (p = 9×10⁻⁴²). GTI is a fully automated, deterministic geometric quantity, whereas DIFFI is an independent holistic judgment by an experienced endoscopist blinded to GTI; their close agreement establishes the concurrent validity of GTI — an automated index reproduces expert clinical assessment of difficulty. Were the two to disagree, the premise that GTI captures clinically meaningful complexity would not hold; the strong correlation confirms that it does. Difficulty also rose monotonically across the three shape archetypes — mean DIFFI 2.67 (smooth), 3.21 (high-winding), 3.68 (meandering); Kruskal–Wallis p = 3×10⁻¹⁴.

3.3 The sex gap in difficulty is mediated by colon shape

Women had significantly more complex colons than men: higher GTI (12.1 vs 10.9, p = 3×10⁻¹¹) and a markedly higher mean share of the meandering archetype (mean A3 share 0.23 vs 0.08, p = 3×10⁻¹⁶); correspondingly, the meandering archetype was the dominant archetype in 17% of women but only 5% of men (Table 1). Expert difficulty was higher in women (DIFFI 3.19 vs 2.86, p = 6×10⁻⁵).

A formal mediation analysis (sex → GTI → DIFFI) confirmed that colon shape carries the sex effect. The indirect effect through GTI was 0.36 (95% bootstrap CI 0.26–0.48; bootstrap p < 0.001; Sobel z = 6.4, p = 2×10⁻¹⁰) and accounted for essentially the entire total effect of sex on difficulty (0.33), while the direct effect of sex once GTI was known was not significant (−0.03, 95% CI −0.17 to 0.11; p = 0.68). The sex difference in colonoscopy difficulty is thus fully mediated by colon shape: the path is sex → colonic geometry (GTI) → difficulty. This is consistent with prior quantitative reports that women have more tortuous and more compact colons [11], and supplies the geometric mechanism.

Age did not account for these findings. In the 322 cases with a recorded age, age was not associated with GTI (Spearman ρ = 0.10, p = 0.09) or with difficulty (ρ = 0.05, p = 0.42), and did not differ between the sexes (p = 0.50). The GTI–difficulty association was essentially unchanged after adjustment for age (within this age subset, partial ρ = 0.59 vs unadjusted ρ = 0.64; cf. full-cohort unadjusted ρ = 0.63 above). In a standardised multivariable model (DIFFI ~ GTI + sex + age), GTI remained the only significant predictor (β = +0.64, p < 0.001), while sex (β = −0.05) and age (β = −0.02) were both non-significant. The mediation likewise held after age adjustment (age-adjusted indirect effect 0.39, 95% CI 0.27–0.51; direct effect of sex −0.08, 95% CI −0.23 to 0.08). Colon shape complexity, not age, carries the sex difference in difficulty.


4. Discussion

We show, in a large CTC cohort, that colon shape is a continuous morphological spectrum and that a single scale-free index of that spectrum — GTI — predicts endoscopist-rated procedural difficulty. Crucially, GTI explains the sex gap: women’s colons are geometrically more complex, and once GTI is known, neither sex nor age adds anything further. The long-standing clinical observation that colonoscopy is harder in women — usually attributed vaguely to pelvic anatomy or a longer colon — is here given a concrete, measurable mechanism.

These findings have two practical implications. First, in patients who have already undergone CTC, GTI can be computed at no additional cost or radiation and could serve as an objective pre-procedural difficulty index, informing endoscopist assignment, device choice, or sedation planning. Second, because shape is continuous, difficulty-prediction models should use continuous regression on GTI rather than discrete shape “types”; the typing schemes in the literature partition a continuum and discard information.

Limitations. The cohort is a single, retrospective, publicly available dataset (ACRIN 6664), and only supine acquisitions were analysed; colonic shape changes with position. Moreover, the colon imaged at CTC — cleansed and gas-distended, with the patient lying still — is not in the same configuration as the colon during colonoscopy, which is manipulated by the instrument; GTI therefore characterises a related but not identical geometry, and its link to the procedure must ultimately be tested against colonoscopy itself. Age was recorded for 322 of 373 cases and, as supplied, included one implausible low value (recorded age 11); because age was unrelated to GTI, difficulty, and sex, this did not affect the conclusions. DIFFI is a subjective, image-based surrogate for procedural difficulty rather than a measured outcome such as caecal intubation time or failure, and was assigned by a single rater, so inter-rater reliability could not be assessed — intra-rater consistency was instead reinforced by the three-set, re-adjudicated protocol. The ρ = 0.63 association, though strong for a subjective scale, should be confirmed prospectively against objective procedural endpoints. Results depend on the quality of centerline extraction. Finally, although the mediation pattern was statistically robust (significant indirect effect, non-significant direct effect), the design is cross-sectional; the causal ordering sex → geometry → difficulty, while biologically plausible, cannot be proven from these data alone.

Future work will validate GTI prospectively against measured intubation time and caecal intubation rate, and extend the analysis to prone and decubitus acquisitions. A companion morphological study will further decompose the sex difference in GTI into its anatomical determinants — pelvic geometry, abdominal cavity volume, and visceral adipose distribution, all extractable from the same CT examination — to identify where the additional tortuosity in women’s colons resides.


5. Conclusion

Colon shape is a continuum, not a set of discrete types. A single scale-free index, GTI, computed from CT colonography predicts endoscopist-rated colonoscopy difficulty and explains the sex difference in difficulty as a consequence of colonic geometry.


6. Declarations

Ethics approval. This study analysed the publicly available, fully de-identified ACRIN 6664 dataset. Secondary analysis of such data does not constitute human-subjects research requiring institutional review board review and was therefore exempt from ethics approval.

Data availability. The ACRIN 6664 CT colonography images are publicly available from the Cancer Imaging Archive (TCIA). Derived data (colon centerlines, GTI values, archetype mixtures) and the analysis code are available from the corresponding author on reasonable request.

Funding. This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Conflicts of interest. The author declares no conflicts of interest.


Tables

Table 1. Cohort characteristics, overall and by sex.

CharacteristicAll (n = 373)Female (n = 175)Male (n = 198)p*
Age, years†57.7 ± 7.657.4 ± 6.757.9 ± 8.30.50
GTI11.5 ± 1.612.1 ± 1.710.9 ± 1.43×10⁻¹¹
DIFFI (1–5)3.02 ± 0.843.19 ± 0.782.86 ± 0.866×10⁻⁵
Dominant archetype, n (%)6×10⁻⁴‡
— A1, smooth167 (45)71 (41)96 (48)
— A2, high-winding168 (45)75 (43)93 (47)
— A3, meandering38 (10)29 (17)9 (5)

Values are mean ± SD unless stated otherwise. * Mann–Whitney U test (age, GTI, DIFFI). † Age recorded for 322 cases (156 female, 166 male). ‡ χ² test for the distribution of the dominant archetype.


Figures

Figure 1. The Global Tortuosity Index (GTI). GTI = centerline length / radius of gyration (Rg). Rg, the root-mean-square distance of all centerline points from their centroid, is shown as the dashed circle. A colon extended in space has a large Rg and low GTI (left); a colon densely folded within a confined space has a small Rg and high GTI (right). GTI is dimensionless and therefore independent of patient body size.

Figure 1

Figure 2. Colon shape is a continuous spectrum, not discrete types. (a) Shape space (first two principal components); colour = GTI, showing a smooth gradient with no separated clusters. (b) GTI distribution is unimodal. (c) Clustering quality (silhouette) is near zero at every K, and the Gap statistic gives an optimal K = 1.

Figure 2

Figure 3. Three shape archetypes — the extremes of the continuum. A1 smooth (low total curvature), A2 high-winding (large total turning), A3 meandering (high tortuosity, no loop). Bold = archetype exemplar, grey = dominant cases. Most colons are mixtures of these extremes rather than pure types.

Figure 3

Figure 4. GTI predicts colonoscopy difficulty and explains the sex gap. (a) GTI vs expert difficulty (DIFFI), Spearman ρ = 0.63. (b) Women have higher GTI than men. (c) Difficulty rises across archetypes A1→A2→A3. (d) Formal mediation analysis: the effect of sex on difficulty is fully mediated by colon shape — indirect effect 0.36 (95% CI 0.26–0.48); the direct effect of sex is −0.03 (not significant).

Figure 4


References

  1. Anderson JC, Gonzalez JD, Messina CR, Pollack BJ. Factors predictive of difficult colonoscopy. Gastrointest Endosc. 2001;54(5):558-562.
  2. Sasani H, Ozkan M, Simsek MA, Sasani M. Morphometric analysis and tortuosity typing of the large intestine segments on computed tomography colonography with artificial intelligence. Colomb Med (Cali). 2024;55(2):e2005944. doi:10.25100/cm.v55i2.5944.
  3. Raahave D. Dolichocolon revisited: an inborn anatomic variant with redundancies causing constipation and volvulus. World J Gastrointest Surg. 2018;10(2):6-12. doi:10.4240/wjgs.v10.i2.6.
  4. Weber CN, Lev-Toaff AS, Levine MS, Zafar HM. Detailed quantitative assessment of colonic morphology at CT colonography using novel software: a feasibility and reproducibility study. Med Biol Eng Comput. 2017;55:507-515. doi:10.1007/s11517-016-1529-2.
  5. Alazmani A, Hood A, Jayne D, Neville A, Culmer P. Quantitative assessment of colorectal morphology: implications for robotic colonoscopy. Med Eng Phys. 2016;38(2):148-154. doi:10.1016/j.medengphy.2015.11.018.
  6. Hanson ME, Pickhardt PJ, Kim DH, Pfau PR. Anatomic factors predictive of incomplete colonoscopy based on findings at CT colonography. AJR Am J Roentgenol. 2007;189(4):774-779. doi:10.2214/AJR.07.2048.
  7. Johnson CD, Chen MH, Toledano AY, et al. Accuracy of CT colonography for detection of large adenomas and cancers. N Engl J Med. 2008;359(12):1207-1217. doi:10.1056/NEJMoa0800996. [ACRIN 6664 trial — source cohort]
  8. Hartigan JA, Hartigan PM. The dip test of unimodality. Ann Stat. 1985;13(1):70-84.
  9. Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Series B Stat Methodol. 2001;63(2):411-423.
  10. Cutler A, Breiman L. Archetypal analysis. Technometrics. 1994;36(4):338-347.
  11. Weber CN, Poff JA, Lev-Toaff AS, Levine MS, Zafar HM. Differences between genders in colorectal morphology on CT colonography using a quantitative approach: a pilot study. Clin Imaging. 2017;46:65-70. doi:10.1016/j.clinimag.2017.07.006.

작성 메모 (집필 진행용 — 최종본에서 삭제)

v2 개정 (2026-05-18)

v3.1 점검 반영 (2026-05-23)