← Back to Index

Proof of Concept / Pre-Launch Commercial Strategy

ML-Based KOL/HCP
Segmentation

A K-means clustering engine trained on synthetic prescriber data to segment HCPs in the TGCT (Tenosynovial Giant Cell Tumor) market. This proof of concept demonstrates how a pre-commercial biotech can build actionable physician segmentation from publicly available signals — claims data, publication databases, trial registries, and conference attendance — when entering a rare oncology market with zero proprietary prescribing data against an established incumbent (Turalio).

K-Means ClusteringPCA VisualizationSilhouette AnalysisSynthetic Claims DataChannel AllocationPre-Launch StrategyFeature EngineeringInteractive Profiler

Disclaimer: This is a proof-of-concept demonstration using entirely synthetic data. All HCP profiles are algorithmically generated and do not represent real physicians. The TGCT market framing (Turalio as incumbent, hypothetical CSF1R inhibitor challenger) is used for illustrative purposes only. This tool is NOT intended for commercial use and should NOT inform actual targeting or engagement decisions.

Generating synthetic HCP data & running K-means clustering...

Technical Architecture

K-Means Clustering

Lloyd's algorithm with K-means++ initialization for stable centroid seeding. Convergence typically within 15-25 iterations. Silhouette analysis validates cluster separation quality across k=2-8.

Feature Engineering

8-dimensional feature space constructed from 4 pre-launch data sources. Z-score standardization ensures equal feature weighting. PCA projection to 2D enables visual validation of cluster separation.

Channel Optimization

Segment membership drives omnichannel budget allocation across 6 engagement channels. Maps unsupervised clustering output to actionable media planning — connecting data science to commercial execution.

References

  • Lloyd SP. Least squares quantization in PCM. IEEE Trans Inform Theory. 1982;28(2):129-137.
  • Arthur D, Vassilvitskii S. K-means++: The advantages of careful seeding. Proc SODA. 2007;1027-1035.
  • Rousseeuw PJ. Silhouettes: A graphical aid to interpretation of cluster analysis. J Comput Appl Math. 1987;20:53-65.
  • Hotelling H. Analysis of a complex of statistical variables into principal components. J Educ Psychol. 1933;24(6):417-441.
  • Tap WD, et al. Pexidartinib versus placebo for advanced tenosynovial giant cell tumour (ENLIVEN). Lancet. 2019;394(10197):478-487.
  • Campbell JD, et al. HCP segmentation for rare disease commercialization. J Med Mark. 2021;21(2):89-101.