Description
In the section on typographic units there is a discussion on extended grapheme clusters, using स्कूल as an example. The text says:
There are two syllables in this word: SA+VIRAMA+KA+UU and LA. Note, however, that there are three Unicode grapheme clusters here: SA+VIRAMA, KA+UU and LA.
Styling is done on the basis of the whole orthographic syllable, not the first character, nor even the first grapheme.
Unicode 15.1, UAX #29 added a new rule specifically for some Indic scripts:
GB9c rule only applies to extended grapheme clusters:
Do not break within certain combinations with Indic_Conjunct_Break (InCB)=Linker.
So the following characters:
Character properties
┌──────┬──────┬────────────────────────┬────────────┬────────────┬─────┬──────┬────┐
│ char │ cp │ name │ script │ block │ cat │ bidi │ cc │
├──────┼──────┼────────────────────────┼────────────┼────────────┼─────┼──────┼────┤
│ ् │ 094D │ DEVANAGARI SIGN VIRAMA │ Devanagari │ Devanagari │ Mn │ NSM │ 9 │
│ ্ │ 09CD │ BENGALI SIGN VIRAMA │ Bengali │ Bengali │ Mn │ NSM │ 9 │
│ ્ │ 0ACD │ GUJARATI SIGN VIRAMA │ Gujarati │ Gujarati │ Mn │ NSM │ 9 │
│ ୍ │ 0B4D │ ORIYA SIGN VIRAMA │ Oriya │ Oriya │ Mn │ NSM │ 9 │
│ ్ │ 0C4D │ TELUGU SIGN VIRAMA │ Telugu │ Telugu │ Mn │ NSM │ 9 │
│ ് │ 0D4D │ MALAYALAM SIGN VIRAMA │ Malayalam │ Malayalam │ Mn │ NSM │ 9 │
└──────┴──────┴────────────────────────┴────────────┴────────────┴─────┴──────┴────┘
String: [\p{InCB=Linker}]
can now extend a grapheme cluster.
So स्कूल will be three extended grapheme clusters (['स्', 'कू', 'ल'] – SA+VIRAMA, KA+UU and LA) in Unicode 15.0 and prior versions, and two extended grapheme clusters (['स्कू', 'ल'] – SA+VIRAMA+KA+UU and LA) in Unicode 15.1 onwards.
So the effect of extended grapheme cluster level segmentation will depend on the Version of Unicode the toolchain is using at the pint of segentation.