Unicode collation algorithm

The Unicode collation algorithm (UCA) is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from strings representing text in any writing system and language that can be represented with Unicode. These keys can then be efficiently byte-by-byte compared in order to collate or sort them according to the rules of the language, with options for ignoring case, accents, etc.^[1]

Unicode Technical Report #10 also specifies the Default Unicode Collation Element Table (DUCET), this data file specifies a default collation ordering, the DUCET is customizable for different languages.^[1]^[2] Some such customizations can be found in the Unicode Common Locale Data Repository (CLDR).^[3]

An open source implementation of UCA is included with the International Components for Unicode, ICU.^[4]^[5] ICU supports tailoring, and the collation tailorings from CLDR are included in ICU.^[6]^[2]

References

1 2 Whistler, Ken; Scherer, Markus; Davis, Mark (2022-08-26). "UTS #10: Unicode Collation Algorithm". Unicode. Retrieved 2023-08-16.
1 2 Hosken, Martin (2021-09-23). Unicode Sort Tailoring: Tutorial (PDF) (1.3 ed.). SIL Writing Systems Technology. pp. 2–3. Retrieved 2023-08-16.
↑ "CLDR Releases/Downloads". Unicode CLDR. Retrieved 2023-08-16.
↑ "ICU - International Components for Unicode". Unicode. Retrieved 2023-08-16.
↑ "Collations". SyBooks Online. Retrieved 2023-08-16.
↑ "Customization". ICU Documentation. Retrieved 2023-08-16.

External links

Unicode Collation Algorithm: Unicode Technical Standard #10
Mimer SQL Unicode Collation Charts

Tools

ICU Locale Explorer An online demonstration of the Unicode Collation Algorithm using International Components for Unicode , as of 2023-08-16 it's not working.
An ICU collation demo, as of 2023-08-16 it's not working.
msort A sort program that provides an unusual level of flexibility in defining collations and extracting keys.

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[:0-1] 1 2 Whistler, Ken; Scherer, Markus; Davis, Mark (2022-08-26). "UTS #10: Unicode Collation Algorithm". Unicode. Retrieved 2023-08-16.

[:1-2] 1 2 Hosken, Martin (2021-09-23). Unicode Sort Tailoring: Tutorial (PDF) (1.3 ed.). SIL Writing Systems Technology. pp. 2–3. Retrieved 2023-08-16.

[3] "CLDR Releases/Downloads". Unicode CLDR. Retrieved 2023-08-16.

[4] "ICU - International Components for Unicode". Unicode. Retrieved 2023-08-16.

[5] "Collations". SyBooks Online. Retrieved 2023-08-16.

[6] "Customization". ICU Documentation. Retrieved 2023-08-16.

[1]

[2]

[3]

[4]

[5]

[6]

References

See also

External links

Tools