| Language(s) | Ukrainian, Russian, Bulgarian | 
|---|---|
| Classification | 8-bit KOI, extended ASCII | 
| Extends | KOI8-B | 
| Based on | KOI8-R | 
| Other related encoding(s) | KOI8-RU, KOI8-F | 
KOI8-U (RFC 2319) is an 8-bit character encoding, designed to cover Ukrainian, which uses a Cyrillic alphabet. It is based on KOI8-R, which covers Russian and Bulgarian, but replaces eight box drawing characters with four Ukrainian letters Ґ, Є, І, and Ї in both upper case and lower case.
KOI8-RU is closely related, but adds Ў for Belarusian. In both, the letter allocations match those in KOI8-E, except for Ґ which is added to KOI8-F.
In Microsoft Windows, KOI8-U is assigned the code page number 21866. In IBM, KOI8-U is assigned code page/CCSID 1168.[1][2][3]
KOI8 remains much more commonly used than ISO 8859-5, which never really caught on. Another common Cyrillic character encoding is Windows-1251. In the future, both may eventually give way to Unicode.
KOI8 stands for Kod Obmena Informatsiey, 8 bit (Russian: Код Обмена Информацией, 8 бит) which means "Code for Information Exchange, 8 bit".
The KOI8 character sets have the property that the Russian Cyrillic letters are in pseudo-Roman order rather than the natural Cyrillic alphabetical order as in ISO 8859-5. Although this may seem unnatural, it has the useful property that if the eighth bit is stripped, the text can still be read (or at least deciphered) in case-reversed transliteration on an ordinary ASCII terminal. For instance, "Русский Текст" in KOI8-U becomes rUSSKIJ tEKST ("Russian Text") if the 8th bit is stripped.
Character set
The following table shows the KOI8-U encoding.[1][4] Each character is shown with its equivalent Unicode code point.
| KOI8-U | ||||||||||||||||
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
| 0x | ||||||||||||||||
| 1x | ||||||||||||||||
| 2x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / | 
| 3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? | 
| 4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | 
| 5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ | 
| 6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | 
| 7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | |
| 8x | ─ 2500  | 
│ 2502  | 
┌ 250C  | 
┐ 2510  | 
└ 2514  | 
┘ 2518  | 
├ 251C  | 
┤ 2524  | 
┬ 252C  | 
┴ 2534  | 
┼ 253C  | 
▀ 2580  | 
▄ 2584  | 
█ 2588  | 
▌ 258C  | 
▐ 2590  | 
| 9x | ░ 2591  | 
▒ 2592  | 
▓ 2593  | 
⌠ 2320  | 
■ 25A0  | 
∙ 2219  | 
√ 221A  | 
≈ 2248  | 
≤ 2264  | 
≥ 2265  | 
NBSP | ⌡ 2321  | 
° 00B0  | 
² 00B2  | 
· 00B7  | 
÷ 00F7  | 
| Ax | ═ 2550  | 
║ 2551  | 
╒ 2552  | 
ё 0451  | 
є 0454  | 
╔ 2554  | 
і 0456  | 
ї 0457  | 
╗ 2557  | 
╘ 2558  | 
╙ 2559  | 
╚ 255A  | 
╛ 255B  | 
ґ 0491  | 
╝ 255D  | 
╞ 255E  | 
| Bx | ╟ 255F  | 
╠ 2560  | 
╡ 2561  | 
Ё 0401  | 
Є 0404  | 
╣ 2563  | 
І 0406  | 
Ї 0407  | 
╦ 2566  | 
╧ 2567  | 
╨ 2568  | 
╩ 2569  | 
╪ 256A  | 
Ґ 0490  | 
╬ 256C  | 
© 00A9  | 
| Cx | ю 044E  | 
а 0430  | 
б 0431  | 
ц 0446  | 
д 0434  | 
е 0435  | 
ф 0444  | 
г 0433  | 
х 0445  | 
и 0438  | 
й 0439  | 
к 043A  | 
л 043B  | 
м 043C  | 
н 043D  | 
о 043E  | 
| Dx | п 043F  | 
я 044F  | 
р 0440  | 
с 0441  | 
т 0442  | 
у 0443  | 
ж 0436  | 
в 0432  | 
ь 044C  | 
ы 044B  | 
з 0437  | 
ш 0448  | 
э 044D  | 
щ 0449  | 
ч 0447  | 
ъ 044A  | 
| Ex | Ю 042E  | 
А 0410  | 
Б 0411  | 
Ц 0426  | 
Д 0414  | 
Е 0415  | 
Ф 0424  | 
Г 0413  | 
Х 0425  | 
И 0418  | 
Й 0419  | 
К 041A  | 
Л 041B  | 
М 041C  | 
Н 041D  | 
О 041E  | 
| Fx | П 041F  | 
Я 042F  | 
Р 0420  | 
С 0421  | 
Т 0422  | 
У 0423  | 
Ж 0416  | 
В 0412  | 
Ь 042C  | 
Ы 042B  | 
З 0417  | 
Ш 0428  | 
Э 042D  | 
Щ 0429  | 
Ч 0427  | 
Ъ 042A  | 
Although RFC 2319 says that character 0x95 should be U+2219 (∙), it may also be U+2022 (•) to match the bullet character in Windows-1251.
Some references have a typo and incorrectly state that character 0xB4 is U+0403, rather than the correct U+0404. This typo is present in Appendix A of RFC 2319 (but the table in the main text of the RFC gives the correct mapping).
See also
References
- 1 2 "SBCS code page information - CPGID: 01168 / Name: Ukrainian KOI8-U". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. IBM. C-H 3-3220-050. Archived from the original on 2017-02-18. Retrieved 2017-02-18.
 - ↑ "CCSID information document; CCSID 1168; KOI8-U". IBM. Archived from the original on 2017-02-18. Retrieved 2017-02-18.
 - ↑ International Components for Unicode (ICU), ibm-1168_P100-2002.ucm, 2002-12-03
 - ↑ Verdy, Philippe; Richter, Helmut (2016-01-04) [2008-10-13]. "KOI8-U.TXT". 2.0. Retrieved 2016-12-09.
 
Further reading
- Flohr, Guido (2016) [2006]. "Locale::RecodeData::KOI8_U - Conversion routines for KOI8-U". CPAN libintl-perl. 1.1. Archived from the original on 2017-01-15. Retrieved 2017-01-15.
 - RFC 2319
 - "KOI8-U (RFC 2319)". Kermit. Columbia University. Retrieved 2020-06-24.
 - Leishner, Mark (2008) [1999-12-20]. "KOI8-U Belorussian/Ukrainian Cyrillic to Unicode 2.1 mapping table - Based on RFC 2319". Department of Mathematical Sciences, New Mexico State University. Archived from the original on 2017-02-19. Retrieved 2017-02-19.
 - Kornai, Andras; Birnbaum, David J.; da Cruz, Frank; Davis, Bur; Fowler, George; Paine, Richard B.; Paperno, Slava; Simonsen, Keld J.; Thobe, Glenn E.; Vulis, Dimitri; van Wingen, Johan W. (1993-03-13). "CYRILLIC ENCODING FAQ Version 1.3". 1.3. Archived from the original on 2017-02-18. Retrieved 2020-06-24.
 
External links
- Czyborra, Roman (1998-11-30) [1998-05-25]. "The Cyrillic Charset Soup". Archived from the original on 2016-12-03. Retrieved 2016-12-03.
 - Hohlov, Yu. E. "Cyrillic Information Representation in Electronic Form - Character Set (Code Page) Tables". Archived from the original on 2016-12-05. Retrieved 2016-12-05.
 - Nechayev, Valentin (2013) [2001]. "Review of 8-bit Cyrillic encodings universe". Archived from the original on 2016-12-05. Retrieved 2016-12-05.
 - https://web.archive.org/web/20050206230944/http://www.net.ua/KOI8-U/