Knowledge Base » SMS Gateway » Character Sets and Encodings
Contents
- What character sets do you use? [6]
- What is the difference between UCS-2 and UTF-16? [25]
- What is the GSM 03.38 Character Set? [7]
- What is the Modified Latin-9 Character Set? [14]
What character sets do you use? [6]
Almost all mobile handsets support two character sets for SMS messages, the GSM 03.38 character set and Unicode UCS-2 (with code points appropriate to the locale).
In order to cover the GSM 03.38 character set, you can submit messages in Modified Latin-9 or directly in GSM03.38 (the GSM character set).
For more information on submitting messages to the SMS Gateway via HTTP, please look here.
What is the difference between UCS-2 and UTF-16? [25]
UCS-2 and UTF-16 are virtually identical, but UCS-2 characters will always take exactly 16 bits, so it is safe to increment and decrement by 16 bits for each character when parsing a message in your code.
What is the GSM 03.38 Character Set? [7]
The GSM 03.38 Character Set | ||||||||
---|---|---|---|---|---|---|---|---|
0× | 1× | 2× | 3× | 4× | 5× | 6× | 7× | |
×0 | @ | Δ | SP | 0 | ¡ | P | ¿ | p |
×1 | £ | _ | ! | 1 | A | Q | a | q |
×2 | $ | Φ | " | 2 | B | R | b | r |
×3 | ¥ | Γ | # | 3 | C | S | c | s |
×4 | è | Λ | ¤ | 4 | D | T | d | t |
×5 | é | Ω | % | 5 | E | U | e | u |
×6 | ù | Π | & | 6 | F | V | f | v |
×7 | ì | Ψ | ' | 7 | G | W | g | w |
×8 | ò | Σ | ( | 8 | H | X | h | x |
×9 | Ç | Θ | ) | 9 | I | Y | i | y |
×A | LF | Ξ | * | : | J | Z | j | z |
×B | Ø | ESC | + | ; | K | Ä | k | ä |
×C | ø | Æ | , | < | L | Ö | l | ö |
×D | CR | æ | - | = | M | Ñ | m | ñ |
×E | Å | ß | . | > | N | Ü | n | ü |
×F | å | É | / | ? | O | § | o | à |
GSM 03.38 Escaped Characters | ||
---|---|---|
Character | Escape Sequence | Hex |
€ | ESC e | 1B 65 |
FF | ESC LF | 1B 0A |
[ | ESC < | 1B 3C |
\ | ESC / | 1B 2F |
] | ESC > | 1B 3E |
^ | ESC Λ | 1B 14 |
{ | ESC ( | 1B 28 |
| | ESC ¡ | 1B 40 |
} | ESC ) | 1B 29 |
~ | ESC = | 1B 3D |
What is the Modified Latin-9 Character Set? [14]
This character set is based heavily on ISO-8859-15 (Latin-9). However, in order to fully cover the GSM Character set it adds some Greek letters between 0x80 and 0x8A and has a different character at 0xA8.
The characters not present in the GSM character set are shown on a grey background.
Modified Latin-9 | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0× | 1× | 2× | 3× | 4× | 5× | 6× | 7× | 8× | 9× | A× | B× | C× | D× | E× | F× | |
×0 | SP | 0 | @ | P | ` | p | Δ | NBSP | ° | À | Ð | à | ð | |||
×1 | ! | 1 | A | Q | a | q | ¡ | ± | Á | Ñ | á | ñ | ||||
×2 | " | 2 | B | R | b | r | Φ | ¢ | ² | Â | Ò | â | ò | |||
×3 | # | 3 | C | S | c | s | Γ | £ | ³ | Ã | Ó | ã | ó | |||
×4 | $ | 4 | D | T | d | t | Λ | € | Ž | Ä | Ô | ä | ô | |||
×5 | % | 5 | E | U | e | u | Ω | ¥ | µ | Å | Õ | å | õ | |||
×6 | & | 6 | F | V | f | v | Π | Š | ¶ | Æ | Ö | æ | ö | |||
×7 | ' | 7 | G | W | g | w | Ψ | § | · | Ç | × | ç | ÷ | |||
×8 | ( | 8 | H | X | h | x | Σ | ¤ | ž | È | Ø | è | ø | |||
×9 | ) | 9 | I | Y | i | y | Θ | © | ¹ | É | Ù | é | ù | |||
×A | LF | * | : | J | Z | j | z | Ξ | ª | º | Ê | Ú | ê | ú | ||
×B | + | ; | K | [ | k | { | « | » | Ë | Û | ë | û | ||||
×C | FF | , | < | L | \ | l | | | ¬ | Œ | Ì | Ü | ì | ü | |||
×D | CR | - | = | M | ] | m | } | SHY | œ | Í | Ý | í | ý | |||
×E | . | > | N | ^ | n | ~ | ® | Ÿ | Î | Þ | î | þ | ||||
×F | / | ? | O | _ | o | ¯ | ¿ | Ï | ß | ï | ÿ |