Knowledge Base » SMS Gateway » Character Sets and Encodings
Contents
- What character sets do you use? [6]
- What is the difference between UCS-2 and UTF-16? [25]
- What is the GSM 03.38 Character Set? [7]
- What is the Modified Latin-9 Character Set? [14]
What character sets do you use? [6]
Almost all mobile handsets support two character sets for SMS messages, the GSM 03.38 character set and Unicode UCS-2 (with code points appropriate to the locale).
In order to cover the GSM 03.38 character set, you can submit messages in Modified Latin-9 or directly in GSM03.38 (the GSM character set).
For more information on submitting messages to the SMS Gateway via HTTP, please look here.
What is the difference between UCS-2 and UTF-16? [25]
UCS-2 and UTF-16 are virtually identical, but UCS-2 characters will always take exactly 16 bits, so it is safe to increment and decrement by 16 bits for each character when parsing a message in your code.
What is the GSM 03.38 Character Set? [7]
| The GSM 03.38 Character Set | ||||||||
|---|---|---|---|---|---|---|---|---|
| 0× | 1× | 2× | 3× | 4× | 5× | 6× | 7× | |
| ×0 | @ | Δ | SP | 0 | ¡ | P | ¿ | p |
| ×1 | £ | _ | ! | 1 | A | Q | a | q |
| ×2 | $ | Φ | " | 2 | B | R | b | r |
| ×3 | ¥ | Γ | # | 3 | C | S | c | s |
| ×4 | è | Λ | ¤ | 4 | D | T | d | t |
| ×5 | é | Ω | % | 5 | E | U | e | u |
| ×6 | ù | Π | & | 6 | F | V | f | v |
| ×7 | ì | Ψ | ' | 7 | G | W | g | w |
| ×8 | ò | Σ | ( | 8 | H | X | h | x |
| ×9 | Ç | Θ | ) | 9 | I | Y | i | y |
| ×A | LF | Ξ | * | : | J | Z | j | z |
| ×B | Ø | ESC | + | ; | K | Ä | k | ä |
| ×C | ø | Æ | , | < | L | Ö | l | ö |
| ×D | CR | æ | - | = | M | Ñ | m | ñ |
| ×E | Å | ß | . | > | N | Ü | n | ü |
| ×F | å | É | / | ? | O | § | o | à |
| GSM 03.38 Escaped Characters | ||
|---|---|---|
| Character | Escape Sequence | Hex |
| € | ESC e | 1B 65 |
| FF | ESC LF | 1B 0A |
| [ | ESC < | 1B 3C |
| \ | ESC / | 1B 2F |
| ] | ESC > | 1B 3E |
| ^ | ESC Λ | 1B 14 |
| { | ESC ( | 1B 28 |
| | | ESC ¡ | 1B 40 |
| } | ESC ) | 1B 29 |
| ~ | ESC = | 1B 3D |
What is the Modified Latin-9 Character Set? [14]
This character set is based heavily on ISO-8859-15 (Latin-9). However, in order to fully cover the GSM Character set it adds some Greek letters between 0x80 and 0x8A and has a different character at 0xA8.
The characters not present in the GSM character set are shown on a grey background.
| Modified Latin-9 | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0× | 1× | 2× | 3× | 4× | 5× | 6× | 7× | 8× | 9× | A× | B× | C× | D× | E× | F× | |
| ×0 | SP | 0 | @ | P | ` | p | Δ | NBSP | ° | À | Ð | à | ð | |||
| ×1 | ! | 1 | A | Q | a | q | ¡ | ± | Á | Ñ | á | ñ | ||||
| ×2 | " | 2 | B | R | b | r | Φ | ¢ | ² | Â | Ò | â | ò | |||
| ×3 | # | 3 | C | S | c | s | Γ | £ | ³ | Ã | Ó | ã | ó | |||
| ×4 | $ | 4 | D | T | d | t | Λ | € | Ž | Ä | Ô | ä | ô | |||
| ×5 | % | 5 | E | U | e | u | Ω | ¥ | µ | Å | Õ | å | õ | |||
| ×6 | & | 6 | F | V | f | v | Π | Š | ¶ | Æ | Ö | æ | ö | |||
| ×7 | ' | 7 | G | W | g | w | Ψ | § | · | Ç | × | ç | ÷ | |||
| ×8 | ( | 8 | H | X | h | x | Σ | ¤ | ž | È | Ø | è | ø | |||
| ×9 | ) | 9 | I | Y | i | y | Θ | © | ¹ | É | Ù | é | ù | |||
| ×A | LF | * | : | J | Z | j | z | Ξ | ª | º | Ê | Ú | ê | ú | ||
| ×B | + | ; | K | [ | k | { | « | » | Ë | Û | ë | û | ||||
| ×C | FF | , | < | L | \ | l | | | ¬ | Œ | Ì | Ü | ì | ü | |||
| ×D | CR | - | = | M | ] | m | } | SHY | œ | Í | Ý | í | ý | |||
| ×E | . | > | N | ^ | n | ~ | ® | Ÿ | Î | Þ | î | þ | ||||
| ×F | / | ? | O | _ | o | ¯ | ¿ | Ï | ß | ï | ÿ | |||||

