Knowledge Base » SMS Gateway » Character Sets and Encodings

Contents

What character sets do you use? [6]

Almost all mobile handsets support two character sets for SMS messages, the GSM 03.38 character set and Unicode UCS-2 (with code points appropriate to the locale).

In order to cover the GSM 03.38 character set, you can submit messages in Modified Latin-9 or directly in GSM03.38 (the GSM character set).

For more information on submitting messages to the SMS Gateway via HTTP, please look here.

What is the difference between UCS-2 and UTF-16? [25]

UCS-2 and UTF-16 are virtually identical, but UCS-2 characters will always take exactly 16 bits, so it is safe to increment and decrement by 16 bits for each character when parsing a message in your code.

What is the GSM 03.38 Character Set? [7]

The GSM 03.38 Character Set
0 1 2 3 4 5 6 7
0 @ Δ SP 0 P p
1 _ ! 1 A Q a q
2 $ Φ " 2 B R b r
3 Γ # 3 C S c s
4 Λ 4 D T d t
5 Ω % 5 E U e u
6 Π & 6 F V f v
7 Ψ ' 7 G W g w
8 Σ ( 8 H X h x
9 Θ ) 9 I Y i y
A LF Ξ * : J Z j z
B ESC + ; K k
C , < L l
D CR - = M m
E . > N n
F / ? O o

GSM 03.38 Escaped Characters
Character Escape Sequence Hex
ESC e 1B 65
FF ESC LF 1B 0A
[ ESC < 1B 3C
\ ESC / 1B 2F
] ESC > 1B 3E
^ ESC Λ 1B 14
{ ESC ( 1B 28
| ESC 1B 40
} ESC ) 1B 29
~ ESC = 1B 3D

What is the Modified Latin-9 Character Set? [14]

This character set is based heavily on ISO-8859-15 (Latin-9). However, in order to fully cover the GSM Character set it adds some Greek letters between 0x80 and 0x8A and has a different character at 0xA8.

The characters not present in the GSM character set are shown on a grey background.


Modified Latin-9
0 1 2 3 4 5 6 7 8 9 A B C D E F
0 SP 0 @ P ` p Δ NBSP
1 ! 1 A Q a q
2 " 2 B R b r Φ
3 # 3 C S c s Γ
4 $ 4 D T d t Λ
5 % 5 E U e u Ω
6 & 6 F V f v Π
7 ' 7 G W g w Ψ
8 ( 8 H X h x Σ
9 ) 9 I Y i y Θ
A LF * : J Z j z Ξ
B + ; K [ k {
C FF , < L \ l |
D CR - = M ] m } SHY
E . > N ^ n ~
F / ? O _ o