2013-11-06 11:17:51 +00:00
|
|
|
---
|
|
|
|
layout: post
|
|
|
|
title: Unicode codepoints in ruby
|
|
|
|
date: 06.11.2013 12:04
|
|
|
|
---
|
|
|
|
Another post of the category "better write it down before you forget it".
|
|
|
|
|
|
|
|
I ❤ Unicode. Atleast most of the time. That's why I have things like ✓, ✗ and
|
|
|
|
ツ mapped directly on my keyboard.
|
|
|
|
|
|
|
|
But sometimes you need not only the symbol itself, but maybe the codepoint as well. That's easy in ruby:
|
|
|
|
|
2013-11-06 11:46:19 +00:00
|
|
|
~~~ruby
|
|
|
|
irb> "❤".codepoints
|
|
|
|
=> [10084]
|
|
|
|
~~~
|
2013-11-06 11:17:51 +00:00
|
|
|
|
|
|
|
Got some codepoints and need to map it back to it's symbol? Easy:
|
|
|
|
|
2013-11-06 11:46:19 +00:00
|
|
|
~~~ruby
|
|
|
|
irb> [10084, 10003].pack("U*")
|
|
|
|
=> "❤✓"
|
|
|
|
~~~
|
2013-11-06 11:17:51 +00:00
|
|
|
|
|
|
|
Oh, of course the usual `\uXYZ` syntax works aswell, but you need the hexstring for that:
|
|
|
|
|
2013-11-06 11:46:19 +00:00
|
|
|
~~~ruby
|
|
|
|
irb> 10084.to_s 16
|
|
|
|
=> "2764"
|
|
|
|
irb> "\u{2764}"
|
|
|
|
=> "❤"
|
|
|
|
~~~
|
2013-11-06 11:17:51 +00:00
|
|
|
|
|
|
|
Sometimes you may need to see the actual bytes. This is easy in ruby aswell:
|
|
|
|
|
2013-11-06 11:46:19 +00:00
|
|
|
~~~ruby
|
|
|
|
irb> "❤".bytes
|
|
|
|
=> [226, 157, 164]
|
|
|
|
~~~
|
2013-11-06 11:17:51 +00:00
|
|
|
|
|
|
|
There is documentation on these things:
|
|
|
|
|
|
|
|
* [each_codepoint][]
|
|
|
|
* [codepoints][]
|
|
|
|
* [bytes][]
|
|
|
|
|
|
|
|
Enjoy the world of unicode! [❤][unicode-heart]
|
|
|
|
|
|
|
|
[each_codepoint]: http://www.ruby-doc.org/core-2.0.0/String.html#method-i-each_codepoint
|
|
|
|
[codepoints]: http://www.ruby-doc.org/core-2.0.0/String.html#method-i-codepoints
|
|
|
|
[bytes]: http://www.ruby-doc.org/core-2.0.0/String.html#method-i-bytes
|
|
|
|
[unicode-heart]: http://codepoints.net/U+2764
|