I'm currently recharging my batteries on the beach and soaking up the sun. But as much as I love the quiet, I can't quite ignore the call of coding, can I? So, I thought, why not blend the relaxation of summer with the thrill of exploring new coding concepts?
With that in mind, I'm thrilled to kick off a new short vacation series of newsletters - Sea (#), Sun, and Shells - that will combine the best of both worlds. We will explore fun and interesting topics, each with a unique summer twist!
Let's start with something we're seeing a lot of these days: the sun, or more specifically, the sun emoji ☀️.
Ever wondered how we store and use this emoji in our C# programs? That's what we're going to learn today - diving deep into the world of Unicode and the .NET System.Text.Rune
structure.
Unicode is a universal character encoding standard that represents almost all of the written languages of the world. It can accommodate over a million unique characters, which not only include letters from various languages but also symbols and emojis, like our sun emoji ☀️.
In the world of .NET, we have the System.Text.Rune
structure which was released in .NET Core 3.0. It represents a Unicode scalar value, covering the range of [U+0000..U+D7FF] and [U+E000..U+10FFFF), a Rune
is used when you want to handle a single Unicode scalar value, such as the sun emoji.
System.Char
in .NET is used to represent a UTF-16 code unit and a System.String
is a sequence of these UTF-16 code units. Because the Unicode standard is so vast, not all Unicode scalar values can fit into a single System.Char
. This is where Rune
comes in, efficiently handling any Unicode scalar value.
Take the sun emoji ☀️ for instance. It's represented as U+2600 in Unicode. As a scalar value, it doesn't fit into a char
(which has a Unicode range of U+0000 to U+FFFF), but is handled perfectly by the System.Text.Rune
struct.
Here's how you can declare a Rune
for the sun:
var sun = new Rune(0x2600);
And to convert it back to a string:
var sunString = sun.ToString();
Under the hood, System.Text.Rune
provides several key functionalities:
- Handles Unicode Scalar Values: It can handle any Unicode scalar value, representing and manipulating Unicode characters that the char data type cannot handle.
- Performs String Iteration and Validation: With methods like
Rune.DecodeFromUtf16
, it can accurately enumerate through a string, decode each Rune, and ensure that the string is a valid UTF-16 sequence. This is especially important for processing strings that include Unicode scalar values outside the Basic Multilingual Plane (BMP), which includes many emojis. - Represents Characters as Integers: A Rune is essentially an integer representing a Unicode scalar value, which is why you can create a new Rune by providing an integer in the constructor.
- Converts Strings:
System.Text.Rune
can convert a Rune back to a string using theToString
method, encoding the Rune into a sequence of one or two UTF-16 code units to process it as a regular string. - Checks Validity: `System.Text.Rune` includes various methods to check the validity of a Unicode scalar value. For example, the
IsValid
method can determine whether a specified code point is a valid Unicode scalar value.
This suNs it up for today! (pun intended, indeed)
Whether you're currently on vacation, looking forward to one, or reminiscing about a recent one, I hope this newsletter adds a dash of sunny coding to your day ☀️.
Stay tuned for the next issue of "Sea (#), Sun, and Shells", and remember - keep coding, even on the beach!