String Class
Definition
Represents text as a sequence of UTF-16 code units.
[System.Runtime.InteropServices.ComVisible(true)]
public sealed class String : ICloneable, IComparable, IComparable<string>, IConvertible, IEquatable<string>, System.Collections.Generic.IEnumerable<char>
- Inheritance
-
String
- Attributes
- Implements
Inherited Members
System.Object
Remarks
Note
To view the .NET Framework source code for this type, see the Reference Source. You can browse through the source code online, download the reference for offline viewing, and step through the sources (including patches and updates) during debugging; see instructions.
A string is a sequential collection of characters that is used to represent text. A String object is a sequential collection of System.Char objects that represent a string; a System.Char object corresponds to a UTF-16 code unit. The value of the String object is the content of the sequential collection of System.Char objects, and that value is immutable (that is, it is read-only). For more information about the immutability of strings, see the Immutability and the StringBuilder class section later in this topic. The maximum size of a String object in memory is 2GB, or about 1 billion characters.
In this section:
Instantiating a String object
Char objects and Unicode characters
Strings and The Unicode Standard
Strings and embedded null characters
Strings and indexes
Null strings and empty strings
Immutability and the StringBuilder class
Ordinal vs. culture-sensitive operations
Normalization
String operations by category
Instantiating a String object
You can instantiate a String object in the following ways:
By assigning a string literal to a String variable. This is the most commonly used method for creating a string. The following example uses assignment to create several strings. Note that in C#, because the backslash (\) is an escape character, literal backslashes in a string must be escaped or the entire string must be @-quoted.
using namespace System; void main() { String^ string1 = "This is a string created by assignment."; Console::WriteLine(string1); String^ string2a = "The path is C:\\PublicDocuments\\Report1.doc"; Console::WriteLine(string2a); } // The example displays the following output: // This is a string created by assignment. // The path is C:\PublicDocuments\Report1.doc
string string1 = "This is a string created by assignment."; Console.WriteLine(string1); string string2a = "The path is C:\\PublicDocuments\\Report1.doc"; Console.WriteLine(string2a); string string2b = @"The path is C:\PublicDocuments\Report1.doc"; Console.WriteLine(string2b); // The example displays the following output: // This is a string created by assignment. // The path is C:\PublicDocuments\Report1.doc // The path is C:\PublicDocuments\Report1.doc
Dim string1 As String = "This is a string created by assignment." Console.WriteLine(string1) Dim string2 As String = "The path is C:\PublicDocuments\Report1.doc" Console.WriteLine(string2) ' The example displays the following output: ' This is a string created by assignment. ' The path is C:\PublicDocuments\Report1.doc
By calling a String class constructor. The following example instantiates strings by calling several class constructors. Note that some of the constructors include pointers to character arrays or signed byte arrays as parameters. Visual Basic does not support calls to these constructors. For detailed information about String constructors, see the String constructor summary.
using namespace System; void main() { wchar_t chars[5] = L"word"; char bytes[6] = { 0x41, 0x42, 0x43, 0x44, 0x45, 0x00 }; // Create a string from a character array. String^ string1 = gcnew String(chars); Console::WriteLine(string1); // Create a string that consists of a character repeated 20 times. String^ string2 = gcnew String('c', 20); Console::WriteLine(string2); String^ stringFromBytes = nullptr; String^ stringFromChars = nullptr; char * pbytes = &bytes[0]; // Create a string from a pointer to a signed byte array. stringFromBytes = gcnew String(pbytes); wchar_t* pchars = &chars[0]; // Create a string from a pointer to a character array. stringFromChars = gcnew String(pchars); Console::WriteLine(stringFromBytes); Console::WriteLine(stringFromChars); Console::ReadLine(); } // The example displays the following output: // word // cccccccccccccccccccc // ABCDE // word
char[] chars = { 'w', 'o', 'r', 'd' }; sbyte[] bytes = { 0x41, 0x42, 0x43, 0x44, 0x45, 0x00 }; // Create a string from a character array. string string1 = new string(chars); Console.WriteLine(string1); // Create a string that consists of a character repeated 20 times. string string2 = new string('c', 20); Console.WriteLine(string2); string stringFromBytes = null; string stringFromChars = null; unsafe { fixed (sbyte* pbytes = bytes) { // Create a string from a pointer to a signed byte array. stringFromBytes = new string(pbytes); } fixed (char* pchars = chars) { // Create a string from a pointer to a character array. stringFromChars = new string(pchars); } } Console.WriteLine(stringFromBytes); Console.WriteLine(stringFromChars); // The example displays the following output: // word // cccccccccccccccccccc // ABCDE // word
Dim chars() As Char = { "w"c, "o"c, "r"c, "d"c } ' Create a string from a character array. Dim string1 As New String(chars) Console.WriteLine(string1) ' Create a string that consists of a character repeated 20 times. Dim string2 As New String("c"c, 20) Console.WriteLine(string2) ' The example displays the following output: ' word ' cccccccccccccccccccc
By using the string concatenation operator (+ in C# and & or + in Visual Basic) to create a single string from any combination of String instances and string literals. The following example illustrates the use of the string concatenation operator.
String^ string1 = "Today is " + DateTime::Now.ToString("D") + "."; Console::WriteLine(string1); String^ string2 = "This is one sentence. " + "This is a second. "; string2 += "This is a third sentence."; Console::WriteLine(string2); // The example displays output like the following: // Today is Tuesday, July 06, 2011. // This is one sentence. This is a second. This is a third sentence.
string string1 = "Today is " + DateTime.Now.ToString("D") + "."; Console.WriteLine(string1); string string2 = "This is one sentence. " + "This is a second. "; string2 += "This is a third sentence."; Console.WriteLine(string2); // The example displays output like the following: // Today is Tuesday, July 06, 2011. // This is one sentence. This is a second. This is a third sentence.
Dim string1 As String = "Today is " + Date.Now.ToString("D") + "." Console.WriteLine(string1) Dim string2 As String = "This is one sentence. " + "This is a second. " string2 += "This is a third sentence." Console.WriteLine(string2) ' The example displays output like the following: ' Today is Tuesday, July 06, 2011. ' This is one sentence. This is a second. This is a third sentence.
By retrieving a property or calling a method that returns a string. The following example uses the methods of the String class to extract a substring from a larger string.
String^ sentence = "This sentence has five words."; // Extract the second word. int startPosition = sentence->IndexOf(" ") + 1; String^ word2 = sentence->Substring(startPosition, sentence->IndexOf(" ", startPosition) - startPosition); Console::WriteLine("Second word: " + word2);
string sentence = "This sentence has five words."; // Extract the second word. int startPosition = sentence.IndexOf(" ") + 1; string word2 = sentence.Substring(startPosition, sentence.IndexOf(" ", startPosition) - startPosition); Console.WriteLine("Second word: " + word2); // The example displays the following output: // Second word: sentence
Dim sentence As String = "This sentence has five words." ' Extract the second word. Dim startPosition As Integer = sentence.IndexOf(" ") + 1 Dim word2 As String = sentence.Substring(startPosition, sentence.IndexOf(" ", startPosition) - startPosition) Console.WriteLine("Second word: " + word2) ' The example displays the following output: ' Second word: sentence
By calling a formatting method to convert a value or object to its string representation. The following example uses the composite formatting feature to embed the string representation of two objects into a string.
DateTime^ dateAndTime = gcnew DateTime(2011, 7, 6, 7, 32, 0); Double temperature = 68.3; String^ result = String::Format("At {0:t} on {0:D}, the temperature was {1:F1} degrees Fahrenheit.", dateAndTime, temperature); Console::WriteLine(result); // The example displays the following output: // At 7:32 AM on Wednesday, July 06, 2011, the temperature was 68.3 degrees Fahrenheit.
DateTime dateAndTime = new DateTime(2011, 7, 6, 7, 32, 0); double temperature = 68.3; string result = String.Format("At {0:t} on {0:D}, the temperature was {1:F1} degrees Fahrenheit.", dateAndTime, temperature); Console.WriteLine(result); // The example displays the following output: // At 7:32 AM on Wednesday, July 06, 2011, the temperature was 68.3 degrees Fahrenheit.
Dim dateAndTime As DateTime = #07/06/2011 7:32:00AM# Dim temperature As Double = 68.3 Dim result As String = String.Format("At {0:t} on {0:D}, the temperature was {1:F1} degrees Fahrenheit.", dateAndTime, temperature) Console.WriteLine(result) ' The example displays the following output: ' At 7:32 AM on Wednesday, July 06, 2011, the temperature was 68.3 degrees Fahrenheit.
Char objects and Unicode characters
Each character in a string is defined by a Unicode scalar value, also called a Unicode code point or the ordinal (numeric) value of the Unicode character. Each code point is encoded by using UTF-16 encoding, and the numeric value of each element of the encoding is represented by a Char object.
Note
Note that, because a String instance consists of a sequential collection of UTF-16 code units, it is possible to create a String object that is not a well-formed Unicode string. For example, it is possible to create a string that has a low surrogate without a corresponding high surrogate. Although some methods, such as the methods of encoding and decoding objects in the System.Text namespace, may performs checks to ensure that strings are well-formed, String class members do not ensure that a string is well-formed.
A single Char object usually represents a single code point; that is, the numeric value of the Char equals the code point. For example, the code point for the character "a" is U+0061. However, a code point might require more than one encoded element (more than one Char object). The Unicode standard defines two types of characters that correspond to multiple Char objects: graphemes, and Unicode supplementary code points that correspond to characters in the Unicode supplementary planes.
A grapheme is represented by a base character followed by one or more combining characters. For example, the character ä is represented by a Char object whose code point is U+0061 followed by a Char object whose code point is U+0308. This character can also be defined by a single Char object that has a code point of U+00E4. As the following example shows, a culture-sensitive comparison for equality indicates that these two representations are equal, although an ordinary ordinal comparison does not. However, if the two strings are normalized, an ordinal comparison also indicates that they are equal. (For more information on normalizing strings, see the Normalization section.)
using namespace System; using namespace System::Globalization; using namespace System::IO; void main() { StreamWriter^ sw = gcnew StreamWriter(".\\graphemes.txt"); String^ grapheme = L"a" + L"\u0308"; sw->WriteLine(grapheme); String^ singleChar = "\u00e4"; sw->WriteLine(singleChar); sw->WriteLine("{0} = {1} (Culture-sensitive): {2}", grapheme, singleChar, String::Equals(grapheme, singleChar, StringComparison::CurrentCulture)); sw->WriteLine("{0} = {1} (Ordinal): {2}", grapheme, singleChar, String::Equals(grapheme, singleChar, StringComparison::Ordinal)); sw->WriteLine("{0} = {1} (Normalized Ordinal): {2}", grapheme, singleChar, String::Equals(grapheme->Normalize(), singleChar->Normalize(), StringComparison::Ordinal)); sw->Close(); } // The example produces the following output: // ä // ä // ä = ä (Culture-sensitive): True // ä = ä (Ordinal): False // ä = ä (Normalized Ordinal): True
using System; using System.Globalization; using System.IO; public class Example { public static void Main() { StreamWriter sw = new StreamWriter(@".\graphemes.txt"); string grapheme = "\u0061\u0308"; sw.WriteLine(grapheme); string singleChar = "\u00e4"; sw.WriteLine(singleChar); sw.WriteLine("{0} = {1} (Culture-sensitive): {2}", grapheme, singleChar, String.Equals(grapheme, singleChar, StringComparison.CurrentCulture)); sw.WriteLine("{0} = {1} (Ordinal): {2}", grapheme, singleChar, String.Equals(grapheme, singleChar, StringComparison.Ordinal)); sw.WriteLine("{0} = {1} (Normalized Ordinal): {2}", grapheme, singleChar, String.Equals(grapheme.Normalize(), singleChar.Normalize(), StringComparison.Ordinal)); sw.Close(); } } // The example produces the following output: // ä // ä // ä = ä (Culture-sensitive): True // ä = ä (Ordinal): False // ä = ä (Normalized Ordinal): True
Imports System.Globalization Imports System.IO Module Example Public Sub Main() Dim sw As New StreamWriter(".\graphemes.txt") Dim grapheme As String = ChrW(&H0061) + ChrW(&h0308) sw.WriteLine(grapheme) Dim singleChar As String = ChrW(&h00e4) sw.WriteLine(singleChar) sw.WriteLine("{0} = {1} (Culture-sensitive): {2}", grapheme, singleChar, String.Equals(grapheme, singleChar, StringComparison.CurrentCulture)) sw.WriteLine("{0} = {1} (Ordinal): {2}", grapheme, singleChar, String.Equals(grapheme, singleChar, StringComparison.Ordinal)) sw.WriteLine("{0} = {1} (Normalized Ordinal): {2}", grapheme, singleChar, String.Equals(grapheme.Normalize(), singleChar.Normalize(), StringComparison.Ordinal)) sw.Close() End Sub End Module ' The example produces the following output: ' ä ' ä ' ä = ä (Culture-sensitive): True ' ä = ä (Ordinal): False ' ä = ä (Normalized Ordinal): True
A Unicode supplementary code point (a surrogate pair) is represented by a Char object whose code point is a high surrogate followed by a Char object whose code point is a low surrogate. The code units of high surrogates range from U+D800 to U+DBFF. The code units of low surrogates range from U+DC00 to U+DFFF. Surrogate pairs are used to represent characters in the 16 Unicode supplementary planes. The following example creates a surrogate character and passes it to the System.Char.IsSurrogatePair(Char, Char) method to determine whether it is a surrogate pair.
using namespace System; void main() { String^ surrogate = L"\xD800\xDC03" ; for (int ctr = 0; ctr < surrogate->Length; ctr++) Console::Write("U+{0:X4} ", Convert::ToUInt16(surrogate[ctr])); Console::WriteLine(); Console::WriteLine(" Is Surrogate Pair: {0}", Char::IsSurrogatePair(surrogate[0], surrogate[1])); Console::ReadLine(); } // The example displays the following output: // U+D800 U+DC03 // Is Surrogate Pair: True
using System; public class Example { public static void Main() { string surrogate = "\uD800\uDC03"; for (int ctr = 0; ctr < surrogate.Length; ctr++) Console.Write("U+{0:X2} ", Convert.ToUInt16(surrogate[ctr])); Console.WriteLine(); Console.WriteLine(" Is Surrogate Pair: {0}", Char.IsSurrogatePair(surrogate[0], surrogate[1])); } } // The example displays the following output: // U+D800 U+DC03 // Is Surrogate Pair: True
Module Example Public Sub Main() Dim surrogate As String = ChrW(&hD800) + ChrW(&hDC03) For ctr As Integer = 0 To surrogate.Length - 1 Console.Write("U+{0:X2} ", Convert.ToUInt16(surrogate(ctr))) Next Console.WriteLine() Console.WriteLine(" Is Surrogate Pair: {0}", Char.IsSurrogatePair(surrogate(0), surrogate(1))) End Sub End Module ' The example displays the following output: ' U+D800 U+DC03 ' Is Surrogate Pair: True
Strings and The Unicode Standard
Characters in a string are represented by UTF-16 encoded code units, which correspond to Char values.
Each character in a string has an associated Unicode character category, which is represented in the .NET Framework by the UnicodeCategory enumeration. The category of a character or a surrogate pair can be determined by calling the System.Globalization.CharUnicodeInfo.GetUnicodeCategory method.
The .NET Framework maintains its own table of characters and their corresponding categories, which ensures that a version of the .NET Framework running on different platforms returns identical character category information. The following table lists the versions of the .NET Framework and the versions of the Unicode Standard on which their character categories are based.
.NET Framework version | Version of the Unicode Standard |
---|---|
.NET Framework 1.1 | The Unicode Standard, Version 4.0.0 |
The .NET Framework 2.0 | The Unicode Standard, Version 5.0.0 |
.NET Framework 3.5 | The Unicode Standard, Version 5.0.0 |
.NET Framework 4 | The Unicode Standard, Version 5.0.0 |
.NET Framework 4.5 | The Unicode Standard, Version 6.3.0 |
.NET Framework 4.5.1 | The Unicode Standard, Version 6.3.0 |
.NET Framework 4.5.2 | The Unicode Standard, Version 6.3.0 |
.NET Framework 4.6 | The Unicode Standard, Version 6.3.0 |
.NET Framework 4.6.1 | The Unicode Standard, Version 6.3.0 |
.NET Framework 4.6.2 | The Unicode Standard, Version 8.0.0 |
In addition, the .NET Framework supports string comparison and sorting based on the Unicode standard. In versions of the .NET Framework through the .NET Framework 4, the .NET Framework maintains its own table of string data. This is also true of versions of the .NET Framework starting with the .NET Framework 4.5 running on Windows 7. Starting with the .NET Framework 4.5 running on Window 8 and later versions of the Windows operating system, the runtime delegates string comparison and sorting operations to the operating system. The following table lists the versions of the .NET Framework and the versions of the Unicode Standard on which character comparison and sorting are based.
.NET Framework version | Version of the Unicode Standard |
---|---|
.NET Framework 1.1 | The Unicode Standard, Version 4.0.0 |
The .NET Framework 2.0 | The Unicode Standard, Version 5.0.0 |
.NET Framework 3.5 | The Unicode Standard, Version 5.0.0 |
.NET Framework 4 | The Unicode Standard, Version 5.0.0 |
.NET Framework 4.5 and later on Windows 7 | The Unicode Standard, Version 5.0.0 |
.NET Framework 4.5 and later on Windows 8 and later Windows operating systems | The Unicode Standard, Version 6.3.0 |
Strings and embedded null characters
In the .NET Framework, a String object can include embedded null characters, which count as a part of the string's length. However, in some languages such as C and C++, a null character indicates the end of a string;it is not considered a part of the string and is not counted as part of the string's length. This means that the following common assumptions that C and C++ programmers or libraries written in C or C++ might make about strings are not necessarily valid when applied to String objects:
The value returned by the
strlen
orwcslen
functions does not necessarily equal System.String.Length.The string created by the
strcpy_s
orwcscpy_s
functions is not necessarily identical to the string created by the System.String.Copy method.
You should ensure that native C and C++ code that instantiates String objects, and code that is passed String objects through platform invoke, do not assume that an embedded null character marks the end of the string.
Embedded null characters in a string are also treated differently when a string is sorted (or compared) and when a string is searched. Null characters are ignored when performing culture-sensitive comparisons between two strings, including comparisons using the invariant culture. They are considered only for ordinal or case-insensitive ordinal comparisons. On the other hand, embedded null characters are always considered when searching a string with methods such as Contains, StartsWith, and IndexOf.
Strings and indexes
An index is the position of a Char object (not a Unicode character) in a String. An index is a zero-based, nonnegative number that starts from the first position in the string, which is index position zero. A number of search methods, such as IndexOf and LastIndexOf, return the index of a character or substring in the string instance.
The Chars[Int32] property lets you access individual Char objects by their index position in the string. Because the Chars[Int32] property is the default property (in Visual Basic) or the indexer (in C#), you can access the individual Char objects in a string by using code such as the following. This code looks for white space or punctuation characters in a string to determine how many words the string contains.
using namespace System;
void main()
{
String^ s1 = "This string consists of a single short sentence.";
int nWords = 0;
s1 = s1->Trim();
for (int ctr = 0; ctr < s1->Length; ctr++) {
if (Char::IsPunctuation(s1[ctr]) | Char::IsWhiteSpace(s1[ctr]))
nWords++;
}
Console::WriteLine("The sentence\n {0}\nhas {1} words.",
s1, nWords);
}
// The example displays the following output:
// The sentence
// This string consists of a single short sentence.
// has 8 words.
using System;
public class Example
{
public static void Main()
{
string s1 = "This string consists of a single short sentence.";
int nWords = 0;
s1 = s1.Trim();
for (int ctr = 0; ctr < s1.Length; ctr++) {
if (Char.IsPunctuation(s1[ctr]) | Char.IsWhiteSpace(s1[ctr]))
nWords++;
}
Console.WriteLine("The sentence\n {0}\nhas {1} words.",
s1, nWords);
}
}
// The example displays the following output:
// The sentence
// This string consists of a single short sentence.
// has 8 words.
Module Example
Public Sub Main()
Dim s1 As String = "This string consists of a single short sentence."
Dim nWords As Integer = 0
s1 = s1.Trim()
For ctr As Integer = 0 To s1.Length - 1
If Char.IsPunctuation(s1(ctr)) Or Char.IsWhiteSpace(s1(ctr))
nWords += 1
End If
Next
Console.WriteLine("The sentence{2} {0}{2}has {1} words.",
s1, nWords, vbCrLf)
End Sub
End Module
' The example displays the following output:
' The sentence
' This string consists of a single short sentence.
' has 8 words.
Because the String class implements the IEnumerable interface, you can also iterate through the Char objects in a string by using a foreach
construct, as the following example shows.
using namespace System;
void main()
{
String^ s1 = "This string consists of a single short sentence.";
int nWords = 0;
s1 = s1->Trim();
for each (Char ch in s1)
{
if (Char::IsPunctuation(ch) | Char::IsWhiteSpace(ch))
nWords++;
}
Console::WriteLine("The sentence\n {0}\nhas {1} words.",
s1, nWords);
Console::ReadLine();
}
// The example displays the following output:
// The sentence
// This string consists of a single short sentence.
// has 8 words.
using System;
public class Example
{
public static void Main()
{
string s1 = "This string consists of a single short sentence.";
int nWords = 0;
s1 = s1.Trim();
foreach (var ch in s1) {
if (Char.IsPunctuation(ch) | Char.IsWhiteSpace(ch))
nWords++;
}
Console.WriteLine("The sentence\n {0}\nhas {1} words.",
s1, nWords);
}
}
// The example displays the following output:
// The sentence
// This string consists of a single short sentence.
// has 8 words.
Module Example
Public Sub Main()
Dim s1 As String = "This string consists of a single short sentence."
Dim nWords As Integer = 0
s1 = s1.Trim()
For Each ch In s1
If Char.IsPunctuation(ch) Or Char.IsWhiteSpace(ch) Then
nWords += 1
End If
Next
Console.WriteLine("The sentence{2} {0}{2}has {1} words.",
s1, nWords, vbCrLf)
End Sub
End Module
' The example displays the following output:
' The sentence
' This string consists of a single short sentence.
' has 8 words.
Consecutive index values might not correspond to consecutive Unicode characters, because a Unicode character might be encoded as more than one Char object. In particular, a string may contain multi-character units of text that are formed by a base character followed by one or more combining characters or by surrogate pairs. To work with Unicode characters instead of Char objects, use the System.Globalization.StringInfo and TextElementEnumerator classes. The following example illustrates the difference between code that works with Char objects and code that works with Unicode characters. It compares the number of characters or text elements in each word of a sentence. The string includes two sequences of a base character followed by a combining character.
using namespace System;
using namespace System::Collections::Generic;
using namespace System::Globalization;
void main()
{
// First sentence of The Mystery of the Yellow Room, by Leroux.
String^ opening = L"Ce n'est pas sans une certaine émotion que "+
L"je commence à raconter ici les aventures " +
L"extraordinaires de Joseph Rouletabille.";
// Character counters.
int nChars = 0;
// Objects to store word count.
List<int>^ chars = gcnew List<int>();
List<int>^ elements = gcnew List<int>();
for each (Char ch in opening) {
// Skip the ' character.
if (ch == '\x0027') continue;
if (Char::IsWhiteSpace(ch) | (Char::IsPunctuation(ch))) {
chars->Add(nChars);
nChars = 0;
}
else {
nChars++;
}
}
TextElementEnumerator^ te = StringInfo::GetTextElementEnumerator(opening);
while (te->MoveNext()) {
String^ s = te->GetTextElement();
// Skip the ' character.
if (s == "\x0027") continue;
if ( String::IsNullOrEmpty(s->Trim()) | (s->Length == 1 && Char::IsPunctuation(Convert::ToChar(s)))) {
elements->Add(nChars);
nChars = 0;
}
else {
nChars++;
}
}
// Display character counts.
Console::WriteLine("{0,6} {1,20} {2,20}",
"Word #", "Char Objects", "Characters");
for (int ctr = 0; ctr < chars->Count; ctr++)
Console::WriteLine("{0,6} {1,20} {2,20}",
ctr, chars[ctr], elements[ctr]);
Console::ReadLine();
}
// The example displays the following output:
// Word # Char Objects Characters
// 0 2 2
// 1 4 4
// 2 3 3
// 3 4 4
// 4 3 3
// 5 8 8
// 6 8 7
// 7 3 3
// 8 2 2
// 9 8 8
// 10 2 1
// 11 8 8
// 12 3 3
// 13 3 3
// 14 9 9
// 15 15 15
// 16 2 2
// 17 6 6
// 18 12 12
using System;
using System.Collections.Generic;
using System.Globalization;
public class Example
{
public static void Main()
{
// First sentence of The Mystery of the Yellow Room, by Leroux.
string opening = "Ce n'est pas sans une certaine émotion que "+
"je commence à raconter ici les aventures " +
"extraordinaires de Joseph Rouletabille.";
// Character counters.
int nChars = 0;
// Objects to store word count.
List<int> chars = new List<int>();
List<int> elements = new List<int>();
foreach (var ch in opening) {
// Skip the ' character.
if (ch == '\u0027') continue;
if (Char.IsWhiteSpace(ch) | (Char.IsPunctuation(ch))) {
chars.Add(nChars);
nChars = 0;
}
else {
nChars++;
}
}
TextElementEnumerator te = StringInfo.GetTextElementEnumerator(opening);
while (te.MoveNext()) {
string s = te.GetTextElement();
// Skip the ' character.
if (s == "\u0027") continue;
if ( String.IsNullOrEmpty(s.Trim()) | (s.Length == 1 && Char.IsPunctuation(Convert.ToChar(s)))) {
elements.Add(nChars);
nChars = 0;
}
else {
nChars++;
}
}
// Display character counts.
Console.WriteLine("{0,6} {1,20} {2,20}",
"Word #", "Char Objects", "Characters");
for (int ctr = 0; ctr < chars.Count; ctr++)
Console.WriteLine("{0,6} {1,20} {2,20}",
ctr, chars[ctr], elements[ctr]);
}
}
// The example displays the following output:
// Word # Char Objects Characters
// 0 2 2
// 1 4 4
// 2 3 3
// 3 4 4
// 4 3 3
// 5 8 8
// 6 8 7
// 7 3 3
// 8 2 2
// 9 8 8
// 10 2 1
// 11 8 8
// 12 3 3
// 13 3 3
// 14 9 9
// 15 15 15
// 16 2 2
// 17 6 6
// 18 12 12
Imports System.Collections.Generic
Imports System.Globalization
Module Example
Public Sub Main()
' First sentence of The Mystery of the Yellow Room, by Leroux.
Dim opening As String = "Ce n'est pas sans une certaine émotion que "+
"je commence à raconter ici les aventures " +
"extraordinaires de Joseph Rouletabille."
' Character counters.
Dim nChars As Integer = 0
' Objects to store word count.
Dim chars As New List(Of Integer)()
Dim elements As New List(Of Integer)()
For Each ch In opening
' Skip the ' character.
If ch = ChrW(&h0027) Then Continue For
If Char.IsWhiteSpace(ch) Or Char.IsPunctuation(ch) Then
chars.Add(nChars)
nChars = 0
Else
nChars += 1
End If
Next
Dim te As TextElementEnumerator = StringInfo.GetTextElementEnumerator(opening)
Do While te.MoveNext()
Dim s As String = te.GetTextElement()
' Skip the ' character.
If s = ChrW(&h0027) Then Continue Do
If String.IsNullOrEmpty(s.Trim()) Or (s.Length = 1 AndAlso Char.IsPunctuation(Convert.ToChar(s)))
elements.Add(nChars)
nChars = 0
Else
nChars += 1
End If
Loop
' Display character counts.
Console.WriteLine("{0,6} {1,20} {2,20}",
"Word #", "Char Objects", "Characters")
For ctr As Integer = 0 To chars.Count - 1
Console.WriteLine("{0,6} {1,20} {2,20}",
ctr, chars(ctr), elements(ctr))
Next
End Sub
End Module
' The example displays the following output:
' Word # Char Objects Characters
' 0 2 2
' 1 4 4
' 2 3 3
' 3 4 4
' 4 3 3
' 5 8 8
' 6 8 7
' 7 3 3
' 8 2 2
' 9 8 8
' 10 2 1
' 11 8 8
' 12 3 3
' 13 3 3
' 14 9 9
' 15 15 15
' 16 2 2
' 17 6 6
' 18 12 12
This example works with text elements by using the System.Globalization.StringInfo.GetTextElementEnumerator method and the TextElementEnumerator class to enumerate all the text elements in a string. You can also retrieve an array that contains the starting index of each text element by calling the System.Globalization.StringInfo.ParseCombiningCharacters method.
For more information about working with units of text rather than individual Char values, see the StringInfo class.
Null strings and empty strings
A string that has been declared but has not been assigned a value is null
. Attempting to call methods on that string throws a NullReferenceException. A null string is different from an empty string, which is a string whose value is "" or System.String.Empty. In some cases, passing either a null string or an empty string as an argument in a method call throws an exception. For example, passing a null string to the System.Int32.Parse method throws an ArgumentNullException, and passing an empty string throws a FormatException. In other cases, a method argument can be either a null string or an empty string. For example, if you are providing an IFormattable implementation for a class, you want to equate both a null string and an empty string with the general ("G") format specifier.
The String class includes the following two convenience methods that enable you to test whether a string is null
or empty:
IsNullOrEmpty, which indicates whether a string is either
null
or is equal to System.String.Empty. This method eliminates the need to use code such as the following:if (str == nullptr || str->Equals(String::Empty))
if (str == null || str.Equals(String.Empty))
If str Is Nothing OrElse str.Equals(String.Empty) Then
IsNullOrWhiteSpace, which indicates whether a string is
null
, equals System.String.Empty, or consists exclusively of white-space characters. This method eliminates the need to use code such as the following:if (str == nullptr || str->Equals(String::Empty) || str->Trim()->Equals(String::Empty))
if (str == null || str.Equals(String.Empty) || str.Trim().Equals(String.Empty))
If str Is Nothing OrElse str.Equals(String.Empty) OrElse str.Trim().Equals(String.Empty)
The following example uses the IsNullOrEmpty method in the System.IFormattable.ToString implementation of a custom Temperature
class. The method supports the "G", "C", "F", and "K" format strings. If an empty format string or a format string whose value is null
is passed to the method, its value is changed to the "G" format string.
public:
virtual String^ ToString(String^ format, IFormatProvider^ provider)
{
if (String::IsNullOrEmpty(format)) format = "G";
if (provider == nullptr) provider = CultureInfo::CurrentCulture;
switch (Convert::ToUInt16(format->ToUpperInvariant()))
{
// Return degrees in Celsius.
case 'G':
case 'C':
return temp.ToString("F2", provider) + L"�C";
// Return degrees in Fahrenheit.
case 'F':
return (temp * 9 / 5 + 32).ToString("F2", provider) + L"�F";
// Return degrees in Kelvin.
case 'K':
return (temp + 273.15).ToString();
default:
throw gcnew FormatException(
String::Format("The {0} format string is not supported.",
format));
}
}
public string ToString(string format, IFormatProvider provider)
{
if (String.IsNullOrEmpty(format)) format = "G";
if (provider == null) provider = CultureInfo.CurrentCulture;
switch (format.ToUpperInvariant())
{
// Return degrees in Celsius.
case "G":
case "C":
return temp.ToString("F2", provider) + "°C";
// Return degrees in Fahrenheit.
case "F":
return (temp * 9 / 5 + 32).ToString("F2", provider) + "°F";
// Return degrees in Kelvin.
case "K":
return (temp + 273.15).ToString();
default:
throw new FormatException(
String.Format("The {0} format string is not supported.",
format));
}
}
Public Overloads Function ToString(fmt As String, provider As IFormatProvider) As String _
Implements IFormattable.ToString
If String.IsNullOrEmpty(fmt) Then fmt = "G"
If provider Is Nothing Then provider = CultureInfo.CurrentCulture
Select Case fmt.ToUpperInvariant()
' Return degrees in Celsius.
Case "G", "C"
Return temp.ToString("F2", provider) + "°C"
' Return degrees in Fahrenheit.
Case "F"
Return (temp * 9 / 5 + 32).ToString("F2", provider) + "°F"
' Return degrees in Kelvin.
Case "K"
Return (temp + 273.15).ToString()
Case Else
Throw New FormatException(
String.Format("The {0} format string is not supported.",
fmt))
End Select
End Function
Immutability and the StringBuilder class
A String object is called immutable (read-only), because its value cannot be modified after it has been created. Methods that appear to modify a String object actually return a new String object that contains the modification.
Because strings are immutable, string manipulation routines that perform repeated additions or deletions to what appears to be a single string can exact a significant performance penalty. For example, the following code uses a random number generator to create a string with 1000 characters in the range 0x0001 to 0x052F. Although the code appears to use string concatenation to append a new character to the existing string named str
, it actually creates a new String object for each concatenation operation.
using namespace System;
using namespace System::IO;
using namespace System::Text;
void main()
{
Random^ rnd = gcnew Random();
String^ str = String::Empty;
StreamWriter^ sw = gcnew StreamWriter(".\\StringFile.txt",
false, Encoding::Unicode);
for (int ctr = 0; ctr <= 1000; ctr++) {
str += Convert::ToChar(rnd->Next(1, 0x0530));
if (str->Length % 60 == 0)
str += Environment::NewLine;
}
sw->Write(str);
sw->Close();
}
using System;
using System.IO;
using System.Text;
public class Example
{
public static void Main()
{
Random rnd = new Random();
string str = String.Empty;
StreamWriter sw = new StreamWriter(@".\StringFile.txt",
false, Encoding.Unicode);
for (int ctr = 0; ctr <= 1000; ctr++) {
str += Convert.ToChar(rnd.Next(1, 0x0530));
if (str.Length % 60 == 0)
str += Environment.NewLine;
}
sw.Write(str);
sw.Close();
}
}
Imports System.IO
Imports System.Text
Module Example
Public Sub Main()
Dim rnd As New Random()
Dim str As String = String.Empty
Dim sw As New StreamWriter(".\StringFile.txt",
False, Encoding.Unicode)
For ctr As Integer = 0 To 1000
str += ChrW(rnd.Next(1, &h0530))
If str.Length Mod 60 = 0 Then str += vbCrLf
Next
sw.Write(str)
sw.Close()
End Sub
End Module
You can use the StringBuilder class instead of the String class for operations that make multiple changes to the value of a string. Unlike instances of the String class, StringBuilder objects are mutable; when you concatenate, append, or delete substrings from a string, the operations are performed on a single string. When you have finished modifying the value of a StringBuilder object, you can call its System.Text.StringBuilder.ToString method to convert it to a string. The following example replaces the String used in the previous example to concatenate 1000 random characters in the range to 0x0001 to 0x052F with a StringBuilder object.
using namespace System;
using namespace System::IO;
using namespace System::Text;
void main()
{
Random^ rnd = gcnew Random();
StringBuilder^ sb = gcnew StringBuilder();
StreamWriter^ sw = gcnew StreamWriter(".\\StringFile.txt",
false, Encoding::Unicode);
for (int ctr = 0; ctr <= 1000; ctr++) {
sb->Append(Convert::ToChar(rnd->Next(1, 0x0530)));
if (sb->Length % 60 == 0)
sb->AppendLine();
}
sw->Write(sb->ToString());
sw->Close();
}
using System;
using System.IO;
using System.Text;
public class Example
{
public static void Main()
{
Random rnd = new Random();
StringBuilder sb = new StringBuilder();
StreamWriter sw = new StreamWriter(@".\StringFile.txt",
false, Encoding.Unicode);
for (int ctr = 0; ctr <= 1000; ctr++) {
sb.Append(Convert.ToChar(rnd.Next(1, 0x0530)));
if (sb.Length % 60 == 0)
sb.AppendLine();
}
sw.Write(sb.ToString());
sw.Close();
}
}
Imports System.IO
Imports System.Text
Module Example
Public Sub Main()
Dim rnd As New Random()
Dim sb As New StringBuilder()
Dim sw As New StreamWriter(".\StringFile.txt",
False, Encoding.Unicode)
For ctr As Integer = 0 To 1000
sb.Append(ChrW(rnd.Next(1, &h0530)))
If sb.Length Mod 60 = 0 Then sb.AppendLine()
Next
sw.Write(sb.ToString())
sw.Close()
End Sub
End Module
Ordinal vs. culture-sensitive operations
Members of the String class perform either ordinal or culture-sensitive (linguistic) operations on a String object. An ordinal operation acts on the numeric value of each Char object. A culture-sensitive operation acts on the value of the String object, and takes culture-specific casing, sorting, formatting, and parsing rules into account. Culture-sensitive operations execute in the context of an explicitly declared culture or the implicit current culture. The two kinds of operations can produce very different results when they are performed on the same string.
The .NET Framework also supports culture-insensitive linguistic string operations by using the invariant culture (System.Globalization.CultureInfo.InvariantCulture), which is loosely based on the culture settings of the English language independent of region. Unlike other System.Globalization.CultureInfo settings, the settings of the invariant culture are guaranteed to remain consistent on a single computer, from system to system, and across versions of the .NET Framework. The invariant culture can be seen as a kind of black box that ensures stability of string comparisons and ordering across all cultures.
Important
If your application makes a security decision about a symbolic identifier such as a file name or named pipe, or about persisted data such as the text-based data in an XML file, the operation should use an ordinal comparison instead of a culture-sensitive comparison. This is because a culture-sensitive comparison can yield different results depending on the culture in effect, whereas an ordinal comparison depends solely on the binary value of the compared characters.
Important
Most methods that perform string operations include an overload that has a parameter of type StringComparison, which enables you to specify whether the method performs an ordinal or culture-sensitive operation. In general, you should call this overload to make the intent of your method call clear. For best practices and guidance for using ordinal and culture-sensitive operations on strings, see Best Practices for Using Strings.
Operations for casing, parsing and formatting, comparison and sorting, and testing for equality can be either ordinal or culture-sensitive. The following sections discuss each category of operation.
Tip
You should always call a method overload that makes the intent of your method call clear. For example, instead of calling the Compare(String, String) method to perform a culture-sensitive comparison of two strings by using the conventions of the current culture, you should call the Compare(String, String, StringComparison) method with a value of System.StringComparison for the comparisonType
argument. For more information, see Best Practices for Using Strings.
Casing
Casing rules determine how to change the capitalization of a Unicode character; for example, from lowercase to uppercase. Often, a casing operation is performed before a string comparison. For example, a string might be converted to uppercase so that it can be compared with another uppercase string. You can convert the characters in a string to lowercase by calling the ToLower or ToLowerInvariant method, and you can convert them to uppercase by calling the ToUpper or ToUpperInvariant method. In addition, you can use the System.Globalization.TextInfo.ToTitleCase method to convert a string to title case.
Casing operations can be based on the rules of the current culture, a specified culture, or the invariant culture. Because case mappings can vary depending on the culture used, the result of casing operations can vary based on culture. The actual differences in casing are of three kinds:
Differences in the case mapping of LATIN CAPITAL LETTER I (U+0049), LATIN SMALL LETTER I (U+0069), LATIN CAPITAL LETTER I WITH DOT ABOVE (U+0130), and LATIN SMALL LETTER DOTLESS I (U+0131). In the tr-TR (Turkish (Turkey)) and az-Latn-AZ (Azerbaijan, Latin) cultures, and in the tr, az, and az-Latn neutral cultures, the lowercase equivalent of LATIN CAPITAL LETTER I is LATIN SMALL LETTER DOTLESS I, and the uppercase equivalent of LATIN SMALL LETTER I is LATIN CAPITAL LETTER I WITH DOT ABOVE. In all other cultures, including the invariant culture, LATIN SMALL LETTER I and LATIN CAPITAL LETTER I are lowercase and uppercase equivalents.
The following example demonstrates how a string comparison designed to prevent file system access can fail if it relies on a culture-sensitive casing comparison. (The casing conventions of the invariant culture should have been used.)
using System; using System.Globalization; using System.Threading; public class Example { const string disallowed = "file"; public static void Main() { IsAccessAllowed(@"FILE:\\\c:\users\user001\documents\FinancialInfo.txt"); } private static void IsAccessAllowed(String resource) { CultureInfo[] cultures = { CultureInfo.CreateSpecificCulture("en-US"), CultureInfo.CreateSpecificCulture("tr-TR") }; String scheme = null; int index = resource.IndexOfAny( new Char[] { '\\', '/' } ); if (index > 0) scheme = resource.Substring(0, index - 1); // Change the current culture and perform the comparison. foreach (var culture in cultures) { Thread.CurrentThread.CurrentCulture = culture; Console.WriteLine("Culture: {0}", CultureInfo.CurrentCulture.DisplayName); Console.WriteLine(resource); Console.WriteLine("Access allowed: {0}", ! String.Equals(disallowed, scheme, StringComparison.CurrentCultureIgnoreCase)); Console.WriteLine(); } } } // The example displays the following output: // Culture: English (United States) // FILE:\\\c:\users\user001\documents\FinancialInfo.txt // Access allowed: False // // Culture: Turkish (Turkey) // FILE:\\\c:\users\user001\documents\FinancialInfo.txt // Access allowed: True
Imports System.Globalization Imports System.Threading Module Example Const disallowed = "file" Public Sub Main() IsAccessAllowed("FILE:\\\c:\users\user001\documents\FinancialInfo.txt") End Sub Private Sub IsAccessAllowed(resource As String) Dim cultures() As CultureInfo = { CultureInfo.CreateSpecificCulture("en-US"), CultureInfo.CreateSpecificCulture("tr-TR") } Dim scheme As String = Nothing Dim index As Integer = resource.IndexOfAny( {"\"c, "/"c }) If index > 0 Then scheme = resource.Substring(0, index - 1) ' Change the current culture and perform the comparison. For Each culture In cultures Thread.CurrentThread.CurrentCulture = culture Console.WriteLine("Culture: {0}", CultureInfo.CurrentCulture.DisplayName) Console.WriteLine(resource) Console.WriteLine("Access allowed: {0}", Not String.Equals(disallowed, scheme, StringComparison.CurrentCultureIgnoreCase)) Console.WriteLine() Next End Sub End Module ' The example displays the following output: ' Culture: English (United States) ' FILE:\\\c:\users\user001\documents\FinancialInfo.txt ' Access allowed: False ' ' Culture: Turkish (Turkey) ' FILE:\\\c:\users\user001\documents\FinancialInfo.txt ' Access allowed: True
Differences in case mappings between the invariant culture and all other cultures. In these cases, using the casing rules of the invariant culture to change a character to uppercase or lowercase returns the same character. For all other cultures, it returns a different character. Some of the affected characters are listed in the following table.
Character If changed to Returns MICRON SIGN (U+00B5) Uppercase GREEK CAPITAL LETTER MU (U+-39C) LATIN CAPITAL LETTER I WITH DOT ABOVE (U+0130) Lowercase LATIN SMALL LETTER I (U+0069) LATIN SMALL LETTER DOTLESS I (U+0131) Uppercase LATIN CAPITAL LETTER I (U+0049) LATIN SMALL LETTER LONG S (U+017F) Uppercase LATIN CAPITAL LETTER S (U+0053) LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON (U+01C5) Lowercase LATIN SMALL LETTER DZ WITH CARON (U+01C6) COMBINING GREEK YPOGEGRAMMENI (U+0345) Uppercase GREEK CAPITAL LETTER IOTA (U+0399) Differences in case mappings of two-letter mixed-case pairs in the ASCII character range. In most cultures, a two-letter mixed-case pair is equal to the equivalent two-letter uppercase or lowercase pair. This is not true for the following two-letter pairs in the following cultures, because in each case they are compared to a digraph:
"lJ" and "nJ" in the hr-HR (Croatian (Croatia)) culture.
"cH" in the cs-CZ (Czech (Czech Republic)) and sk-SK (Slovak (Slovakia)) cultures.
"aA" in the da-DK (Danish (Denmark)) culture.
"cS", "dZ", "dZS", "nY", "sZ", "tY", and "zS" in the hu-HU (Hungarian (Hungary)) culture.
"cH" and "lL" in the es-ES_tradnl (Spanish (Spain, Traditional Sort)) culture.
"cH", "gI", "kH", "nG" "nH", "pH", "qU', "tH", and "tR" in the vi-VN (Vietnamese (Vietnam)) culture.
However, it is unusual to encounter a situation in which a culture-sensitive comparison of these pairs creates problems, because these pairs are uncommon in fixed strings or identifiers.
The following example illustrates some of the differences in casing rules between cultures when converting strings to uppercase.
using namespace System;
using namespace System::Globalization;
using namespace System::IO;
String^ ShowHexValue(String^ s);
void main()
{
StreamWriter^ sw = gcnew StreamWriter(".\\case.txt");
array<String^>^ words = gcnew array<String^> { L"file", L"sıfır", L"Dženana" };
array<CultureInfo^>^ cultures = gcnew array<CultureInfo^> { CultureInfo::InvariantCulture,
gcnew CultureInfo("en-US"),
gcnew CultureInfo("tr-TR") };
for each (String^ word in words) {
sw->WriteLine("{0}:", word);
for each (CultureInfo^ culture in cultures) {
String^ name = String::IsNullOrEmpty(culture->Name) ?
"Invariant" : culture->Name;
String^ upperWord = word->ToUpper(culture);
sw->WriteLine(" {0,10}: {1,7} {2, 38}", name,
upperWord, ShowHexValue(upperWord));
}
sw->WriteLine();
}
sw->Close();
}
String^ ShowHexValue(String^ s)
{
String^ retval = nullptr;
for each (Char ch in s) {
array<Byte>^ bytes = BitConverter::GetBytes(ch);
retval += String::Format("{0:X2} {1:X2} ", bytes[1], bytes[0]);
}
return retval;
}
// The example displays the following output:
// file:
// Invariant: FILE 00 46 00 49 00 4C 00 45
// en-US: FILE 00 46 00 49 00 4C 00 45
// tr-TR: FİLE 00 46 01 30 00 4C 00 45
//
// sıfır:
// Invariant: SıFıR 00 53 01 31 00 46 01 31 00 52
// en-US: SIFIR 00 53 00 49 00 46 00 49 00 52
// tr-TR: SIFIR 00 53 00 49 00 46 00 49 00 52
//
// Dženana:
// Invariant: DžENANA 01 C5 00 45 00 4E 00 41 00 4E 00 41
// en-US: DŽENANA 01 C4 00 45 00 4E 00 41 00 4E 00 41
// tr-TR: DŽENANA 01 C4 00 45 00 4E 00 41 00 4E 00 41
using System;
using System.Globalization;
using System.IO;
public class Example
{
public static void Main()
{
StreamWriter sw = new StreamWriter(@".\case.txt");
string[] words = { "file", "sıfır", "Dženana" };
CultureInfo[] cultures = { CultureInfo.InvariantCulture,
new CultureInfo("en-US"),
new CultureInfo("tr-TR") };
foreach (var word in words) {
sw.WriteLine("{0}:", word);
foreach (var culture in cultures) {
string name = String.IsNullOrEmpty(culture.Name) ?
"Invariant" : culture.Name;
string upperWord = word.ToUpper(culture);
sw.WriteLine(" {0,10}: {1,7} {2, 38}", name,
upperWord, ShowHexValue(upperWord));
}
sw.WriteLine();
}
sw.Close();
}
private static string ShowHexValue(string s)
{
string retval = null;
foreach (var ch in s) {
byte[] bytes = BitConverter.GetBytes(ch);
retval += String.Format("{0:X2} {1:X2} ", bytes[1], bytes[0]);
}
return retval;
}
}
// The example displays the following output:
// file:
// Invariant: FILE 00 46 00 49 00 4C 00 45
// en-US: FILE 00 46 00 49 00 4C 00 45
// tr-TR: FİLE 00 46 01 30 00 4C 00 45
//
// sıfır:
// Invariant: SıFıR 00 53 01 31 00 46 01 31 00 52
// en-US: SIFIR 00 53 00 49 00 46 00 49 00 52
// tr-TR: SIFIR 00 53 00 49 00 46 00 49 00 52
//
// Dženana:
// Invariant: DžENANA 01 C5 00 45 00 4E 00 41 00 4E 00 41
// en-US: DŽENANA 01 C4 00 45 00 4E 00 41 00 4E 00 41
// tr-TR: DŽENANA 01 C4 00 45 00 4E 00 41 00 4E 00 41
Imports System.Globalization
Imports System.IO
Module Example
Public Sub Main()
Dim sw As New StreamWriter(".\case.txt")
Dim words As String() = { "file", "sıfır", "Dženana" }
Dim cultures() As CultureInfo = { CultureInfo.InvariantCulture,
New CultureInfo("en-US"),
New CultureInfo("tr-TR") }
For Each word In words
sw.WriteLine("{0}:", word)
For Each culture In cultures
Dim name As String = If(String.IsNullOrEmpty(culture.Name),
"Invariant", culture.Name)
Dim upperWord As String = word.ToUpper(culture)
sw.WriteLine(" {0,10}: {1,7} {2, 38}", name,
upperWord, ShowHexValue(upperWord))
Next
sw.WriteLine()
Next
sw.Close()
End Sub
Private Function ShowHexValue(s As String) As String
Dim retval As String = Nothing
For Each ch In s
Dim bytes() As Byte = BitConverter.GetBytes(ch)
retval += String.Format("{0:X2} {1:X2} ", bytes(1), bytes(0))
Next
Return retval
End Function
End Module
' The example displays the following output:
' file:
' Invariant: FILE 00 46 00 49 00 4C 00 45
' en-US: FILE 00 46 00 49 00 4C 00 45
' tr-TR: FİLE 00 46 01 30 00 4C 00 45
'
' sıfır:
' Invariant: SıFıR 00 53 01 31 00 46 01 31 00 52
' en-US: SIFIR 00 53 00 49 00 46 00 49 00 52
' tr-TR: SIFIR 00 53 00 49 00 46 00 49 00 52
'
' Dženana:
' Invariant: DžENANA 01 C5 00 45 00 4E 00 41 00 4E 00 41
' en-US: DŽENANA 01 C4 00 45 00 4E 00 41 00 4E 00 41
' tr-TR: DŽENANA 01 C4 00 45 00 4E 00 41 00 4E 00 41
Parsing and formatting
Formatting and parsing are inverse operations. Formatting rules determine how to convert a value, such as a date and time or a number, to its string representation, whereas parsing rules determine how to convert a string representation to a value such as a date and time. Both formatting and parsing rules are dependent on cultural conventions. The following example illustrates the ambiguity that can arise when interpreting a culture-specific date string. Without knowing the conventions of the culture that was used to produce a date string, it is not possible to know whether 03/01/2011, 3/1/2011, and 01/03/2011 represent January 3, 2011 or March 1, 2011.
using namespace System;
using namespace System::Globalization;
void main()
{
DateTime^ date = gcnew DateTime(2011, 3, 1);
array<CultureInfo^>^ cultures = gcnew array<CultureInfo^> { CultureInfo::InvariantCulture,
gcnew CultureInfo("en-US"),
gcnew CultureInfo("fr-FR") };
for each (CultureInfo^ culture in cultures)
Console::WriteLine("{0,-12} {1}", String::IsNullOrEmpty(culture->Name) ?
"Invariant" : culture->Name,
date->ToString("d", culture));
}
// The example displays the following output:
// Invariant 03/01/2011
// en-US 3/1/2011
// fr-FR 01/03/2011
using System;
using System.Globalization;
public class Example
{
public static void Main()
{
DateTime date = new DateTime(2011, 3, 1);
CultureInfo[] cultures = { CultureInfo.InvariantCulture,
new CultureInfo("en-US"),
new CultureInfo("fr-FR") };
foreach (var culture in cultures)
Console.WriteLine("{0,-12} {1}", String.IsNullOrEmpty(culture.Name) ?
"Invariant" : culture.Name,
date.ToString("d", culture));
}
}
// The example displays the following output:
// Invariant 03/01/2011
// en-US 3/1/2011
// fr-FR 01/03/2011
Imports System.Globalization
Module Example
Public Sub Main()
Dim dat As Date = #3/1/2011#
Dim cultures() As CultureInfo = { CultureInfo.InvariantCulture,
New CultureInfo("en-US"),
New CultureInfo("fr-FR") }
For Each culture In cultures
Console.WriteLine("{0,-12} {1}", If(String.IsNullOrEmpty(culture.Name),
"Invariant", culture.Name),
dat.ToString("d", culture))
Next
End Sub
End Module
' The example displays the following output:
' Invariant 03/01/2011
' en-US 3/1/2011
' fr-FR 01/03/2011
Similarly, as the following example shows, a single string can produce different dates depending on the culture whose conventions are used in the parsing operation.
using namespace System;
using namespace System::Globalization;
void main()
{
String^ dateString = "07/10/2011";
array<CultureInfo^>^ cultures = gcnew array<CultureInfo^> { CultureInfo::InvariantCulture,
CultureInfo::CreateSpecificCulture("en-GB"),
CultureInfo::CreateSpecificCulture("en-US") };
Console::WriteLine("{0,-12} {1,10} {2,8} {3,8}\n", "Date String", "Culture",
"Month", "Day");
for each (CultureInfo^ culture in cultures) {
DateTime date = DateTime::Parse(dateString, culture);
Console::WriteLine("{0,-12} {1,10} {2,8} {3,8}", dateString,
String::IsNullOrEmpty(culture->Name) ?
"Invariant" : culture->Name,
date.Month, date.Day);
}
}
// The example displays the following output:
// Date String Culture Month Day
//
// 07/10/2011 Invariant 7 10
// 07/10/2011 en-GB 10 7
// 07/10/2011 en-US 7 10
using System;
using System.Globalization;
public class Example
{
public static void Main()
{
string dateString = "07/10/2011";
CultureInfo[] cultures = { CultureInfo.InvariantCulture,
CultureInfo.CreateSpecificCulture("en-GB"),
CultureInfo.CreateSpecificCulture("en-US") };
Console.WriteLine("{0,-12} {1,10} {2,8} {3,8}\n", "Date String", "Culture",
"Month", "Day");
foreach (var culture in cultures) {
DateTime date = DateTime.Parse(dateString, culture);
Console.WriteLine("{0,-12} {1,10} {2,8} {3,8}", dateString,
String.IsNullOrEmpty(culture.Name) ?
"Invariant" : culture.Name,
date.Month, date.Day);
}
}
}
// The example displays the following output:
// Date String Culture Month Day
//
// 07/10/2011 Invariant 7 10
// 07/10/2011 en-GB 10 7
// 07/10/2011 en-US 7 10
Imports System.Globalization
Module Example
Public Sub Main()
Dim dateString As String = "07/10/2011"
Dim cultures() As CultureInfo = { CultureInfo.InvariantCulture,
CultureInfo.CreateSpecificCulture("en-GB"),
CultureInfo.CreateSpecificCulture("en-US") }
Console.WriteLine("{0,-12} {1,10} {2,8} {3,8}", "Date String", "Culture",
"Month", "Day")
Console.WriteLine()
For Each culture In cultures
Dim dat As Date = DateTime.Parse(dateString, culture)
Console.WriteLine("{0,-12} {1,10} {2,8} {3,8}", dateString,
If(String.IsNullOrEmpty(culture.Name),
"Invariant", culture.Name),
dat.Month, dat.Day)
Next
End Sub
End Module
' The example displays the following output:
' Date String Culture Month Day
'
' 07/10/2011 Invariant 7 10
' 07/10/2011 en-GB 10 7
' 07/10/2011 en-US 7 10
String comparison and sorting
Conventions for comparing and sorting strings vary from culture to culture. For example, the sort order may be based on phonetics or on the visual representation of characters. In East Asian languages, characters are sorted by the stroke and radical of ideographs. Sorting also depends on the order languages and cultures use for the alphabet. For example, the Danish language has an "Æ" character that it sorts after "Z" in the alphabet. In addition, comparisons can be case-sensitive or case-insensitive, and in some cases casing rules also differ by culture. Ordinal comparison, on the other hand, uses the Unicode code points of individual characters in a string when comparing and sorting strings.
Sort rules determine the alphabetic order of Unicode characters and how two strings compare to each other. For example, the System.String.Compare(String, String, StringComparison) method compares two strings based on the StringComparison parameter. If the parameter value is System.StringComparison, the method performs a linguistic comparison that uses the conventions of the current culture; if the parameter value is System.StringComparison, the method performs an ordinal comparison. Consequently, as the following example shows, if the current culture is U.S. English, the first call to the System.String.Compare(String, String, StringComparison) method (using culture-sensitive comparison) considers "a" less than "A", but the second call to the same method (using ordinal comparison) considers "a" greater than "A".
using namespace System;
using namespace System::Globalization;
using namespace System::Threading;
void main()
{
Thread::CurrentThread->CurrentCulture = CultureInfo::CreateSpecificCulture("en-US");
Console::WriteLine(String::Compare("A", "a", StringComparison::CurrentCulture));
Console::WriteLine(String::Compare("A", "a", StringComparison::Ordinal));
}
// The example displays the following output:
// 1
// -32
using System;
using System.Globalization;
using System.Threading;
public class Example
{
public static void Main()
{
Thread.CurrentThread.CurrentCulture = CultureInfo.CreateSpecificCulture("en-US");
Console.WriteLine(String.Compare("A", "a", StringComparison.CurrentCulture));
Console.WriteLine(String.Compare("A", "a", StringComparison.Ordinal));
}
}
// The example displays the following output:
// 1
// -32
Imports System.Globalization
Imports System.Threading
Module Example
Public Sub Main()
Thread.CurrentThread.CurrentCulture = CultureInfo.CreateSpecificCulture("en-US")
Console.WriteLine(String.Compare("A", "a", StringComparison.CurrentCulture))
Console.WriteLine(String.Compare("A", "a", StringComparison.Ordinal))
End Sub
End Module
' The example displays the following output:
' 1
' -32
The .NET Framework supports word, string, and ordinal sort rules:
A word sort performs a culture-sensitive comparison of strings in which certain nonalphanumeric Unicode characters might have special weights assigned to them. For example, the hyphen (-) might have a very small weight assigned to it so that "coop" and "co-op" appear next to each other in a sorted list. For a list of the String methods that compare two strings using word sort rules, see the String operations by category section.
A string sort also performs a culture-sensitive comparison. It is similar to a word sort, except that there are no special cases, and all nonalphanumeric symbols come before all alphanumeric Unicode characters. Two strings can be compared using string sort rules by calling the System.Globalization.CompareInfo.Compare method overloads that have an
options
parameter that is supplied a value of System.Globalization.CompareOptions. Note that this is the only method that the .NET Framework provides to compare two strings using string sort rules.An ordinal sort compares strings based on the numeric value of each Char object in the string. An ordinal comparison is automatically case-sensitive because the lowercase and uppercase versions of a character have different code points. However, if case is not important, you can specify an ordinal comparison that ignores case. This is equivalent to converting the string to uppercase by using the invariant culture and then performing an ordinal comparison on the result. For a list of the String methods that compare two strings using ordinal sort rules, see the String operations by category section.
A culture-sensitive comparison is any comparison that explicitly or implicitly uses a CultureInfo object, including the invariant culture that is specified by the System.Globalization.CultureInfo.InvariantCulture property. The implicit culture is the current culture, which is specified by the System.Threading.Thread.CurrentCulture and System.Globalization.CultureInfo.CurrentCulture properties. There is considerable variation in the sort order of alphabetic characters (that is, characters for which the System.Char.IsLetter property returns true
) across cultures. You can specify a culture-sensitive comparison that uses the conventions of a specific culture by supplying a CultureInfo object to a string comparison method such as Compare(String, String, CultureInfo, CompareOptions). You can specify a culture-sensitive comparison that uses the conventions of the current culture by supplying System.StringComparison, System.StringComparison, or any member of the CompareOptions enumeration other than System.Globalization.CompareOptions or System.Globalization.CompareOptions to an appropriate overload of the Compare method. A culture-sensitive comparison is generally appropriate for sorting whereas an ordinal comparison is not. An ordinal comparison is generally appropriate for determining whether two strings are equal (that is, for determining identity) whereas a culture-sensitive comparison is not.
The following example illustrates the difference between culture-sensitive and ordinal comparison. The example evaluates three strings, "Apple", "Æble", and "AEble", using ordinal comparison and the conventions of the da-DK and en-US cultures (each of which is the default culture at the time the Compare method is called). Because the Danish language treats the character "Æ" as an individual letter and sorts it after "Z" in the alphabet, the string "Æble" is greater than "Apple". However, "Æble" is not considered equivalent to "AEble", so "Æble" is also greater than "AEble". The en-US culture doesn't include the letter"Æ" but treats it as equivalent to "AE", which explains why "Æble" is less than "Apple" but equal to "AEble". Ordinal comparison, on the other hand, considers "Apple" to be less than "Æble", and "Æble" to be greater than "AEble".
using System;
using System.Globalization;
using System.Threading;
public class CompareStringSample
{
public static void Main()
{
string str1 = "Apple";
string str2 = "Æble";
string str3 = "AEble";
// Set the current culture to Danish in Denmark.
Thread.CurrentThread.CurrentCulture = new CultureInfo("da-DK");
Console.WriteLine("Current culture: {0}",
CultureInfo.CurrentCulture.Name);
Console.WriteLine("Comparison of {0} with {1}: {2}",
str1, str2, String.Compare(str1, str2));
Console.WriteLine("Comparison of {0} with {1}: {2}\n",
str2, str3, String.Compare(str2, str3));
// Set the current culture to English in the U.S.
Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");
Console.WriteLine("Current culture: {0}",
CultureInfo.CurrentCulture.Name);
Console.WriteLine("Comparison of {0} with {1}: {2}",
str1, str2, String.Compare(str1, str2));
Console.WriteLine("Comparison of {0} with {1}: {2}\n",
str2, str3, String.Compare(str2, str3));
// Perform an ordinal comparison.
Console.WriteLine("Ordinal comparison");
Console.WriteLine("Comparison of {0} with {1}: {2}",
str1, str2,
String.Compare(str1, str2, StringComparison.Ordinal));
Console.WriteLine("Comparison of {0} with {1}: {2}",
str2, str3,
String.Compare(str2, str3, StringComparison.Ordinal));
}
}
// The example displays the following output:
// Current culture: da-DK
// Comparison of Apple with Æble: -1
// Comparison of Æble with AEble: 1
//
// Current culture: en-US
// Comparison of Apple with Æble: 1
// Comparison of Æble with AEble: 0
//
// Ordinal comparison
// Comparison of Apple with Æble: -133
// Comparison of Æble with AEble: 133
Imports System.Globalization
Imports System.Threading
Public Module Example
Public Sub Main()
Dim str1 As String = "Apple"
Dim str2 As String = "Æble"
Dim str3 As String = "AEble"
' Set the current culture to Danish in Denmark.
Thread.CurrentThread.CurrentCulture = New CultureInfo("da-DK")
Console.WriteLine("Current culture: {0}",
CultureInfo.CurrentCulture.Name)
Console.WriteLine("Comparison of {0} with {1}: {2}",
str1, str2, String.Compare(str1, str2))
Console.WriteLine("Comparison of {0} with {1}: {2}",
str2, str3, String.Compare(str2, str3))
Console.WriteLine()
' Set the current culture to English in the U.S.
Thread.CurrentThread.CurrentCulture = New CultureInfo("en-US")
Console.WriteLine("Current culture: {0}",
CultureInfo.CurrentCulture.Name)
Console.WriteLine("Comparison of {0} with {1}: {2}",
str1, str2, String.Compare(str1, str2))
Console.WriteLine("Comparison of {0} with {1}: {2}",
str2, str3, String.Compare(str2, str3))
Console.WriteLine()
' Perform an ordinal comparison.
Console.WriteLine("Ordinal comparison")
Console.WriteLine("Comparison of {0} with {1}: {2}",
str1, str2,
String.Compare(str1, str2, StringComparison.Ordinal))
Console.WriteLine("Comparison of {0} with {1}: {2}",
str2, str3,
String.Compare(str2, str3, StringComparison.Ordinal))
End Sub
End Module
' The example displays the following output:
' Current culture: da-DK
' Comparison of Apple with Æble: -1
' Comparison of Æble with AEble: 1
'
' Current culture: en-US
' Comparison of Apple with Æble: 1
' Comparison of Æble with AEble: 0
'
' Ordinal comparison
' Comparison of Apple with Æble: -133
' Comparison of Æble with AEble: 133
Use the following general guidelines to choose an appropriate sorting or string comparison method:
If you want the strings to be ordered based on the user's culture, you should order them based on the conventions of the current culture. If the user's culture changes, the order of sorted strings will also change accordingly. For example, a thesaurus application should always sort words based on the user's culture.
If you want the strings to be ordered based on the conventions of a specific culture, you should order them by supplying a CultureInfo object that represents that culture to a comparison method. For example, in an application designed to teach students a particular language, you want strings to be ordered based on the conventions of one of the cultures that speaks that language.
If you want the order of strings to remain unchanged across cultures, you should order them based on the conventions of the invariant culture or use an ordinal comparison. For example, you would use an ordinal sort to organize the names of files, processes, mutexes, or named pipes.
For a comparison that involves a security decision (such as whether a username is valid), you should always perform an ordinal test for equality by calling an overload of the Equals method.
Note
The culture-sensitive sorting and casing rules used in string comparison depend on the version of the .NET Framework. In the .NET Framework 4.5 running on the Windows 8 operating system, sorting, casing, normalization, and Unicode character information conforms to the Unicode 6.0 standard. On other operating systems, it conforms to the Unicode 5.0 standard.
For more information about word, string, and ordinal sort rules, see the System.Globalization.CompareOptions topic. For additional recommendations on when to use each rule, see Best Practices for Using Strings.
Ordinarily, you do not call string comparison methods such as Compare directly to determine the sort order of strings. Instead, comparison methods are called by sorting methods such as System.Array.Sort or System.Collections.Generic.List<T>.Sort. The following example performs four different sorting operations (word sort using the current culture, word sort using the invariant culture, ordinal sort, and string sort using the invariant culture) without explicitly calling a string comparison method, although they do specify the type of comparison to use. Note that each type of sort produces a unique ordering of strings in its array.
using namespace System;
using namespace System::Collections;
using namespace System::Collections::Generic;
using namespace System::Globalization;
// IComparer<String> implementation to perform string sort.
ref class SCompare : System::Collections::Generic::IComparer<String^>
{
public:
SCompare() {};
virtual int Compare(String^ x, String^ y)
{
return CultureInfo::CurrentCulture->CompareInfo->Compare(x, y, CompareOptions::StringSort);
}
};
void main()
{
array<String^>^ strings = gcnew array<String^> { "coop", "co-op", "cooperative",
L"co\x00ADoperative", L"c�ur", "coeur" };
// Perform a word sort using the current (en-US) culture.
array<String^>^ current = gcnew array<String^>(strings->Length);
strings->CopyTo(current, 0);
Array::Sort(current, StringComparer::CurrentCulture);
// Perform a word sort using the invariant culture.
array<String^>^ invariant = gcnew array<String^>(strings->Length);
strings->CopyTo(invariant, 0);
Array::Sort(invariant, StringComparer::InvariantCulture);
// Perform an ordinal sort.
array<String^>^ ordinal = gcnew array<String^>(strings->Length);
strings->CopyTo(ordinal, 0);
Array::Sort(ordinal, StringComparer::Ordinal);
// Perform a string sort using the current culture.
array<String^>^ stringSort = gcnew array<String^>(strings->Length);
strings->CopyTo(stringSort, 0);
Array::Sort(stringSort, gcnew SCompare());
// Display array values
Console::WriteLine("{0,13} {1,13} {2,15} {3,13} {4,13}\n",
"Original", "Word Sort", "Invariant Word",
"Ordinal Sort", "String Sort");
for (int ctr = 0; ctr < strings->Length; ctr++)
Console::WriteLine("{0,13} {1,13} {2,15} {3,13} {4,13}",
strings[ctr], current[ctr], invariant[ctr],
ordinal[ctr], stringSort[ctr] );
}
// The example displays the following output:
// Original Word Sort Invariant Word Ordinal Sort String Sort
//
// coop c�ur c�ur co-op co-op
// co-op coeur coeur coeur c�ur
// cooperative coop coop coop coeur
// co�operative co-op co-op cooperative coop
// c�ur cooperative cooperative co�operative cooperative
// coeur co�operative co�operative c�ur co�operative
using System;
using System.Collections;
using System.Collections.Generic;
using System.Globalization;
public class Example
{
public static void Main()
{
string[] strings = { "coop", "co-op", "cooperative",
"co\u00ADoperative", "cœur", "coeur" };
// Perform a word sort using the current (en-US) culture.
string[] current = new string[strings.Length];
strings.CopyTo(current, 0);
Array.Sort(current, StringComparer.CurrentCulture);
// Perform a word sort using the invariant culture.
string[] invariant = new string[strings.Length];
strings.CopyTo(invariant, 0);
Array.Sort(invariant, StringComparer.InvariantCulture);
// Perform an ordinal sort.
string[] ordinal = new string[strings.Length];
strings.CopyTo(ordinal, 0);
Array.Sort(ordinal, StringComparer.Ordinal);
// Perform a string sort using the current culture.
string[] stringSort = new string[strings.Length];
strings.CopyTo(stringSort, 0);
Array.Sort(stringSort, new SCompare());
// Display array values
Console.WriteLine("{0,13} {1,13} {2,15} {3,13} {4,13}\n",
"Original", "Word Sort", "Invariant Word",
"Ordinal Sort", "String Sort");
for (int ctr = 0; ctr < strings.Length; ctr++)
Console.WriteLine("{0,13} {1,13} {2,15} {3,13} {4,13}",
strings[ctr], current[ctr], invariant[ctr],
ordinal[ctr], stringSort[ctr] );
}
}
// IComparer<String> implementation to perform string sort.
internal class SCompare : IComparer<String>
{
public int Compare(string x, string y)
{
return CultureInfo.CurrentCulture.CompareInfo.Compare(x, y, CompareOptions.StringSort);
}
}
// The example displays the following output:
// Original Word Sort Invariant Word Ordinal Sort String Sort
//
// coop cœur cœur co-op co-op
// co-op coeur coeur coeur cœur
// cooperative coop coop coop coeur
// cooperative co-op co-op cooperative coop
// cœur cooperative cooperative cooperative cooperative
// coeur cooperative cooperative cœur cooperative
Imports System.Collections
Imports System.Collections.Generic
Imports System.Globalization
Module Example
Public Sub Main()
Dim strings() As String = { "coop", "co-op", "cooperative",
"co" + ChrW(&h00AD) + "operative",
"cœur", "coeur" }
' Perform a word sort using the current (en-US) culture.
Dim current(strings.Length - 1) As String
strings.CopyTo(current, 0)
Array.Sort(current, StringComparer.CurrentCulture)
' Perform a word sort using the invariant culture.
Dim invariant(strings.Length - 1) As String
strings.CopyTo(invariant, 0)
Array.Sort(invariant, StringComparer.InvariantCulture)
' Perform an ordinal sort.
Dim ordinal(strings.Length - 1) As String
strings.CopyTo(ordinal, 0)
Array.Sort(ordinal, StringComparer.Ordinal)
' Perform a string sort using the current culture.
Dim stringSort(strings.Length - 1) As String
strings.CopyTo(stringSort, 0)
Array.Sort(stringSort, new SCompare())
' Display array values
Console.WriteLine("{0,13} {1,13} {2,15} {3,13} {4,13}",
"Original", "Word Sort", "Invariant Word",
"Ordinal Sort", "String Sort")
Console.WriteLine()
For ctr As Integer = 0 To strings.Length - 1
Console.WriteLine("{0,13} {1,13} {2,15} {3,13} {4,13}",
strings(ctr), current(ctr), invariant(ctr),
ordinal(ctr), stringSort(ctr))
Next
End Sub
End Module
' IComparer<String> implementation to perform string sort.
Friend Class SCompare : Implements IComparer(Of String)
Public Function Compare(x As String, y As String) As Integer _
Implements IComparer(Of String).Compare
Return CultureInfo.CurrentCulture.CompareInfo.Compare(x, y, CompareOptions.StringSort)
End Function
End Class
' The example displays the following output:
' Original Word Sort Invariant Word Ordinal Sort String Sort
'
' coop cœur cœur co-op co-op
' co-op coeur coeur coeur cœur
' cooperative coop coop coop coeur
' cooperative co-op co-op cooperative coop
' cœur cooperative cooperative cooperative cooperative
' coeur cooperative cooperative cœur cooperative
Tip
Internally, the.NET Framework uses sort keys to support culturallysensitive string comparison. Each character in a string is given several categories of sort weights, including alphabetic, case, and diacritic. A sort key, represented by the SortKey class, provides a repository of these weights for a particular string. If your app performs a large number of searching or sorting operations on the same set of strings, you can improve its performance by generating and storing sort keys for all the strings that it uses. When a sort or comparison operation is required, you use the sort keys instead of the strings. For more information, see the SortKey class.
If you don't specify a string comparison convention, sorting methods such as System.Array.Sort(Array) perform a culture-sensitive, case-sensitive sort on strings. The following example illustrates how changing the current culture affects the order of sorted strings in an array. It creates an array of three strings. First, it sets the System.Threading.Thread.CurrentThread.CurrentCulture
property to en-US and calls the System.Array.Sort(Array) method. The resulting sort order is based on sorting conventions for the English (United States) culture. Next, the example sets the System.Threading.Thread.CurrentThread.CurrentCulture
property to da-DK and calls the System.Array.Sort method again. Notice how the resulting sort order differs from the en-US results because it uses the sorting conventions for Danish (Denmark).
using System;
using System.Globalization;
using System.Threading;
public class ArraySort
{
public static void Main(String[] args)
{
// Create and initialize a new array to store the strings.
string[] stringArray = { "Apple", "Æble", "Zebra"};
// Display the values of the array.
Console.WriteLine( "The original string array:");
PrintIndexAndValues(stringArray);
// Set the CurrentCulture to "en-US".
Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");
// Sort the values of the array.
Array.Sort(stringArray);
// Display the values of the array.
Console.WriteLine("After sorting for the culture \"en-US\":");
PrintIndexAndValues(stringArray);
// Set the CurrentCulture to "da-DK".
Thread.CurrentThread.CurrentCulture = new CultureInfo("da-DK");
// Sort the values of the Array.
Array.Sort(stringArray);
// Display the values of the array.
Console.WriteLine("After sorting for the culture \"da-DK\":");
PrintIndexAndValues(stringArray);
}
public static void PrintIndexAndValues(string[] myArray)
{
for (int i = myArray.GetLowerBound(0); i <=
myArray.GetUpperBound(0); i++ )
Console.WriteLine("[{0}]: {1}", i, myArray[i]);
Console.WriteLine();
}
}
// The example displays the following output:
// The original string array:
// [0]: Apple
// [1]: Æble
// [2]: Zebra
//
// After sorting for the "en-US" culture:
// [0]: Æble
// [1]: Apple
// [2]: Zebra
//
// After sorting for the culture "da-DK":
// [0]: Apple
// [1]: Zebra
// [2]: Æble
Imports System.Globalization
Imports System.IO
Imports System.Threading
Public Class TextToFile
Public Shared Sub Main()
' Creates and initializes a new array to store
' these date/time objects.
Dim stringArray() As String = { "Apple", "Æble", "Zebra"}
' Displays the values of the array.
Console.WriteLine("The original string array:")
PrintIndexAndValues(stringArray)
' Set the CurrentCulture to "en-US".
Thread.CurrentThread.CurrentCulture = New CultureInfo("en-US")
' Sort the values of the Array.
Array.Sort(stringArray)
' Display the values of the array.
Console.WriteLine("After sorting for the ""en-US"" culture:")
PrintIndexAndValues(stringArray)
' Set the CurrentCulture to "da-DK".
Thread.CurrentThread.CurrentCulture = New CultureInfo("da-DK")
' Sort the values of the Array.
Array.Sort(stringArray)
' Displays the values of the Array.
Console.WriteLine("After sorting for the culture ""da-DK"":")
PrintIndexAndValues(stringArray)
End Sub
Public Shared Sub PrintIndexAndValues(myArray() As String)
For i As Integer = myArray.GetLowerBound(0) To myArray.GetUpperBound(0)
Console.WriteLine("[{0}]: {1}", i, myArray(i))
Next
Console.WriteLine()
End Sub
End Class
' The example displays the following output:
' The original string array:
' [0]: Apple
' [1]: Æble
' [2]: Zebra
'
' After sorting for the "en-US" culture:
' [0]: Æble
' [1]: Apple
' [2]: Zebra
'
' After sorting for the culture "da-DK":
' [0]: Apple
' [1]: Zebra
' [2]: Æble
Warning
If your primary purpose in comparing strings is to determine whether they are equal, you should call the System.String.Equals method. Typically, you should use Equals to perform an ordinal comparison. The System.String.Compare method is intended primarily to sort strings.
String search methods, such as System.String.StartsWith and System.String.IndexOf, also can perform culture-sensitive or ordinal string comparisons. The following example illustrates the differences between ordinal and culture-sensitive comparisons using the IndexOf method. A culture-sensitive search in which the current culture is English (United States) considers the substring "oe" to match the ligature "œ". Because a soft hyphen (U+00AD) is a zero-width character, the search treats the soft hyphen as equivalent to Empty and finds a match at the beginning of the string. An ordinal search, on the other hand, does not find a match in either case.
using namespace System;
void FindInString(String^ s, String^ substring, StringComparison options);
void main()
{
// Search for "oe" and "�u" in "�ufs" and "oeufs".
String^ s1 = L"�ufs";
String^ s2 = L"oeufs";
FindInString(s1, "oe", StringComparison::CurrentCulture);
FindInString(s1, "oe", StringComparison::Ordinal);
FindInString(s2, "�u", StringComparison::CurrentCulture);
FindInString(s2, "�u", StringComparison::Ordinal);
Console::WriteLine();
String^ s3 = L"co\x00ADoperative";
FindInString(s3, L"\x00AD", StringComparison::CurrentCulture);
FindInString(s3, L"\x00AD", StringComparison::Ordinal);
}
void FindInString(String^ s, String^ substring, StringComparison options)
{
int result = s->IndexOf(substring, options);
if (result != -1)
Console::WriteLine("'{0}' found in {1} at position {2}",
substring, s, result);
else
Console::WriteLine("'{0}' not found in {1}",
substring, s);
}
// The example displays the following output:
// 'oe' found in oufs at position 0
// 'oe' not found in oufs
// 'ou' found in oeufs at position 0
// 'ou' not found in oeufs
//
// '-' found in co-operative at position 0
// '-' found in co-operative at position 2
using System;
public class Example
{
public static void Main()
{
// Search for "oe" and "œu" in "œufs" and "oeufs".
string s1 = "œufs";
string s2 = "oeufs";
FindInString(s1, "oe", StringComparison.CurrentCulture);
FindInString(s1, "oe", StringComparison.Ordinal);
FindInString(s2, "œu", StringComparison.CurrentCulture);
FindInString(s2, "œu", StringComparison.Ordinal);
Console.WriteLine();
string s3 = "co\u00ADoperative";
FindInString(s3, "\u00AD", StringComparison.CurrentCulture);
FindInString(s3, "\u00AD", StringComparison.Ordinal);
}
private static void FindInString(string s, string substring, StringComparison options)
{
int result = s.IndexOf(substring, options);
if (result != -1)
Console.WriteLine("'{0}' found in {1} at position {2}",
substring, s, result);
else
Console.WriteLine("'{0}' not found in {1}",
substring, s);
}
}
// The example displays the following output:
// 'oe' found in œufs at position 0
// 'oe' not found in œufs
// 'œu' found in oeufs at position 0
// 'œu' not found in oeufs
//
// '' found in cooperative at position 0
// '' found in cooperative at position 2
Module Example
Public Sub Main()
' Search for "oe" and "œu" in "œufs" and "oeufs".
Dim s1 As String = "œufs"
Dim s2 As String = "oeufs"
FindInString(s1, "oe", StringComparison.CurrentCulture)
FindInString(s1, "oe", StringComparison.Ordinal)
FindInString(s2, "œu", StringComparison.CurrentCulture)
FindInString(s2, "œu", StringComparison.Ordinal)
Console.WriteLine()
Dim softHyphen As String = ChrW(&h00AD)
Dim s3 As String = "co" + softHyphen + "operative"
FindInString(s3, softHyphen, StringComparison.CurrentCulture)
FindInString(s3, softHyphen, StringComparison.Ordinal)
End Sub
Private Sub FindInString(s As String, substring As String,
options As StringComparison)
Dim result As Integer = s.IndexOf(substring, options)
If result <> -1
Console.WriteLine("'{0}' found in {1} at position {2}",
substring, s, result)
Else
Console.WriteLine("'{0}' not found in {1}",
substring, s)
End If
End Sub
End Module
' The example displays the following output:
' 'oe' found in œufs at position 0
' 'oe' not found in œufs
' 'œu' found in oeufs at position 0
' 'œu' not found in oeufs
'
' '' found in cooperative at position 0
' '' found in cooperative at position 2
Searching Strings
String search methods, such as System.String.StartsWith and System.String.IndexOf, also can perform culture-sensitive or ordinal string comparisons to determine whether a character or substring is found in a specified string.
The search methods in the String class that search for an individual character, such as the IndexOf method, or one of a set of characters, such as the IndexOfAny method, all perform an ordinal search. To perform a culture-sensitive search for a character, you must call a CompareInfo method such as System.Globalization.CompareInfo.IndexOf(String, Char) or System.Globalization.CompareInfo.LastIndexOf(String, Char). Note that the results of searching for a character using ordinal and culture-sensitive comparison can be very different. For example, a search for a precomposed Unicode character such as the ligature "Æ" (U+00C6) might match any occurrence of its components in the correct sequence, such as "AE" (U+041U+0045), depending on the culture. The following example illustrates the difference between the System.String.IndexOf(Char) and System.Globalization.CompareInfo.IndexOf(String, Char) methods when searching for an individual character. The ligature "æ" (U+00E6) is found in the string "aerial" when using the conventions of the en-US culture, but not when using the conventions of the da-DK culture or when performing an ordinal comparison.
using System;
using System.Globalization;
public class Example
{
public static void Main()
{
String[] cultureNames = { "da-DK", "en-US" };
CompareInfo ci;
String str = "aerial";
Char ch = 'æ'; // U+00E6
Console.Write("Ordinal comparison -- ");
Console.WriteLine("Position of '{0}' in {1}: {2}", ch, str,
str.IndexOf(ch));
foreach (var cultureName in cultureNames) {
ci = CultureInfo.CreateSpecificCulture(cultureName).CompareInfo;
Console.Write("{0} cultural comparison -- ", cultureName);
Console.WriteLine("Position of '{0}' in {1}: {2}", ch, str,
ci.IndexOf(str, ch));
}
}
}
// The example displays the following output:
// Ordinal comparison -- Position of 'æ' in aerial: -1
// da-DK cultural comparison -- Position of 'æ' in aerial: -1
// en-US cultural comparison -- Position of 'æ' in aerial: 0
Imports System.Globalization
Module Example
Public Sub Main()
Dim cultureNames() As String = { "da-DK", "en-US" }
Dim ci As CompareInfo
Dim str As String = "aerial"
Dim ch As Char = "æ"c ' U+00E6
Console.Write("Ordinal comparison -- ")
Console.WriteLine("Position of '{0}' in {1}: {2}", ch, str,
str.IndexOf(ch))
For Each cultureName In cultureNames
ci = CultureInfo.CreateSpecificCulture(cultureName).CompareInfo
Console.Write("{0} cultural comparison -- ", cultureName)
Console.WriteLine("Position of '{0}' in {1}: {2}", ch, str,
ci.IndexOf(str, ch))
Next
End Sub
End Module
' The example displays the following output:
' Ordinal comparison -- Position of 'æ' in aerial: -1
' da-DK cultural comparison -- Position of 'æ' in aerial: -1
' en-US cultural comparison -- Position of 'æ' in aerial: 0
On the other hand, String class methods that search for a string rather than a character perform a culture-sensitive search if search options are not explicitly specified by a parameter of type StringComparison. The sole exception is Contains, which performs an ordinal search.
Testing for equality
Use the System.String.Compare method to determine the relationship of two strings in the sort order. Typically, this is a culture-sensitive operation. In contrast, call the System.String.Equals method to test for equality. Because the test for equality usually compares user input with some known string, such as a valid user name, a password, or a file system path, it is typically an ordinal operation.
Warning
It is possible to test for equality by calling the System.String.Compare method and determining whether the return value is zero. However, this practice is not recommended. To determine whether two strings are equal, you should call one of the overloads of the System.String.Equals method. The preferred overload to call is either the instance Equals(String, StringComparison) method or the static Equals(String, String, StringComparison) method, because both methods include a System.StringComparison parameter that explicitly specifies the type of comparison.
The following example illustrates the danger of performing a culture-sensitive comparison for equality when an ordinal one should be used instead. In this case, the intent of the code is to prohibit file system access from URLs that begin with "FILE://" or "file://" by performing a case-insensitive comparison of the beginning of a URL with the string "FILE://". However, if a culture-sensitive comparison is performed using the Turkish (Turkey) culture on a URL that begins with "file://", the comparison for equality fails, because the Turkish uppercase equivalent of the lowercase "i" is "İ" instead of "I". As a result, file system access is inadvertently permitted. On the other hand, if an ordinal comparison is performed, the comparison for equality succeeds, and file system access is denied.
using namespace System;
using namespace System::Globalization;
using namespace System::Threading;
bool TestForEquality(String^ str, StringComparison cmp);
void main()
{
Thread::CurrentThread->CurrentCulture = CultureInfo::CreateSpecificCulture("tr-TR");
String^ filePath = "file://c:/notes.txt";
Console::WriteLine("Culture-sensitive test for equality:");
if (! TestForEquality(filePath, StringComparison::CurrentCultureIgnoreCase))
Console::WriteLine("Access to {0} is allowed.", filePath);
else
Console::WriteLine("Access to {0} is not allowed.", filePath);
Console::WriteLine("\nOrdinal test for equality:");
if (! TestForEquality(filePath, StringComparison::OrdinalIgnoreCase))
Console::WriteLine("Access to {0} is allowed.", filePath);
else
Console::WriteLine("Access to {0} is not allowed.", filePath);
}
bool TestForEquality(String^ str, StringComparison cmp)
{
int position = str->IndexOf("://");
if (position < 0) return false;
String^ substring = str->Substring(0, position);
return substring->Equals("FILE", cmp);
}
// The example displays the following output:
// Culture-sensitive test for equality:
// Access to file://c:/notes.txt is allowed.
//
// Ordinal test for equality:
// Access to file://c:/notes.txt is not allowed.
using System;
using System.Globalization;
using System.Threading;
public class Example
{
public static void Main()
{
Thread.CurrentThread.CurrentCulture = CultureInfo.CreateSpecificCulture("tr-TR");
string filePath = "file://c:/notes.txt";
Console.WriteLine("Culture-sensitive test for equality:");
if (! TestForEquality(filePath, StringComparison.CurrentCultureIgnoreCase))
Console.WriteLine("Access to {0} is allowed.", filePath);
else
Console.WriteLine("Access to {0} is not allowed.", filePath);
Console.WriteLine("\nOrdinal test for equality:");
if (! TestForEquality(filePath, StringComparison.OrdinalIgnoreCase))
Console.WriteLine("Access to {0} is allowed.", filePath);
else
Console.WriteLine("Access to {0} is not allowed.", filePath);
}
private static bool TestForEquality(string str, StringComparison cmp)
{
int position = str.IndexOf("://");
if (position < 0) return false;
string substring = str.Substring(0, position);
return substring.Equals("FILE", cmp);
}
}
// The example displays the following output:
// Culture-sensitive test for equality:
// Access to file://c:/notes.txt is allowed.
//
// Ordinal test for equality:
// Access to file://c:/notes.txt is not allowed.
Imports System.Globalization
Imports System.Threading
Module Example
Public Sub Main()
Thread.CurrentThread.CurrentCulture = CultureInfo.CreateSpecificCulture("tr-TR")
Dim filePath As String = "file://c:/notes.txt"
Console.WriteLine("Culture-sensitive test for equality:")
If Not TestForEquality(filePath, StringComparison.CurrentCultureIgnoreCase) Then
Console.WriteLine("Access to {0} is allowed.", filePath)
Else
Console.WriteLine("Access to {0} is not allowed.", filePath)
End If
Console.WriteLine()
Console.WriteLine("Ordinal test for equality:")
If Not TestForEquality(filePath, StringComparison.OrdinalIgnoreCase) Then
Console.WriteLine("Access to {0} is allowed.", filePath)
Else
Console.WriteLine("Access to {0} is not allowed.", filePath)
End If
End Sub
Private Function TestForEquality(str As String, cmp As StringComparison) As Boolean
Dim position As Integer = str.IndexOf("://")
If position < 0 Then Return False
Dim substring As String = str.Substring(0, position)
Return substring.Equals("FILE", cmp)
End Function
End Module
' The example displays the following output:
' Culture-sensitive test for equality:
' Access to file://c:/notes.txt is allowed.
'
' Ordinal test for equality:
' Access to file://c:/notes.txt is not allowed.
Normalization
Some Unicode characters have multiple representations. For example, any of the following code points can represent the letter "ắ":
U+1EAF
U+0103 U+0301
U+0061 U+0306 U+0301
Multiple representations for a single character complicate searching, sorting, matching, and other string operations.
The Unicode standard defines a process called normalization that returns one binary representation of a Unicode character for any of its equivalent binary representations. Normalization can use several algorithms, called normalization forms, that follow different rules. The .NET Framework supports Unicode normalization forms C, D, KC, and KD. When strings have been normalized to the same normalization form, they can be compared by using ordinal comparison.
An ordinal comparison is a binary comparison of the Unicode scalar value of corresponding Char objects in each string. The String class includes a number of methods that can perform an ordinal comparison, including the following:
Any overload of the Compare, Equals, StartsWith, EndsWith, IndexOf, and LastIndexOf methods that includes a StringComparison parameter. The method performs an ordinal comparison if you supply a value of System.StringComparison or OrdinalIgnoreCase for this parameter.
The overloads of the CompareOrdinal method.
Methods that use ordinal comparison by default, such as Contains, Replace, and Split.
Methods that search for a Char value or for the elements in a Char array in a string instance. Such methods include IndexOf(Char) and Split(Char[]).
You can determine whether a string is normalized to normalization form C by calling the System.String.IsNormalized() method, or you can call the System.String.IsNormalized(NormalizationForm) method to determine whether a string is normalized to a specified normalization form. You can also call the System.String.Normalize() method to convert a string to normalization form C, or you can call the System.String.Normalize(NormalizationForm) method to convert a string to a specified normalization form. For step-by-step information about normalizing and comparing strings, see the Normalize() and Normalize(NormalizationForm) methods.
The following simple example illustrates string normalization. It defines the letter "ố" in three different ways in three different strings, and uses an ordinal comparison for equality to determine that each string differs from the other two strings. It then converts each string to the supported normalization forms, and again performs an ordinal comparison of each string in a specified normalization form. In each case, the second test for equality shows that the strings are equal.
using namespace System;
using namespace System::Globalization;
using namespace System::IO;
using namespace System::Text;
public ref class Example
{
private:
StreamWriter^ sw;
void TestForEquality(... array<String^>^ words)
{
for (int ctr = 0; ctr <= words->Length - 2; ctr++)
for (int ctr2 = ctr + 1; ctr2 <= words->Length - 1; ctr2++)
sw->WriteLine("{0} ({1}) = {2} ({3}): {4}",
words[ctr], ShowBytes(words[ctr]),
words[ctr2], ShowBytes(words[ctr2]),
words[ctr]->Equals(words[ctr2], StringComparison::Ordinal));
}
String^ ShowBytes(String^ str)
{
String^ result = nullptr;
for each (Char ch in str)
result += String::Format("{0} ", Convert::ToUInt16(ch).ToString("X4"));
return result->Trim();
}
array<String^>^ NormalizeStrings(NormalizationForm nf, ... array<String^>^ words)
{
for (int ctr = 0; ctr < words->Length; ctr++)
if (! words[ctr]->IsNormalized(nf))
words[ctr] = words[ctr]->Normalize(nf);
return words;
}
public:
void Execute()
{
sw = gcnew StreamWriter(".\\TestNorm1.txt");
// Define three versions of the same word.
String^ s1 = L"sống"; // create word with U+1ED1
String^ s2 = L"s\x00F4\x0301ng";
String^ s3 = L"so\x0302\x0301ng";
TestForEquality(s1, s2, s3);
sw->WriteLine();
// Normalize and compare strings using each normalization form.
for each (String^ formName in Enum::GetNames(NormalizationForm::typeid))
{
sw->WriteLine("Normalization {0}:\n", formName);
NormalizationForm nf = (NormalizationForm) Enum::Parse(NormalizationForm::typeid, formName);
array<String^>^ sn = NormalizeStrings(nf, s1, s2, s3 );
TestForEquality(sn);
sw->WriteLine("\n");
}
sw->Close();
}
};
void main()
{
Example^ ex = gcnew Example();
ex->Execute();
}
// The example produces the following output:
// The example displays the following output:
// sống (0073 1ED1 006E 0067) = sống (0073 00F4 0301 006E 0067): False
// sống (0073 1ED1 006E 0067) = sống (0073 006F 0302 0301 006E 0067): False
// sống (0073 00F4 0301 006E 0067) = sống (0073 006F 0302 0301 006E 0067): False
//
// Normalization FormC:
//
// sống (0073 1ED1 006E 0067) = sống (0073 1ED1 006E 0067): True
// sống (0073 1ED1 006E 0067) = sống (0073 1ED1 006E 0067): True
// sống (0073 1ED1 006E 0067) = sống (0073 1ED1 006E 0067): True
//
//
// Normalization FormD:
//
// sống (0073 006F 0302 0301 006E 0067) = sống (0073 006F 0302 0301 006E 0067): True
// sống (0073 006F 0302 0301 006E 0067) = sống (0073 006F 0302 0301 006E 0067): True
// sống (0073 006F 0302 0301 006E 0067) = sống (0073 006F 0302 0301 006E 0067): True
//
//
// Normalization FormKC:
//
// sống (0073 1ED1 006E 0067) = sống (0073 1ED1 006E 0067): True
// sống (0073 1ED1 006E 0067) = sống (0073 1ED1 006E 0067): True
// sống (0073 1ED1 006E 0067) = sống (0073 1ED1 006E 0067): True
//
//
// Normalization FormKD:
//
// sống (0073 006F 0302 0301 006E 0067) = sống (0073 006F 0302 0301 006E 0067): True
// sống (0073 006F 0302 0301 006E 0067) = sống (0073 006F 0302 0301 006E 0067): True
// sống (0073 006F 0302 0301 006E 0067) = sống (0073 006F 0302 0301 006E 0067): True
using System;
using System.Globalization;
using System.IO;
using System.Text;
public class Example
{
private static StreamWriter sw;
public static void Main()
{
sw = new StreamWriter(@".\TestNorm1.txt");
// Define three versions of the same word.
string s1 = "sống"; // create word with U+1ED1
string s2 = "s\u00F4\u0301ng";
string s3 = "so\u0302\u0301ng";
TestForEquality(s1, s2, s3);
sw.WriteLine();
// Normalize and compare strings using each normalization form.
foreach (string formName in Enum.GetNames(typeof(NormalizationForm)))
{
sw.WriteLine("Normalization {0}:\n", formName);
NormalizationForm nf = (NormalizationForm) Enum.Parse(typeof(NormalizationForm), formName);
string[] sn = NormalizeStrings(nf, s1, s2, s3);
TestForEquality(sn);
sw.WriteLine("\n");
}
sw.Close();
}
private static void TestForEquality(params string[] words)
{
for (int ctr = 0; ctr <= words.Length - 2; ctr++)
for (int ctr2 = ctr + 1; ctr2 <= words.Length - 1; ctr2++)
sw.WriteLine("{0} ({1}) = {2} ({3}): {4}",
words[ctr], ShowBytes(words[ctr]),
words[ctr2], ShowBytes(words[ctr2]),
words[ctr].Equals(words[ctr2], StringComparison.Ordinal));
}
private static string ShowBytes(string str)
{
string result = null;
foreach (var ch in str)
result += String.Format("{0} ", Convert.ToUInt16(ch).ToString("X4"));
return result.Trim();
}
private static string[] NormalizeStrings(NormalizationForm nf, params string[] words)
{
for (int ctr = 0; ctr < words.Length; ctr++)
if (! words[ctr].IsNormalized(nf))
words[ctr] = words[ctr].Normalize(nf);
return words;
}
}
// The example displays the following output:
// sống (0073 1ED1 006E 0067) = sống (0073 00F4 0301 006E 0067): False
// sống (0073 1ED1 006E 0067) = sống (0073 006F 0302 0301 006E 0067): False
// sống (0073 00F4 0301 006E 0067) = sống (0073 006F 0302 0301 006E 0067): False
//
// Normalization FormC:
//
// sống (0073 1ED1 006E 0067) = sống (0073 1ED1 006E 0067): True
// sống (0073 1ED1 006E 0067) = sống (0073 1ED1 006E 0067): True
// sống (0073 1ED1 006E 0067) = sống (0073 1ED1 006E 0067): True
//
//
// Normalization FormD:
//
// sống (0073 006F 0302 0301 006E 0067) = sống (0073 006F 0302 0301 006E 0067): True
// sống (0073 006F 0302 0301 006E 0067) = sống (0073 006F 0302 0301 006E 0067): True
// sống (0073 006F 0302 0301 006E 0067) = sống (0073 006F 0302 0301 006E 0067): True
//
//
// Normalization FormKC:
//
// sống (0073 1ED1 006E 0067) = sống (0073 1ED1 006E 0067): True
// sống (0073 1ED1 006E 0067) = sống (0073 1ED1 006E 0067): True
// sống (0073 1ED1 006E 0067) = sống (0073 1ED1 006E 0067): True
//
//
// Normalization FormKD:
//
// sống (0073 006F 0302 0301 006E 0067) = sống (0073 006F 0302 0301 006E 0067): True
// sống (0073 006F 0302 0301 006E 0067) = sống (0073 006F 0302 0301 006E 0067): True
// sống (0073 006F 0302 0301 006E 0067) = sống (0073 006F 0302 0301 006E 0067): True
Imports System.Globalization
Imports System.IO
Imports System.Text
Module Example
Private sw As StreamWriter
Public Sub Main()
sw = New StreamWriter(".\TestNorm1.txt")
' Define three versions of the same word.
Dim s1 As String = "sống" ' create word with U+1ED1
Dim s2 AS String = "s" + ChrW(&h00F4) + ChrW(&h0301) + "ng"
Dim s3 As String = "so" + ChrW(&h0302) + ChrW(&h0301) + "ng"
TestForEquality(s1, s2, s3)
sw.WriteLine()
' Normalize and compare strings using each normalization form.
For Each formName In [Enum].GetNames(GetType(NormalizationForm))
sw.WriteLine("Normalization {0}:", formName)
Dim nf As NormalizationForm = CType([Enum].Parse(GetType(NormalizationForm), formName),
NormalizationForm)
Dim sn() As String = NormalizeStrings(nf, s1, s2, s3)
TestForEquality(sn)
sw.WriteLine(vbCrLf)
Next
sw.Close()
End Sub
Private Sub TestForEquality(ParamArray words As String())
For ctr As Integer = 0 To words.Length - 2
For ctr2 As Integer = ctr + 1 To words.Length - 1
sw.WriteLine("{0} ({1}) = {2} ({3}): {4}",
words(ctr), ShowBytes(words(ctr)),
words(ctr2), ShowBytes(words(ctr2)),
words(ctr).Equals(words(ctr2), StringComparison.Ordinal))
Next
Next
End Sub
Private Function ShowBytes(str As String) As String
Dim result As String = Nothing
For Each ch In str
result += String.Format("{0} ", Convert.ToUInt16(ch).ToString("X4"))
Next
Return result.Trim()
End Function
Private Function NormalizeStrings(nf As NormalizationForm, ParamArray words() As String) As String()
For ctr As Integer = 0 To words.Length - 1
If Not words(ctr).IsNormalized(nf) Then
words(ctr) = words(ctr).Normalize(nf)
End If
Next
Return words
End Function
End Module
' The example displays the following output:
' sống (0073 1ED1 006E 0067) = sống (0073 00F4 0301 006E 0067): False
' sống (0073 1ED1 006E 0067) = sống (0073 006F 0302 0301 006E 0067): False
' sống (0073 00F4 0301 006E 0067) = sống (0073 006F 0302 0301 006E 0067): False
'
' Normalization FormC:
'
' sống (0073 1ED1 006E 0067) = sống (0073 1ED1 006E 0067): True
' sống (0073 1ED1 006E 0067) = sống (0073 1ED1 006E 0067): True
' sống (0073 1ED1 006E 0067) = sống (0073 1ED1 006E 0067): True
'
'
' Normalization FormD:
'
' sống (0073 006F 0302 0301 006E 0067) = sống (0073 006F 0302 0301 006E 0067): True
' sống (0073 006F 0302 0301 006E 0067) = sống (0073 006F 0302 0301 006E 0067): True
' sống (0073 006F 0302 0301 006E 0067) = sống (0073 006F 0302 0301 006E 0067): True
'
'
' Normalization FormKC:
'
' sống (0073 1ED1 006E 0067) = sống (0073 1ED1 006E 0067): True
' sống (0073 1ED1 006E 0067) = sống (0073 1ED1 006E 0067): True
' sống (0073 1ED1 006E 0067) = sống (0073 1ED1 006E 0067): True
'
'
' Normalization FormKD:
'
' sống (0073 006F 0302 0301 006E 0067) = sống (0073 006F 0302 0301 006E 0067): True
' sống (0073 006F 0302 0301 006E 0067) = sống (0073 006F 0302 0301 006E 0067): True
' sống (0073 006F 0302 0301 006E 0067) = sống (0073 006F 0302 0301 006E 0067): True
For more information about normalization and normalization forms, see System.Text.NormalizationForm, as well as Unicode Standard Annex #15: Unicode Normalization Forms and the Normalization FAQ on the unicode.org website.
String operations by category
The String class provides members for comparing strings, testing strings for equality, finding characters or substrings in a string, modifying a string, extracting substrings from a string, combining strings, formatting values, copying a string, and normalizing a string.
Comparing strings
You can compare strings to determine their relative position in the sort order by using the following String methods:
Compare returns an integer that indicates the relationship of one string to a second string in the sort order.
CompareOrdinal returns an integer that indicates the relationship of one string to a second string based on a comparison of their code points.
CompareTo returns an integer that indicates the relationship of the current string instance to a second string in the sort order. The CompareTo(String) method provides the IComparable and IComparable<T> implementations for the String class.
Testing strings for equality
You call the Equals method to determine whether two strings are equal. The instance Equals(String, String, StringComparison) and the static Equals(String, StringComparison) overloads let you specify whether the comparison is culture-sensitive or ordinal, and whether case is considered or ignored. Most tests for equality are ordinal, and comparisons for equality that determine access to a system resource (such as a file system object) should always be ordinal.
Finding characters in a string
The String class includes two kinds of search methods:
Methods that return a Boolean value to indicate whether a particular substring is present in a string instance. These include the Contains, EndsWith, and StartsWith methods.
Methods that indicate the starting position of a substring in a string instance. These include the IndexOf, IndexOfAny, LastIndexOf, and LastIndexOfAny methods.
Warning
If you want to search a string for a particular pattern rather than a specific substring, you should use regular expressions. For more information, see .NET Framework Regular Expressions.
Modifying a string
The String class includes the following methods that appear to modify the value of a string:
PadLeft inserts one or more occurrences of a specified character at the beginning of a string.
PadRight inserts one or more occurrences of a specified character at the beginning of a string.
Remove deletes a substring from the current String instance.
Replace replaces a substring with another substring in the current String instance.
ToLower and ToLowerInvariant convert all the characters in a string to lowercase.
ToUpper and ToUpperInvariant convert all the characters in a string to uppercase.
Trim removes all occurrences of a character from the beginning and end of a string.
TrimEnd removes all occurrences of a character from the end of a string.
TrimStart removes all occurrences of a character from the beginning of a string.
Important
All string modification methods return a new String object. They do not modify the value of the current instance.
Extracting substrings from a string
The System.String.Split method separates a single string into multiple strings. Overloads of the method allow you to specify multiple delimiters, to determine the maximum number of substrings that the method extracts, and to determine whether empty strings (which occur when delimiters are adjacent) are included among the returned strings.
Combining strings
The following String methods can be used for string concatenation:
Concat combines one or more substrings into a single string.
Join concatenates one or more substrings into a single element and adds a separator between each substring.
Formatting values
The System.String.Format method uses the composite formatting feature to replace one or more placeholders in a string with the string representation of some object or value. The Format method is often used to do the following:
To embed the string representation of a numeric value in a string.
To embed the string representation of a date and time value in a string.
To embed the string representation of an enumeration value in a string.
To embed the string representation of some object that supports the IFormattable interface in a string.
To right-justify or left-justify a substring in a field within a larger string.
For detailed information about formatting operations and examples, see the Format overload summary.
Copying a string
You can call the following String methods to make a copy of a string:
Copy creates a copy of an existing string.
CopyTo copies a portion of a string to a character array.
Normalizing a string
In Unicode, a single character can have multiple code points. Normalization converts these equivalent characters into the same binary representation. The System.String.Normalize method performs the normalization, and the System.String.IsNormalized method determines whether a string is normalized.
For more information and an example, see the Normalization section earlier in this topic.
Constructors
String(Char*) |
Initializes a new instance of the String class to the value indicated by a specified pointer to an array of Unicode characters. |
String(Char[]) |
Initializes a new instance of the String class to the value indicated by an array of Unicode characters. |
String(SByte*) |
Initializes a new instance of the String class to the value indicated by a pointer to an array of 8-bit signed integers. |
String(Char, Int32) |
Initializes a new instance of the String class to the value indicated by a specified Unicode character repeated a specified number of times. |
String(Char*, Int32, Int32) |
Initializes a new instance of the String class to the value indicated by a specified pointer to an array of Unicode characters, a starting character position within that array, and a length. |
String(Char[], Int32, Int32) |
Initializes a new instance of the String class to the value indicated by an array of Unicode characters, a starting character position within that array, and a length. |
String(SByte*, Int32, Int32) |
Initializes a new instance of the String class to the value indicated by a specified pointer to an array of 8-bit signed integers, a starting position within that array, and a length. |
String(SByte*, Int32, Int32, Encoding) |
Initializes a new instance of the String class to the value indicated by a specified pointer to an array of 8-bit signed integers, a starting position within that array, a length, and an Encoding object. |
Fields
Empty |
Represents the empty string. This field is read-only. |
Properties
Chars[Int32] |
Gets the Char object at a specified position in the current String object. |
Length |
Gets the number of characters in the current String object. |
Methods
Clone() |
Returns a reference to this instance of String. |
Compare(String, Int32, String, Int32, Int32, Boolean, CultureInfo) |
Compares substrings of two specified String objects, ignoring or honoring their case and using culture-specific information to influence the comparison, and returns an integer that indicates their relative position in the sort order. |
Compare(String, Int32, String, Int32, Int32, StringComparison) |
Compares substrings of two specified String objects using the specified rules, and returns an integer that indicates their relative position in the sort order. |
Compare(String, Int32, String, Int32, Int32, Boolean) |
Compares substrings of two specified String objects, ignoring or honoring their case, and returns an integer that indicates their relative position in the sort order. |
Compare(String, Int32, String, Int32, Int32) |
Compares substrings of two specified String objects and returns an integer that indicates their relative position in the sort order. |
Compare(String, String) |
Compares two specified String objects and returns an integer that indicates their relative position in the sort order. |
Compare(String, String, Boolean, CultureInfo) |
Compares two specified String objects, ignoring or honoring their case, and using culture-specific information to influence the comparison, and returns an integer that indicates their relative position in the sort order. |
Compare(String, String, StringComparison) |
Compares two specified String objects using the specified rules, and returns an integer that indicates their relative position in the sort order. |
Compare(String, String, Boolean) |
Compares two specified String objects, ignoring or honoring their case, and returns an integer that indicates their relative position in the sort order. |
Compare(String, Int32, String, Int32, Int32, CultureInfo, CompareOptions) |
Compares substrings of two specified String objects using the specified comparison options and culture-specific information to influence the comparison, and returns an integer that indicates the relationship of the two substrings to each other in the sort order. |
Compare(String, String, CultureInfo, CompareOptions) |
Compares two specified String objects using the specified comparison options and culture-specific information to influence the comparison, and returns an integer that indicates the relationship of the two strings to each other in the sort order. |
CompareOrdinal(String, Int32, String, Int32, Int32) |
Compares substrings of two specified String objects by evaluating the numeric values of the corresponding Char objects in each substring. |
CompareOrdinal(String, String) |
Compares two specified String objects by evaluating the numeric values of the corresponding Char objects in each string. |
CompareTo(Object) |
Compares this instance with a specified Object and indicates whether this instance precedes, follows, or appears in the same position in the sort order as the specified Object. |
CompareTo(String) |
Compares this instance with a specified String object and indicates whether this instance precedes, follows, or appears in the same position in the sort order as the specified string. |
Concat(String, String, String, String) |
Concatenates four specified instances of String. |
Concat(Object, Object, Object, Object) |
Concatenates the string representations of four specified objects and any objects specified in an optional variable length parameter list. |
Concat(Object, Object, Object) |
Concatenates the string representations of three specified objects. |
Concat(String, String) |
Concatenates two specified instances of String. |
Concat(String, String, String) |
Concatenates three specified instances of String. |
Concat(String[]) |
Concatenates the elements of a specified String array. |
Concat(Object[]) |
Concatenates the string representations of the elements in a specified Object array. |
Concat(Object) |
Creates the string representation of a specified object. |
Concat(IEnumerable<String>) |
Concatenates the members of a constructed IEnumerable<T> collection of type String. |
Concat(Object, Object) |
Concatenates the string representations of two specified objects. |
Concat<T>(IEnumerable<T>) |
Concatenates the members of an IEnumerable<T> implementation. |
Contains(String) |
Returns a value indicating whether a specified substring occurs within this string. |
Copy(String) |
Creates a new instance of String with the same value as a specified String. |
CopyTo(Int32, Char[], Int32, Int32) |
Copies a specified number of characters from a specified position in this instance to a specified position in an array of Unicode characters. |
EndsWith(String, Boolean, CultureInfo) |
Determines whether the end of this string instance matches the specified string when compared using the specified culture. |
EndsWith(String, StringComparison) |
Determines whether the end of this string instance matches the specified string when compared using the specified comparison option. |
EndsWith(String) |
Determines whether the end of this string instance matches the specified string. |
EndsWith(Char) | |
Equals(Object) |
Determines whether this instance and a specified object, which must also be a String object, have the same value. |
Equals(String) |
Determines whether this instance and another specified String object have the same value. |
Equals(String, String) |
Determines whether two specified String objects have the same value. |
Equals(String, StringComparison) |
Determines whether this string and a specified String object have the same value. A parameter specifies the culture, case, and sort rules used in the comparison. |
Equals(String, String, StringComparison) |
Determines whether two specified String objects have the same value. A parameter specifies the culture, case, and sort rules used in the comparison. |
Format(IFormatProvider, String, Object, Object, Object) |
Replaces the format items in a specified string with the string representation of three specified objects. An parameter supplies culture-specific formatting information. |
Format(String, Object, Object, Object) |
Replaces the format items in a specified string with the string representation of three specified objects. |
Format(IFormatProvider, String, Object, Object) |
Replaces the format items in a specified string with the string representation of two specified objects. A parameter supplies culture-specific formatting information. |
Format(String, Object, Object) |
Replaces the format items in a specified string with the string representation of two specified objects. |
Format(IFormatProvider, String, Object) |
Replaces the format item or items in a specified string with the string representation of the corresponding object. A parameter supplies culture-specific formatting information. |
Format(String, Object[]) |
Replaces the format item in a specified string with the string representation of a corresponding object in a specified array. |
Format(String, Object) |
Replaces one or more format items in a specified string with the string representation of a specified object. |
Format(IFormatProvider, String, Object[]) |
Replaces the format items in a specified string with the string representations of corresponding objects in a specified array. A parameter supplies culture-specific formatting information. |
GetEnumerator() |
Retrieves an object that can iterate through the individual characters in this string. |
GetHashCode() |
Returns the hash code for this string. |
GetHashCode(StringComparison) | |
GetTypeCode() | |
IndexOf(String, Int32, Int32) |
Reports the zero-based index of the first occurrence of the specified string in this instance. The search starts at a specified character position and examines a specified number of character positions. |
IndexOf(String, Int32, Int32, StringComparison) |
Reports the zero-based index of the first occurrence of the specified string in the current String object. Parameters specify the starting search position in the current string, the number of characters in the current string to search, and the type of search to use for the specified string. |
IndexOf(String, Int32, StringComparison) |
Reports the zero-based index of the first occurrence of the specified string in the current String object. Parameters specify the starting search position in the current string and the type of search to use for the specified string. |
IndexOf(Char, Int32, Int32) |
Reports the zero-based index of the first occurrence of the specified character in this instance. The search starts at a specified character position and examines a specified number of character positions. |
IndexOf(String) |
Reports the zero-based index of the first occurrence of the specified string in this instance. |
IndexOf(String, Int32) |
Reports the zero-based index of the first occurrence of the specified string in this instance. The search starts at a specified character position. |
IndexOf(Char, Int32) |
Reports the zero-based index of the first occurrence of the specified Unicode character in this string. The search starts at a specified character position. |
IndexOf(String, StringComparison) |
Reports the zero-based index of the first occurrence of the specified string in the current String object. A parameter specifies the type of search to use for the specified string. |
IndexOf(Char) |
Reports the zero-based index of the first occurrence of the specified Unicode character in this string. |
IndexOfAny(Char[]) |
Reports the zero-based index of the first occurrence in this instance of any character in a specified array of Unicode characters. |
IndexOfAny(Char[], Int32) |
Reports the zero-based index of the first occurrence in this instance of any character in a specified array of Unicode characters. The search starts at a specified character position. |
IndexOfAny(Char[], Int32, Int32) |
Reports the zero-based index of the first occurrence in this instance of any character in a specified array of Unicode characters. The search starts at a specified character position and examines a specified number of character positions. |
Insert(Int32, String) |
Returns a new string in which a specified string is inserted at a specified index position in this instance. |
Intern(String) |
Retrieves the system's reference to the specified String. |
IsInterned(String) |
Retrieves a reference to a specified String. |
IsNormalized() |
Indicates whether this string is in Unicode normalization form C. |
IsNormalized(NormalizationForm) |
Indicates whether this string is in the specified Unicode normalization form. |
IsNullOrEmpty(String) |
Indicates whether the specified string is |
IsNullOrWhiteSpace(String) |
Indicates whether a specified string is |
Join(String, String[], Int32, Int32) |
Concatenates the specified elements of a string array, using the specified separator between each element. |
Join(String, String[]) |
Concatenates all the elements of a string array, using the specified separator between each element. |
Join(String, Object[]) |
Concatenates the elements of an object array, using the specified separator between each element. |
Join(Char, String[], Int32, Int32) | |
Join(Char, String[]) | |
Join(Char, Object[]) | |
Join(String, IEnumerable<String>) |
Concatenates the members of a constructed IEnumerable<T> collection of type String, using the specified separator between each member. |
Join<T>(Char, IEnumerable<T>) | |
Join<T>(String, IEnumerable<T>) |
Concatenates the members of a collection, using the specified separator between each member. |
LastIndexOf(String, Int32, Int32, StringComparison) |
Reports the zero-based index position of the last occurrence of a specified string within this instance. The search starts at a specified character position and proceeds backward toward the beginning of the string for the specified number of character positions. A parameter specifies the type of comparison to perform when searching for the specified string. |
LastIndexOf(String, Int32, StringComparison) |
Reports the zero-based index of the last occurrence of a specified string within the current String object. The search starts at a specified character position and proceeds backward toward the beginning of the string. A parameter specifies the type of comparison to perform when searching for the specified string. |
LastIndexOf(Char, Int32, Int32) |
Reports the zero-based index position of the last occurrence of the specified Unicode character in a substring within this instance. The search starts at a specified character position and proceeds backward toward the beginning of the string for a specified number of character positions. |
LastIndexOf(String, StringComparison) |
Reports the zero-based index of the last occurrence of a specified string within the current String object. A parameter specifies the type of search to use for the specified string. |
LastIndexOf(String, Int32, Int32) |
Reports the zero-based index position of the last occurrence of a specified string within this instance. The search starts at a specified character position and proceeds backward toward the beginning of the string for a specified number of character positions. |
LastIndexOf(Char, Int32) |
Reports the zero-based index position of the last occurrence of a specified Unicode character within this instance. The search starts at a specified character position and proceeds backward toward the beginning of the string. |
LastIndexOf(String) |
Reports the zero-based index position of the last occurrence of a specified string within this instance. |
LastIndexOf(Char) |
Reports the zero-based index position of the last occurrence of a specified Unicode character within this instance. |
LastIndexOf(String, Int32) |
Reports the zero-based index position of the last occurrence of a specified string within this instance. The search starts at a specified character position and proceeds backward toward the beginning of the string. |
LastIndexOfAny(Char[]) |
Reports the zero-based index position of the last occurrence in this instance of one or more characters specified in a Unicode array. |
LastIndexOfAny(Char[], Int32) |
Reports the zero-based index position of the last occurrence in this instance of one or more characters specified in a Unicode array. The search starts at a specified character position and proceeds backward toward the beginning of the string. |
LastIndexOfAny(Char[], Int32, Int32) |
Reports the zero-based index position of the last occurrence in this instance of one or more characters specified in a Unicode array. The search starts at a specified character position and proceeds backward toward the beginning of the string for a specified number of character positions. |
Normalize(NormalizationForm) |
Returns a new string whose textual value is the same as this string, but whose binary representation is in the specified Unicode normalization form. |
Normalize() |
Returns a new string whose textual value is the same as this string, but whose binary representation is in Unicode normalization form C. |
PadLeft(Int32) |
Returns a new string that right-aligns the characters in this instance by padding them with spaces on the left, for a specified total length. |
PadLeft(Int32, Char) |
Returns a new string that right-aligns the characters in this instance by padding them on the left with a specified Unicode character, for a specified total length. |
PadRight(Int32) |
Returns a new string that left-aligns the characters in this string by padding them with spaces on the right, for a specified total length. |
PadRight(Int32, Char) |
Returns a new string that left-aligns the characters in this string by padding them on the right with a specified Unicode character, for a specified total length. |
Remove(Int32) |
Returns a new string in which all the characters in the current instance, beginning at a specified position and continuing through the last position, have been deleted. |
Remove(Int32, Int32) |
Returns a new string in which a specified number of characters in the current instance beginning at a specified position have been deleted. |
Replace(Char, Char) |
Returns a new string in which all occurrences of a specified Unicode character in this instance are replaced with another specified Unicode character. |
Replace(String, String) |
Returns a new string in which all occurrences of a specified string in the current instance are replaced with another specified string. |
Replace(String, String, StringComparison) | |
Replace(String, String, Boolean, CultureInfo) | |
Split(String[], Int32, StringSplitOptions) |
Splits a string into a maximum number of substrings based on the strings in an array. You can specify whether the substrings include empty array elements. |
Split(String, Int32, StringSplitOptions) | |
Split(Char[], Int32, StringSplitOptions) |
Splits a string into a maximum number of substrings based on the characters in an array. |
Split(Char, Int32, StringSplitOptions) | |
Split(String[], StringSplitOptions) |
Splits a string into substrings based on the strings in an array. You can specify whether the substrings include empty array elements. |
Split(Char, StringSplitOptions) | |
Split(Char[], StringSplitOptions) |
Splits a string into substrings based on the characters in an array. You can specify whether the substrings include empty array elements. |
Split(Char[], Int32) |
Splits a string into a maximum number of substrings based on the characters in an array. You also specify the maximum number of substrings to return. |
Split(String, StringSplitOptions) | |
Split(Char[]) |
Splits a string into substrings that are based on the characters in an array. |
StartsWith(String, Boolean, CultureInfo) |
Determines whether the beginning of this string instance matches the specified string when compared using the specified culture. |
StartsWith(String, StringComparison) |
Determines whether the beginning of this string instance matches the specified string when compared using the specified comparison option. |
StartsWith(String) |
Determines whether the beginning of this string instance matches the specified string. |
StartsWith(Char) | |
Substring(Int32) |
Retrieves a substring from this instance. The substring starts at a specified character position and continues to the end of the string. |
Substring(Int32, Int32) |
Retrieves a substring from this instance. The substring starts at a specified character position and has a specified length. |
ToCharArray(Int32, Int32) |
Copies the characters in a specified substring in this instance to a Unicode character array. |
ToCharArray() |
Copies the characters in this instance to a Unicode character array. |
ToLower() |
Returns a copy of this string converted to lowercase. |
ToLower(CultureInfo) |
Returns a copy of this string converted to lowercase, using the casing rules of the specified culture. |
ToLowerInvariant() |
Returns a copy of this String object converted to lowercase using the casing rules of the invariant culture. |
ToString() |
Returns this instance of String; no actual conversion is performed. |
ToString(IFormatProvider) |
Returns this instance of String; no actual conversion is performed. |
ToUpper() |
Returns a copy of this string converted to uppercase. |
ToUpper(CultureInfo) |
Returns a copy of this string converted to uppercase, using the casing rules of the specified culture. |
ToUpperInvariant() |
Returns a copy of this String object converted to uppercase using the casing rules of the invariant culture. |
Trim(Char) | |
Trim(Char[]) |
Removes all leading and trailing occurrences of a set of characters specified in an array from the current String object. |
Trim() |
Removes all leading and trailing white-space characters from the current String object. |
TrimEnd() | |
TrimEnd(Char) | |
TrimEnd(Char[]) |
Removes all trailing occurrences of a set of characters specified in an array from the current String object. |
TrimStart() | |
TrimStart(Char) | |
TrimStart(Char[]) |
Removes all leading occurrences of a set of characters specified in an array from the current String object. |
Operators
Equality(String, String) |
Determines whether two specified strings have the same value. |
Inequality(String, String) |
Determines whether two specified strings have different values. |
Explicit Interface Implementations
Extension Methods
Thread Safety
This type is thread safe.