Core Java: How to sort strings in a language-specific order

Share

When sorting a list of strings in alphabetical order you shouldn’t use the comparison methods of the String class (especially when writing internationalized programs). The String.compareTo method performs a binary comparison of the Unicode characters within the two strings, which is ineffective when sorting in most languages, because the Unicode values do not correspond to the relative order of the characters. Instead, you should use the java.text.Collator class to sort strings in language-specific order.

Let’s assume we have to sort the following list of German words:

über,
zahlen,
können,
kreativ,
Äther,
Österreich

Using the String.compareTo method (see code below), this will result in a “sorted” list of:

kreativ,
können,
zahlen,
Äther,
Österreich,
über

According to the collation rules of the German language, the preceding list is obviously in a wrong order. In German, the proper sorting is as follows:

Äther,
können,
kreativ,
Österreich,
über,
zahlen

Here’s a quick example that demonstrates the differences:

import java.text.Collator;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import java.util.Locale;

public class CollatorDemo {

	public static void main(String[] args) {

		List words = Arrays.asList("über", "zahlen", "können",
				"kreativ", "Äther", "Österreich");

		//
		// this won't sort the words properly, in German language
		//
		Collections.sort(words);

		System.out.println("wrong sorting: " + words);

		//
		// Define a collator for German language
		//
		Collator collator = Collator.getInstance(Locale.GERMAN);

		//
		// Sort the list using Collator
		//
		Collections.sort(words, collator);

		System.out.println("proper sorting with Collator: " + words);

	}
}

In conclusion, the Collator class takes into account language-sensitive sorting issues and doesn’t just try to sort words based upon their Unicode character values. You can also set the Collator’s strength property to determine the level of difference considered significant in comparisons. Four strengths are provided: PRIMARY, SECONDARY, TERTIARY, and IDENTICAL. What actually happens with each is dependent on the locale. Typically what happens is as follows: IDENTICAL strength means that the characters must be identical for them to be treated the same, TERTIARY typically is for ignoring case differences, SECONDARY is for ignoring diacritical marks, PRIMARY is like IDENTICAL for base letter differences, but has some differences when handling control characters and accents. See the Collator javadoc for more information.

Finally, there’s another very important peculiarity of what does Cialis that brings it so high above its alternatives. It is the only med that is available in two versions – one intended for use on as-needed basis and one intended for daily use. As you might know, Viagra and Levitra only come in the latter of these two forms and should be consumed shortly before expected sexual activity to ensure best effect. Daily Cialis, in its turn, contains low doses of Tadalafil, which allows to build its concentration up in your system gradually over time and maintain it on acceptable levels, which, consequently, makes it possible for you to enjoy sex at any moment without having to time it.

3 thoughts on “Core Java: How to sort strings in a language-specific order”
  • In my opinion you commit an error.

    July 8, 2010 at 9:42 am
  • burn dvd says:

    I can not take part now in discussion – it is very occupied. But I will soon necessarily write that I think.

    August 2, 2010 at 11:41 pm
  • software says:

    It goes beyond all limits.

    September 15, 2010 at 8:20 pm

Comments are closed.

By continuing to use the site, you agree to the use of cookies. More information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close