So your company wants to move into international markets -- not just France or Italy, but you might need to support...
a language that reads right-to-left, or a language based not on letters but pictures, like Mandarin or Japanese. So let's talk about international applications, both what they are and how to test them for common problems.
Internationalization (sometimes abbreviated i18n, for "the letter i, eighteen letters, and the letter n") is the ability of your software to accept, display and otherwise process characters beyond the basic 26 on an English keyboard. Today we are focused on how to test for it.
When you think about testing those characters, think about three classic character sets. The "classic" US-ASCII character set contains only English numbers, letters and special characters. The extended character set adds French-Canadian and other characters. Finally, think about full Unicode, the ability to support non-western characters, including Hebrew (which reads right-to-left), Mandarin Chinese and other figure-based languages.
Testing for i18n appears to be deceptively simple. Just go to every major category of input, paste international characters in, save, and see results. You'll also want to make sure searches for special characters come back, and sort work "properly." By properly, I mean that words that contain the ñ, such as niña, come back in the correct order; alphabetically the ñ comes after n and before o.
Notice I wrote it appears to be simple. Beyond basic save/display, we’ll want to think about other touch points of the application. For example, if the actions of the user generate files on a disk, consider if the file names can be international, and if any other trading partner or consumer of that file can handle international text. Likewise, is it possible for users on a plain English system to try to edit, cut and paste international characters? What happens?
This integration problem exists with any third-party component. For example, if you use a special tool to index your database, or a component to generate files in PDF or Microsoft Word format, it's possible that tool does not support international characters.
When internationalization succeeds, it "just looks right." Likewise, internationalized text handled incorrectly "just looks wrong;" the term we use for this is Mojibake, and could be a series of question marks, strange characters or other confusing letters.
Localization (L10n) testing is another aspect of testing for regional differences. Instead of accepting and processing good input, localized software is software that can present information in the language of choice. The most common approach to localized software is to build a lookup table. Every row in the lookup table has a unique key -- Save, Open, Save As, Exit, all the common operations might be a row, with different languages as the columns. Localization is when there is the ability in the software to set the locale, either at the server level or the user level, and present text back in the right locale. To 'test' localization, flip the locale for a user to a language this is not English, and then try to drive the application. You should see all of your symbols translated into the new language.
I remember one assignment where we wanted to test for localization before we had any international clients. This meant we needed the capability, but had not yet funded a translator for any specific language. We just wanted to make sure the ability existed to sell to a foreign customer without any changes to the architecture; we wanted to know the architecture was in place and solid. So the technical team created a 'ZZ' locale; a meaningless language where all the letters were Z or z. To test localization for a feature, I just set the locale to ZZ (which was cheap), then looked for any text not in ZZzzz format. Those were bugs.
Tools and utilities
If you want to test with sample input you need, well, sample input. That can be hard to generate with an operating system running English and an English keyboard. Microsoft Windows comes with a built-in program called charmap.exe which will help solve this problem. To run charmap, click start->run, type charmap and hit return. That will bring up the Windows Character Map program.
Charmap.exe allows you to select a language, characters, and copy them to the clipboard. Once on the clipboard, you can Control-V paste those characters right into your application. You can see extended characters out of the box, or use the "Group by" dropdown to select Chinese, Hebrew and other complex character sets.
If you aren't using Windows, there is also a free, Web-based character mapping tool online that you can use.
Quick attacks for localization:
Once the framework is in place, we can test for common failures. A few of my favorite tests for localization include:
- Words that are substantially longer in a translation. Look in the translation table for words that have a different length than the English version, and browse to see those words. Depending on the user interface, the words may cut-off or wrap badly; those are errors.
- Error Messages. Because they are coded in a different place, developers often forget to localize error messages and pop-ups. To test this, set the locale for a user and run through the application with invalid input, forcing the errors and pop-ups to appear. Make sure they are translated.
- Images. Some images -- especially graphs -- can have text embedded in them. Make sure any such images are localized.
- ALT tags on images. Many Web-based applications have an ALT tag or text that appears on the image if you mouse over it. These should also be localized.
- Date Stamps. In the United States, the date format is typically MM/DD/YYYY, yet it in Europe it is commonly abbreviated DD-MON-YYYY, where MON is the three letter abbreviation for the month. Your application may need to translate these by locale, or allow a user to pick their preference.
- Time Stamps. If the time stamp on actions is based on the server, and some customers will use the application in different time zones, that might cause a problem. It turns out this idea, of letting customers set the time zone for the users, is an often overlooked “hidden requirement” of international applications.
Putting it all together
Here we've looked at two different kinds of features:
- The ability to handle international character sets and
- The ability to present localized versions of our software for different customers
We can test for the first by introducing international characters, looking for Mojibake, and the second by setting locales and looking for non-localized characters.
In the end, there's nothing quite like having someone who speaks a foreign language try to actually use your software. By having a good handle on internationalization and localization testing, you can make sure your application is easy-to-use and free from foreign language errors.