Search This Blog

Friday, December 4, 2009

Character Set Encoding Issues.

Character Set encoding can become an issue in some scenarios. Migrating an App from one OS to another with different default character set encoding would need some examining.

Developing/Compiling on an OS with a different encoding than the one you are running your app on would be another.

This should be a consideration anytime you are communicating with an OS with a different character set encoding. Either if you are receiving files, or perhaps communicating via a message queue or just HTTP.

The default character set encoding on common OSs (Windows, AIX, Linux) may be different but they are mostly compatible and most likely this doesn't lead to issues.

This is a red flag for me. Anytime I see the potential for subtle and hard to recreate issues -- I like to be proactive in making sure they do not occur to begin with.

To specify the character set encoding in Java -- wrap the java.io.InputStream with java.io.InputStreamReader.

Charset cs = Charset.forName("ISO-8859-1");
InputStreamReader reader = new InputStreamReader(inputStream,cs);

e.g. if you are reading a file: -- inputStream would be:

InputStream inputStream = new FileInputStream(fileName);

In practice you would probably have the file encoding somehow configurable, so if the source system for your file changes, you could just make a config change and not have to make a coding change.

----------------------

If developing on one system and deploying on another with a different character set encoding -- you will need to learn up on Java's handling of character set encoding. There is easily found and good information on this on the internet. The point of this post is to simply alert you that this is/could be an issue.

My instinct would be to compile/build on the target system and specify the character set encoding of the source files to the javac compiler.

The issue isn't so much with the compiled code itself, but with String constants and the like.

No comments:

Post a Comment