When running a Java application (on any system) the JVM can be started with different character encodings. Typically it will pickup the system default based on the OS and locale system settings. For an excellent introduction to character encodings I can highly recommend Joel Spolsky’s article.
I recently encountered a problem with this on an AS400 where the Java application was trying to write a file out to the IFS file system. The filename in question contained Polish characters, or at least it was supposed to.
You can see the encoding in use by looking at the JVM properties against the job. Use WRKJOB to work with the job and take option 45 to Work with Java virtual machine. There are a couple of places you can then check.
- Option 2 will show the system environment variables that were used to initialize the JVM
- Option 7 will show the current Java system properties that are in use by the JVM
Searching online I saw many sources claiming that if you set the JVM argument file.encoding that this will set the default character encoding. On the AS400 this can be achieved by navigating to the SystemDefault.properties file and adding the line:
Note: SystemDefault.properties can either reside in /QIBM/UserData/Java400 or in the user home directory for the user starting the JVM.
This may work when it comes to writing out the contents of a file but it has no effect on the encoding used to read or write filenames.
To properly influence the encoding used to initialize the JVM you have to set environment variables for the locale. In the Linux/Unix world this can be done with LC_ALL environment variable which can be set to something like en_US.UTF-8.
The AS400 isn’t a *nix platform so LC_ALL does not apply and the JVM is an IBM platform specific implementation. By looking at the environment variables against option 45 in WRKJOB and trial and error I managed to find that setting the following two environment variables did the trick.
Using ADDENVVAR add the following two environment variables:
QIBM_PASE_CCSID = 1208
PASE_LANG = EN_US
You can try different combinations of CCSID and locale depending on your desired character encoding. I set this up so we would be using UTF-8.
A list of locales can be found here. Note that en_US is either ISO8859-1 or 8859-15 whereas EN_US is UTF-8.