Java FileWriter, XML and UTF-8

Oddly enough the java.io.FileWriter class doesn’t use UTF-8 by default. I’m not exactly sure what the default encoding is (possibly ISO-8859-1 or US-ASCII?) but it doesn’t seem to be UTF-8, which is odd given that java strings are supposed to be unicode. This causes a problem if you want to have non-ascii characters and you don’t realise what’s happening. This was a bug in SQLEditor and somebody accidentally typed an umlaut into one of the fields and the file wouldn’t reload. (Which was annoying).

The correct thing to do seems to be to use the following:

OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(path),"UTF-8");

Which ensures that you are using UTF-8.

I suppose that the motivation for this is that it means that simple use of FileWriter is compatible with applications that are not unicode aware and don’t support UTF-8. It probably makes sense at some level, but it just goes to show that you can’t assume anything. 🙂

Update: Bela’s comment (below) explains more about which character set you’ll actually get.

Comments

43 responses to “Java FileWriter, XML and UTF-8”

  1. Florian Avatar

    That’s exactly the line of code I needed. You are currently no. 2 in a google search for “FileWriter UTF-8” 😉

    The java input/ouput api sooo unintuitive – and when someone actually wrote an easy to use FileWriter class he forgot to implement a setEncoding(…).

  2. oh Avatar
    oh

    Thanks!

  3. Severine Avatar
    Severine

    And no. 1 with “java FileWriter UTF-8”
    Danke schön !

  4. laurent Avatar
    laurent

    Thank You !

    your answer is so accurate for my
    “FileWriter UTF8” google search !!!

    That’s exactly the line of code I needed too !

  5. Edge Avatar
    Edge

    Thank you!
    That’s what I need.

  6. sinka Avatar
    sinka

    Thank you!!Gracias!

  7. Nabil Avatar
    Nabil

    chukran! (thx in arabic)

  8. Shachar Avatar
    Shachar

    Toda (thx in Hebrew)

  9. simon Avatar
    simon

    4 years later and your code is still helping people. Many thanks my friend!!

  10. Damian Mora Avatar
    Damian Mora

    Excellent, just the code line I was looking for. Muchas Gracias. 🙂

  11. kann Avatar

    Thanks,nice work ^^

  12. Bela Avatar
    Bela

    Köszönöm (thx in Hungarian)

    I’ve read after this article the Java API carefully and there is the answer: (http://java.sun.com/javase/6/docs/api/java/io/FileWriter.html)

    “Convenience class for writing character files. The constructors of this class assume that the default character encoding and the default byte-buffer size are acceptable. To specify these values yourself, construct an OutputStreamWriter on a FileOutputStream. ”

    You get the default character encoding on your system:
    System.getProperty(file.encoding) => I have the cp1252

    So, never use FileWriter! It is everything, but convenient.

    1. Angus Hardie Avatar
      Angus Hardie

      A most illuminating explanation, thank you!

  13. Gowmukhi Avatar

    Awesome !!!

    Thanks

  14. Vijay Avatar
    Vijay

    Thank you! Dude.. those who are struggling with castor utf-8 conversion.. this is very helpful piece of code…

  15. Nitish Avatar
    Nitish

    Your answer give me the absolute answer of my question. I used same concept for utf-16 encoding, for my encryption -decryption project. I come up with success. But have still a problem while decryption it is saving file in such [] blocks everytime, but reading it write. I checked the utf-16 code it is reading. I would like to chat you about the problem any time you would like.

  16. Senny Avatar
    Senny

    I was using fileWriter and was facing some problems with the copyright symbol, due to which my xml contained invalid characters. Your line of code gave exactly what I was looking for….

  17. Vaibhav Avatar
    Vaibhav

    Thanks a lot!! Finally I got what i was looking for 🙂

  18. Vishal Avatar
    Vishal

    This post is really great!

  19. Simon Avatar
    Simon

    very nice! exactly what i’m looking for.

  20. Konstantin Petrukhnov Avatar
    Konstantin Petrukhnov

    first result from Google:
    “java xml output utf-8”

  21. Thiago Avatar
    Thiago

    Thanks!!!!!

  22. Sebastien Avatar
    Sebastien

    Thanks a lot for posting this… even many months later, it still helps some people! 🙂

  23. Marcelo Avatar
    Marcelo

    Thanks !! Gracias !!! its Nov-2009 and this code keeps helping people ! =)

  24. Paw Hermansen Avatar

    You’re still number one hit on google for “java filewriter for utf-8”. Your code is exactly what I need. Thank You.

  25. agustin Avatar
    agustin

    Just fine…

    tks

    From Chile.

  26. Iulian Avatar

    A big thank you from Romania !

  27. Menio Avatar
    Menio

    …and from the Netherlands too!

  28. Y Avatar
    Y

    Thank you a lot, Java sux in default…

  29. milan Avatar
    milan

    dakujem (in slovak) 🙂

  30. arny Avatar
    arny

    great & thanks much.
    just to add:
    looking at:
    http://java.sun.com/javase/6/docs/api/java/io/FileWriter.html
    made me add like this:
    —java:
    Writer out = new BufferedWriter( new OutputStreamWriter(new FileOutputStream(this.outputFilename),”UTF-8″));

    (I guess that’s what they call the “decorator pattern” in for example:
    http://oreilly.com/catalog/9780596007126
    )
    HTH

  31. Abraham Avatar
    Abraham

    Thanks man! I’m starting with JDom to creates XMLs and this post was what I looking for 😉 GBY

  32. Russ Avatar
    Russ

    Yes, now July 29, 2010 and this post is still a lifesaver! I didn’t suspect that this class was the source of my problems, now solved.

  33. Esteban Avatar
    Esteban

    thànks mán!!!!!!!!!

  34. Marco Avatar
    Marco

    Love you, man! It solved my problem! =D

  35. milkywayfarer Avatar
    milkywayfarer

    More over, today is 25th of December and post is still actual!
    Thx from cold Russia (:

  36. mohan verma Avatar
    mohan verma

    Thanks alot my friend!!!!
    I found what i am looking for!!!

  37. Ivan Avatar
    Ivan

    Thank you!!!!!!!!!!!!!!!!!!!!!!!!!!!)

  38. Jomo Frodo Avatar
    Jomo Frodo

    Beautiful – thanks!

  39. nick Avatar
    nick

    I found similar problem in March 2008 with reading UTF-8 encoded files in. I wrote it up here:

    http://footech.blogspot.com/search/label/UTF8

  40. EPO Avatar
    EPO

    Before
    new FileWriter( ….
    output
    wÃŒnscht

    After
    new OutputStreamWriter(…. ,”UTF-8″)
    output
    wĂĽnscht

    expected
    wünscht

    shit went in the second round ….

  41. Anh Avatar
    Anh

    Cảm ơn bạn!!!

  42. fereshteh Avatar
    fereshteh

    Sepaas (Thanks in Persian!) ^_^