Ropardo Sowftware development company

Experience software development with ROPARDO S.R.L.

RSS Feed
RSS Feed
  • Home
  • About ROPARDO S.R.L
  • Our websites

Encoding in Python

Several issues appeared when dealing (or not) with enconding in Python scripting. As a basic information in Python there are two types of string: unicode and byte strings. In theory everything in Python is treating Unicode strings the same as byte strings, but you have to be carefull in your code and understand when you are dealing with Objects or with Strings.
Python by default declares ASCII as the standard encoding if no other encodings are declared.

Examples:

#!/usr/bin/python
# -*- coding: latin-1 -*-
...
#!/usr/bin/python
# -*- coding: iso-8859-15 -*-

There is the “PEP” abstract notion, stating that the complete Python source file should use a single encoding. “ Embedding of differently encoded data is not allowed and will result in a decoding error during compilation of the Python source code. “ http://www.python.org/dev/peps/

My following examples are about small but tricky(for that moment) blocks.

Case 1:

If you want to write German characters(ö, Ä, é, ß, etc.) in the python script you implement then you need to declare on the first line of the .py file:

# -*- coding: ISO-8859-1 -*-

This represents the ‘latin-1′ encoding. This allows you to write special characters that can be recognized when compiled by the Python Interpretor.

Case 2:

If you want to read files that have German characters in its contents:

import codecs
fileObj = codecs.open( "filename", "r", encoding="utf-8" )
fileContents = fileObj.read()

When you read a line from the file you get bytes, not characters. Text files always contain encoded text, not characters. Each character in the text is encoded as one or more bytes in the file. The most used encodings: UTF-8, ISO-8859-1 are supersets of ASCII. This means that the first 128 characters have the usual meaning(ASCII), and that the rest of characters are used from the special encodings.

Case 3:

If this error occures:

“ UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\xa1′ in position 0: ordinal not in range(128)”

then Python has a problem with the way you handled the encoding part. My best and first sollution for this error was using django’s smart_str() function. About the existence of this function I found out from here: http://www.b-list.org/weblog/2007/nov/10/unicode/

from django.utils.encoding import smart_str, smart_unicode
...
text = u'\xa1'
print smart_str(text)

Result: i

This is a very basic example, in my case the problem came when reading from a database table a column that contained German characters in its contents. When I wanted to display the text, my characters were not the correct German ones. This matter was fixed by simply using

smart_str(text)

Enconded/Decoded:
To decode a string, use the decode() method:

typeOfEncoding = "iso-8859-1"
fileContents = file.readline()
text = fileContents.decode(typeOfEncoding)

The encode() method converts from Unicode to an encoded string.

text = txt.encode(typeOfEncoding)

If the string contains characters that cannot be represented in the given encoding, Python raises an exception. You can change this by passing in a second argument to encode:

output = text.encode(typeOfEncoding, "replace")

Encoding in Python can be a bit unclear at a first glance, but knowing these basic ideas of implementing a script taking in account the encoding type, the appearance of these issues can be minimized.

  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
Get Shareaholic
Tags: ASCII decode Encoding PEP Python unicode UTF-8

 Posted in: Web Development
August 21, 2009 | Gabriela Radu | No Comments

Leave a Reply

 


  • « Previous post
  • Next post »
  • Recent Posts

    • Installing PyGraphviz on Windows
    • Convert python object to XML representation
    • Liferay Portlet Development
    • Norway Road Show 2011 private meeting invitation
    • Oracle OpenWorld 2011
  • Ropardo is Hiring

  • Subscribe

    • Add to Google Reader or Homepage Add to netvibes TopOfBlogs
  • Recent Comments

    • Rajkumar Pomaji on Bluetooth PC Remote Control
    • Stelian Morariu on GWT 2.1 – Uploading a file using the RPC mechanism
    • Sergio on GWT 2.1 – Uploading a file using the RPC mechanism
    • Artem on Liferay: Deployment will start in a few seconds… and how to realy start
    • rkd80 on GWT 2.1 – Uploading a file using the RPC mechanism
  • Archives

    • November 2011 (1)
    • September 2011 (4)
    • July 2011 (3)
    • June 2011 (2)
    • May 2011 (4)
    • April 2011 (4)
    • March 2011 (3)
    • February 2011 (2)
    • January 2011 (2)
    • December 2010 (1)
    • November 2010 (4)
    • October 2010 (4)
    • August 2010 (3)
    • July 2010 (3)
    • June 2010 (6)
    • May 2010 (8)
    • April 2010 (7)
    • March 2010 (9)
    • February 2010 (6)
    • January 2010 (5)
    • December 2009 (7)
    • November 2009 (9)
    • October 2009 (10)
    • September 2009 (14)
    • August 2009 (10)
    • July 2009 (1)
    • June 2009 (1)
    • May 2009 (1)
    • April 2009 (1)
    • March 2009 (1)
    • October 2008 (3)
    • October 2007 (3)
    • July 2007 (4)
    • June 2007 (1)
    • May 2007 (3)
  • Meta

    • Log in
    • Entries RSS
    • Comments RSS
    • WordPress.org
  • Categories

    • News (15)
    • Ropardo Team (8)
    • Ropardo Products (6)
      • File Tracking Client (4)
      • iManagement (2)
    • Software Development (83)
      • Microsoft.NET (22)
      • Java (40)
      • Oracle (8)
      • Power Builder (3)
      • Liferay (5)
      • Lotus Notes (9)
      • xWiki (4)
    • System Adminstration (13)
      • Linux (10)
      • Windows (3)
    • Programming (1)
    • Uncategorized (3)
    • Databases (10)
      • MSSQL (5)
      • PostgreeSQL (3)
    • Microsoft.NET (1)
    • Web Development (28)
      • ASP/ASPX (3)
      • Content Management Systems (1)
      • HTML/CSS (5)
      • Javascrip/AJAX (8)
      • PHP (7)
    • Oracle E Business Suite (6)
  • Tags

    .NET ajax blog C# certification client CMS control css database Debugging django Domino Eclipse extension file tracking filter fun gentoo google Hibernate how to html image iManagement import Java javascript jQuery liferay Linux Lotus Notes lotus script Oracle Oracle BI Publisher 11g PHP portal PostgreSQL powerbuilder Python SQL Telerik velocity xml Xwiki

© 2010 ROPARDO s.r.l..

Powered by WordPress. Styled by Ropardo