Several issues appeared when dealing (or not) with enconding in Python scripting. As a basic information in Python there are two types of string: unicode and byte strings. In theory everything in Python is treating Unicode strings the same as byte strings, but you have to be carefull in your code and understand when you are dealing with Objects or with Strings.
Python by default declares ASCII as the standard encoding if no other encodings are declared.
Examples:
#!/usr/bin/python # -*- coding: latin-1 -*- ... #!/usr/bin/python # -*- coding: iso-8859-15 -*- |
There is the “PEP” abstract notion, stating that the complete Python source file should use a single encoding. “ Embedding of differently encoded data is not allowed and will result in a decoding error during compilation of the Python source code. “ http://www.python.org/dev/peps/
My following examples are about small but tricky(for that moment) blocks.
Case 1:
If you want to write German characters(ö, Ä, é, ß, etc.) in the python script you implement then you need to declare on the first line of the .py file:
# -*- coding: ISO-8859-1 -*- |
This represents the ‘latin-1’ encoding. This allows you to write special characters that can be recognized when compiled by the Python Interpretor.
Case 2:
If you want to read files that have German characters in its contents:
import codecs fileObj = codecs.open( "filename", "r", encoding="utf-8" ) fileContents = fileObj.read() |
When you read a line from the file you get bytes, not characters. Text files always contain encoded text, not characters. Each character in the text is encoded as one or more bytes in the file. The most used encodings: UTF-8, ISO-8859-1 are supersets of ASCII. This means that the first 128 characters have the usual meaning(ASCII), and that the rest of characters are used from the special encodings.
Case 3:
If this error occures:
“ UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\xa1′ in position 0: ordinal not in range(128)”
then Python has a problem with the way you handled the encoding part. My best and first sollution for this error was using django’s smart_str() function. About the existence of this function I found out from here: http://www.b-list.org/weblog/2007/nov/10/unicode/
from django.utils.encoding import smart_str, smart_unicode ... text = u'\xa1' print smart_str(text) |
Result: i
This is a very basic example, in my case the problem came when reading from a database table a column that contained German characters in its contents. When I wanted to display the text, my characters were not the correct German ones. This matter was fixed by simply using
smart_str(text)
Enconded/Decoded:
To decode a string, use the decode() method:
typeOfEncoding = "iso-8859-1" fileContents = file.readline() text = fileContents.decode(typeOfEncoding) |
The encode() method converts from Unicode to an encoded string.
text = txt.encode(typeOfEncoding) |
If the string contains characters that cannot be represented in the given encoding, Python raises an exception. You can change this by passing in a second argument to encode:
output = text.encode(typeOfEncoding, "replace") |
Encoding in Python can be a bit unclear at a first glance, but knowing these basic ideas of implementing a script taking in account the encoding type, the appearance of these issues can be minimized.
Finally, there’s another very important peculiarity of what does Cialis that brings it so high above its alternatives. It is the only med that is available in two versions – one intended for use on as-needed basis and one intended for daily use. As you might know, Viagra and Levitra only come in the latter of these two forms and should be consumed shortly before expected sexual activity to ensure best effect. Daily Cialis, in its turn, contains low doses of Tadalafil, which allows to build its concentration up in your system gradually over time and maintain it on acceptable levels, which, consequently, makes it possible for you to enjoy sex at any moment without having to time it.