UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 1

Question

さらに

Markum

質問

UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 1

文字列をUTF-8にエンコードしようとすると、いくつかの問題が発生します。string.encode('utf-8')やunicode(string)`を使うなど、いろいろ試してみましたが、エラーが出てしまいます。

UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 1: ordinal not in range(128)

これが私の文字列です。

(｡･ω･｡)ﾉ

何が間違っているのかわかりませんが、何かアイデアはありませんか？

編集：問題は、文字列をそのまま印刷してもうまく表示されないことです。また、変換しようとするとこのようなエラーが出ます。

Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
>>> s1 = s.decode('utf-8')
>>> print s1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-5: ordinal not in range(128)

Rubens Mariuzzo

編集された質問 4日 12月 2014 в 4:34

プログラミング

python

unicode

utf-8

解決策・回答

mata

12日 5月 2012 в 7:53

さらに

を試すことができます。

string.decode('utf-8')  # or:
unicode(string, 'utf-8')

を編集してみてください。

'(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'.decode('utf-8') gives u'(\uff61\uff65\u03c9\uff65\uff61)\uff89', which is correct.

おそらく、暗黙の変換が行われている状態で何かをしようとした場合でしょう（印刷やストリームへの書き込みなどが考えられます）。

これ以上はコードを見てみないとわかりません。

mata

編集した答え 12日 5月 2012 в 8:08

24

0

wim

12日 5月 2012 в 8:08

さらに

文字列は utf-8 にエンコードされているようですが、具体的には何が問題なのでしょうか？あるいは、ここで何をしようとしているのでしょうか...？

Python 2.7.3 (default, Apr 20 2012, 22:39:59) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
>>> s1 = s.decode('utf-8')
>>> print s1
(｡･ω･｡)ﾉ
>>> s2 = u'(｡･ω･｡)ﾉ'
>>> s2 == s1
True
>>> s2
u'(\uff61\uff65\u03c9\uff65\uff61)\uff89'

1

0

質問の追加

カテゴリ

すべて

技術情報

文化・レクリエーション

生活・芸術

科学

プロフェッショナル

事業内容

ユーザー

すべて

新しい

人気

1

2

3

4

5

Do you have a question? Add it on the site and get an answer instantly

ja.kzen.dev

Nick Craig-Wood · Accepted Answer · 2012-05-12T11:05:58+00:00

これは、ターミナルのエンコーディングがUTF-8に設定されていないことが原因です。以下は私のターミナルです。

$ echo $LANG
en_GB.UTF-8
$ python
Python 2.7.3 (default, Apr 20 2012, 22:39:59) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
>>> s1 = s.decode('utf-8')
>>> print s1
(｡･ω･｡)ﾉ
>>>

私の端末では、この例は上記のように動作しますが、LANGの設定を取り除くと動作しません。

$ unset LANG
$ python
Python 2.7.3 (default, Apr 20 2012, 22:39:59) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
>>> s1 = s.decode('utf-8')
>>> print s1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-5: ordinal not in range(128)
>>>

この変更を恒久的なものにする方法については、お使いの Linux のバージョンのドキュメントを参照してください。