Skip to content

Unicode issues#252

Merged
takluyver merged 15 commits into
ipython:masterfrom
takluyver:unicode-issues
Mar 24, 2011
Merged

Unicode issues#252
takluyver merged 15 commits into
ipython:masterfrom
takluyver:unicode-issues

Conversation

@takluyver

Copy link
Copy Markdown
Member

This branch fixes two particularly nasty unicode bugs:

  1. Entering code with non-ascii characters will fail immediately with a UnicodeEncodeError. This was due to code in compilerop attempting to do MD5 hashing without first encoding.
  2. Starting IPython when there are non-ascii characters in the .ipython/history.json file will fail. This was due to readline.add_history, which expects text to be encoded.

I've started creating a test suite for unicode related issues. Unfortunately, this has to be run separately from the main test suite, as something in the test suite's initialisation causes it to change the default encoding from ascii to utf-8, in which situation neither of the problems above occur. This is, for now, in tools/unicode_tests.py.

@rkern

rkern commented Jan 27, 2011

Copy link
Copy Markdown
Contributor

I'm not able to see a problem with the default encoding in the test suite. Certainly, there isn't any code in IPython or nose that calls sys.setdefaultencoding(). Do you have any nose plugins installed? It's possible that a poorly behaved plugin is doing that. The other changes seem fine.

@takluyver

Copy link
Copy Markdown
Member Author

I shouldn't have any nose plugins installed - I only installed nose to test IPython. I didn't try to track down where it got changed, but when I tried to add the tests to the main test suite, I couldn't get them to fail, and when I tried sticking a sys.getdefaultencoding() in there, it told me it was utf-8. I've just installed nose 1.0.0 in the virtualenv I'm using, and it's still doing the same.

@rkern

rkern commented Jan 27, 2011

Copy link
Copy Markdown
Contributor

Some other packages (e.g. Pylons) will also provide nose plugins for their own use. But if you just have IPython and nose installed in the virtualenv, then I'm not sure what the issue would be.

@takluyver

Copy link
Copy Markdown
Member Author

Sorting out the test suite revealed a unicode related test failure in inputsplitter, which has also been resolved.

@takluyver

Copy link
Copy Markdown
Member Author

Interestingly, somewhere along the line, the annoying bug where unicode literals turn into UTF-8 has been resolved.

@takluyver

Copy link
Copy Markdown
Member Author

Rebased on trunk - I needed to check it against newer code.

@rkern

rkern commented Mar 2, 2011

Copy link
Copy Markdown
Contributor

Looks good to me. Recommend merge.

@takluyver

Copy link
Copy Markdown
Member Author

Trying to resolve some test failures on Windows.

@takluyver

Copy link
Copy Markdown
Member Author

N.B. I suggest that this waits until my sqlite history branch is merged. It will need a couple of changes following the new history system.

@fperez

fperez commented Mar 13, 2011

Copy link
Copy Markdown
Member

OK for waiting post sqlite history merger. I just wanted to thank you for tackling these unicode issues!

One thing to note: we may need to make history loading more robust against encoding issues. Just in testing this I input some unicode, then went back to running on master, but since the unicode had gone into my history file on disk, now I can't start ipython because it crashes on initialization:

amirbar[~]> ip
Error in sys.excepthook:
Traceback (most recent call last):
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/core/crashhandler.py", line 137, in __call__
    traceback = TBhandler.text(etype,evalue,etb,context=31)
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/core/ultratb.py", line 397, in text
    tb_offset, context)
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/core/ultratb.py", line 958, in structured_traceback
    ipinst = ipapi.get()
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/core/ipapi.py", line 29, in get
    return InteractiveShell.instance()
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/core/interactiveshell.py", line 315, in instance
    inst = cls(*args, **kwargs)
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/core/interactiveshell.py", line 279, in __init__
    self.init_readline()
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/core/interactiveshell.py", line 1565, in init_readline
    self.reload_history() 
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/core/interactiveshell.py", line 1259, in reload_history
    self.history_manager.reload_history()
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/core/history.py", line 174, in reload_history
    self.populate_readline_history()
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/core/history.py", line 146, in populate_readline_history
    self.shell.readline.add_history(line)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 6: ordinal not in range(128)

Original exception was:
Traceback (most recent call last):
  File "/home/fperez/usr/bin/ipython", line 10, in 
    launch_new_instance()
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/frontend/terminal/ipapp.py", line 660, in launch_new_instance
    app.start()
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/core/application.py", line 210, in start
    self.initialize()
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/core/application.py", line 202, in initialize
    self.construct()
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/frontend/terminal/ipapp.py", line 478, in construct
    self.shell = TerminalInteractiveShell.instance(config=self.master_config)
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/core/interactiveshell.py", line 315, in instance
    inst = cls(*args, **kwargs)
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/frontend/terminal/interactiveshell.py", line 87, in __init__
    user_global_ns=user_global_ns, custom_exceptions=custom_exceptions
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/core/interactiveshell.py", line 279, in __init__
    self.init_readline()
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/core/interactiveshell.py", line 1565, in init_readline
    self.reload_history() 
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/core/interactiveshell.py", line 1259, in reload_history
    self.history_manager.reload_history()
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/core/history.py", line 174, in reload_history
    self.populate_readline_history()
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/core/history.py", line 146, in populate_readline_history
    self.shell.readline.add_history(line)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 6: ordinal not in range(128)

Now, in this case I know perfectly what's going on (the product of going back and forth between branches with different unicode behavior) and I also know how to get out of it. But a normal user may get seriously frustrated by a failure of this kind.

So in the history branch, we probably want to make sure that in the face of catastrophic errors with history loading, we simply clean things up, start a new history store (db in the new code) and get up and running (informing the user of what happened) rather than blocking out startup.

This is just a comment to keep in mind as we rework the history stuff... As far as this branch is concerned, all I can say is thank you for tackling this unicode mess. In my mind, this was one of the major, if not the key, blocker for getting 0.11 out the door.

Awesome work, as always.

@takluyver

Copy link
Copy Markdown
Member Author

You're welcome. And yes, some error handling code for loading history into readline is not a bad idea, although hopefully once we've got all our ducks in order, it won't actually be needed.

@minrk

minrk commented Mar 22, 2011

Copy link
Copy Markdown
Member

Merging this will also close #88

Comment thread IPython/core/history.py Outdated

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detect encoding outside loop

@fperez

fperez commented Mar 24, 2011

Copy link
Copy Markdown
Member

Done with review during sage days 29 sprint, ready for merge after minor fixes discussed in review.

Will be merged after rebase on top of the sqlite branch (once that's merged), since this depends on that.

Thanks again for the great work!

@takluyver takluyver merged commit 528125c into ipython:master Mar 24, 2011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants