Saturday, June 1, 2013

Iterating in Python

I was looking into creating a function which would convert a dictionary filled with unicode strings to a dictionary filled with non-unicode (ANSI) strings. I'm using Python 2.7.4. I found the answer to this in Stack Overflow post: http://stackoverflow.com/questions/1254454/fastest-way-to-convert-a-dicts-keys-values-from-unicode-to-str, but the given solution didn't quite work for me.

I discovered that Python treats strings as iterable collections:

>>> import collections
>>> isinstance(u'abc', collections.Iterable)
True

A unicode string is iterable? It makes sense because I can do this:

>>> for c in u'abc':
>>>  print c
a
b
c

An ordinary string is also iterable:
>>> import collections
>>> isinstance('abc', collections.Iterable)
True

This means I need to tweak my function to check to see if an input is a string type first before checking to see if it is iterable. Here's what I got:

import collections
import types
def convert(in_data):
    if isinstance(in_data, types.StringTypes):
        return in_data.encode('utf-8')
    elif isinstance(in_data, collections.Mapping):
        return {convert(key): convert(value) for key, value in in_data.iteritems()}
    elif isinstance(in_data, collections.Iterable):
        return [convert(element) for element in in_data]
    else:
        return in_data

Now for a quick test:

    def test1(self):
        unicode_d = {u'key1': u'val1', u'sub_key': {u'name1': u'value1', u'name2': u'value2'},
             u'list': [1, u'a', u'b', 2, [u'you', u'and', u'me', 221, 321]]}
        ansi_d = {'key1': 'val1', 'sub_key': {'name1': 'value1', 'name2': 'value2'},
             'list': [1, 'a', 'b', 2, ['you', 'and', 'me', 221, 321]]}

        self.assertTrue(convert(unicode_d) == ansi_d)
        self.assertTrue(convert(unicode_d) == convert(ansi_d))

Of course, this function doesn't handle all possibilities like None types. It will also convert sets into lists...