TransWikia.com

Operations on nested data

Code Review Asked by kluvin on December 11, 2020

Problem description.

I have JSON which comes in bad shape:

data = [
    {"ids": [1]},
    {"ids": [3, 4]},
    {"ids": [1, 2]},
    {"ids": [4]},
    {"ids": [3]},
    {"ids": [2]},
] # LD. List of dictionaries.

I want it to get in shape, like this:

expected = [
  [{"ids": [1]}, {"ids": [2]}],        # Length = 1
  [{"ids": [3, 4]}, {"ids": [1, 2]}],  # Length = 2
]  # LOLD. List is now list-of-lists of dictionaries.

To simplify the problem, we can remove the dictionaries of a single kv-pair, keeping in mind that we must reconstruct it later:

# in
[
    [3, 4], [1, 2],
    [1], [4], [3], [2]
]

# out
[
    [[3, 4], [1, 2]]
    [[1], [4], [3], [2]]
]

This is very easy. assemble . op . disassemble $ data:

def main(ids):
    return [list(x) for x in assemble(cardinality_groups(disassemble(ids)))]

def cardinality_groups(lol):
    return [list(group) for _, group in groupby(sorted(lol, key=len), key=len)]

def assemble(data):
    return [tag_datum(x) for x in data]

def tag_datum(datum):
    return [{"ids": x} for x in datum]

def disassemble(ids):
    return [x['ids'] for x in ids]

but

I insist, it must be simpler, purer! Although, I am not sure if Python sports the amenities to make things prettier. So please suggest functionality present in other languages.

I am curious about two ways the program can expand here, and some other things:

  1. By the complexity and nesting of the data. Data can take the form of any JSON found in the wild. Here we simply descend a few levels down.
  2. By the operation performed: Here grouping by cardinality solved the issue. In another world we want no two sets to intersect. Is there any more complex operations, what are they, and do they break anything?
  3. Assembly-disassembly symmetry. The two should be each other’s inverse, so can I deduce one from the other, thus not having to code it. Does any language provide such tools?
  4. Beyond typing, are there any languages that support describing how data looks when it comes in, and how it will look when it comes out? Not just the top-level type, but the shape of the data the code works with at that level of abstraction. My presumption is that many programs lend well to this type of reasoning.
  5. I don’t like f(g(h(x))). I like f . g . h $ x–it’s purer. It really bothers that I can’t do something like this in Python, or JavaScript–two of the most popular languages! Consequently, I frequently find myself doing either:
someValue = dostuff(someInput)
valueIsNowSlightlyChanged = doMoreStuff(someValue)
iAmLosingTrack = doStuffMoreNow(valueIsNowSlightlyChanged)
final = wtf(iAmLosingTrack)

return final

Or variations thereof. At this point I don’t feel like using either language. Doing things this way isn’t, of course, isn’t always going to be possible, but I don’t even get the opportunity. Am I confused, or do I have a point, and you possibly a solution to my supposed confusion?

.

This code was originally written for a question on StackOverflow. I believe that I found a neat way to do it in comparison to the rest. Regardless, I must assume that I don’t have the only good solution. Do you have an example?

Feel free to interpret my questions liberally. Apologies if some is beyond the scope of this forum. I appreciate any pointers to literature on these topics, as well as all your impressions.

Please also let me know if I am unclear. Thank you!

One Answer

You can avoid assemble, disassemble, tag_datum if you leverage on key param of both itertools.groupby and sorted and condense them into one simple function.

Build a custom function for key param

Write a function to return the length of each value in the dictionary.

val_len = lambda x: len(x['ids'])

Now pass this as argument for key in both itertools.groupby and sorted

from itertools import groupby

def transform(data):
    sorted_data = sorted(data, key=val_len)
    return [list(group) for _, group in groupby(sorted_data, key=val_len)]

transform(data) # data taken from question itself.
# [[{'ids': [1]}, {'ids': [4]}, {'ids': [3]}, {'ids': [2]}],
#  [{'ids': [3, 4]}, {'ids': [1, 2]}]]

Details

sorted_data = sorted(data, key=val_len) # sorts data based the length of
                                        # the value of 'ids'
# [{'ids': [1]}, ---|
#  {'ids': [4]},    |-- group with length 1
#  {'ids': [3]},    |
#  {'ids': [2]}, ---|

#  {'ids': [3, 4]},---|-- group with length 2
#  {'ids': [1, 2]}]---|

groupby(sorted_data, key=val_len) # groups data based on the length of 
                                  # the value of 'ids'

Answered by Ch3steR on December 11, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP