AnswerBun.com

Pyspark Transpose

Stack Overflow Asked by Sawan S on November 23, 2020

I have data in the below format with 38 measure columns for various months as shown below.

+---------+-----------------+-----------------+------+------------------+------------------+------------------+---------+------------------+
| Cust_No | Measure1_month1 | Measure1_month2 | .... | Measure1_month72 | Measure2_month_1 | Measure2_month_2 | ….so on | Measure2_month72 |....Measure38_month1...
+---------+-----------------+-----------------+------+------------------+------------------+------------------+---------+------------------+
|       1 |              10 |              20 | ….   |              500 |               40 |               50 | …       |                  |
|       2 |              20 |              40 | ….   |              800 |               70 |              150 | …       |                  |
+---------+-----------------+-----------------+------+------------------+------------------+------------------+---------+------------------+

I want to achieve the below format using PYSPARK.

+---------+-------+----------+----------+
| CustNum | Month | Measure1 | Measure2.......measure38 |
+---------+-------+----------+----------+
|       1 |     1 |       10 |       30 |
|       1 |     2 |       20 |       40 |
|       1 |     3 |       30 |       80 |
|       1 |     4 |       70 |       90 |
|       1 |     5 |       40 |      100 |
|       . |     . |        . |        . |
|       . |     . |        . |        . |
|       1 |    72 |      700 |       50 |
+---------+-------+----------+----------+

and so on for every customer number

Can you please help me with this?

Thanks

One Answer

IIUC, you need wide to long kind of transformation which can be achieved by stack in pyspark

I created a sample dataframe with 5 months data

df = spark.createDataFrame([(1,10,20,30,40,50,10,20,30,40,50),(2,10,20,30,40,50,10,20,30,40,50)],['cust','Measrue1_month1','Measrue1_month2','Measrue1_month3','Measrue1_month4','Measrue1_month5','Measrue2_month1','Measrue2_month2','Measrue2_month3','Measrue2_month4','Measrue2_month5'])

Now generating the clause for stack operation. Can be done in better ways but here is the most simplest example

Measure1 = [i for i in df.columns if i.startswith('Measrue1')]
Measure2 = [i for i in df.columns if i.startswith('Measrue2')]
final = []
for i in Measure1:
    for j in Measure2:
        if(i.split('_')[1]==j.split('_')[1]):
            final.append((i,j))
rows = len(final)
values = ','.join([f"'{i.split('_')[1]}',{i},{j}" for i,j in final])

Now actually applying the stack operation

df.select('cust',expr(f'''stack({rows},{values})''').alias('Month','Measure1','Measure2')).show()

+----+------+--------+--------+
|cust| Month|Measure1|Measure2|
+----+------+--------+--------+
|   1|month1|      10|      10|
|   1|month2|      20|      20|
|   1|month3|      30|      30|
|   1|month4|      40|      40|
|   1|month5|      50|      50|
|   2|month1|      10|      10|
|   2|month2|      20|      20|
|   2|month3|      30|      30|
|   2|month4|      40|      40|
|   2|month5|      50|      50|
+----+------+--------+--------+

Answered by Shubham Jain on November 23, 2020

Add your own answers!

Related Questions

Removing duplicates while sorting numbers inside a String in java

5  Asked on November 30, 2020 by swetha-haridoss

       

pointers cant read the correct elements in array

2  Asked on November 29, 2020 by jabou

   

pandas does not load the sub packages properly

0  Asked on November 29, 2020 by realbro

   

Webpack: how to copy html files into build folder?

2  Asked on November 29, 2020 by wai-yan-hein

   

Save string to file without converting newlines using Python

1  Asked on November 29, 2020 by kleiton-kurti

       

ngx-mask Do not allow negative value for the currency input

2  Asked on November 28, 2020 by ashot-aleqsanyan

   

Unity: Add extra time to slider

1  Asked on November 28, 2020 by christopher-madsen

     

Python cannot access list

3  Asked on November 28, 2020 by forge-mods

   

Get property name and value only if there is a value

1  Asked on November 28, 2020 by jimenemex

   

Count percentage of upper case words

1  Asked on November 28, 2020 by user13623188

   

Ask a Question

Get help from others!

© 2022 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP, SolveDir