introducir datos en DataFrame Pandas

Question

La idea principal del programa es agarrar un DataFrame que tiene 2 columnas con fechas que generan un periodo.
rowId   end_period  init_period
K_000000000002  2018-02-17  2018-01-23

Generar una columna por cada mes del periodo, y nos diga cuanto porcentaje aplicable pertenece a ese mes
Dias_Trasncurridos  DiaInicialTarget    InicioPeriodo   FinPeriodo  porcentaje  rowId1  
9   2018-01-01  2018-01-23  2018-02-17  0.34615384615384615 K_000000000002 
17  2018-02-01  2018-01-23  2018-02-17  0.6538461538461539  K_000000000002

No he tenido problemas al hacer esta función pero al momento de guardarla en un nuevo DataFrame para poder seguir trabajando con ella no he conseguido forma de hacerlo sin que solo guarde la ultima linea nada mas del DataFrame.
Pongo el código que he hecho, si tiene alguna sugerencia soy todo oído
def CalculoReal(InicioPeriodo,FinPeriodo,rowId1):
    
    Target=InicioPeriodo
    
    UltimoFinPeriodo=datetime(FinPeriodo.year, FinPeriodo.month,calendar.monthrange(FinPeriodo.year, FinPeriodo.month)[1])
    
    while Target <= UltimoFinPeriodo:

DiaInicialTarget=datetime(Target.year, Target.month, 1)
        DiaFinalTarget=datetime(Target.year, Target.month,calendar.monthrange(Target.year, Target.month)[1])
        DIAS=max(0,abs(min(DiaFinalTarget, FinPeriodo) - max(DiaInicialTarget,InicioPeriodo)).days)+1
        
        f=FinPeriodo-InicioPeriodo
        
        s = f.days
        if s == 0 :
            s=1
        
        porce =DIAS/(s+1)

Target=DiaFinalTarget + timedelta(days=1)
        
        print(DIAS,DiaInicialTarget,InicioPeriodo,FinPeriodo, porce,rowId1)

Se que se crean correctamente por el print, pero no he podido como ya comentaba ir guardando los datos en el DataFrame.
Os dejo un DataFrame de prueba
rowId   end_period  init_period
K_000000000002  2018-02-17  2018-01-23
K_000000000003  2018-02-17  2018-01-17
K_000000000007  2018-03-26  2018-02-23
K_000000000008  2018-03-25  2018-02-23
K_000000000009  2018-03-25  2018-02-24
K_000000000010  2018-03-20  2018-02-19
K_000000000011  2018-03-24  2018-02-17
K_000000000012  2018-04-01  2018-03-05
K_000000000013  2018-03-24  2018-02-18
K_000000000014  2018-03-25  2018-02-24
K_000000000015  2018-04-01  2018-02-28

Un saludo y muchas gracias de antemano

Candid Moe · Answer

Modifique ligeramente la función CalculoReal, eliminado el parámetro rowId1, pues no se ocupa. También elimine algunas lineas innecesarias.
La función ahora retorna la información calculada:
import pandas as pd
import calendar
from datetime import datetime, timedelta

df = pd.DataFrame({'rowId':       ["K_000000000002", "K_000000000003", "K_000000000007", "K_000000000008", "K_000000000009"],
                   'end_period':  ["2018-02-17", "2018-02-17", "2018-03-26", "2018-03-25", "2018-03-25"],
                   'init_period': ["2018-01-23", "2018-01-17", "2018-02-23", "2018-02-23", "2018-02-24"] })

def CalculoReal(InicioPeriodo, FinPeriodo):
    Target = InicioPeriodo

UltimoFinPeriodo = datetime(FinPeriodo.year, FinPeriodo.month,
                                calendar.monthrange(FinPeriodo.year, FinPeriodo.month)[1])

while Target <= UltimoFinPeriodo:

DiaInicialTarget = datetime(Target.year, Target.month, 1)
        DiaFinalTarget = datetime(Target.year, Target.month, calendar.monthrange(Target.year, Target.month)[1])
        DIAS = max(0, abs(min(DiaFinalTarget, FinPeriodo) - max(DiaInicialTarget, InicioPeriodo)).days) + 1

f = FinPeriodo - InicioPeriodo

s = f.days
        if s == 0:
            s = 1

porce = DIAS / (s + 1)

return DIAS, DiaInicialTarget, InicioPeriodo, FinPeriodo, porce

con eso podemos construir el dataframe pedido:
Primero agregamos las columnas que faltan:
for column in  ['Dias_Transcurridos', 'DiaInicialTarget', 'InicioPeriodo', 'FinPeriodo', 'porcentaje']:
    df[column] = ''

y luego recorremos el dataframe por filas, realizando el cálculo e incorporando la respuesta en la fila pertinente:
for row in range(len(df)):
    inicio = datetime.strptime(df.loc[row]["init_period"], "%Y-%m-%d")
    fin = datetime.strptime(df.loc[row]["end_period"], "%Y-%m-%d")
    DIAS, DiaInicialTarget, InicioPeriodo, FinPeriodo, porce = CalculoReal(inicio, fin)
    df.at[row, 'Dias_Transcurridos'] = DIAS
    df.at[row, 'DiaInicialTarget'] = DiaInicialTarget
    df.at[row, 'InicioPeriodo'] = InicioPeriodo
    df.at[row, 'FinPeriodo'] = FinPeriodo
    df.at[row, 'porcentaje'] = porce

print(df.head())

produce:
  end_period init_period           rowId Dias_Transcurridos  
0  2018-02-17  2018-01-23  K_000000000002                  9   
1  2018-02-17  2018-01-17  K_000000000003                 15   
2  2018-03-26  2018-02-23  K_000000000007                  6   
3  2018-03-25  2018-02-23  K_000000000008                  6   
4  2018-03-25  2018-02-24  K_000000000009                  5

DiaInicialTarget        InicioPeriodo           FinPeriodo porcentaje  
0  2018-01-01 00:00:00  2018-01-23 00:00:00  2018-02-17 00:00:00   0.346154  
1  2018-01-01 00:00:00  2018-01-17 00:00:00  2018-02-17 00:00:00    0.46875  
2  2018-02-01 00:00:00  2018-02-23 00:00:00  2018-03-26 00:00:00     0.1875  
3  2018-02-01 00:00:00  2018-02-23 00:00:00  2018-03-25 00:00:00   0.193548  
4  2018-02-01 00:00:00  2018-02-24 00:00:00  2018-03-25 00:00:00   0.166667

lo que parece estar bien (son los valores devueltos por CalculoReal).
CalculoReal parece calcular mal. Hay 25 días entre el 23-01-2018 y el 17-02-2018, no 9, como arroja la función.

introducir datos en DataFrame Pandas

One Answer

Add your own answers!

Ask a Question