AnswerBun.com

Linear Regression not working due to wrong kind of array

Data Science Asked on September 16, 2020

I try to deal with my homework. The Job is to take this Data and perform a linear regression on it.

The code is published here.

I am quite new to programming in Python and in data science. So I tried transforming as the interpreter suggests, but it didn’t work.
My first error was that there was a 2d array expected but 1d given. Then I took the pure array and put it into an empty one suggested by a StackOverflow answer now the error is that a scalar array is given but a 2d array is given.

import pandas as pd
from sklearn.preprocessing import StandardScaler

#Import
data = pd.read_csv('uscrime.txt', sep="t")
crime = pd.concat([data], axis = 1)
print(crime)

from sklearn.linear_model import LinearRegression
regression = LinearRegression()
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(crime.get("M"), crime.get("Crime"), test_size=0.2, random_state=0)

X_train_new = []
X_train_new.append(X_train.values)

y_train_new = []
y_train_new.append(y_train.values)

regression.fit(X_train_new, y_train_new)

One Answer

import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split


data = pd.read_csv("http://www.statsci.org/data/general/uscrime.txt", sep="t")
x = data.loc[:, data.columns != 'Crime'].to_numpy() 
y = np.squeeze(data.loc[:,'Crime'].to_numpy())

regression = LinearRegression()

regression.fit(x, y)

scikit learn expects numpy arrays and not pandas dataframes. You need to convert from one to the other on top, you need to make sure that the array for y only has one dimension which I achieved via np.squeeze. Bonus: see above how you can directly load the csv from the website.

Correct answer by PalimPalim on September 16, 2020

Add your own answers!

Related Questions

Data scientist vs machine learning engineer

9  Asked on December 6, 2020 by ryan-zotti

 

Predicting next rows in tabular dataset

0  Asked on December 6, 2020 by user112112

   

Passing tuples (key, value) into parameterized SQL query in Python

1  Asked on December 6, 2020 by wackytaco636

   

MSE relevance as a metric when errors < 1

3  Asked on December 5, 2020 by gwalchaved

     

Why image is more blurred through PIL?

2  Asked on December 5, 2020 by arohan-ajit

     

Random forest multivariate forecast in Python

0  Asked on December 5, 2020 by user3792245

   

ValueError: bad input shape

0  Asked on December 5, 2020 by rajan-lagah

     

Ask a Question

Get help from others!

© 2022 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP, SolveDir