TransWikia.com

What tool would you suggest I use if I have to analyze millions of rows of csv data?

Software Recommendations Asked by Karan on September 25, 2021

I’m using windows and can’t change that. Novice at programming.

Something that’d let me make pivots easily to analyze would help. Basically I have about 20gb data of userids, spend, country and so on which I want to analyze- say total spend by region and other metrics.

2 Answers

My approach doesn't really target a "novice" user, but it it could be learned by a novice.

  1. My suggestion would be to try getting a sql database (https://www.postgresql.org/).
  2. Load the data into the database. Example: https://www.postgresqltutorial.com/import-csv-file-into-posgresql-table/
  3. Learn enough SQL to be dangerous. (https://www.w3schools.com/sql/)

Rationale: postgresql is capable of handling disk storage, compression, indexing, RAM usage, and other problems that many other software packages do not.

It's not quite programming, but just on the border of it. SQL can be scary, but still readable for simple questions. example:

SELECT SUM(spend)
FROM spend_info
GROUP BY region

It is a capability that has been around for decades, and will continue to be around, so it's a valuable skill to learn. And there's lots of tutorials out there to help along the way.

Answered by cmonkey on September 25, 2021

if you need to process these in your intranet- get a server or PC with lots of memory. Command line tools such as grep, sort and uniq -c are a good first start to do simple analyses, assuming that the data is reasonably clean.

Alternatively process datafiles in the cloud. New customers get a free tier there. Upload files to, say, Google BigQuery and process them with their other tools (e.g. Cloud Dataprep which would calculate summaries for each column, during upload).

However, you probably shouldn't do this with confidential user data where data protection laws apply.

Answered by knb on September 25, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP