TransWikia.com

Text File Problems

Stack Overflow Asked by taco_attack on January 7, 2021

I dont know how well I’ll be able to ask this question, but given a text file I need to parse through and extract the productID data and store it in a HashSet, userID data and store it in a HashSet, and the review/score and store it in an ArrayList. They also need to be used to create a graph, where the productID is connected with an edge between the userID.

The data is found here http://snap.stanford.edu/data/web-FineFoods.html
You can ignore review/time, review/helpfulness, review/summary, and review/text information, they dont need to be stored in memory.

My current code looks like this:

import java.io.*;
import java.util.*;
import java.nio.charset.*;

public class Reviews
{
    String fileName = "newfinefoods.txt";
    GraphType<String> foodReview;
    HashSet<String> productID;
    HashSet<String> userID;
    ArrayList<String> review;
    
    int counter; //was using this to make sure I'm counting all the lines which I think I am
    
    public Reviews(){
        foodReview = new GraphType<>();
        productID = new HashSet<>();
        userID = new HashSet<>();
        review = new ArrayList<>();
        counter = 0;
    }
    
    public int numReviews(){
        return review.size();
    }
    
    public int numProducts(){
        return productID.size();
    }
    
    public int numUsers(){
        return userID.size();
    }
    
    public void setupGraph(){
        Scanner fileScanner;
        String line = "";
        try{
            fileScanner = new Scanner (new File (fileName), "UTF-8");
            String pr = "";
            while(fileScanner.hasNextLine()){
                line = fileScanner.nextLine();
                String[] reviewInfo = line.split(": ");
                String productInfo = reviewInfo[1];
                System.out.println(productInfo);
            }
        }
        
        catch (IOException e){
            System.out.println(e);
        }
    }
    
    
    
    public static void main(String[] args){
        Reviews review = new Reviews();
        review.setupGraph();
        System.out.println("Number of Reviews:" + review.numReviews());
        System.out.println("Number of Products:" + review.numProducts());
        System.out.println("Number of Users:" + review.numUsers());
        
    }
}

Whenever I run the code, looking in the array reviewInfo at 1, it only prints one set of data, but if I change it to 0 it seems to print all the information (just not the info that I need). I need to create this graph and get the info from the data but I am really just super stuck, and any tips or help would be very appreciated!

Here is a sample of the data:

product/productId: B001E4KFG0
review/userId: A3SGXH7AUHU8GW
review/profileName: delmartian
review/helpfulness: 1/1
review/score: 5.0
review/time: 1303862400
review/summary: Good Quality Dog Food
review/text: I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better. My Labrador is finicky and she appreciates this product better than  most.

product/productId: B00813GRG4
review/userId: A1D87F6ZCVE5NK
review/profileName: dll pa
review/helpfulness: 0/0
review/score: 1.0
review/time: 1346976000
review/summary: Not as Advertised
review/text: Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted. Not sure if this was an error or if the vendor intended to represent the product as "Jumbo".

product/productId: B000LQOCH0
review/userId: ABXLMWJIXXAIN
review/profileName: Natalia Corres "Natalia Corres"
review/helpfulness: 1/1
review/score: 4.0
review/time: 1219017600
review/summary: "Delight" says it all
review/text: This is a confection that has been around a few centuries.  It is a light, pillowy citrus gelatin with nuts - in this case Filberts. And it is cut into tiny squares and then liberally coated with powdered sugar.  And it is a tiny mouthful of heaven.  Not too chewy, and very flavorful.  I highly recommend this yummy treat.  If you are familiar with the story of C.S. Lewis' "The Lion, The Witch, and The Wardrobe" - this is the treat that seduces Edmund into selling out his Brother and Sisters to the Witch.

product/productId: B000UA0QIQ

One Answer

Initial approach of your design is right, but you should structure it a little more:

Method setupGraph should be splitted in little specific and parametrized methods:

  • Since the users and products are part of the class' state, I deem it better that the class' constructor receives the scanner as an input parameter. Then, after initializing the state variables, it should call setupGraph (which should be private) passing the input scanner.
  • setupGraph shall receive an input scanner and take the responsibility of reading lines from it, and give a proper treatment to the IOExceptions that might arise. On each line, it should merely call another private method for processing the read line. If you want to count all the read lines, this is where you should place the increment.
  • The processing line method shall receive an input string, and take the responsibility of deciding if it contains a product data, a user data, a score data, or none. This must be done through properly parsing its contents. Here is where you can use String.split() to get the name and value of each line, and then evaluate the name to decide where to store the value. And if you want to count all the processed lines, this is where you should place the increment.
  • Last, main method shall take the responsability of instancing the scanner and passing it when constructing the Reviews object. In this way, you could receive the file name as input argument from the command line, so your program would become flexible.

Realise that the only public methods of your class should be the constructor and the getters. And state variables shuld be private.

Answered by Little Santi on January 7, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP