AnswerBun.com

How to insert several csv files into Elasticsearch?

Database Administrators Asked by Revolucion for Monica on December 19, 2020

I have several csv files on university courses that all seem linked by an ID, that you can find here, and I wondered how to put them on Elasticsearch. I know, thanks to this video and Logstash, how to insert one sole file csv file to Elasticsearch. But do you know how to insert several such as those in the provided link ?

At the moment I started with a first .config file for a first csv file : ACCREDITATION.csv. But it would be painful to write them all…

The .config file is :

input{
    file{
        path =>"Users/mike/Data/ACCREDITATION.csv"
        start_position => "begining"
        sincedb_path => "/dev/null"
    }
}

filter{
    csv{
        separator => ","
        columns => ['PUBUKPRN', 'UKPRN', 'KISCOURSEID', 'KISMODE', 'ACCTYPE', 'ACCDEPEND', 'ACCDEPENDURL', 'ACCDEPENDURLW']

    }
    mutate{convert => ["PUBUKPRN","integer"]}
    mutate{convert => ["UKPRN","integer"]}
    mutate{convert => ["KISMODE","integer"]}
    mutate{convert => ["ACCTYPE","integer"]}
    mutate{convert => ["ACCDEPEND","integer"]}
}

output{
    elasticsearch{
        hosts =>"localhost"
        index =>"accreditation"
        document_type =>"accreditaiton keys"
    }
    stdout{}
}

Update May, 3rd

Without knowing how to use a .config file to implement csv files to Elasticsearch, I fell back to Elastic blog and tried to do a shell script importSVFiles for a first .csv file before trying to generalize the approach :

importCSVFiles :

#!/bin/bash
while read f1
do        
   curl -XPOST 'https://XXX.us-east-1.aws.found.io:9243/courses/accreditation' -H "Content-Type: application/json" -u elastic:XXX -d "{ "accreditation": "$f1" }"
done < AccreditationByHep.csv

Yet I received a mapper_parsing_exception on the terminal :

[email protected]:~/Data/on_2018_04_25_16_43_17$ ./importCSVFiles
{"error":{"root_cause":
            [{"type":"mapper_parsing_exception","reason":"failed to parse"}],
          "type":"mapper_parsing_exception",
          "reason":"failed to parse",
          "caused_by":{"type":"i_o_exception","reason":"Illegal unquoted character ((CTRL-CHAR, code 13)): 
              has to be escaped using backslash to be included in string valuen at [Source: org.elasticsear[email protected]e18584; line: 1, column: 88]"}
         },"status":400
}

One Answer

I just had a look at the data in the Higher Education Statistics Agency (HESA) zipped file and the files are all different.

This means you will either have to create an individual .config file for each import or create a single .config file using conditions as described in the following article:

Reference: How to use multiple csv files in logstash (Elastic Discuss Forum)

Expanding on your first .config by one level:

input{
    file{
        path =>"Users/mike/Data/ACCREDITATION.csv"
        start_position => "begining"
        sincedb_path => "/dev/null"
    }
    file{
        path =>"Users/mike/Data/ACCREDITATION.csv"
        start_position => "begining"
        sincedb_path => "/dev/null"
    }

}

filter{
    # added condition for first file
    if [path] == "Users/mike/Data/ACCREDITATION.csv"{
        csv{
            separator => ","
            columns => ['PUBUKPRN', 'UKPRN', 'KISCOURSEID', 'KISMODE', 'ACCTYPE', 'ACCDEPEND', 'ACCDEPENDURL', 'ACCDEPENDURLW']

        }
        mutate{convert => ["PUBUKPRN","integer"]}
        mutate{convert => ["UKPRN","integer"]}
        mutate{convert => ["KISMODE","integer"]}
        mutate{convert => ["ACCTYPE","integer"]}
        mutate{convert => ["ACCDEPEND","integer"]}
    }
    # added condition for second file
    else if [path] == "Users/mike/Data/AccreditationByHep.csv"{
        csv{
            separator => ","
            columns => ['AccreditingBodyName', 'AccreditionType', 'HEP', 'KisCourseTitle', 'KiscourseID']
        }
    # ommitted mutations for second file
    }

}

output{
    # added condition for first file
    if [path] == "Users/mike/Data/ACCREDITATION.csv"{ 
        elasticsearch{
            hosts =>"localhost"
            index =>"accreditation"
            document_type =>"accreditaiton keys"
        }
    }
    # added condition for second file
    else if [path] == "Users/mike/Data/AccreditationByHep.csv"{
        elasticsearch{
            hosts =>"localhost"
            index =>"accreditationByHep"
            document_type =>"accreditaitonbyhep keys"
        }
    }
    stdout{}
}

document_type is a deprecated configuration option

You should be able to expand on this example on your own.

Answered by John aka hot2use on December 19, 2020

Add your own answers!

Related Questions

Dbca hangs at 59%

0  Asked on December 6, 2020 by user133197

 

Performance issue in Postgresql 12

0  Asked on December 3, 2020 by raghu-mutyam

   

Linked Server throws metadata error

1  Asked on November 30, 2020 by cam

     

What is an ODBK file and how to restore that?

1  Asked on November 14, 2020 by hasnain

   

Calculating percentage over sub query SQL

1  Asked on October 29, 2020 by patrick-kusebauch

   

How to select the rows affected by an update

2  Asked on October 27, 2020 by enharmonic

 

Corruption in mariadb binlog

2  Asked on October 18, 2020 by ctutte

     

“SQL command not properly ended” with trigger

1  Asked on September 28, 2020 by exilevoid

     

Ask a Question

Get help from others!

© 2022 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP