TransWikia.com

Convert column value stripping html tags into sql view with rows and columns

Database Administrators Asked by Velocity on October 28, 2021

I have a table named data with a column pid & display_data.
Pid is the serial number of all rows.

The value of display_data column is as below :

For each row display_Data has values in the format =>

For PID= 1:

The labels are constant for all rows which would become column headers. Values are changing with respect to labels.
There are total 400 such rows hence, the query needs to be dynamic.

I want to parse this column value,strip html tags and into a new view split it using sql query(Regexp_Replace maybe) such that:
All label values become columns, i.e.

The actual data is much more with many labels and values however this is just sample to get help.

One Answer

TL;DR Answer

Convert HTML to XML and parse with XMLTABLE()

Long Answer

HTML and XML are both descendants of SGML.

From the sample data you have posted, it looks like you just need to remove the <br> tags and wrap desc_data in a root element (eg <row>) for the conversion into XML.

Once in XML format, you can parse the data with XMLTABLE() to give you:

  • instance of <span> ( via for ordinality, column name RN )
  • value of @label
  • value of <span>

This is done within the CTE named xml_data in my example.

Assuming value always follows label, you can convert the instance of <span> ( RN ) to have the same value for each pair. (this is also done in xml_data)

From there, you can PIVOT the data into a table that has:

  • your row identifier (RID)
  • value for label (LABEL is a key word so I used LABEL_TXT )
  • value for value (VALUE is a key word so I used VALUE_TXT )

This is done within the CTE labeled key_val in my example.

For the final query, the final columns names MUST BE KNOWN at compile time. Once you know the columns, you can just do another PIVOT for a proper table.

Notes

As you can see, HTML is not a good data format to use for parsing data and/or using the data within a computer (eg a database).

Unless desc_data is used for display, I recommend that the data is stored in XML or JSON format. (I'm assuming you can change the source)

Once data is in a parseable format, PL/SQL is no longer needed.

Example Code

with data as (
    select q'[<span class ="label">Project Name</span><br> <span class ="value">Digital</span> <span class ="label">Dept Name</span><br> <span class ="value">Finance</span><span class ="label">Level</span><br> <span class ="value">Level 5</span>]'
        as txt
        ,1 AS RID
    from dual
),xml_data as (
select RID, a.rn - mod(a.rn+1,2) rn,a.label_txt, a.val
from data d, xmltable( '/row/span'
    passing xmltype( '<row>' || replace(d.txt,'<br>','') || '</row>' ) 
    columns
        rn for ordinality,
        label_txt varchar2(20)  path '/span/@class',
        val       varchar2(20)  path '/span'
    ) a
), key_val as (
    select rid, label_txt, value_txt
    from xml_data
        pivot (
            min( val )
            for label_txt in ( 'label' as LABEL_TXT, 'value' as VALUE_TXT)
        )
)
select RID, PNAME, LVL, DNAME
from key_val
    pivot (
        min( VALUE_TXT )
        for LABEL_TXT in ('Project Name' as PNAME, 'Level' as LVL, 'Dept Name' as DNAME )
)

Answered by Michael Kutz on October 28, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP