TransWikia.com

SharePoint workflow to conduct a keyword search within documents to populate columns

SharePoint Asked by Noah M on October 26, 2021

I’ve tried a couple ways of going about this but keep running into dead ends, so I wanted to see if anyone has done something similar to this in the past. My goal is to create a workflow that will search through any uploaded documents in a library (in this case I’m trying to search through resumes, so it isn’t uniformly formatted) for specified keywords, and then auto populate a "Keyword" column in the document library. That way, the resumes could be easily filtered based on which keywords we’re looking for. I know that the basic SharePoint search will search through documents and return documents that contain that keyword, but it doesn’t always seem to work, and we would much rather have it in column form.

Thanks!

One Answer

I have not done anything like this. But you can take a look here:

https://powerusers.microsoft.com/t5/Power-Automate-Community-Blog/Extract-data-from-documents-with-Microsoft-Flow/ba-p/370422

For a code based approach Use an event reciever and override the itemAdded event:

  1. Note: I have assumed that all are doc files, but if they are different files,we can write a different code for each and use a switch based on extention.
  2. I have assumed, that you need to store all the keywords in a single column. You can use any kind of logic though- the basic code remains same
public override void ItemAdded(SPItemEventProperties properties)
    {
        try
        {
            SPSecurity.RunWithElevatedPrivileges(delegate ()
            {
                using (SPSite site = properties.OpenSite())
                {
                    using (SPWeb web = site.OpenWeb())
                    {
                        SPFile file = properties.ListItem.File;
            var byteArray = file.OpenBinary();
            string filePath = Path.Combine("<<Path where you want to store>>", file.Name);

            using(FileStream fs = new FileStream(filePath, FileMode.Create)
            {
                fs.Write(byteArray, 0, byteArray.Length);
            }
        List AvailablekeyWords= ExtractKeyWords(filePath);
            var currentItem=properties.ListItem;
            currentItem["KeyWordColumn"]=AvailablekeyWords.Join(',');
            currentItem.Update();
                    }
                }
            });
        }
        catch (Exception ex)
        {
        }
        base.ItemAdded(properties);
    }
public List<String> ExtractKeyWords(string filePath){
// needs a reference to  Microsoft.Office.Interop.Word; 
  List<string> keyWords=`new List<String>(){"SharePoint","O365"};
 List<string> AvailableKeyWords=`new List<String>();

   Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();  
   object miss = System.Reflection.Missing.Value;  
   object path =filePath);  
   object readOnly = true;  
   Microsoft.Office.Interop.Word.Document docs = word.Documents.Open(ref path, ref miss, ref readOnly, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss);  
   
   for (int i = 0; i < docs.Paragraphs.Count; i++)  
   {  
     var string= docs.Paragraphs[i + 1].Range.Text.ToString());  
    foreach(var item in keyWords){
     if(string.Contains(item))
    {
          AvailableKeyWords.Add(item);
    }
     
    }
   }   
   
   return AvailableKeyWords;
}

Client side approach if you can't deploy wsp Add a hidden column called status (hide it from list form using css in content editor or in SP designer-> create a item add and edit form)

By default make the status='New'

Create a windows task and schedule it to run every 1 hour (or whatever interval you want) the windows task will call an .exe file, which will have the following code in the main function:

//Add reference to Microsoft.SharePoint.Client
//and Microsoft.SharePoint.Client.Runtime dlls

using Microsoft.SharePoint.Client;

Public static void Maain(string args[]){
 ClientContext context=new ClientContext("Url of the web containing the library")
 context.credentials= CredentialCache.DefaultNetworkCredentials; 
 List ResumeLibrary=context.Web.Lists.GetByTitle("Title of the library");
 Caml camlQuery=new CamlQuery(){
    @"<View><Query><Where><Eq><FieldRef Name="Status"/><Value Type="Text">"New"</Value></Eq></Where></Query></View>"
     };
 ListItemCollection resumes=ResumeLibrary.GetItems(camlQuery);
 context.Load(resumes,r=>r.File);
 context.ExecuteQuery();
foreach(var resume in resumes){
            var file=resume.File;

            var byteArray = file.OpenBinary();
            string filePath = Path.Combine("<<Path where you want to store>>", file.Name);

            using(FileStream fs = new FileStream(filePath, FileMode.Create)
            {
                fs.Write(byteArray, 0, byteArray.Length);
        
            }
            List AvailablekeyWords= ExtractKeyWords(filePath);//same method described earlier
            
            resume["KeyWordColumn"]=AvailablekeyWords.Join(',');
            resume["Status"]="Completed";
            currentItem.Update();
            contex.ExecuteQuery();
    }
}

This code will execute everytime the scheduled job runs and pick all new resumes and extract the keywords. Then change the status to completed

Answered by SOURAV MUKHERJEE on October 26, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP