TransWikia.com

Scan a directory for files and load it in memory efficiently

Code Review Asked on October 27, 2021

I am working on a little project where I need to scan all the files present in a folder on the disk and load it in memory. Below is my code which does that exactly and works fine.

Here are the steps:

  • On the disk there is already a default Records folder which has all the default config files present. This is to fallback if in case something gets wrong or the loadDefaultFlag is enabled.
  • There are also new config files present as a tar.gz file (max 100 MB size) in a remote url location which I need to download and store it on disk in _secondaryLocation if loadDefaultFlag is disabled.
  • Depending on whether loadDefaultFlag is present or not – we will either load default local files already present on the disk or load it from _secondaryLocation (after downloading it from the remote url location).
  • During server startup call goes to my RecordManager constructor where it checks whether loadDefaultFlag is enabled or not and basis on that it loads the file either from Records folder as mentioned in point 1 or download new configs from url as mentioned in point 2 and then load it in memory.

I get the json value of configKey from IConfiguration object in my constructor which has all the details whether to use default configs or download files from a remote url and store it on disk. Sample content of configKey object is –

{"loadDefaultFlag": "false", "remoteFileName":"data-1234.tgz", ...}

Basis on the above json value I figure out what to do as outlined in above series of points.

Below is my code:

using System;
using System.Collections.Generic;
using System.IO;
using System.Threading;
using System.Threading.Tasks;
using System.Net.Http;
using ICSharpCode.SharpZipLib.GZip;
using ICSharpCode.SharpZipLib.Tar;
using Polly;
using Microsoft.Extensions.Configuration;
using Newtonsoft.Json;

public class RecordManager
{
    private readonly string _remoteUrl = "remote-url-from-where-to-download-new-configs";
    private readonly string _secondaryLocation = "SecondaryConfigs";
    private readonly string _localPath = null;
    private readonly IConfiguration _configuration;

    private static HttpClient _httpClient = new HttpClient()
    {
        Timeout = TimeSpan.FromSeconds(3)
    };

    public RecordManager(IConfiguration configuration, string localPath = "Records")
    {
        _localPath = localPath ?? throw new ArgumentNullException(nameof(localPath));
        _configuration = configuration;
        ChangeToken.OnChange(configuration.GetReloadToken, _ => ConfigChanged(), new object());

        string jsonValue = configuration["configKey"];
        if (!string.IsNullOrWhiteSpace(jsonValue))
        {
            RecordPojo dcc = JsonConvert.DeserializeObject<RecordPojo>(jsonValue);
            Boolean.TryParse((string)dcc.loadDefaultFlag, out bool loadDefaultFlag);
            string remoteFileName = dcc.remoteFileName;
            if (!loadDefaultFlag && !string.IsNullOrWhiteSpace(remoteFileName))
            {
                // get all the configs from the url and load it in memory
                if (!LoadAllConfigsInMemory(_url, remoteFileName, _secondaryLocation).Result) throw new ArgumentNullException(nameof(_records));
            }
            else
            {
                var recordsList = LoadDefaultConfigsInMemory() ?? throw new ArgumentNullException("recordsList");
                if (recordsList.Count == 0) throw new ArgumentNullException("recordsList");

                if (!UpdateRecords(recordsList)) throw new ArgumentNullException(nameof(_records));
            }
        }
        else
        {
            var recordsList = LoadDefaultConfigsInMemory() ?? throw new ArgumentNullException("recordsList");
            if (recordsList.Count == 0) throw new ArgumentNullException("recordsList");

            if (!UpdateRecords(recordsList)) throw new ArgumentNullException(nameof(_records));
        }
    }

    // This method will load all the configs downloaded from the url in memory
    private async Task<bool> LoadAllConfigsInMemory(string url, string fileName, string directory)
    {
        IList<RecordHolder> recordsList = new List<RecordHolder>();
        try
        {
            recordsList = GetRecords(url, fileName, directory);
            if (recordsList == null || recordsList.Count == 0)
            {
                throw new ArgumentException("No config records loaded from remote service.");
            }
            return UpdateRecords(recordsList);
        }
        catch (Exception ex)
        {
            // log error
        }
        // falling back to load default configs
        recordsList = LoadDefaultConfigsInMemory();

        return UpdateRecords(recordsList);
    }

    // This will return list of all the RecordHolder by iterating on all the files.
    private IList<RecordHolder> GetRecords(string url, string fileName, string directory)
    {
        var recordsList = new List<RecordHolder>();
        var recordPaths = GetAllTheFiles(url, fileName, directory);
        for (int i = 0; i < recordPaths.Count; i++)
        {
            var configPath = recordPaths[i];
            if (File.Exists(configPath))
            {
                var fileDate = File.GetLastWriteTimeUtc(configPath);
                string fileContent = File.ReadAllText(configPath);
                var pathPieces = configPath.Split(System.IO.Path.DirectorySeparatorChar, StringSplitOptions.RemoveEmptyEntries);
                var fileName = pathPieces[pathPieces.Length - 1];
                recordsList.Add(new RecordHolder()
                {
                    Name = fileName,
                    Date = fileDate,
                    JDoc = fileContent
                });
            }
        }
        return recordsList;
    }

    // This method will return list of all the files by downloading a tar.gz file
    // from a url and then extracting contents of tar.gz into a folder.
    // Maybe this code can be simplified better - I am doing lot of boolean checks here
    // not sure if that's good.
    private IList<string> GetAllTheFiles(string url, string fileName, string directory)
    {
        IList<string> allFiles = new List<string>();
        bool isDownloadSuccessful = DownloadConfigs(url, fileName).Result;
        if (!isDownloadSuccessful)
        {
            return allFiles;
        }
        bool isExtracted = ExtractTarGz(fileName, directory);
        if (!isExtracted)
        {
            return allFiles;
        }
        return GetFiles(directory);
    }

    // This method will download a tar.gz file from a remote url and save it onto the disk
    // in a particular folder
    private async Task<bool> DownloadConfigs(string remoteUrl, string fileName)
    {
        var policyResult = await Policy
           .Handle<TaskCanceledException>()
           .WaitAndRetryAsync(retryCount: 5, sleepDurationProvider: i => TimeSpan.FromMilliseconds(500))
           .ExecuteAndCaptureAsync(async () =>
           {
               using (var httpResponse = await _httpClient.GetAsync(remoteUrl + fileName).ConfigureAwait(false))
               {
                   httpResponse.EnsureSuccessStatusCode();
                   return await httpResponse.Content.ReadAsByteArrayAsync().ConfigureAwait(false);
               }
           }).ConfigureAwait(false);

        if (policyResult.Outcome == OutcomeType.Failure || policyResult.Result == null)
            return false;
        try
        {
            // write all the content of tar.gz file onto the disk
            File.WriteAllBytes(fileName, policyResult.Result);
            return true;
        }
        catch (Exception ex)
        {
            // log error
            return false;
        }
    }

    // This method extracts contents of tar.gz file in a directory
    private bool ExtractTarGz(string fileName, string directory)
    {
        try
        {
            Stream inStream = File.OpenRead(fileName);
            Stream gzipStream = new GZipInputStream(inStream);

            TarArchive tarArchive = TarArchive.CreateInputTarArchive(gzipStream);
            tarArchive.ExtractContents(directory);
            tarArchive.Close();

            gzipStream.Close();
            inStream.Close();
        }
        catch (Exception ex)
        {
            // log error
            return false;
        }
        return true;
    }

    // This method gets list of all files in a folder matching particular suffix
    private IList<string> GetFiles(string path)
    {
        var allFiles = new List<string>();
        try
        {
            var jsonFiles = Directory.GetFiles(path, "*.json", SearchOption.AllDirectories);
            var testFiles = Directory.GetFiles(path, "*.txt", SearchOption.AllDirectories);
            allFiles.AddRange(jsonFiles);
            allFiles.AddRange(testFiles);
        }
        catch (UnauthorizedAccessException ex)
        {
            // log error
        }
        return allFiles;
    }

    // This method will load all the default local configs in memory
    // if  `loadDefaultFlag` is enabled or cannot talk to remote url location
    private IList<RecordHolder> LoadDefaultConfigsInMemory()
    {
        var configs = new List<RecordHolder>();
        var recordPaths = GetFiles(_localPath);
        for (int i = 0; i < recordPaths.Count; i++)
        {
            var configPath = recordPaths[i];
            if (File.Exists(configPath))
            {
                var fileDate = File.GetLastWriteTimeUtc(configPath);
                string fileContent = File.ReadAllText(configPath);
                var pathPieces = configPath.Split(System.IO.Path.DirectorySeparatorChar, StringSplitOptions.RemoveEmptyEntries);
                var fileName = pathPieces[pathPieces.Length - 1];
                configs.Add(new RecordHolder()
                {
                    Name = fileName,
                    Date = fileDate,
                    JDoc = fileContent
                });
            }
        }
        return configs;
    }

    private bool UpdateRecords(IList<RecordHolder> recordsHolder)
    {
        // leaving out this code as it just updates the config in memory
    }

}

Opting for a code review here. I am specifically interested in the way I have designed and implemented my code. I am sure there must be a better way to rewrite this whole class efficiently with clear design and implementation. Also there are few methods above which could be written in a better and efficient way as well.

The idea is very simple – During server startup, either load default local configs already present on the disk or load it from a secondary folder on the disk after downloading it from remote url location.

2 Answers

In additional to @Reinderien answer:

Constructor

You're doing much work in your constructor, consider moving most of the configuration part into a separate method, and just keep the constructors to work on validating its parameters only, if you want any other code to be executed with the constructor, just put it inside a private method, then recall it from the constructor to initialize your configuration or required logic. Also, don't use optional parameters on the constructor arguments. Use overloads instead, as it would be safer for future changes, and also, to avoid any confusion.

Naming Convention

while your naming methodology is partially clear to me, but it took me sometime to follow up your code because of the naming confusion. For instance, GetAllTheFiles and GetFiles this confused me at first, but when I dig into the code, it came clear that GetFiles is for getting the files from the local disk, and GetAllTheFiles would download the remotely file. So, you need to consider naming your objects based on their logic and result. for instance, GetAllTheFiles can be renamed to something like `GetConfigurationFileFromServer' (just an example).

Methods

It's partially unclear, and could be misled others. As your requirements is clear (switch between local and remote configuration). you'll need to minimize them to have a better code clarity. Some methods can be used as helper methods like GetFiles so it would be useful to create a separate helper class for managing files, and then use this class. This way, you'll have a chance of reusing these methods in any part of the project.

Design Pattern

I suggest to try to find a design pattern that fits your current project, as designing your objects in a clear design would give you many advantages in which would make it easier to bind for future changes.

For instance, you could use Fluent API design pattern, here is an example of your code (including some changes based on the notes above).

public class RecordManager
{
    private const string _remoteUrl = "remote-url-from-where-to-download-new-configs";
    private string _remoteFileName; 
    
    private const string SecondaryLocation = "SecondaryConfigs";
    private readonly IConfiguration _configuration;
    private readonly string _localPath; 
    private IEnumerable<RecordHolder> _records; 
    private readonly FileHelper _fileHelper = new FileHelper();
    
    public enum ConfigLocation { System, Local, Remote }
    
    public RecordManager(IConfiguration configuration, string localPath)
    {
        if(configuration == null) { throw new ArgumentNullException(nameof(configuration)); }
        
        if(localPath?.Length == 0) { throw new ArgumentNullException(nameof(localPath)); }
        
        _localPath = localPath;
        _configuration = configuration;
        ChangeToken.OnChange(configuration.GetReloadToken, _ => ConfigChanged(), new object());
    }
    
    public RecordManager(IConfiguration configuration) : this(configuration, "Records") { } 
    
    public RecordManager LoadConfigurationsFrom(ConfigLocation location)
    {
        switch(location)
        {
            case ConfigLocation.Remote:
                _records = GetConfigurationsFromServer();
                break; 
            case ConfigLocation.Local:
                _records = GetConfigurationsFromLocalFiles();
                break; 
            case ConfigLocation.System:
                _records = IsConfigruationFromServer() ?  GetConfigurationsFromServer() : GetConfigurationsFromLocalFiles();
                break;  
        }
        
        return this; 
    }
    
    public void Save()
    {
        // finalize your work.
    }

    private bool IsConfigruationFromServer()
    {
        string configValue = configuration["configKey"];

        if (string.IsNullOrWhiteSpace(configValue)){ return false; }
        
        var dcc = JsonConvert.DeserializeObject<RecordPojo>(configValue);
        
        // use conditional access instead of casting to avoid casting exceptions 
        // also you only need a valid boolean value, any other value should be ignored.
        if(!bool.TryParse(dcc.loadDefaultFlag?.ToString(), out bool loadDefaultFlag)) { return false; }
        
        _remoteFileName = dcc.remoteFileName;
        
        return !loadDefaultFlag && !string.IsNullOrWhiteSpace(dcc.remoteFileName);
    }
    
    // adjust this to be parameterless
    // use the global variables _remoteUrl, _remoteFileName instead
    private IEnumerable<RecordHolder> GetConfigurationsFromServer()
    {       
        var isDownloaded = _fileHelper.Download($"{_remoteUrl}{_remoteFileName}", _secondaryLocation);
        
        if(!isDownloaded) { yield return default; }
        
        var isExtracted = _fileHelper.ExtractTarGz(_remoteFileName, _directory);
        
        if(!isExtracted) { yield return default; }
        
        foreach(var configPath in _fileHelper.GetFiles(directory))
        {
            if(!File.Exists(configPath)) { continue; }
            
            var fileDate = File.GetLastWriteTimeUtc(configPath);
            
            var fileContent = File.ReadAllText(configPath);
            
            var pathPieces = configPath.Split(System.IO.Path.DirectorySeparatorChar, StringSplitOptions.RemoveEmptyEntries);
            
            var fileName = pathPieces[pathPieces.Length - 1];
          
            yield return new RecordHolder
            {
                Name = fileName,
                Date = fileDate,
                JDoc = fileContent
            };
        }
    }


    private IEnumerable<RecordHolder> GetConfigurationsFromLocalFiles()
    {
        // Same concept as GetConfigurationsFromServer 
    }

}

usage would be like :

new RecordManager(configuration)
    .LoadConfigurationsFrom(RecordManager.ConfigLocation.Remote)
    .Save();

I hope this would give you the boost you're seeking.

From Comments :

Btw can you also explain what is the use of yield here and what advantage does it have compared to what I had earlier.

yield keyword basically a shortcut of what you've already done in the same method, but with an effective and more efficient enumeration.

It would create a lazy enumeration over a managed collection elements that would only create what you asked for nothing more nothing less. (say you're iterating over 100 elements, and you just need the first element, it'll only build a collection for one element and it would ignore the rest). and it works with IEnumerable only. I encourage you to read more about it and try to use it when possible.

Also what does yield return default means here?

it would return the default value of the current element type. Say you're enumerating over int collection. the default value of int is 0 since it's non-nullable type. same thing for other types (each type has its own default value).

Answered by iSR5 on October 27, 2021

Coalesce abuse

There's no reason for this to use ??, since the value of the second half of the expression isn't actually used:

_localPath = localPath ?? throw new ArgumentNullException(nameof(localPath));
    

Just use if (localPath == null).

Anonymous lambda

Try replacing this:

_ => ConfigChanged()

with ConfigChanged (no parens). This should bind to the function itself rather than wrapping it in a lambda. Under certain circumstances I seem to remember this needing a cast and I'm not sure whether that's needed here.

Log the error

    catch (Exception ex)
    {
        // log error
    }

Okay? But you didn't log it. That needs to happen.

For-each

    for (int i = 0; i < recordPaths.Count; i++)
    {
        var configPath = recordPaths[i];

should use a simple foreach.

IDisposable

This:

        TarArchive tarArchive = TarArchive.CreateInputTarArchive(gzipStream);
        tarArchive.ExtractContents(directory);
        tarArchive.Close();

should be checked for inheritance from IDisposable. If that is the case, remove your explicit Close and use a using statement. using should also be used for the two Streams in that method.

See https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/using-statement for more details.

Read your library's documentation:

Implements

System.IDisposable

So it can be used as using (TarArchive tarArchive = TarArchive.CreateInputTarArchive(gzipStream)) { ... }

Answered by Reinderien on October 27, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP