TransWikia.com

C# - name separation & last name in upper to Camel

Code Review Asked on January 3, 2021

I had an xls file with lots of full names in the following form at work:

+----------------------------+--------------+-----+-----+
|        [Full name]         | [More data]  |[...]|[...]|
+----------------------------|--------------------------+
|   Cristiano RONALDO        |     ...      | ... | ... |
+----------------------------+--------------+-----+-----+
|   Carol SEVILLA            |     ...      | ... | ... |
+----------------------------|--------------+-----+-----+
|   Ronald Chris MAC DONALDS |     ...      | ... | ... |
+----------------------------|--------------+-----+-----+

some of the data will still be input this way but I want to add a column for last name and let it clear that I don’t need the last name in upper case anymore, so I separated name from last name, and then changed the last name to camel case, notice that last names can have many words like "Mc Donalds Rodriguez" (it happens) so I solved it as follows

public static string GetLastNameFromFullName(string fullName)
{
    var lastName = "";
    foreach (var ch in fullName)
    {
        lastName += ch;
        if (char.IsLower(ch))
        {
            lastName = "";
        }
    }
    return lastName.TrimStart();
}

public static string GetCameledLastName(string lastNames)
{
    string[] lastNamesArr = lastNames.Split(' ');
    var lastNamesCameled = "";

    foreach (string lastNameUpper in lastNamesArr)
    {
        lastNamesCameled += lastNameUpper[0];
        for (int i = 1; i < lastNameUpper.Length; i++)
        {
            lastNamesCameled += char.ToLower(lastNameUpper[i]);
        }
    }
    return lastNamesCameled;
}

public static string GetNameWithoutLastName(string fullName)
{
    var possibleLastName = false;
    char possibleLastNameChar = ' '; //just initialized
    var name = "";
    foreach (var ch in fullName)
    {
        if (char.IsUpper(ch))
        {
            possibleLastNameChar = ch;
            if (possibleLastName)
            {
                break;
            }
            possibleLastName = true;
        }
        else
        {
            if (possibleLastName)
            {
                name += possibleLastNameChar;
            }
            name += ch;
            possibleLastName = false;
        }
    }
    return name;
}
private void Form1_Load(object sender, EventArgs e)
{
    var path = @"../../file.txt"; //dumped from xls file
    string contents = File.ReadAllText(path);

    using (StreamReader reader = new StreamReader(path, Encoding.GetEncoding("iso-8859-1"))) //some names had ñ or accented characters
    {
        string line;
        while ((line = reader.ReadLine()) != null)
        {
            var fullName = "";
            foreach (char ch in line)
            {
                if (ch == 't')
                {
                    //The columns in the xls file were divided by tab characters
                }
                else
                {
                    fullName += ch;
                }
            }
            var lastName = GetLastNameFromFullName(fullName);
            Console.WriteLine("Name: " + GetNameWithoutLastName(fullName));
            Console.WriteLine("Last name: " + GetCameledLastName(lastName));
        }
    }
}

I think my code could be a lot better.

UPDATE: please note that while it is true that cases like "Cinthia del Río" is an actual name that is not considered in this way, it will be converted to "Cinthia Del Rio" because in the xls file it would be in a single column as "Cinthia DEL RIO", and of course it is impossible for the algorithm to know that "DEL" should actually be "del" even though it is perfectly OK for a last name’s word to start with a lower case.

2 Answers

Well, I don't know if your code could be better or faster but the code could be a lot shorter by using some Linq-"magic".

Your code could use some level of input-parameter-validation because the methods in question are public which means anybody who uses these methods can pass whatever he/she wants, even null which would blow each method and would expose implementation details.

I don't know if the requirement is meant to be that passing Ronald Chris MAC DONALDS returns as lastname MacDonalds but for me this doesn't sound correct.

Instead of splitting the fullname twice and then splitting the lastname again, you should consider to just pass a string[] to the methods.

You could consider to have one public method where you pass the fullname and get a Tuple<string, string> so you would need only one parameter validation because you can make the other methods private.

Because a lastname contains only UpperCase letters we can take the passed string[] and take each string in this array which contains only upper-case letters, we will leave the first char because it allready is uppercase and take the remaining chars as lower-case chars. Last we join them using a space char like so

private static string GetLastName(string[] nameParts)
{
    return string.Join(" ", nameParts.Where(s => s.All(c => char.IsUpper(c)))
                                     .Select(s => s[0] + s.Substring(1).ToLowerInvariant()));
}  

For the firstname we know that not all chars are upper-case chars. So we take each string inside the passed array and check if any char is a lower-case char, and then join the found strings by using a space char like so

private static string GetFirstName(string[] nameParts)
{
    return string.Join(" ", nameParts.Where(s => s.Any(c => char.IsLower(c))));
}  

Last but not least we need to call these 2 methods after some proper validation like so

public static Tuple<string, string> GetNormalizedNames(string fullName)
{
    if (fullName == null) { throw new ArgumentNullException(nameof(fullName)); }
    if (string.IsNullOrWhiteSpace(fullName)) { return Tuple.Create("", ""); }

    var nameParts = fullName.Split(' ');

    return Tuple.Create(GetFirstName(nameParts), GetLastName(nameParts));
}  

which we then call like so

var firstNameLastNameTuple = GetNormalizedNames(fullName);
Console.WriteLine("Name: " + firstNameLastNameTuple.Item1);
Console.WriteLine("Last name: " + firstNameLastNameTuple.Item2);  

The whole code is now easier to read and therefor easier to maintain. Sure linq is only syntactic sugar and won't be faster than iterating over the chars by "hand" but the benefit is less and easier to read code.

Correct answer by Heslacher on January 3, 2021

just need to add another approach. You could use Substring and IndexOf to get the first and the last name without looping. The only loop that you need is on last name to camelCase it. Though, names that needed to be lowered case needs to be defined in an array or a switch statement when looping over the last name, that's if you need to add more precision on your output. Here is an example :

public static KeyValuePair<string, string> GetFirstAndLastName(string fullName)
{
    if(fullName?.Length == 0) { return; }

    // take the first name, trim any whitespace and camelcase it
    var firstName = ToCamelCase(fullName.Substring(0, fullName.IndexOf(' ') + 1).Trim());
    
    // take the last name, trim any whitespace, and convert it to array
    var lastNameArray = fullName.Substring(firstName.Length).Trim().Split(' '); 
    
    var lastName = string.Empty;

    foreach(var name in lastNameArray)
    {
        lastName += ToCamelCase(name) + " ";
    }

    lastName.TrimEnd();

    return new KeyValuePair<string, string>(firstName, lastName);
}

public static string ToCamelCase(string name)
{
    return name.Substring(0, 1).ToUpperInvariant() + name.Substring(1).ToLowerInvariant();
}

usage :

var firstLastName = GetFirstAndLastName(fullName);
Console.WriteLine($"Name: {firstLastName.Key}");
Console.WriteLine($"Last name: {firstLastName.Value}");

another note on :

string contents = File.ReadAllText(path);

it's not used, and even if it's used, it would be useless, since ReadAllText would open a StreamReader, so you either use ReadAllText or StreamReader, using both would be redundant.

Also, since your columns are separated by a tab, you can do this :

string line;

while ((line = reader.ReadLine()) != null)
{
    var columns = line.Split('t'); 

    if(columns != null && columns.Length > 0)
    {
        var fullName = columns[0]; 
        
        var firstLastName = GetFirstAndLastName(fullName);
        Console.WriteLine($"Name: {firstLastName.Key}");
        Console.WriteLine($"Last name: {firstLastName.Value}"); 
    }
}

finally, I would suggest you use any type of converter that would parse your CVS or excel file into DataTable or an object model to make your work much maintainable. So, you can set your validation process once, and focus on processing the data whenever needed.

Answered by iSR5 on January 3, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP