TransWikia.com

Perl - Splitting a string

Code Review Asked by Linny on February 7, 2021

I’ve decided to work on my Perl skills. I’ve written a small subroutine that splits a string based on an optional delimiter. I’d like any feedback on this program so I can kick bad habits to the curb. Written in Perl 5.

sub split_string {
    my $string = @_[0];
    my $delimiter = @_[1] ? @_[1] : " ";
    my @result = ();
    my $temp = "";

    for $i (0..length($string)) {
        my $char = substr($string, $i, 1);
        if (($char eq $delimiter) or $i == length($string)) {
            push(@result, $temp);
            $temp = "";
        } else {
            $temp .= $char;
        }
    }
    return @result;
}

And this is how I test this subroutine.

@test = split_string("This is a test to ensure this works correctly.");
foreach $element (@test) {
    print $element . "n";
}

3 Answers

All the main code comments you've had so far are solid.

I thought I would address your approach to testing.

If you write your code as a module that can be loaded with "use" it's very easy to use Perl's extensive testing toolset.

You can use the classical Test::Simple and Test::More modules that are built in with many versions of Perl. But if you are comfortable with CPAN module installation (it's worth learning, if you aren't) you can install the newer Test2 suite which makes writing tests even easier.

Check out Test2::V0 which is a nice big bundle of testing functions.

Also see the introduction to the tools.

In short write your code like:

package MySplit;

use Exporter qw<import>;
our @EXPORT_OK = qw( split_string );

sub split_string {
    # do stuff
}

1;

Then write tests like:

#!/bin/env perl
use strict;
use warnings;
use Test::V0;

use MySplit qw< split_string >;

is  split_string('1,2,4'),
    array {
        item 1;
        item 2;
        item 4;
        end();
    }, 
    "Basic split works";

done_testing();

The above sample doesn't really dig into the expressiveness and power of the comparator constructors. You can see more in the docs.

Answered by daotoad on February 7, 2021

I agree with the previous answer, especially the remarks about using the strict and warnings pragmas. I fixed so many Perl bugs which could be seen easily using these pragmas.

First of all you should know that Perl's split command uses regular expression as the delimiter, would you like to dare and write a regular expression based split_string?

Secondly, to make it look more as the Perl's split you could use prototypes (which will also check for correct parameter passing):

sub split_string ($;$);

Than you can call the function as following (note that there is no need for parenthesis):

my $test = split_string "This is a test to ensure this works correctly.";

I like to use prototypes when I write basic functions.

Answered by Ronen Moldovan on February 7, 2021

- Use the strict and warnings pragmas

This helps catch many errors at an early stage.

- Declare lexical variables with my instead of using package variables

If you define variables without having declared them they will be defined as package variables (which are seen by all code in your package). Note that if you use the strict pragma you need to declare package variables with our.

- use say instead of print

Since perl version 5.10 you can use say to print a line and add the line terminator (newline character) automatically. Just remember to enable the the feature with use feature qw(say).

- Unpack arguments to a function/method from the @_ array for clarity

Prefer my ($str, $delim) = @_ over my $str = $_[0]; my $delim = $_[1]

- Use $array[$N] to refer to the ($N+1)th element of @array.

In you code you used @_[1] to refer to the second element of the @_ array. The correct syntax is to use $_[1].

- Do not use parenthesis around argument for builtin functions if not necessary.

In Perl parenthesis around function arguments is optional. A common style is to avoid parenthesis around builtin function calls. This reduces visual clutter and disambiguates built-in functions from user functions, see also What is the reason to use parenthesis-less subroutine calls in Perl?

- Don't declare empty arrays with empty parenthesis. Simply use my @arr;

- Return a reference to an array and not an array value.

By returning a reference you avoid copying, but see also In perl, when assigning a subroutine's return value to a variable, is the data duplicated in memory?

- Don't reinvent the wheel, use the Perl builtin function split

You tagged your question with [reinventing-the-wheel] so I assume this is for learning purposes only.

Here is a revised version of your code that implements the above comments:

use feature qw(say);
use strict;
use warnings;

{ # <-- create a scope so lexical variable does not "leak" into the subs below

    my $test = split_string("This is a test to ensure this works correctly.");
    foreach my $element (@$test) {
        say $element;
    }
}

sub split_string {
    my ( $string, $delimiter ) = @_;

    $delimiter //= " ";
    my @result;
    my $temp = "";

    for my $i (0..(length $string)) {
        my $char = substr $string, $i, 1;
        if (($char eq $delimiter) or $i == (length $string)) {
            push @result, $temp;
            $temp = "";
        } else {
            $temp .= $char;
        }
    }
    return @result;
}

Answered by Håkon Hægland on February 7, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP