Perl - Splitting a string

Question

I've decided to work on my Perl skills. I've written a small subroutine that splits a string based on an optional delimiter. I'd like any feedback on this program so I can kick bad habits to the curb. Written in Perl 5.

sub split_string {
    my $string = @_[0];
    my $delimiter = @_[1] ? @_[1] : " ";
    my @result = ();
    my $temp = "";

for $i (0..length($string)) {
        my $char = substr($string, $i, 1);
        if (($char eq $delimiter) or $i == length($string)) {
            push(@result, $temp);
            $temp = "";
        } else {
            $temp .= $char;
        }
    }
    return @result;
}

And this is how I test this subroutine.

@test = split_string("This is a test to ensure this works correctly.");
foreach $element (@test) {
    print $element . "n";
}

daotoad · Answer

All the main code comments you've had so far are solid.

I thought I would address your approach to testing.

If you write your code as a module that can be loaded with "use" it's very easy to use Perl's extensive testing toolset.

You can use the classical Test::Simple and Test::More modules that are built in with many versions of Perl. But if you are comfortable with CPAN module installation (it's worth learning, if you aren't) you can install the newer Test2 suite which makes writing tests even easier.

Check out Test2::V0 which is a nice big bundle of testing functions.

Also see the introduction to the tools.

In short write your code like:

package MySplit;

use Exporter qw<import>;
our @EXPORT_OK = qw( split_string );

sub split_string {
    # do stuff
}

1;

Then write tests like:

#!/bin/env perl
use strict;
use warnings;
use Test::V0;

use MySplit qw< split_string >;

is  split_string('1,2,4'),
    array {
        item 1;
        item 2;
        item 4;
        end();
    }, 
    "Basic split works";

done_testing();

The above sample doesn't really dig into the expressiveness and power of the comparator constructors. You can see more in the docs.

Ronen Moldovan · Answer

I agree with the previous answer, especially the remarks about using the strict and warnings pragmas. I fixed so many Perl bugs which could be seen easily using these pragmas.
First of all you should know that Perl's split command uses regular expression as the delimiter, would you like to dare and write a regular expression based split_string?
Secondly, to make it look more as the Perl's split you could use prototypes (which will also check for correct parameter passing):
sub split_string ($;$);

Than you can call the function as following (note that there is no need for parenthesis):
my $test = split_string "This is a test to ensure this works correctly.";

I like to use prototypes when I write basic functions.

Håkon Hægland · Answer

- Use the strict and warnings pragmas

This helps catch many errors at an early stage.

- Declare lexical variables with my instead of using package variables

If you define variables without having declared them they will be defined as package variables (which are seen by all code in your package). Note that if you use the strict pragma you need to declare package variables with our.

- use say instead of print

Since perl version 5.10 you can use say to print a line and add the line terminator (newline character) automatically. Just remember to enable the the feature with use feature qw(say).

- Unpack arguments to a function/method from the @_ array for clarity

Prefer my ($str, $delim) = @_ over my $str = $_[0]; my $delim = $_[1]

- Use $array[$N] to refer to the ($N+1)th element of @array.

In you code you used @_[1] to refer to the second element of the @_ array. The correct syntax is to use $_[1].

- Do not use parenthesis around argument for builtin functions if not necessary.

In Perl parenthesis around function arguments is optional. A common style is to avoid parenthesis around builtin function calls. This reduces visual clutter and disambiguates built-in functions from user functions, see also What is the reason to use parenthesis-less subroutine calls in Perl?

- Don't declare empty arrays with empty parenthesis. Simply use my @arr;

- Return a reference to an array and not an array value.

By returning a reference you avoid copying, but see also In perl, when assigning a subroutine's return value to a variable, is the data duplicated in memory?

- Don't reinvent the wheel, use the Perl builtin function split

You tagged your question with [reinventing-the-wheel] so I assume this is for learning purposes only.

Here is a revised version of your code that implements the above comments:

use feature qw(say);
use strict;
use warnings;

{ # <-- create a scope so lexical variable does not "leak" into the subs below

my $test = split_string("This is a test to ensure this works correctly.");
    foreach my $element (@$test) {
        say $element;
    }
}

sub split_string {
    my ( $string, $delimiter ) = @_;

$delimiter //= " ";
    my @result;
    my $temp = "";

for my $i (0..(length $string)) {
        my $char = substr $string, $i, 1;
        if (($char eq $delimiter) or $i == (length $string)) {
            push @result, $temp;
            $temp = "";
        } else {
            $temp .= $char;
        }
    }
    return @result;
}

Perl - Splitting a string

3 Answers

- Use the `strict` and `warnings` pragmas

- Declare lexical variables with `my` instead of using package variables

- use `say` instead of `print`

- Unpack arguments to a function/method from the `@_` array for clarity

- Use `$array[$N]` to refer to the (`$N+1`)th element of `@array`.

- Do not use parenthesis around argument for builtin functions if not necessary.

- Don't declare empty arrays with empty parenthesis. Simply use `my @arr`;

- Return a reference to an array and not an array value.

- Don't reinvent the wheel, use the Perl builtin function `split`

Add your own answers!

Ask a Question