TransWikia.com

Taking n lines from a file which contains m lines, repeating the file lazily if necessary

Code Review Asked on October 27, 2021

The desire for such a function came out of the necessity of generating a "source" of Lorem ipsum by repeating over and over the same 50 or something paragraphs of the text.

Actually I am curious to know how to do it in C++ (I guess boost::hana has something to offer in this respect), and I have asked a question on StackOverflow about it aleady.

However, to get a better understanding of the functional approach, here’s my working solution in Haskell,

import System.IO (readFile)
main :: IO ()
main = (x -> readFile "file" >>= putStrLn . unlines . take x . concat . repeat . lines)
       =<< return . read
       =<< getLine

where

  • I’m not sure about the return . read type being the best choice there where it is (one reason being that it doesn’t crash gracefully if main is fed with a non-numeric input), and
  • I haven’t written the lambda totally in point-free because I think it would be less readable.
  • Most importantly, I’m not sure this works lazily in all its parts, but I’d really like to understand this bit.

The program above assumes the current directory contains a text file named file, which can contain something like this,

line1
line2
line3

Given this, one can compile and run obtaining this, for instance:

ghc --make -dynamic program.hs && ./program <<< 7
line1
line2
line3
line1
line2
line3
line1

Any suggestions?

One Answer

  • Even though module Main is implied when no header is given, I'd write it.
  • return . read =<< getLine = read <$> getLine = fmap read getLine is a worse version of readLn. read <$> getLine is basically return (error msg) when the parse fails. This passes the bottoming value into the "pure" part of your code. readLn badstr crashes at a well defined time—upon its execution—and if you actually wanted to handle the error you'd use Text.Read.readMaybe or readEither.
  • concat . repeat is cycle. Gets your point across better, doesn't it?
  • putStrLn . unlines is a bug. It causes an extra newline in the input: unlines creates a trailing newline and putStrLn creates another. Use putStr.
  • I don't hate the "expression" style here, but once we replace the =<< return . read =<< getLine with =<< readLine you may as well bring it to the front and use >>=, and at that point let's just use a do.
  • x? Really?

So:

module Main (main) where -- not my style of spacing; but fine
import System.IO (readFile)

main = do
    wanted <- readLn
    contents <- readFile "file"
    putStr $ unlines $ take wanted $ cycle $ lines contents

Though I'd also take

main = do
    wanted <- readLn
    putStr . unlines . take wanted . cycle . lines =<< readFile "file"

Or, if you're attached

main = readLn >>= (wanted -> putStr . unlines . take wanted . cycle . lines =<< readFile "file")

As to your concern about laziness; this is perfectly fine. readFile is lazy, and so is lines. cycle is actually better than concat . repeat; the latter keeps allocating (:)-cells as long as take consumes them, while cycle will simply connect a pointer that cycles back to the beginning of the output list, which should make it faster. Also, it errors instead of potentially looping forever on an empty input. take is of course lazy, and we've eliminated the excess laziness that would cause any parse errors to be triggered in it. Passing an infinite input (am on a *nix, so I used yes(1) and mkfifo(1)) into this program doesn't break it, which is a good litmus test for laziness.

Now, as to the functionality of this program: the wanted number should really be an argument, and so should the file (which should be optional and default to stdin). That's done with getArgs. This is complicated enough to move into new definitions. Let's get that readMaybe going, too. (Hoogle the new names to find the imports.)

dieWithUsage :: IO a
dieWithUsage = do
    name <- getProgName
    die $ "Usage: " ++ name ++ " lines [file]"

parseArgs :: [String] -> IO (Int, Handle)
parseArgs [] = dieWithUsage
parseArgs (wanted : rest) = (,) <$> getWanted <*> getHandle
    where getWanted = maybe dieWithUsage return $ readMaybe wanted
          getHandle = case rest of
              [] -> return stdin
              [path] -> openFile path ReadMode
              _ -> dieWithUsage

leaving

main = do
    (wanted, input) <- parseArgs =<< getArgs
    putStr . unlines . take wanted . cycle . lines =<< hGetContents input

Answered by HTNW on October 27, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP