File Handling In Perl Scripts

RSS Newsfeed
for your site

DW News
Calendar
DW Forum

Right click on button and copy shortcut

Reminder

Remember to book mark this site!

File Handling

by pixelatedcyberdust 4-19-04

Sooner or later you'll want the ability to read, write, set permissions, search or many other options to or from files. Working with files makes data storage and retrieval a simple yet powerful task, especially for projects that don't require full blown databases.

If you have been reading the tutorials from the beginning, you'll have come across on at least on occasion of us opening a file for both reading and writing. This is the partial outline for this tutorial but it will be broaden to include all aspects of file handling.

Opening a file for reading

One of the must powerful and fundamental tools in Perl is the ability to open a file and read the contents into the script for processing.

open (FILEHANDLE, "file.txt");

When reading or writing a file, it is important to know that the file contents are stored into the scripts memory. Once the information is in the filehandle, you can call back on the filehandle as often as you want for either reading OR writing. You know longer have direct access to the file, you slurp the contents or write the contents to the filehandle, the filehandle either has the information to read or it has the information to write back to it.

FILEHANDLE can be nearly anything you want, most programmers prefer to type these in all caps to prevent confusion and these should start with a letter.

open (DATA, "file.txt");

This is the same as the above, only difference is we are storing the contents of file.txt inside the filehandle DATA.

open (DATA, "file.txt") or die "an error occured: $!";

By default, if you attempt to open a file that cannot be opened (if the permissions were set improperly, the file doesn't exist, etc), you will not be informed of this problem. Your script will appear to be running properly but the information you attempted to read or write will have undesirable results. or die will force the script to crash if for any reason the file you are trying to access is not available or functioning properly. $! is Perl's built-in error variable, if the script dies the last error before it terminated will be stored in this variable.

If file.txt did not exist, you may get an error: "an error occured: file does not exist" which is very helpful information in figuring out the source of the problem. You always want to include this sanity check when opening a file.

To print the entire slurped contents of your file, you could use a while loop on the file handle which will continue to print everything line-for-line until it reaches the EOF. Remember, while loops will continue until EOF or until something forces it to return a non-true value.

while(<FILEHANDLE>)
{
print "$_\n";
}

File Attributes

This is the chart of the different attributes you may use while opening a file.

FILE SETTINGS
< file	READ
> file	WRITE, CREATE, TRUNCATE
>>file	WRITE, APPEND, CREATE
+< file	READ, WRITE
+> file	READ, WRITE, TRUNCATE, CREATE
+>> file	READ, WRITE, CREATE, APPEND, CREATE

Closing a file

All opened files close in order of opening at the end of your script automatically but it's safer if you close the filehandles when you're done using them. If you are done with them, close them to prevent any accidental file changes later on. If you happen to open three of them and not choose to close any of them, you may accidentally overwrite the contents.

close(FILEHANDLE);

You close the file by closing the filehandle you named when you originally opened it. A common practice is to also catch errors if the filehandle fails to close for some unforeseen language. This is done using the same method as above.

close(FILEHANDLE) or die "an error occured while trying to close the file: $!";

Opening a file for writing:

We know how to read the contents of a file, now it's time to learn how we can store our information into one. The syntax between opening for reading and writing are virtually the same, the only difference is the mode.

open(FILEHANDLE, ">file.txt") or die "cannot open file for reading: $!";

The mode is the character prepended to the file name, in this case it's ">". This tells Perl we want to open the file for writing, not for reading.

Writing to a filehandle is much like printing text to your screen on web site. We do this using the print() method but when writing to a filehandle, we need to specify that's what we want to do.

open(FILEHANDLE, ">file.txt") or die "cannot open file for reading: $!";
print FILEHANDLE "I love goldfishes, they're so delicious!";

We are specifically telling it to write our sentence to the FILEHANDLE instead of printing it to screen. file.txt now contains the phrase "I love goldfishes, they're so delicious!".

You can print anything you want to a filehandle: text, variables, lists, you name it!

print FILEHANDLE "$name";
print FILEHANDLE @list;
print FILEHANDLE %hash;

If we defined these variables, we could print them each to file. The array and hash are not embedded in quotes as to not interpolate them.

If you print like we did above, all the content will be on the same line (or rather, it will be multi-line but without separators between the data). This means if we had

my $cat = "meow";
my $dog = "purrr";

open(FILEHANDLE, ">file.txt") or die "cannot open file for reading: $!";
print FILEHANDLE "$cat";
print FILEHANDLE "$dog";

Our results would be on one line: "meowpurrr". Typically when writing to a file, we want to include line breaks so we know where new pieces of information begins. Tack on a new line feed \n to the prints and it will break the content for you.

my $cat = "meow";
my $dog = "purrr";

open(FILEHANDLE, ">file.txt") or die "cannot open file for reading: $!";
print FILEHANDLE "$cat\n";
print FILEHANDLE "$dog\n";

meow
purr

Remember, as we did when opening the file, we should close the filehandle when we are done writing data to it.

close(FILEHANDLE);

Binmode

Writing to a file doesn't necissarily mean text or strings, you can write to files in binary format. When trying to write an image, a sound file or an executable, you'll need to write it in binary because some systems will translate all newline feeds \n into carriage returns /r/n.

To prevent some systems from automatically rewriting this, we use binmode.

binmode FILEHANDLE;

Binmode may take a second argumentDISCIPLINE. DISCIPLINE is used to set the mode; either :raw for binary or :crlf. If the discipline is not found, by default it will write as binary.

binmode FILEHANDLE, DISCIPLINE;

binmode FILEHANDLE, :raw; # this is for binary files

binmode FILEHANDLE, :crlf; # this is for text files

Please do not use the first set of these three and literally reads DISCIPLINE, this was merely to show you where you set the mode, this is not the actual code for you to use.

We switch on binmode after we open the filehandle but before we do anything else. Let's say we are opening a file to write a string to it but we wanted to convert it to binary first..

open(FILEHANDLE, ">text.dat") or die "an error occured: $!";
binmode FILEHANDLE;
print "Hello there, aliens from Earth!";
close(FILEHANDLE);

Buffering:

When printing something to screen, it is first stored into a buffer. A buffer will not print the information until the buffer is filled with information. In some cases, we do not want to rely on the buffer and we want our data to print as it comes up-- no more waiting for the buffer to get filled.

This is very useful when you have a lot of individual processing that needs to be completed and output displayed each step along the way. An example where I've used anti-buffering was when I created the Link Checker tool.

It scans complete chunks of code from the url's it's given and then prints a message for each process. This has a large overhead and if I kept the buffering at default, parts of the program would hang until there was enough information to satisfy it.

$| = 1;

$| is among many of the other built-in functions. If this is set to any non-zero (literally means if this is true) value it will turn off buffering.

Retrieving by characters:

We know how to read and write files, are there any other neat tricks for us to use? There certainly is! One of them is getc. Instead of reading globs of the file at one time (or literally one line of the file at once), we can retrieve the files one character at a time.

We do this using getc, which is short for Get Character.

getc FILEHANDLE;

To retrieve just the very first character in a filehandle, we would do something like this..

open (FILEHANDLE, "< test.txt") or die "oops: $!";
   my $char = getc FILEHANDLE;
   print $char;
close(FILEHANDLE);

If we wanted to read the first X characters of a file, we could use a for loop. The following example will read the first 20 characters of your filehandle and print them to screen.

for ( 1 . . 20)
{
open (FILEHANDLE, "< test.txt") or die "oops: $!";
       my $char = getc FILEHANDLE;
       print $char;
   close(FILEHANDLE);
}

You can also print the entire file until EOF using getc, even though this entirely defeats the purpose of this function, here is an example of how you could do it, not that you should!

open (FILEHANDLE, "< test.txt") or die "oops: $!";
while(<FILEHANDLE>)
{
   my $char = getc FILEHANDLE;
   print $char;
}
close(FILEHANDLE);

Seek

Let's say we have a 30MB text file (okay, if you have a 30 MB text file my hat is off to you..that's very impressive and at the same time, very illogical) and there is only a portion of it we want to read. It would be a waste of resources and load time if we loaded the entire file into memory if all we wanted was a few lines.

seek gives us the power to begin reading a file wherever you want, instead of always from the beginning and reading the entire file. If your opening your file for reading creates a large overhead, seek may just be the solution you are looking for.

seek FILEHANDLE, POSITION, OPTION;

Option gives us a little more control over position. The possible attributes are 0-- set the new position, 1-- set new position plus position, 2-- sets position to the end of the file .

Position is the location in the file, in bytes, where you want the next input to begin from. If we wanted to read from the 10th byte until the end of file, we would use..

open (FILEHANDLE, "< test.txt") or die "oops: $!";
seek FILEHANDLE, 10, 0;
while(<FILEHANDLE>)
{
print;
}
close(FILEHANDLE);

Get current location in file

If we wanted to know the location of our last read in the filehandle, we use tell. This will show us the current position in the filehandle if one is specified, if there is no specified filehandle then the last read will be used.

tell FILEHANDLE;

Using the last example from seek, we are reading from the 10th byte. Just to be sure, we will use tell to inform us where we actually are within the filehandle.

open (FILEHANDLE, "< test.txt") or die "oops: $!";
seek FILEHANDLE, 10, 0;
print tell FILEHANDLE;
close(FILEHANDLE);

2

File Statistics:

There is so much more information you can find out about a particular file. You can check the size of the file, the time it was last accessed, the owner of the file,etc. This is extremely helpful if you are a system administrator and you need to watch how files are being used or if you build a file management system.

An example from this site that uses a few file statistics is the File Upload Pro which determines the file size in bytes and then recalculates it into larger units of measure if possible.

Chart:
$dev         - the file system device number
$ino         - inode number
$mode     - mode of file
$nlink     - counts number of links to file
$uid         - the ID of the file's owner
$gid         - the group ID of the file's owner
$rdev       - the device identifier
$size        - file size in bytes
$atime   - last access time
$mtime    - last modification time
$ctime - last change of the mode
$blksize - block size of file
$blocks - number of blocks in a file

stat();

For our first example, we will determine the size of a given file. We are setting our variable $file (which could be any other name) to a filename then setting up all possible file stats.

my $file = "test.txt";

my ($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size, $atime, $mtime, $ctime, $blksize, $blocks) = stat($file);

print $size;

         108664

All of the possible file statistics are already inside of these variables. You can pick and choose which of these you are interested in and manipulate/print the information.

Another example, if you wanted to see the owner's ID of the file, we'd use $gid.

my $file = "test.txt";

my ($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size, $atime, $mtime, $ctime, $blksize, $blocks) = stat($file);

print $uid;

        0

Feel free to experiment with these other settings to get a feel for what each of them have to say about a file (or file) you're working with.

Challenges

1) What does the special variable $! do and what is an example of how we would use it?

To view a possible solution, highlight between the lines below:
------------------------------------------------------------------------
$! is a built-in variable that holds the last recorded error in a program. If your program errors out and you print $!, you may get an error that helps you debug the program.

An example:
open(FILE, "file.txt") or die "Oops, we had an error: $!";
------------------------------------------------------------------------

2) What is a filehandle and what is the purpose behind them?
------------------------------------------------------------------------
A filehandle is the reference to a file. It's an exact copy of the entire file stored in memory. The purpose behind these is it is much safer for you to read/write to filehandles than the actual files themselves. What you do with your filehandle will not directly affect the file unless you instruct it to.
------------------------------------------------------------------------

3) We want to write to an image file, if we write as usual to the file we will get unexpected results. What must we remember to do?
------------------------------------------------------------------------
When using files to create images, sounds, flash, etc., you need to remember to switch binmode on to tell Perl to write in binary. Failure to do this will make the file you're writing to inaccessible.
------------------------------------------------------------------------