File Handling
by pixelatedcyberdust 4-19-04
Sooner or later you'll want
the ability to read, write, set permissions, search or many other options
to or from files. Working with files makes data storage and retrieval
a simple yet powerful task, especially for projects that don't require
full blown databases.
If you have been reading the
tutorials from the beginning, you'll have come across on at least on occasion
of us opening a file for both reading and writing. This is the partial
outline for this tutorial but it will be broaden to include all aspects
of file handling.
Opening a file for reading
One of the must
powerful and fundamental tools in Perl is the ability to open a file and
read the contents into the script for processing.
open (FILEHANDLE,
"file.txt");
When reading or
writing a file, it is important to know that the file contents are stored
into the scripts memory. Once the information is in the filehandle,
you can call back on the filehandle as often as you want for either reading
OR writing. You know longer have direct access to the file, you
slurp the contents or write the contents to the filehandle, the filehandle
either has the information to read or it has the information to write
back to it.
FILEHANDLE can
be nearly anything you want, most programmers prefer to type these in
all caps to prevent confusion and these should start with a letter.
open (DATA,
"file.txt");
This is the same
as the above, only difference is we are storing the contents of file.txt
inside the filehandle DATA.
open (DATA,
"file.txt") or die "an error occured: $!";
By default, if
you attempt to open a file that cannot be opened (if the permissions were
set improperly, the file doesn't exist, etc), you will not be informed
of this problem. Your script will appear to be running properly
but the information you attempted to read or write will have undesirable
results. or die will force the script to crash if for any
reason the file you are trying to access is not available or functioning
properly. $! is Perl's built-in error variable, if the script dies
the last error before it terminated will be stored in this variable.
If file.txt did
not exist, you may get an error: "an error occured: file does
not exist" which is very helpful information in figuring out
the source of the problem. You always want to include this sanity
check when opening a file.
To print the entire
slurped contents of your file, you could use a while loop on
the file handle which will continue to print everything line-for-line
until it reaches the EOF. Remember, while loops will continue
until EOF or until something forces it to return a non-true value.
while(<FILEHANDLE>)
{
print "$_\n";
}
File Attributes
This is
the chart of the different attributes you may use while opening a file.
| FILE SETTINGS |
|
< file |
READ |
|
> file |
WRITE, CREATE, TRUNCATE |
|
>>file |
WRITE, APPEND, CREATE |
|
+< file |
READ, WRITE |
|
+> file |
READ, WRITE, TRUNCATE, CREATE |
|
+>> file |
READ, WRITE, CREATE, APPEND, CREATE |
Closing a file
All opened files
close in order of opening at the end of your script automatically but
it's safer if you close the filehandles when you're done using them.
If you are done with them, close them to prevent any accidental file changes
later on. If you happen to open three of them and not choose to
close any of them, you may accidentally overwrite the contents.
close(FILEHANDLE);
You close the
file by closing the filehandle you named when you originally
opened it. A common practice is to also catch errors if the filehandle
fails to close for some unforeseen language. This is done using
the same method as above.
close(FILEHANDLE)
or die "an error occured while trying to close the file: $!";
Opening
a file for writing:
We know how to
read the contents of a file, now it's time to learn how we can store our
information into one. The syntax between opening for reading and
writing are virtually the same, the only difference is the mode.
open(FILEHANDLE,
">file.txt") or die "cannot open file for reading:
$!";
The mode is the
character prepended to the file name, in this case it's ">".
This tells Perl we want to open the file for writing, not for reading.
Writing to a filehandle
is much like printing text to your screen on web site. We do this
using the print() method but when writing to a filehandle, we
need to specify that's what we want to do.
open(FILEHANDLE,
">file.txt") or die "cannot open file for reading:
$!";
print FILEHANDLE "I love goldfishes, they're so delicious!";
We are specifically
telling it to write our sentence to the FILEHANDLE instead of printing
it to screen. file.txt now contains the phrase "I love
goldfishes, they're so delicious!".
You can print
anything you want to a filehandle: text, variables, lists, you name it!
print FILEHANDLE
"$name";
print FILEHANDLE @list;
print FILEHANDLE %hash;
If we defined
these variables, we could print them each to file. The array and
hash are not embedded in quotes as to not interpolate them.
If you print like
we did above, all the content will be on the same line (or rather, it
will be multi-line but without separators between the data). This
means if we had
my $cat = "meow";
my $dog = "purrr";
open(FILEHANDLE, ">file.txt") or die "cannot open
file for reading: $!";
print FILEHANDLE "$cat";
print FILEHANDLE "$dog";
Our results would
be on one line: "meowpurrr". Typically when writing to
a file, we want to include line breaks so we know where new pieces of
information begins. Tack on a new line feed \n to the prints
and it will break the content for you.
my $cat = "meow";
my $dog = "purrr";
open(FILEHANDLE, ">file.txt") or die "cannot open
file for reading: $!";
print FILEHANDLE "$cat\n";
print FILEHANDLE "$dog\n";
meow
purr
Remember, as we
did when opening the file, we should close the filehandle when we are
done writing data to it.
close(FILEHANDLE);
Binmode
Writing to a file
doesn't necissarily mean text or strings, you can write to files in binary
format. When trying to write an image, a sound file or an executable,
you'll need to write it in binary because some systems will translate
all newline feeds \n into carriage returns /r/n.
To prevent some
systems from automatically rewriting this, we use binmode.
binmode FILEHANDLE;
Binmode may take
a second argumentDISCIPLINE. DISCIPLINE is used to set
the mode; either :raw for binary or :crlf. If the discipline is
not found, by default it will write as binary.
binmode FILEHANDLE,
DISCIPLINE;
binmode FILEHANDLE,
:raw; # this is for binary files
binmode FILEHANDLE,
:crlf; # this is for text files
Please do not
use the first set of these three and literally reads DISCIPLINE,
this was merely to show you where you set the mode, this is not the actual
code for you to use.
We switch on binmode
after we open the filehandle but before we do anything else. Let's say
we are opening a file to write a string to it but we wanted to convert
it to binary first..
open(FILEHANDLE,
">text.dat") or die "an error occured: $!";
binmode FILEHANDLE;
print "Hello there, aliens from Earth!";
close(FILEHANDLE);
Buffering:
When printing
something to screen, it is first stored into a buffer. A buffer
will not print the information until the buffer is filled with information.
In some cases, we do not want to rely on the buffer and we want our data
to print as it comes up-- no more waiting for the buffer to get filled.
This is very useful
when you have a lot of individual processing that needs to be completed
and output displayed each step along the way. An example where I've
used anti-buffering was when I created the Link Checker tool.
It scans complete
chunks of code from the url's it's given and then prints a message for
each process. This has a large overhead and if I kept the buffering
at default, parts of the program would hang until there was enough information
to satisfy it.
$| = 1;
$| is among many of the other built-in functions.
If this is set to any non-zero (literally means if this is true)
value it will turn off buffering.
Retrieving by characters:
We know how to read and write
files, are there any other neat tricks for us to use? There certainly
is! One of them is getc. Instead of reading globs
of the file at one time (or literally one line of the file at once), we
can retrieve the files one character at a time.
We do this using getc,
which is short for Get Character.
getc FILEHANDLE;
To retrieve just
the very first character in a filehandle, we would do something like this..
open (FILEHANDLE,
"< test.txt") or die "oops: $!";
my $char = getc FILEHANDLE;
print $char;
close(FILEHANDLE);
If we wanted to
read the first X characters of a file, we could use a for loop.
The following example will read the first 20 characters of your filehandle
and print them to screen.
for ( 1 . .
20)
{
open (FILEHANDLE, "< test.txt") or die "oops:
$!";
my $char = getc FILEHANDLE;
print $char;
close(FILEHANDLE);
}
You can also print
the entire file until EOF using getc, even though this entirely
defeats the purpose of this function, here is an example of how you could
do it, not that you should!
open (FILEHANDLE,
"< test.txt") or die "oops: $!";
while(<FILEHANDLE>)
{
my $char = getc FILEHANDLE;
print $char;
}
close(FILEHANDLE);
Seek
Let's say we have
a 30MB text file (okay, if you have a 30 MB text file my hat is off to
you..that's very impressive and at the same time, very illogical) and
there is only a portion of it we want to read. It would be a waste
of resources and load time if we loaded the entire file into memory if
all we wanted was a few lines.
seek
gives us the power to begin reading a file wherever you want, instead
of always from the beginning and reading the entire file. If your
opening your file for reading creates a large overhead, seek
may just be the solution you are looking for.
seek FILEHANDLE,
POSITION, OPTION;
Option gives us
a little more control over position. The possible attributes
are 0-- set the new position, 1-- set new position plus position, 2--
sets position to the end of the file .
Position is the
location in the file, in bytes, where you want the next input to begin
from. If we wanted to read from the 10th byte until the end of file,
we would use..
open (FILEHANDLE,
"< test.txt") or die "oops: $!";
seek FILEHANDLE, 10, 0;
while(<FILEHANDLE>)
{
print;
}
close(FILEHANDLE);
Get current location in file
If we wanted to
know the location of our last read in the filehandle, we use tell.
This will show us the current position in the filehandle if one is specified,
if there is no specified filehandle then the last read will be used.
tell FILEHANDLE;
Using the last
example from seek, we are reading from the 10th byte. Just
to be sure, we will use tell to inform us where we actually are
within the filehandle.
open (FILEHANDLE,
"< test.txt") or die "oops: $!";
seek FILEHANDLE, 10, 0;
print tell FILEHANDLE;
close(FILEHANDLE);
2
File
Statistics:
There is so much
more information you can find out about a particular file. You can
check the size of the file, the time it was last accessed, the owner of
the file,etc. This is extremely helpful if you are a system administrator
and you need to watch how files are being used or if you build a file
management system.
An example from
this site that uses a few file statistics is the File Upload Pro which
determines the file size in bytes and then recalculates it into larger
units of measure if possible.
Chart:
$dev
- the file system device number
$ino
- inode number
$mode - mode of file
$nlink - counts number of links
to file
$uid
- the ID of the file's owner
$gid
- the group ID of the file's owner
$rdev - the device
identifier
$size - file size
in bytes
$atime - last access time
$mtime - last modification time
$ctime - last change of the mode
$blksize - block size of file
$blocks - number of blocks in a file
stat();
For our first
example, we will determine the size of a given file. We are setting
our variable $file (which could be any other name) to a filename then
setting up all possible file stats.
my $file = "test.txt";
my ($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size,
$atime, $mtime, $ctime, $blksize, $blocks) = stat($file);
print $size;
108664
All of the possible file statistics are already inside
of these variables. You can pick and choose which of these you are
interested in and manipulate/print the information.
Another example, if you wanted to see the owner's ID of
the file, we'd use $gid.
my $file = "test.txt";
my ($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size,
$atime, $mtime, $ctime, $blksize, $blocks) = stat($file);
print $uid;
0
Feel free to experiment with these other settings to get
a feel for what each of them have to say about a file (or file) you're
working with.
Challenges
1) What does the special variable
$! do and what is an example of how we would use it?
To view a possible
solution, highlight between the lines below:
------------------------------------------------------------------------
$! is a built-in variable
that holds the last recorded error in a program. If your program errors
out and you print $!, you may get an error that helps you debug the program.
An example:
open(FILE, "file.txt") or die "Oops, we had an error: $!";
------------------------------------------------------------------------
2) What is a filehandle
and what is the purpose behind them?
------------------------------------------------------------------------
A filehandle is the
reference to a file. It's an exact copy of the entire file stored
in memory. The purpose behind these is it is much safer for you
to read/write to filehandles than the actual files themselves. What
you do with your filehandle will not directly affect the file unless you
instruct it to.
------------------------------------------------------------------------
3) We want to write to an image
file, if we write as usual to the file we will get unexpected results.
What must we remember to do?
------------------------------------------------------------------------
When using files to create images, sounds,
flash, etc., you need to remember to switch binmode on to tell
Perl to write in binary. Failure to do this will make the file you're
writing to inaccessible.
------------------------------------------------------------------------
|