What Does Read() Return at Eof in Python
Update Mar 14, 2020: I'thou working on an update to the article based on all the feedback I've received so far. Stay tuned!
I was reading Computer Systems: A Programmer's Perspective the other day and in the chapter on Unix I/O the authors mention that there is no explicit "EOF grapheme" at the end of a file .
If y'all've spent some time reading and/or playing with Unix I/O and accept written some C programs that read text files and run on Unix/Linux, that statement is probably obvious. Merely let's have a closer look at the post-obit two points related to the statement in the volume:
- EOF is not a graphic symbol
- EOF is not a grapheme you find at the end of a file
one. Why would anyone say or recall that EOF is a character? I retrieve it may be because in some C programs you tin can find code that explicitly checks for EOF using getchar() and getc() routines:
#include <stdio.h> ... while (( c = getchar ()) != EOF ) putchar ( c ); OR FILE * fp ; int c ; ... while (( c = getc ( fp )) != EOF ) putc ( c , stdout );
And if you check the homo folio for getchar() or getc(), you'll read that both routines get the adjacent grapheme from the input stream. So that could be what leads to a confusion about the nature of EOF, but that'southward just me speculating. Let'due south become back to the indicate that EOF is not a character.
What is a graphic symbol anyway? A character is the smallest component of a text. 'A', 'a', 'B', 'b' are all different characters. A character has a numeric value that is called a code point in the Unicode standard. For instance, the English graphic symbol 'A' has a numeric value of 65 in decimal. Y'all tin check this quickly in a Python shell:
$python >>> ord('A') 65 >>> chr(65) 'A'
Or y'all could expect it up in the ASCII table on your Unix/Linux box:
Permit'south check the value of EOF by writing a niggling C program. In ANSI C, EOF is defined in <stdio.h> equally part of the standard library. Its value is usually -1. Save the post-obit lawmaking in file printeof.c, compile it, and run it:
#include <stdio.h> int principal ( int argc , char * argv []) { printf ( "EOF value on my organisation: %d \north " , EOF ); return 0 ; }
$ gcc -o printeof printeof.c $ ./printeof EOF value on my system: -1
Okay, so on my system the value is -i (I tested it both on Mac OS and Ubuntu Linux). Is there a character with a numerical value of -1? Again, you could bank check the available numeric values in the ASCII table or check the official Unicode page to find the legitimate range of numeric values for representing characters. But let's fire up a Python beat and use the congenital-in chr() role to return a graphic symbol for -1:
$ python >>> chr(-one) Traceback (about recent telephone call last): File "<stdin>", line 1, in <module> ValueError: chr() arg not in range(0x110000)
As expected, there is no graphic symbol with a numeric value of -1. Okay, so EOF (as seen in C programs) is not a character.
Onto the 2nd point.
ii. Is EOF a graphic symbol that you can find at the end of a file? I think at this signal you already know the reply, but let'south double bank check our assumption.
Permit'south take a uncomplicated text file helloworld.txt and get a hexdump of the contents of the file. We can utilise xxd for that:
$ true cat helloworld.txt Hello globe! $ xxd helloworld.txt 00000000: 4865 6c6c 6f20 776f 726c 6421 0a Hello world!.
As you tin can see, the last character at the finish of the file is the hex 0a. You can notice in the ASCII tabular array that 0a represents nl, the newline character. Or y'all can check it in a Python shell:
$ python >>> chr(0x0a) '\n'
Okay. If EOF is non a graphic symbol and it's not a graphic symbol that you find at the end of a file, what is it and so?
EOF (end-of-file) is a condition provided past the kernel that can be detected by an application.
Allow's run across how we tin can detect the EOF condition in various programming languages when reading a text file using loftier-level I/O routines provided by the languages. For this purpose, nosotros'll write a very simple cat version called mcat that reads an ASCII-encoded text file byte past byte (character by character) and explicitly checks for EOF. Permit's write our cat version in the following programming languages:
- ANSI C
- Python
- Become
- JavaScript (node.js)
You lot can find source lawmaking for all of the examples in this commodity on GitHub. Okay, allow's get started with the venerable C programming language.
-
ANSI C (a modified true cat version from The C Programming Language volume)
/* mcat.c */ #include <stdio.h> int main ( int argc , char * argv []) { FILE * fp ; int c ; if (( fp = fopen ( *++ argv , "r" )) == NULL ) { printf ( "mcat: tin can't open up %s \n " , * argv ); return ane ; } while (( c = getc ( fp )) != EOF ) putc ( c , stdout ); fclose ( fp ); render 0 ; }
Compile
Run
$ ./mcat helloworld.txt Hullo world!
Quick caption of the code above:- The program opens a file passed equally a command line argument
- The while loop copies data from the file to the standard output one byte at a time until it reaches the end of the file.
- On reaching EOF, the program closes the file and terminates
-
Python three
Python doesn't take a machinery to explicitly check for EOF like in ANSI C, merely if y'all read a text file one character at a time, y'all can determine the end-of-file condition by checking if the character read is empty:
# mcat.py import sys with open ( sys . argv [ i ]) as fin : while True : c = fin . read ( one ) # read max ane char if c == '' : # EOF break print ( c , end = '' )
$ python mcat.py helloworld.txt Hi world!
Python 3.8+ (a shorter version of the higher up using the walrus operator):
# mcat38.py import sys with open ( sys . argv [ one ]) as fin : while ( c : = fin . read ( 1 )) != '' : # read max one char at a time until EOF impress ( c , terminate = '' )
$ python3.8 mcat38.py helloworld.txt Hello world!
-
Go
In Become we can explicitly cheque if the error returned by Read() is EOF.
// mcat . get bundle main import ( "fmt" "os" "io" ) func master () { file , err : = os . Open up ( os . Args [ 1 ]) if err != cipher { fmt . Fprintf ( os . Stderr , "mcat: %v \northward " , err ) os . Exit ( 1 ) } buffer : = make ([] byte , one ) // 1 - byte buffer for { bytesread , err : = file . Read ( buffer ) if err == io . EOF { break } fmt . Print ( string ( buffer [: bytesread ])) } file . Close () }
$ go run mcat.go helloworld.txt Hello world!
-
JavaScript (node.js)
There is no explicit cheque for EOF, but the end outcome on a stream is fired when the end of a file is reached and a read functioning tries to read more than data.
/* mcat.js */ const fs = require ( 'fs' ); const procedure = crave ( 'process' ); const fileName = process . argv [ 2 ]; var readable = fs . createReadStream ( fileName , { encoding : 'utf8' , fd : aught , }); readable . on ( 'readable' , office () { var chunk ; while (( clamper = readable . read ( i )) !== nix ) { process . stdout . write ( clamper ); /* chunk is one byte */ } }); readable . on ( 'end' , () => { console . log ( '\nEOF: At that place will be no more information.' ); });
$ node mcat.js helloworld.txt Hello world! EOF: There will be no more data.
How practise the loftier-level I/O routines in the examples above determine the finish-of-file condition? On Linux systems the routines either directly or indirectly apply the read() system call provided past the kernel. The getc() function (or macro) in C, for instance, uses the read() arrangement call and returns EOF if read() indicated the terminate-of-file condition. The read() organisation call returns 0 to betoken the EOF condition.
Permit's write a cat version called syscat using Unix organisation calls merely, both for fun and potentially some profit. Let's do that in C first:
/* syscat.c */ #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> int main ( int argc , char * argv []) { int fd ; char c ; fd = open up ( argv [ i ], O_RDONLY , 0 ); while ( read ( fd , & c , 1 ) != 0 ) write ( STDOUT_FILENO , & c , i ); return 0 ; }
$ gcc -o syscat syscat.c $ ./syscat helloworld.txt Hello earth!
In the code above, you can see that we use the fact that the read() function returns 0 to point EOF.
And the aforementioned in Python iii:
# syscat.py import sys import bone fd = bone . open up ( sys . argv [ 1 ], bone . O_RDONLY ) while True : c = bone . read ( fd , 1 ) if not c : # EOF intermission bone . write ( sys . stdout . fileno (), c )
$ python syscat.py helloworld.txt Hello world!
And in Python3.8+ using the walrus operator:
# syscat38.py import sys import bone fd = os . open ( sys . argv [ one ], bone . O_RDONLY ) while c : = os . read ( fd , 1 ): bone . write ( sys . stdout . fileno (), c )
$ python3.viii syscat38.py helloworld.txt Hello world!
Permit's recap the chief points about EOF once more:
- EOF is not a character
- EOF is not a character that you find at the end of a file
- EOF is a condition provided by the kernel that can be detected by an application
when a read operation reaches the end of a file
Update Mar three, 2020 Let'due south recap the primary points about EOF with added details for more clarity:
- EOF in ANSI C is not a graphic symbol. It's a constant defined in <stdio.h> and its value is ordinarily -one
- EOF is non a character in the ASCII or Unicode character prepare
- EOF is not a graphic symbol that you observe at the terminate of a file on Unix/Linux systems
- At that place is no explicit "EOF character" at the end of a file on Unix/Linux systems
- EOF(end-of-file) is a condition provided by the kernel that can be detected past an application
when a read operation reaches the finish of a file(if m is the current file position and m is the size of a file, performing a read() when 1000 >= chiliad triggers the condition)
Update Mar 14, 2020: I'm working on an update to the article based on all the feedback I've received so far. Stay tuned!
Happy learning and take a great day!
Resources used in preparation for this commodity (some links are affiliate links):
- Reckoner Systems: A Programmer's Perspective (3rd Edition)
- C Programming Language, second Edition
- The Unix Programming Environment (Prentice-Hall Software Series)
- Advanced Programming in the UNIX Environment, tertiary Edition
- Become Programming Language, The (Addison-Wesley Professional person Computing Serial)
- Unicode HOWTO
- Node.js Stream module
- Go io bundle
- cat (Unix)
- End-of-file
- End-of-Transmission graphic symbol
If you want to get my newest articles in your inbox, then enter your e-mail address beneath and click "Get Updates!"
franklinthemblent.blogspot.com
Source: https://ruslanspivak.com/eofnotchar/
0 Response to "What Does Read() Return at Eof in Python"
Post a Comment