How to open the command line for Github

Introduction to programming principles

A program that works on a meaningful task will always communicate in some way, i.e. receive information and, in particular, output the results. Different scenarios are conceivable for both processes.

In Chapter 2 we already got to know the function that allows the user to be asked for input from within the program. An alternative would be to specify parameters directly when calling the program from the command line. The use of a graphical user interface is also conceivable.

In the case of more complex problems, however, it often does not make sense to enter all parameters by hand, and above all to have to go through this process every time the program is started. Then it makes more sense to save the information required by the program in a file and then have it read in by the program. In some cases it can also be useful or necessary to load data from the Internet into the program.

Results can also be output in different ways. In our examples, we have so far used the option of displaying data on the screen. If a graphical user interface is used, information could also be output there. Larger amounts of data, on the other hand, will in most cases be saved in a file. Of course, it is also possible to send the output to the Internet, a task that every web server does.

In the natural and engineering sciences, in particular, but also in other areas that have to deal with large amounts of data, the data will often be represented graphically in a suitable manner. The question that often arises here is whether the data required for the graphic is saved first or whether it is processed further directly. A major factor will be the effort involved in generating the data. It is very annoying to lose the results of a multi-day calculation only because you made an error in the evaluation or the generation of the graphic. If, on the other hand, the data can be generated very quickly, the route via intermediate storage of the data can be a nuisance, for example if you want to quickly see the effect of parameter changes in the result.

As already clear from these introductory remarks, the input and output of data can be a complex topic in detail. In the following, we want to deal with the input via the command line and the keyboard relatively briefly and also take a quick look at the function. After that, we will mainly deal with reading and writing files. For more specific aspects, we will show in further information that Python has useful modules in the standard library for many of the potentially arising tasks.

7.1. Input via the command line and the keyboard¶

When calling programs from the command line, it is not uncommon to pass parameters. Programming languages ​​such as Fortran, C and Python offer the possibility of accessing these parameters within the program. The alternative of defining such parameters in the program code yourself has the disadvantage that you have to change the code to change a parameter, which is generally not a good idea, among other things because you might make other unintentional changes to the program.

We demonstrate the procedure with a small sample program called, which can output a text specified by the user several times. The first argument in the call should be the text, the second argument then specifies the number of repetitions.

# example_1.pyimportsysprint (sys.argv) str = sys.argv [1] nmax = int (sys.argv [2]) forninrange (nmax): print (str)

We are using the attribute from the module of the Python standard library. Here stands for argument vector. In line 4, we first output this attribute to see what it contains. Then we use two entries in this attribute in lines 5 and 6 to determine the text to be output and the number of repetitions.

If we run the program with the parameters and, we get the following output.

$ python hello 3 ['', 'hello', '3'] hello hello hello

Line 1 gives the input again after the $ prompt, and then the generated output follows in lines 2 to 5. In line 2 we see that it contains a list, the first entry of which corresponds to the name of the Python script, which is followed by the specified parameters in the following entries. As we can see, the entries are all strings. This explains why we had to convert to an integer in line 6 of our example script. Lines 3 to 5 above then contain the expected output, namely three times the text.

Further note

It can happen that the number of parameters is significantly larger than in our small example and that default values ​​are defined for a number of parameters. In most cases, only a reduced number of parameters will then be specified, which must be marked accordingly in order to allow a clear assignment. This situation is similar to what we got to know in chapter 5.7 for function calls with keywords and default values. In such cases it is advisable to take a look at the module of the Python standard library. Since this module offers relatively complex options, a special tutorial is also available.

As an alternative to specifying parameters when the program is called, the parameters can also be called up by the program. The function, which we already got to know in Chapter 2, is used for this purpose. Here, too, it should be noted that the entries in the program are available as character strings and must be converted as required.

Warning notice

The input received via the function can also be evaluated as a Python expression using the function. What, at first glance, may seem to be particularly interesting, may, under certain circumstances, be a security risk. Executing code unseen can potentially cause considerable damage, since it is entirely possible, for example, to delete files from within a Python program.

Let's see how the function can be used for an alternative implementation of our sample program.

# example_2.pystr = input ('text to be output:') nmax = int (input ('number of repetitions:')) forninrange (nmax): print (str)

After calling the program, the parameters are now queried, whereby the number of repetitions must again be converted into an integer. The entered text is then output several times.

$ python Text to be output: Hello Number of repetitions: 3 Hello Hello Hello

We have limited ourselves here to demonstrating the basic procedure. In practice it would of course make sense in both examples to catch errors, for example in the event that the entered number of repetitions cannot be converted into an integer.

Further note

If you don't want to enter parameters on the command line or generally in a terminal window, but via a graphical user interface, you can get support in Python in the standard library in the form of the module.

Now that we have concentrated on the input up to now, we would like to conclude briefly with the output using the function. We have used this function on a variety of occasions. In chapter 3.7 we saw, among other things, that the standard line break at the end of the output can be replaced by another output using the parameter.

Let us now use the print function to discuss a general aspect of outputting data that can occasionally be important in practice. Input and output operations are typically very slow compared to arithmetic operations in the processor. Executing every single input or output operation immediately would therefore unnecessarily delay the execution of the program. When outputting data, for example, it is better to collect a certain amount of data in a buffer and then output the entire data block. In practical terms, this means that you cannot rely on the output operation triggered by an instruction to be immediately and completely executed. However, the statement can be forced to execute immediately by setting the argument to.

Since we are dealing with time sequences here, we will illustrate the effect of buffering in a film. We'll use the statement to do this, but data buffering will play a role again in the next section when we discuss outputting data to a file.

7.2. Reading files¶

For simple scientific problems it may be sufficient to enter parameters via the command line or at the request of the program and to view the resulting data on the screen. However, it will often be the case that this procedure no longer makes sense due to the size of the data. It can be too cumbersome to enter ten or twenty parameters by hand over and over again, and in some cases the number of input parameters can be significantly larger. Think, for example, of structural data for quantum chemical calculations. The data generated are often very extensive, so that one would like to save them for further analysis. This is especially true if the generation of the data is very time-consuming.

Let's first turn to reading data from files. The basic procedure is similar to reading a book. Just as you have to open a book to read, you first have to open a file to read it. Then you can read in the book, although we want to limit ourselves to the situation in which reading begins at the beginning. As in a book, it is in principle also possible to jump directly to a specific point in a file, but in practice this mainly occurs in connection with binary files. Here we want to limit ourselves to text files, whereby text does not exclude that the file contains numerical information only or in part.

Just as you close a book after reading part or all of its content, you should also close a file. In principle, modern operating systems will do this if necessary, but it is not good practice to hope that someone else will clean up for you. This would be particularly problematic if you open many books at the same time or open many files at the same time.

In order to be able to read a file, this file must first exist. In the following we assume that there is a file with the name in the current directory. We have the contents of this file output here with a so-called magic command from Jupyter.

1.37 2.59 10.3 -1.3 5.8 2.0

Then we can open this file for reading. We get back a file object with which we can then access the content of the file.

file = open ('foo.dat') print (file)
<_io.TextIOWrapper name='foo.dat' mode='r' encoding='UTF-8'>

The instruction here does not output the content of the file, which has not yet been read at all, but rather information about the file object that is represented by the variable. As we can see, the file object actually allows us access to our file.

Access mode is what is short for read indicates that the file is open for reading. Later we will learn about other values ​​for the access mode. If you want to emphasize at this point that the file should only be opened for reading, you can do this with the aid of the argument in the statement. However, since the default is read access, this is not absolutely necessary.

Finally, the text coding has been determined. By default, the coding preferred by the operating system is used, which in our case is UTF-8 coding, as it should be the case on most systems today. If the text to be read is in a different coding, the argument must be set accordingly in the instruction.

Trying to open a file for reading that doesn't exist at all results in a.

open ('nonexistent.dat')
-------------------------------------------------- ------------------------- FileNotFoundErrorTraceback (most recent call last) in ----> 1open ('nonexistent.dat') FileNotFoundError: [Errno 2] No such file or directory: 'nonexistent.dat'

However, we now want to use the file object that was created earlier to read the file. In view of the large main memory available in modern computers, it is in most cases possible to load the contents of the entire file into this memory. There are two ways to do this in Python. First, let's use the file object's method.

content = () content
'1.37 2.59 \ n10.3 -1.3 \ n 5.8 2.0 \ n'

As we can see, the entire content is loaded into a character string, whereby the line breaks can be recognized by the control character. In order to make these control characters clear, we have not used the function for output here.

The method provides a convenient method of dividing this character string into individual lines.

print (content.splitlines ())
[' 1.37 2.59', '10.3 -1.3', ' 5.8 2.0']

This gives us a list that contains the individual lines as entries.

Now what happens if we try to read the file a second time?

The result is now an empty string. How can it be explained that we cannot reproduce our previous result? The best way to visualize the reading process is to imagine the file being read from a historical data store, a magnetic tape. There a read head moves along the magnetic tape. Correspondingly, there is still a pointer that points to the current position in the file. At the beginning of the reading process this pointer is at the beginning of the file and after reading it at the end of it. If you then try to read on, you will no longer receive any data. In principle, you can reposition the pointer anywhere in the file, but these options are actually not required for text files, since we have already read in the entire file and can work with it.

Since we want to demonstrate further possibilities for reading in data, we first close the file again and then open it again.

file.close () print (file.closed)

With the first line we close the file. In the second line we checked to illustrate whether the file is really closed.

Especially with larger files, you might not want to load the entire file at once, but read and process it line by line. If you iterate over the file object in a loop, you get the individual lines.

file = open ('foo.dat') forlineinfile: print (line) file.close ()
1.37 2.59 10.3 -1.3 5.8 2.0

We can see from the blank lines that are output that the line break character at the end of the line has not been removed.

To ensure that the file is closed under all circumstances, even in the event of an error, a context manager is normally used in Python. The previous example can then be formulated as follows.

withopen ('foo.dat') asfile: forlineinfile: print (line)
1.37 2.59 10.3 -1.3 5.8 2.0

We recognize the familiar structure with a keyword, which is here, and a colon at the end of the line. The following indented block runs under the control of the context manager, and it is ensured that the file is closed at the end of the block. The construction in the first line is new, which assigns the result of the statement to the variable using the keyword.

In our specific example, one would certainly like to access the individual floating point values ​​separately, so that in practice one first applies the method to each line. This works here without specifying an argument, since the separation is then on white space, i.e. especially spaces or tabs. In addition, such characters are completely removed, including the line break control character.

withopen ('foo.dat') asfile: forlineinfile: print (line.split ())
['1.37', '2.59'] ['10.3', '-1.3'] ['5.8', '2.0']

However, the result shows that initially only character strings are present when reading in. It is still necessary here to convert the individual line components into the correct data type, in this case into floating point numbers. Of course, you could iterate over each individual list and use the function to build new lists after the conversion. In Python this can be done more easily and clearly with the function, whereby the following loop is only used for output.

data = map (float, ['1.37', '2.59']) fordindata: print (d, type (d))
1.37 2.59

In principle, one could proceed in a similar way if the individual entries in a line are separated by commas or semicolons. Such files are often encountered in practice when data was recorded with Excel and then saved in CSV format, the abbreviation CSV for comma separated values stands. Instead of programming the import yourself, it is then a good idea to use functions from the Python standard library, in this case specifically the module.Pandas is a program library that can read in various input formats and is particularly suitable for processing structured data in Python.

Further note

Large amounts of structured data are now often stored in the HDF5 format. This can be read by pandas. However, if you do not want to use pandas to further process the data, it makes sense to have a look at the package.

Instead of transferring parameters to a program as described in Chapter 7.1, configuration files are also used, as known as files in Windows. The Python standard library provides the module for reading such files.

Popular formats for data exchange include XML (eXtensible Markup Language) and JSON (JavaScript Object Notation). Python also supports this in the standard library with a number of XML processing modules or the module.

The packages mentioned not only support reading of the file types mentioned, but also writing.

7.3. Writing files¶

Just as for reading from files, you first have to open the file in order to write to it. Once you have finished writing, the file should be closed again. In Python, this is easiest to do in the context of an email context, as we have already seen when reading from files.

In chapter 7.2 we saw that a file is opened by default in mode, i.e. for reading. In this mode it is not possible to write to the open file. Instead of using the method to read, here we need to use the method to write to demonstrate the behavior.

withopen ('foo.txt') asfile: file.write ('This is a test.')
-------------------------------------------------- ------------------------- FileNotFoundErrorTraceback (most recent call last) in ----> 1withopen ('foo.txt') asfile: 2file.write ('This is a test.') FileNotFoundError: [Errno 2] No such file or directory: 'foo.txt'

On the other hand, if we open the file in write mode, we can write the file as we want.

withopen ('foo.txt', mode = 'w') asfile: file.write ('This is a test.')

We can look at the result as in Chapter 7.2 with the magic command in a notebook cell.

In connection with the method, however, it should be noted that, unlike the function, it does not automatically append a line break. In the following example we do not specify the keyword for the second argument, since this argument is in the correct position. Of course, there is nothing to be said against using the keyword for clarification purposes.

withopen ('foo.txt', 'w') asfile: forninrange (1,4): file.write (f'Zeile {n} ')

This behavior corresponds to what we know from the function when we set. If we want to achieve a line break, we must explicitly specify the corresponding control character.

withopen ('foo.txt', 'w') asfile: forninrange (1,4): file.write (f'Zeile {n} \ n ')

We discussed the options for formatting in f-strings in some detail in Chapter 3.7, which we would like to refer to at this point.

If you use the mode to open a file for writing, you must be aware that any existing file with this name will first be deleted. This is at least true if you have the rights to do so on the operating system level. Depending on the situation, this can be the desired behavior or at least it doesn't bother you. However, there are applications in which one will use an alternate mode.

Instead of the mode, you can use the mode that will only open the file for writing if a file with the relevant name does not yet exist. We can demonstrate this using the file we just wrote.

withopen ('foo.txt', 'x') asfile: forninrange (1,4): file.write (f'Zeile {n} \ n ')
-------------------------------------------------- ------------------------- FileExistsErrorTraceback (most recent call last) in ----> 1withopen ('foo.txt', 'x') asfile: 2forninrange (1,4): 3file.write (f'Zeile {n} \ n ') FileExistsError: [Errno 17] File exists:' foo.txt '

The attempt to write to an existing file is therefore prevented in the mode.

But it can also happen that you want to continue writing in an already existing file at the end of the file. The mode for append intended. One use case can be to work around problems by buffering the output, which we discussed at the end of Chapter 7.1. In the case of a program with a long runtime, in which data is written at greater intervals, the file could only be opened again immediately to write the data and then closed again. If the program is aborted, the data that have already been output are not lost. However, this procedure only makes sense if the times between the write processes are not too short, as otherwise the effort involved in opening and closing the file would slow down the program.

In the following example, we demonstrate repeated appending to a file using an out of context loop. We also check that the file has been closed again.

fromdatetimeimportdatetimefromtimeimportsleepforninrange (1,5): sleep (5) now = () withopen ('spam.dat', 'a') asfile: msg = f '{now:% H:% M:% S} - pass {n} \ n'file.write (msg) iffile.closed: msg = f '{now:% H:% M:% S} - file closed'print (msg)
17:13:18 - File closed
17:13:23 - File closed
17:13:28 - File closed
17:13:33 - File closed
17:13:18 - pass 1 17:13:23 - pass 2 17:13:28 - pass 3 17:13:33 - pass 4

If you want to write several files, for example because you want to perform calculations for several parameter sets, you should keep in mind that the name that has to be given when opening the file is simply a character string that can be constructed accordingly. So you have the option of either including parameters in the file name or numbering the files consecutively. We want to demonstrate the latter with an example.

forninrange (1,16): withopen (f'mydata_ {n: 04} .dat ',' w ') asfile: file.write (f'file no. {n} \ n')
mydata_0001.dat mydata_0005.dat mydata_0009.dat mydata_0013.dat mydata_0002.dat mydata_0006.dat mydata_0010.dat mydata_0014.dat mydata_0003.dat mydata_0007.dat mydata_0011.dat mydata_00000_000.dat

In order to create a clear sorting of the files, it makes sense to provide a sufficiently wide field for the number of the file and to fill the free spaces with zeros.

Regardless of how you choose the file name, it makes sense to save information that is required to generate the data at the beginning of the file. This includes not only the parameters used, but also information about the program version used. This means that in the event of an error in the program, it can also be decided retrospectively whether the data is affected.