Binlog mysql 5610 comments
Stock trading business in canada
How are command line arguments interpreted? Can arbitrary data be exchanged between the calling program on the one hand and the called program on the other hand? Command line interfaces comprise an essential class of interfaces used in system architecture and the above questions deserve precise answers.
In this article I try to clarify why command line arguments are nothing but raw byte sequences that deserve proper interpretation in the receiving program. To some degree, the article dives into the topic of character encoding. Towards the end, I provide simple and expressive code examples based on bash and Python. Please note that the article only applies to Unix-like systems.
Eventually, the system call execve is the entry point for all program invocations on these platforms. It instructs the operating system to run the loader , which prepares the new program for execution and eventually brings it into running state and leaves it to itself. One argument provided to execve is a character string — a file system path pointing to the executable file on disk. These arguments were provided to the loader via the argv argument to execve — argv means argument vector.
Simply spoken, this is a set of strings. More precisely, each of these strings is a null-terminated C char array. One could say that each element in a C char array is a character. A character, however, has quite an abstract meaning. In times of real character abstractions, the Unicode code points , we should call each element in a C char array what it is: Each element in such an array stores one byte of information.
In a null-terminated char array, each byte may assume all values between x01 and xFF. In essence, the data in argv which by itself is a pointer to an array of pointers to type char arrays as provided to execve so to say takes a long journey through the kernel, the launcher, and finally ends up as second argument to the main function in the new program the first argument is the argument count.
Provide argument data to execve and read it in the new program: The first, the recipient , would have a main argc, argv function which evaluates the contents of argv and prints them in some human readable form to stdout in Hex representation, for example.
The second program, the sender , would. Knowing how the above works is useful in order to understand that, internally, command line arguments are just a portion of memory copied to the stack of the main thread of the new program. You might have noticed that in this picture, however, there is no actual command line involved. So far, we have discussed a programming interface provided by the operating system that enables us to use argv in order to bootstrap a newly invoked program with some input data that is not coming from any file, socket, or other data sources.
The concept of argv quickly translates to the concept of command line arguments. A command line interface is something that enables us to call programs in the established program arg1 arg2 arg Right, that is one of the main features a shell provides! This is what happens behind the scenes: In other words, the shell program takes parts of the user input on the command line and translates these parts to C char arrays that it later provides to execve. Hence, it does all the things that our hypothetical sender program from above would have done and more.
An example for a shell is bash. Python 2 is a useful tool in this case, because in contrast to Python 3 it provides raw access to the byte sequences provided via argv. It makes the shell spawn the Python executable in a child process. This program consumes the first command line argument, which is the path to our little Python script. The remaining arguments one in this case are left for our script. They are accessible in the sys. Each item in sys.
This type carries raw byte sequences. This is the output:. The fact that b in the input translates to b in the output seems normal. These interpretation steps do not always do the thing you would expect. The common ground of most of these encoders and decoders is the 7-bit ASCII character set a 2-way translation table between byte values and characters.
As you will see below, it is not always that simple and often times you need to understand the details of the data interpretation steps involved.
From an ASCII character table like this we can infer that the letter b corresponds to the byte value , or x62 in hexadecimal representation. Let us now try to explicitly use this byte value as command line argument to our little Python script. There is one difficulty: There are a couple of possibilities. I like one of them particularly: Let us create the same output as before, but with a more interesting input:. This worked as expected.
The input chain is clear: I guess that the shell internally actually terminates this byte sequence properly with a null byte before calling execve. Therefore, it does not surprise that the output is the same as before.
This example again suggests that there is some magic happening between stdout of the Python script and our terminal. Let us now provide two arguments in explicit binary form. Now, I mentioned the null-termination of arguments. Difficult to create with the keyboard, right? Straight-forward with the hex notation:. That proves that a null byte actually terminates an argument byte sequence. The first one arrives as an empty byte sequence, because it only contains a null byte.
We could have defined the input with a different notation representing the same byte sequence and would have gotten the same output. It cannot make sense of it without us defining how to interpret this data. As I said earlier, these three bytes are the UTF-8 encoded form of a smiley. In order to make sense of this data, the Python script needs to decode it. The modified version of the script:. It first prints the raw representation of the byte string via repr the same as before. Secondly, it decodes the data using the explicitly defined codec UTF This leads to a unicode data type da containing a certain code point representing a character.
This may not ring a bell for you, but it actually is the abstract and unambiguous description of our character here: So when Python prints this unicode data type, it actually encodes it in the encoding as expected by the terminal.
The terminal then decodes it again and displays the character if the terminal font has a glyph for it. I hope the article made clear that command line arguments are nothing but byte sequences with certain limitations that deserve proper interpretation in the receiving program.
At this point, Python 2 and 3 behave quite differently. I guess you are mostly thinking in terms of interfaces across network boundaries. There, you are right: In my opinion however, command line interfaces do not compete at all with those interfaces you were referring to. Overall, the command line interface is one of the most fundamental class of interfaces, still in modern IT.
Just think about how the web server and web application that actually provide the APIs you were referring to have been invoked on your Linux server. An argument may contain arbitrary binary data, with certain limitations A command line argument is nothing but a sequence of bytes. These bytes are raw binary data that may mean anything. It is up to the retrieving program to make sense of these bytes to decode them into something meaningful.
Any kind of binary data can be provided within the byte sequence of a command line argument. However, there is one important exception: It always terminates the byte sequence.
If x00 is the first or the only byte in the sequence, then the sequence is considered empty. Since argv data is initially and entirely written to the stack of the new program, the total amount of data that may be provided is limited by the operating system. These limits are defined in the system header files.