# Notes relatives au projet cf. [Bash Reference Manual](https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html) Comparative testing with bash should be done with bash --norc. In case of difference between regular bash and posix bash, we decide to follow regular bash. ## Ideas for testing * use prysk or shellspec with shell=./minishell ### Prysk Seems like it would work, but only after we implement execution of simple commands, and the $? variable, since prysk (and cram as well) relie on it to collect the exit status. For it to work, the shell must be able to execute the following command: ```shell $ echo PRYSK12345 2 $? PRYSK12345 2 0 ``` ## Usefull resources [Bash Reference Manual](https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html) [Internal parsing & flow](https://mail.gnu.org/archive/html/help-bash/2014-01/msg00000.html) [Python parser for bash](https://github.com/idank/bashlex) [The Stages of Word Expansion](https://www.gnu.org/software/libc/manual/html_node/Expansion-Stages.html) [Diagram of bash parse flow](https://web.archive.org/web/20160308045823/https://stuff.lhunath.com/parser.png) [The Bash Parser](http://mywiki.wooledge.org/BashParser) [The Architecture of Open Source Applications (Volume 1) The Bourne-Again Shell](https://aosabook.org/en/v1/bash.html) [IEEE Open Group Base Specification Issue 7: Shell Command Language](https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_01) [The Bash Hackers Wiki: Basic grammar rules of Bash (see also other pages on this website)](https://bash-hackers.gabe565.com/syntax/basicgrammar/) ## Shell Operation cf. 3.1.1 Shell Operation Reads the input. When reading input in non-interactive mode, it MUST do so one character at a time, so that it does not accidentally read a character that is intended for a called program. Breaks the input into **words** and **operators**, obeying the *Quoting Rules*. These tokens are delimited by **metacharacters**. Parses the tokens into simple and compound commands (see *Shell Commands*) Performs the various _Shell Expansions_, breaking the expanded tokens into lists of filenames and commands and arguments. Performs any necessary _redirections_ and removes the **redirection operators** and their operands from the argument list. Executes the command (see _Command Execution_); Waits for the command to complete and collects its exit status (see _Exit Status_). ### Quoting Rules cf. 3.1.2 Quoting Quoting escapes metacharacters. The quoting mechanisms we have to implement are: cf. Subject * Single quotes, which prevent metacharacters interpretation. * Double quotes, which prevent metacharacters interpretation except for '$' (See _Shell Parameter Expansion_). In the Bash Reference Manual, these are defined as follows (keeping only the parts we have to implement): cf. 3.1.2.2 Single Quotes Preserves the literal value of each character within the quotes. cf. 3.1.2.3 Double Quotes Preserves the literal value of all characters within the quotes, with the exception of '$'. Per the subject: minishell should not interpret unclosed quotes ### Shell Commands cf. 3.2 Shell Commands A Shell Command may be either a *Simple Command*, a *Pipeline*, a *List of Commands* (composed of one or more *Pipelines*), or a *Grouped Command* (composed of one or more *List of Commands*). #### Simple Commands cf. 3.2.2 Simple Commands It’s just a sequence of words separated by **blanks**, terminated by one of the shell’s **control operators**. The first **word** specifies a command to be executed, with the rest of the **words** being that command’s arguments. The return status (see _Exit Status_) of a simple command is its exit status as provided by the POSIX 1003.1 waitpid function, or 128+n if the command was terminated by signal n. #### Pipelines cf. 3.2.3 Pipelines A pipeline is a sequence of one or more commands separated by the control operator '|'. The output of each command in the pipeline is connected via a pipe to the input of the next command. That is, each command reads the previous command’s output. This connection is performed before any redirections specified by the first command. The shell waits for all commands in the pipeline to complete before reading the next command. Each command in a multi-command pipeline, where pipes are created, is executed in its own _subshell_, which is a separate process. e.g. ```shell export TT=1 | echo $TT ``` prints an empty string, because TT is unset in the second subshell. The exit status of a pipeline is the exit status of the last command in the pipeline. The shell waits for all commands in the pipeline to terminate before returning a value. #### Lists of Commands cf. 3.2.4 Lists of Commands A list is a sequence of one or more pipelines separated by one of the **operators** ‘&&’, or ‘||’, and optionally terminated by a newline. AND and OR lists are sequences of one or more pipelines separated by the control operators ‘&&’ and ‘||’, respectively. AND and OR lists are executed with left associativity. e.g. ```shell A && B && C ``` is the same as ```shell (A && B) && C ``` An AND list has the form ```shell A && B ``` B is execute if and only if A has an exit status of 0 (succes). An OR list has the form ```shell A || B ``` B is execute if and only if A has a non-zero exit status (failure). The return status of AND and OR lists is the exit status of the last command executed in the list. #### Group of Commands cf. 3.2.5 Compound Commands Each group begins with the **control operator** '(' and ends with the **control operator** ')'. Any redirections associated with a _group of commands_ apply to all commands within that _group of commands_ unless explicitly overridden. cf. 3.2.5.3 Grouping Commands When commands are grouped, redirections may be applied to the entire command list. For example, the output of all the commands in the list may be redirected to a single stream. ( LIST ) The parentheses are operators, and are recognized as separate tokens by the shell even if they are not separated from the LIST by whitespace. Placing a list of commands between parentheses forces the shell to create a _subshell, and each of the commands in LIST is executed in that subshell environment. Since the LIST is executed in a subshell, variable assignments do not remain in effect after the subshell completes. The exit status of this construct is the exit status of LIST. ```c struct s_cmdgroup; typedef union u_cmdgroup_item_inner { struct s_cmdgroup cmdgroup; struct s_cmdlist cmdlist; } t_cmdgroup_item_inner; typedef struct s_cmdgroup_item { enum e_cmdgroup_item_type type; union u_cmdgroup_item_inner inner; } t_cmdgroup_item; typedef struct s_cmdgroup { int item_num; struct s_cmdgroup_item *items; struct s_redirects *redirections; } t_cmdgroup; ``` ### Shell Expansion cf. 3.5 Shell Expansions Expansion is performed on the command line after it has been split into **token**'s. There are seven kinds of expansion performed. in the following order: * brace expansion * tilde expansion * parameter and variable expansion * arithmetic expansion * command substitution (left to right) * word splitting * filename expansion We only have to implement the following kinds: * parameter expansion * word splitting * filename expansion After these expansions are performed, quote characters present in the original word are removed unless they have been quoted themselves ("_quote removal_"). Only brace expansion, word splitting, and filename expansion can increase the number of words of the expansion; other expansions expand a single word to a single word. #### Shell Parameter Expansion cf. 3.5.3 Shell Parameter Expansion The '$' character introduces parameter expansion, command substitution, or arithmetic expansion. The form is $VAR, where VAR may only contain the following characters: * a-z * A-Z * _ * 0-9 (not in the first character) Just noticed an interesting case: ```shell bash-5.2$ VAR=hello # we set a shell variable (NOT environment variable) bash-5.2$ VAR=hi env | grep VAR=; env | grep VAR= # here VAR is an environment variable, which is only valid for the next command (the second env returns nothing, confirming that it is not valid for that command) VAR=hi bash-5.2$ env | grep VAR= # var is not an environment variable bash-5.2$ echo $VAR # but it is a shell variable hello ``` Luckily for us, we don't have to handle shell variables, nor do we have to handle `VAR=value` or `VAR=value cmd`. #### Word Splitting cf. 3.5.7 Word Splitting The shell scans the results of parameter expansion that did not occur within double quotes for word splitting. The shell splits the results of the other expansions into **words**. The shell treats the following characters as a delimiter: * (space) * (tab) * (newline) Explicit null arguments ('""' or '''') are retained and passed to commands as empty strings. Unquoted implicit null arguments, resulting from the expansion of parameters that have no values, are removed. If a parameter with no value is expanded within double quotes, a null argument results and is retained and passed to a command as an empty string. When a quoted null argument appears as part of a word whose expansion is non-null, the null argument is removed. That is, the word '-d''' becomes '-d' after word splitting and null argument removal. Note that if no expansion occurs, no splitting is performed. #### Filename Expansion cf. 3.5.8 Filename Expansion Bash scans each word for the character '\*'. If one of these characters appears, and is not quoted, then the word is regarded as a PATTERN, and replaced with an alphabetically sorted list of filenames matching the pattern (see: _Pattern Matching_). If no matching filenames are found, the word is left unchanged. When a pattern is used for filename expansion, the character '.' at the start of a filename or immediately following a slash must be matched explicitly. In order to match the filenames '.' and '..', the pattern must begin with '.' When matching a filename, the slash character must always be matched explicitly by a slash in the pattern. ##### Pattern Matching cf. 3.5.8.1 Pattern Matching Any character that appears in a pattern, other than the special pattern characters described below, matches itself. The NUL character may not occur in a pattern. The special pattern characters have the following meanings: '\*' Matches any string, including the null string. The special pattern characters must be quoted if they are to be matched literally. e.g. this is the required behaviour ```shell bash-5.1$ ls *there 'hello*there' 'hi*there' noonethere bash-5.1$ ls *'*'there 'hello*there' 'hi*there' ``` #### Quote Removal cf. 3.5.9 Quote Removal After the preceding expansions, all unquoted occurrences of the characters ''' and '"' that did not result from one of the above expansions are removed. ### Redirection cf. 3.6 Redirections Before a command is executed, its input and output may be "redirected" using a special notation interpreted by the shell. "Redirection" allows commands' file handles to be made to refer to different files, and can change the files the command reads from and writes to. The redirection operators may precede or appear anywhere within a simple command or may follow a command. e.g. this is the correct behaviour ```shell bash-5.1$ ls > hello.txt *here bash-5.1$ cat hello.txt hello*there hi*there noonethere ``` Redirections are processed in the order they appear, from left to right. e.g. this is the correct behaviour ```shell bash-5.1$ ls > hello.txt share > here.txt *.txt bash-5.1$ ls -l hello.txt here.txt -rw-r--r-- 1 kcolin 2024_le-havre 0 Feb 7 15:54 hello.txt -rw-r--r-- 1 kcolin 2024_le-havre 68 Feb 7 15:54 here.txt bash-5.1$ cat here.txt hello.txt here.txt log.txt newlog-strict.txt newlog.txt share: man ``` '<' refers to the standard input (fd 0, STDIN\_FILENO) '>' refers to the standard output (fd 1, STDOUT\_FILENO) The word following the redirection operator, unless the redirection operator is '<<', is subjected to parameter expansion, filename expansion, word splitting, and quote removal. If it expands to more than one word, Bash reports an error. This is the correct behaviour: ```shell bash-5.1$ var="file1 file2" bash-5.1$ echo "hello world" > $var bash: $var: ambiguous redirect ``` If the variable is not defined, bash prints the following error: ```shell bash-5.1$ echo "hello world" > $nonexist bash: $nonexist: ambiguous redirect ``` Interesting cases: command group must handle redirections at the scale of the group pipelines handle piplines simple commands handle its own file redirections group redirections may not appear at the start, except for here_doc, which will give a parse error *afterwards* ```shell bash-5.2$ (echo hello | cat -e > outcat && echo hi) > outgroup bash-5.2$ cat outgroup hi bash-5.2$ cat outcat hello$ bash-5.2$ (echo hello | cat -e && echo hi) > outgroup2 bash-5.2$ cat outgroup2 hello$ hi bash-5.2$ (echo hello > outhello | cat -e && echo hi) > outgroup3 bash-5.2$ cat outhello hello bash-5.2$ cat outgroup3 hi bash-5.2$ > outgroup4 (echo hello > outhello | cat -e && echo hi) bash: syntax error near unexpected token `(' bash-5.2$ echo bonjour > infile bash-5.2$ < infile (cat - > outhello | cat -e && echo hi) bash: syntax error near unexpected token `(' bash-5.2$ << EOF (cat - > outhello | cat -e && echo hi) > hello > EOF bash: syntax error near unexpected token `(' bash-5.2$ (cat - > outhello | cat -e && echo hi) < infile hi bash-5.2$ (cat - > outhello | cat -e && echo hi) << EOF > helllo > EOF hi bash-5.2$ cat outhello helllo bash-5.2$ (echo coucou | echo hello) > outfile bash-5.2$ cat outfile hello ``` #### Here Documents cf. Bash Reference Manual 3.6.6 Here Documents This type of redirection instructs the shell to read input from the current source until a line containing only word (with no trailing blanks) is seen. All of the lines read up to that point are then used as the standard input for a command. No parameter and variable expansion, command substitution, arithmetic expansion, or filename expansion is performed on word. If any part of word is quoted, the delimiter is the result of quote removal on word, and the lines in the here-document are not expanded. If word is unquoted, all lines of the here-document are subjected to parameter expansion. This is the correct behaviour for quoting and parameter expansion: ```shell bash-5.2$ cat << EOF > hello > $$ > EOF hello 1491742 bash-5.2$ cat << "EOF" > hello > $$ > EOF hello $$ bash-5.2$ cat << 'E'OF > hello > $$ > EOF hello $$ bash-5.2$ cat << $USER > hello > khais > $USER hello khais bash-5.2$ echo $USER khais bash-5.2$ cat << "$USER" > $USER ``` Subject says \\ is not required, so this behaviour we will not implement: ```shell bash-5.2$ cat << EOF > hello \ world > EOF hello world ``` ### Executing Commands cf. 3.7 Executing Commands #### Simple Command Execution cf. 3.7.1 Simple Command Expansion When a simple command is executed, the shell performs the following expansions, assignments, and redirections, from left to right, in the following order. 1. The words that the parser has marked as redirections are saved for later processing. 2. The words that are not redirections are expanded (see _Shell Expansions_). If any words remain after expansion, the first word is taken to be the name of the command and the remaining words are the arguments. 3. Redirections are performed as described above (see _Redirections_). If no command name results, redirections are performed, but do not affect the current shell environment. A redirection error causes the command to exit with a non-zero status. If there is a command name left after expansion, execution proceeds as described below. Otherwise, the command exits with a status of zero. #### Command Search and Execution cf. 3.7.2 Command Search and Execution After a command has been split into words, if it results in a simple command and an optional list of arguments, the following actions are taken. 1. The shell searches for it in the list of shell builtins. If a match is found, that builtin is invoked. 2. If the name is not a builtin, and contains no slashes, Bash searches each element of '$PATH' for a directory containing an executable file by that name. If the search is unsuccessful, the shell prints an error message and returns an exit status of 127. 3. If the search is successful, or if the command name contains one or more slashes, the shell executes the named program in a separate execution environment. Argument 0 is set to the name given, and the remaining arguments to the command are set to the arguments supplied, if any. 4. If this execution fails because the file is not in executable format, and the file is not a directory, it is assumed to be a "shell script" and the shell executes it as described in _Shell Scripts_. NOTE: we will _maybe_ implement this, we will see. It does not seem to be required. 5. The shell waits for the command to complete and collects its exit status. #### Subshell cf. 3.7.3 Command Execution Environment The shell has an execution environment, which consists of the following: open files inherited by the shell at invocation, as modified by redirections the current working directory as set by cd or inherited by the shell at invocation shell variables, passed in the environment A command invoked in this separate environment cannot affect the shell’s execution environment. A subshell is a copy of the shell process. #### Environment cf. 3.7.4 Environment When a program is invoked it is given an array of strings called the "environment". This is a list of name-value pairs, of the form 'name=value'. Bash provides several ways to manipulate the environment. On invocation, the shell scans its own environment and creates a parameter for each name found, automatically marking it for 'export' to child processes. Executed commands inherit the environment. The 'export' and 'unset' builtins allow parameters to be added to and deleted from the environment. If the value of a parameter in the environment is modified using the 'export' builtin, the new value becomes part of the environment, replacing the old. The environment inherited by any executed command consists of the shell's initial environment, whose values may be modified in the shell, less any pairs removed by the 'unset' builtin, plus any additions via the 'export' command. #### Exit Status cf. 3.7.5 Exit Status The exit status of an executed command is the value returned by the 'waitpid' system call or equivalent function. Exit statuses fall between 0 and 255, though, as explained below, the shell may use values above 125 specially. Exit statuses from shell builtins and compound commands are also limited to this range. Under certain circumstances, the shell will use special values to indicate specific failure modes. For the shell's purposes, a command which exits with a zero exit status has succeeded. A non-zero exit status indicates failure. This seemingly counter-intuitive scheme is used so there is one well-defined way to indicate success and a variety of ways to indicate various failure modes. When a command terminates on a fatal signal whose number is N, Bash uses the value 128+N as the exit status. If a command is not found, the child process created to execute it returns a status of 127. If a command is found but is not executable, the return status is 126. If a command fails because of an error during expansion or redirection, the exit status is greater than zero. All of the Bash builtins return an exit status of zero if they succeed and a non-zero status on failure, so they may be used by the conditional and list constructs. All builtins return an exit status of 2 to indicate incorrect usage, generally invalid options or missing arguments. The exit status of the last command is available in the special parameter $?. #### Signals cf. 3.7.6 Signals When Bash is interactive, it ignores 'SIGTERM' (so that 'kill 0' does not kill an interactive shell), and 'SIGINT' is caught and handled. When Bash receives a 'SIGINT', it breaks out of any executing loops. In all cases, Bash ignores 'SIGQUIT'. Bash ignores 'SIGTTIN', 'SIGTTOU', and 'SIGTSTP'. NOTE: The behaviour on when ^C is printed seems strange, investigate further once we implement this TODO: investigate this further, this seems very complicated ## Definitions cf. [Bash Reference Manual](https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Definitions) cf. 2 Definitions **token** A sequence of characters considered a single unit by the shell. It is either a word or an operator **word** A sequence of characters treated as a unit by the shell. Words may not include unquoted metacharacters. **operator** A **control operator** or a **redirection operator**. Operators contain at least one unquoted **metacharacter**. **control operator** A token that performs a control function. It is a newline or one of the following: '|', ‘||’, ‘&&’, ‘(’, or ‘)’. **redirection operator** For our project: '<' redirects input '>' redirects output '<<' is here_doc with delimiter. delimiter is a **word**. Does not have to update history '>>' redirects output in append mode **blank** A space or tab character