Merge branch 'notes'

This commit is contained in:
Khaïs COLIN 2025-02-12 17:39:37 +01:00
commit 394a923fdc
Signed by: logistic-bot
SSH key fingerprint: SHA256:RlpiqKeXpcPFZZ4y9Ou4xi2M8OhRJovIwDlbCaMsuAo
3 changed files with 551 additions and 26 deletions

292
NOTES.md
View file

@ -2,7 +2,9 @@
cf. [Bash Reference Manual](https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html)
Comparative testing with bash should be done with bash --norc --posix.
Comparative testing with bash should be done with bash --norc.
In case of difference between regular bash and posix bash, we decide to follow regular bash.
## Ideas for testing
* use prysk or shellspec with shell=./minishell
@ -16,6 +18,22 @@ $ echo PRYSK12345 2 $?
PRYSK12345 2 0
```
## Usefull resources
[Bash Reference Manual](https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html)
[Internal parsing & flow](https://mail.gnu.org/archive/html/help-bash/2014-01/msg00000.html)
[Python parser for bash](https://github.com/idank/bashlex)
[The Stages of Word Expansion](https://www.gnu.org/software/libc/manual/html_node/Expansion-Stages.html)
[Diagram of bash parse flow](https://web.archive.org/web/20160308045823/https://stuff.lhunath.com/parser.png)
[The Bash Parser](http://mywiki.wooledge.org/BashParser)
[The Architecture of Open Source Applications (Volume 1) The Bourne-Again Shell](https://aosabook.org/en/v1/bash.html)
## Shell Operation
cf. 3.1.1 Shell Operation
@ -28,6 +46,13 @@ Parses the tokens into simple and compound commands (see *Shell Commands*)
Performs the various _Shell Expansions_, breaking the expanded tokens into lists
of filenames and commands and arguments.
Performs any necessary _redirections_ and removes the **redirection operators**
and their operands from the argument list.
Executes the command (see _Command Execution_);
Waits for the command to complete and collects its exit status (see _Exit Status_).
### Quoting Rules
cf. 3.1.2 Quoting
@ -52,9 +77,6 @@ cf. 3.1.2.3 Double Quotes
Preserves the literal value of all characters within the quotes, with the exception of '$'.
TODO: The special parameters * and @ have special meaning when in double quotes (see Shell Parameter Expansion).
See if we have to handle this
Per the subject: minishell should not interpret unclosed quotes
### Shell Commands
@ -217,6 +239,17 @@ The form is $VAR, where VAR may only contain the following characters:
* _
* 0-9 (not in the first character)
Just noticed an interesting case:
```shell
bash-5.2$ VAR=hello # we set a shell variable (NOT environment variable)
bash-5.2$ VAR=hi env | grep VAR=; env | grep VAR= # here VAR is an environment variable, which is only valid for the next command (the second env returns nothing, confirming that it is not valid for that command)
VAR=hi
bash-5.2$ env | grep VAR= # var is not an environment variable
bash-5.2$ echo $VAR # but it is a shell variable
hello
```
Luckily for us, we don't have to handle shell variables, nor do we have to handle `VAR=value` or `VAR=value cmd`.
#### Word Splitting
cf. 3.5.7 Word Splitting
@ -282,12 +315,183 @@ bash-5.1$ ls *'*'there
#### Quote Removal
cf. 3.5.9 Quote Removal
After the preceding expansions, all unquoted occurrences of the
characters ''' and '"' that did not result from one of the above
expansions are removed.
After the preceding expansions, all unquoted occurrences of the characters '''
and '"' that did not result from one of the above expansions are removed.
## Subshell
### Redirection
cf. 3.6 Redirections
Before a command is executed, its input and output may be "redirected" using a
special notation interpreted by the shell. "Redirection" allows commands' file
handles to be made to refer to different files, and can change the files the
command reads from and writes to.
The redirection operators may precede or appear anywhere within a simple command
or may follow a command.
e.g. this is the correct behaviour
```shell
bash-5.1$ ls > hello.txt *here
bash-5.1$ cat hello.txt
hello*there
hi*there
noonethere
```
Redirections are processed in the order they appear, from left to right.
e.g. this is the correct behaviour
```shell
bash-5.1$ ls > hello.txt share > here.txt *.txt
bash-5.1$ ls -l hello.txt here.txt
-rw-r--r-- 1 kcolin 2024_le-havre 0 Feb 7 15:54 hello.txt
-rw-r--r-- 1 kcolin 2024_le-havre 68 Feb 7 15:54 here.txt
bash-5.1$ cat here.txt
hello.txt
here.txt
log.txt
newlog-strict.txt
newlog.txt
share:
man
```
'<' refers to the standard input (fd 0, STDIN\_FILENO)
'>' refers to the standard output (fd 1, STDOUT\_FILENO)
The word following the redirection operator, unless the redirection operator is
'<<', is subjected to parameter expansion, filename expansion, word splitting,
and quote removal.
If it expands to more than one word, Bash reports an error.
This is the correct behaviour:
```shell
bash-5.1$ var="file1 file2"
bash-5.1$ echo "hello world" > $var
bash: $var: ambiguous redirect
```
If the variable is not defined, bash prints the following error:
```shell
bash-5.1$ echo "hello world" > $nonexist
bash: $nonexist: ambiguous redirect
```
#### Here Documents
cf. Bash Reference Manual 3.6.6 Here Documents
This type of redirection instructs the shell to read input from the current
source until a line containing only word (with no trailing blanks) is seen. All
of the lines read up to that point are then used as the standard input for a
command.
No parameter and variable expansion, command substitution, arithmetic expansion,
or filename expansion is performed on word.
If any part of word is quoted, the delimiter is the result of quote removal on
word, and the lines in the here-document are not expanded. If word is unquoted,
all lines of the here-document are subjected to parameter expansion.
This is the correct behaviour for quoting and parameter expansion:
```shell
bash-5.2$ cat << EOF
> hello
> $$
> EOF
hello
1491742
bash-5.2$ cat << "EOF"
> hello
> $$
> EOF
hello
$$
bash-5.2$ cat << 'E'OF
> hello
> $$
> EOF
hello
$$
bash-5.2$ cat << $USER
> hello
> khais
> $USER
hello
khais
bash-5.2$ echo $USER
khais
bash-5.2$ cat << "$USER"
> $USER
```
Subject says \\ is not required, so this behaviour we will not implement:
```shell
bash-5.2$ cat << EOF
> hello \
world
> EOF
hello world
```
### Executing Commands
cf. 3.7 Executing Commands
#### Simple Command Execution
cf. 3.7.1 Simple Command Expansion
When a simple command is executed, the shell performs the following
expansions, assignments, and redirections, from left to right, in the
following order.
1. The words that the parser has marked as redirections are saved for later
processing.
2. The words that are not redirections are expanded (see _Shell Expansions_).
If any words remain after expansion, the first word is taken to be the name
of the command and the remaining words are the arguments.
3. Redirections are performed as described above (see _Redirections_).
If no command name results, redirections are performed, but do not affect the
current shell environment. A redirection error causes the command to exit with
a non-zero status.
If there is a command name left after expansion, execution proceeds as described
below. Otherwise, the command exits with a status of zero.
#### Command Search and Execution
cf. 3.7.2 Command Search and Execution
After a command has been split into words, if it results in a simple command and
an optional list of arguments, the following actions are taken.
1. The shell searches for it in the list of shell builtins. If a match is
found, that builtin is invoked.
2. If the name is not a builtin, and contains no slashes, Bash searches each
element of '$PATH' for a directory containing an executable file by that
name. If the search is unsuccessful, the shell prints an error message and
returns an exit status of 127.
3. If the search is successful, or if the command name contains one or more
slashes, the shell executes the named program in a separate execution
environment. Argument 0 is set to the name given, and the remaining
arguments to the command are set to the arguments supplied, if any.
4. If this execution fails because the file is not in executable format, and
the file is not a directory, it is assumed to be a "shell script" and the
shell executes it as described in _Shell Scripts_.
NOTE: we will _maybe_ implement this, we will see. It does not seem to be
required.
5. The shell waits for the command to complete and collects its exit status.
#### Subshell
cf. 3.7.3 Command Execution Environment
The shell has an execution environment, which consists of the following:
@ -303,30 +507,66 @@ execution environment.
A subshell is a copy of the shell process.
## Here Documents
cf. Bash Reference Manual 3.6.6 Here Documents
#### Environment
cf. 3.7.4 Environment
This type of redirection instructs the shell to read input from the current
source until a line containing only word (with no trailing blanks) is seen. All
of the lines read up to that point are then used as the standard input for a
command.
When a program is invoked it is given an array of strings called the
"environment". This is a list of name-value pairs, of the form 'name=value'.
TODO: The following paragraph may not apply fully to our project, check it again!
Bash provides several ways to manipulate the environment. On invocation, the
shell scans its own environment and creates a parameter for each name found,
automatically marking it for 'export' to child processes. Executed commands
inherit the environment. The 'export' and 'unset' builtins allow parameters to
be added to and deleted from the environment. If the value of a parameter in
the environment is modified using the 'export' builtin, the new value becomes
part of the environment, replacing the old. The environment inherited by any
executed command consists of the shell's initial environment, whose values may
be modified in the shell, less any pairs removed by the 'unset' builtin, plus
any additions via the 'export' command.
No parameter and variable expansion, command substitution, arithmetic expansion,
or filename expansion is performed on word. If any part of word is quoted, the
delimiter is the result of quote removal on word, and the lines in the
here-document are not expanded. If word is unquoted, all lines of the
here-document are subjected to parameter expansion, command substitution, and
arithmetic expansion, the character sequence \newline is ignored, and \ must
be used to quote the characters \, $, and `.
#### Exit Status
cf. 3.7.5 Exit Status
## Signal handling
The exit status of an executed command is the value returned by the 'waitpid'
system call or equivalent function. Exit statuses fall between 0 and 255,
though, as explained below, the shell may use values above 125 specially. Exit
statuses from shell builtins and compound commands are also limited to this
range. Under certain circumstances, the shell will use special values to
indicate specific failure modes.
cf. 6.12 Shell Compatibility Mode => compat32
For the shell's purposes, a command which exits with a zero exit status has
succeeded. A non-zero exit status indicates failure. This seemingly
counter-intuitive scheme is used so there is one well-defined way to indicate
success and a variety of ways to indicate various failure modes. When a command
terminates on a fatal signal whose number is N, Bash uses the value 128+N as the
exit status.
interrupting a command list such as "a; b; c" causes the execution of the entire
list to be aborted.
If a command is not found, the child process created to execute it returns a
status of 127. If a command is found but is not executable, the return status
is 126.
If a command fails because of an error during expansion or redirection, the exit
status is greater than zero.
All of the Bash builtins return an exit status of zero if they succeed and a
non-zero status on failure, so they may be used by the conditional and list
constructs. All builtins return an exit status of 2 to indicate incorrect
usage, generally invalid options or missing arguments.
The exit status of the last command is available in the special parameter $?.
#### Signals
cf. 3.7.6 Signals
When Bash is interactive, it ignores 'SIGTERM' (so that 'kill 0' does not kill
an interactive shell), and 'SIGINT' is caught and handled. When Bash receives a
'SIGINT', it breaks out of any executing loops. In all cases, Bash ignores
'SIGQUIT'. Bash ignores 'SIGTTIN', 'SIGTTOU', and 'SIGTSTP'.
NOTE: The behaviour on when ^C is printed seems strange, investigate further
once we implement this
TODO: investigate this further, this seems very complicated
## Definitions
cf. [Bash Reference Manual](https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Definitions)