mirror of
https://codeberg.org/la-chouette/minishell.git
synced 2025-12-06 07:28:09 +01:00
581 lines
18 KiB
Markdown
581 lines
18 KiB
Markdown
# Notes relatives au projet
|
||
|
||
cf. [Bash Reference Manual](https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html)
|
||
|
||
Comparative testing with bash should be done with bash --norc.
|
||
|
||
In case of difference between regular bash and posix bash, we decide to follow regular bash.
|
||
|
||
## Ideas for testing
|
||
* use prysk or shellspec with shell=./minishell
|
||
|
||
## Shell Operation
|
||
|
||
cf. 3.1.1 Shell Operation
|
||
|
||
Breaks the input into **words** and **operators**, obeying the *Quoting Rules*.
|
||
These tokens are delimited by **metacharacters**.
|
||
|
||
Parses the tokens into simple and compound commands (see *Shell Commands*)
|
||
|
||
Performs the various _Shell Expansions_, breaking the expanded tokens into lists
|
||
of filenames and commands and arguments.
|
||
|
||
Performs any necessary _redirections_ and removes the **redirection operators**
|
||
and their operands from the argument list.
|
||
|
||
Executes the command (see _Command Execution_);
|
||
|
||
TODO: add missing operations
|
||
|
||
### Quoting Rules
|
||
|
||
cf. 3.1.2 Quoting
|
||
|
||
Quoting escapes metacharacters.
|
||
|
||
The quoting mechanisms we have to implement are:
|
||
|
||
cf. Subject
|
||
|
||
* Single quotes, which prevent metacharacters interpretation.
|
||
* Double quotes, which prevent metacharacters interpretation except for '$' (See
|
||
_Shell Parameter Expansion_).
|
||
|
||
In the Bash Reference Manual, these are defined as follows (keeping only the parts we have to implement):
|
||
|
||
cf. 3.1.2.2 Single Quotes
|
||
|
||
Preserves the literal value of each character within the quotes.
|
||
|
||
cf. 3.1.2.3 Double Quotes
|
||
|
||
Preserves the literal value of all characters within the quotes, with the exception of '$'.
|
||
|
||
Per the subject: minishell should not interpret unclosed quotes
|
||
|
||
### Shell Commands
|
||
|
||
cf. 3.2 Shell Commands
|
||
|
||
A Shell Command may be either a *Simple Command*, a *Pipeline*, a *List of
|
||
Commands* (composed of one or more *Pipelines*), or a *Grouped Command*
|
||
(composed of one or more *List of Commands*).
|
||
|
||
#### Simple Commands
|
||
|
||
cf. 3.2.2 Simple Commands
|
||
|
||
It’s just a sequence of words separated by **blanks**, terminated by one of the
|
||
shell’s **control operators**.
|
||
The first **word** specifies a command to be executed, with the rest of the
|
||
**words** being that command’s arguments.
|
||
|
||
The return status (see _Exit Status_) of a simple command is its exit status as
|
||
provided by the POSIX 1003.1 waitpid function, or 128+n if the command was
|
||
terminated by signal n.
|
||
|
||
#### Pipelines
|
||
|
||
cf. 3.2.3 Pipelines
|
||
|
||
A pipeline is a sequence of one or more commands separated by the control operator '|'.
|
||
|
||
The output of each command in the pipeline is connected via a pipe to the input
|
||
of the next command.
|
||
That is, each command reads the previous command’s output.
|
||
This connection is performed before any redirections specified by the first
|
||
command.
|
||
|
||
The shell waits for all commands in the pipeline to complete before reading the next command.
|
||
|
||
Each command in a multi-command pipeline, where pipes are created, is executed
|
||
in its own _subshell_, which is a separate process.
|
||
|
||
e.g.
|
||
|
||
```shell
|
||
export TT=1 | echo $TT
|
||
```
|
||
|
||
prints an empty string, because TT is unset in the second subshell.
|
||
|
||
The exit status of a pipeline is the exit status of the last command in the pipeline.
|
||
|
||
The shell waits for all commands in the pipeline to terminate before returning a value.
|
||
|
||
#### Lists of Commands
|
||
|
||
cf. 3.2.4 Lists of Commands
|
||
|
||
A list is a sequence of one or more pipelines separated by one of the
|
||
**operators** ‘&&’, or ‘||’, and optionally terminated by a newline.
|
||
|
||
AND and OR lists are sequences of one or more pipelines separated by the control
|
||
operators ‘&&’ and ‘||’, respectively.
|
||
AND and OR lists are executed with left associativity.
|
||
|
||
e.g.
|
||
|
||
```shell
|
||
A && B && C
|
||
```
|
||
|
||
is the same as
|
||
|
||
```shell
|
||
(A && B) && C
|
||
```
|
||
|
||
An AND list has the form
|
||
|
||
```shell
|
||
A && B
|
||
```
|
||
|
||
B is execute if and only if A has an exit status of 0 (succes).
|
||
|
||
An OR list has the form
|
||
|
||
```shell
|
||
A || B
|
||
```
|
||
|
||
B is execute if and only if A has a non-zero exit status (failure).
|
||
|
||
The return status of AND and OR lists is the exit status of the last command
|
||
executed in the list.
|
||
|
||
#### Group of Commands
|
||
|
||
cf. 3.2.5 Compound Commands
|
||
|
||
Each group begins with the **control operator** '(' and ends with the
|
||
**control operator** ')'.
|
||
|
||
Any redirections associated with a _group of commands_ apply to all commands
|
||
within that _group of commands_ unless explicitly overridden.
|
||
|
||
cf. 3.2.5.3 Grouping Commands
|
||
|
||
When commands are grouped, redirections may be applied to the entire command
|
||
list. For example, the output of all the commands in the list may be redirected
|
||
to a single stream.
|
||
|
||
( LIST )
|
||
|
||
The parentheses are operators, and are recognized as separate tokens by the
|
||
shell even if they are not separated from the LIST by whitespace.
|
||
|
||
Placing a list of commands between parentheses forces the shell to create a
|
||
_subshell, and each of the commands in LIST is executed in that subshell
|
||
environment. Since the LIST is executed in a subshell, variable assignments do
|
||
not remain in effect after the subshell completes.
|
||
|
||
The exit status of this construct is the exit status of LIST.
|
||
|
||
### Shell Expansion
|
||
|
||
cf. 3.5 Shell Expansions
|
||
|
||
Expansion is performed on the command line after it has been split into
|
||
**token**'s. There are seven kinds of expansion performed. in the following
|
||
order:
|
||
|
||
* brace expansion
|
||
* tilde expansion
|
||
* parameter and variable expansion
|
||
* arithmetic expansion
|
||
* command substitution (left to right)
|
||
* word splitting
|
||
* filename expansion
|
||
|
||
We only have to implement the following kinds:
|
||
* parameter expansion
|
||
* word splitting
|
||
* filename expansion
|
||
|
||
After these expansions are performed, quote characters present in the original
|
||
word are removed unless they have been quoted themselves ("_quote removal_").
|
||
|
||
Only brace expansion, word splitting, and filename expansion can increase the
|
||
number of words of the expansion; other expansions expand a single word to a
|
||
single word.
|
||
|
||
#### Shell Parameter Expansion
|
||
cf. 3.5.3 Shell Parameter Expansion
|
||
|
||
The '$' character introduces parameter expansion, command substitution, or
|
||
arithmetic expansion.
|
||
|
||
The form is $VAR, where VAR may only contain the following characters:
|
||
* a-z
|
||
* A-Z
|
||
* _
|
||
* 0-9 (not in the first character)
|
||
|
||
Just noticed an interesting case:
|
||
```shell
|
||
bash-5.2$ VAR=hello # we set a shell variable (NOT environment variable)
|
||
bash-5.2$ VAR=hi env | grep VAR=; env | grep VAR= # here VAR is an environment variable, which is only valid for the next command (the second env returns nothing, confirming that it is not valid for that command)
|
||
VAR=hi
|
||
bash-5.2$ env | grep VAR= # var is not an environment variable
|
||
bash-5.2$ echo $VAR # but it is a shell variable
|
||
hello
|
||
```
|
||
Luckily for us, we don't have to handle shell variables, nor do we have to handle `VAR=value` or `VAR=value cmd`.
|
||
|
||
#### Word Splitting
|
||
cf. 3.5.7 Word Splitting
|
||
|
||
The shell scans the results of parameter expansion that did not occur within
|
||
double quotes for word splitting.
|
||
|
||
The shell splits the results of the other expansions into **words**.
|
||
|
||
The shell treats the following characters as a delimiter:
|
||
* (space)
|
||
* (tab)
|
||
* (newline)
|
||
|
||
Explicit null arguments ('""' or '''') are retained and passed to commands as
|
||
empty strings. Unquoted implicit null arguments, resulting from the expansion
|
||
of parameters that have no values, are removed. If a parameter with no value is
|
||
expanded within double quotes, a null argument results and is retained and
|
||
passed to a command as an empty string. When a quoted null argument appears as
|
||
part of a word whose expansion is non-null, the null argument is removed. That
|
||
is, the word '-d''' becomes '-d' after word splitting and null argument removal.
|
||
|
||
Note that if no expansion occurs, no splitting is performed.
|
||
|
||
#### Filename Expansion
|
||
cf. 3.5.8 Filename Expansion
|
||
|
||
Bash scans each word for the character '\*'.
|
||
|
||
If one of these characters appears, and is not quoted, then the word is regarded
|
||
as a PATTERN, and replaced with an alphabetically sorted list of filenames
|
||
matching the pattern (see: _Pattern Matching_). If no matching filenames are
|
||
found, the word is left unchanged.
|
||
|
||
When a pattern is used for filename expansion, the character '.' at the start of
|
||
a filename or immediately following a slash must be matched explicitly. In order
|
||
to match the filenames '.' and '..', the pattern must begin with '.'
|
||
|
||
When matching a filename, the slash character must always be matched explicitly
|
||
by a slash in the pattern.
|
||
|
||
##### Pattern Matching
|
||
cf. 3.5.8.1 Pattern Matching
|
||
|
||
* [ ] Any character that appears in a pattern, other than the special pattern
|
||
characters described below, matches itself. The NUL character may not occur in
|
||
a pattern.
|
||
|
||
The special pattern characters have the following meanings:
|
||
'\*'
|
||
Matches any string, including the null string.
|
||
|
||
The special pattern characters must be quoted if they are to be matched
|
||
literally.
|
||
|
||
e.g. this is the required behaviour
|
||
```shell
|
||
bash-5.1$ ls *there
|
||
'hello*there' 'hi*there' noonethere
|
||
bash-5.1$ ls *'*'there
|
||
'hello*there' 'hi*there'
|
||
```
|
||
|
||
#### Quote Removal
|
||
cf. 3.5.9 Quote Removal
|
||
|
||
After the preceding expansions, all unquoted occurrences of the characters '''
|
||
and '"' that did not result from one of the above expansions are removed.
|
||
|
||
### Redirection
|
||
cf. 3.6 Redirections
|
||
|
||
Before a command is executed, its input and output may be "redirected" using a
|
||
special notation interpreted by the shell. "Redirection" allows commands' file
|
||
handles to be made to refer to different files, and can change the files the
|
||
command reads from and writes to.
|
||
|
||
The redirection operators may precede or appear anywhere within a simple command
|
||
or may follow a command.
|
||
|
||
e.g. this is the correct behaviour
|
||
```shell
|
||
bash-5.1$ ls > hello.txt *here
|
||
bash-5.1$ cat hello.txt
|
||
hello*there
|
||
hi*there
|
||
noonethere
|
||
```
|
||
|
||
Redirections are processed in the order they appear, from left to right.
|
||
|
||
e.g. this is the correct behaviour
|
||
```shell
|
||
bash-5.1$ ls > hello.txt share > here.txt *.txt
|
||
bash-5.1$ ls -l hello.txt here.txt
|
||
-rw-r--r-- 1 kcolin 2024_le-havre 0 Feb 7 15:54 hello.txt
|
||
-rw-r--r-- 1 kcolin 2024_le-havre 68 Feb 7 15:54 here.txt
|
||
bash-5.1$ cat here.txt
|
||
hello.txt
|
||
here.txt
|
||
log.txt
|
||
newlog-strict.txt
|
||
newlog.txt
|
||
|
||
share:
|
||
man
|
||
```
|
||
|
||
'<' refers to the standard input (fd 0, STDIN\_FILENO)
|
||
|
||
'>' refers to the standard output (fd 1, STDOUT\_FILENO)
|
||
|
||
The word following the redirection operator, unless the redirection operator is
|
||
'<<', is subjected to parameter expansion, filename expansion, word splitting,
|
||
and quote removal.
|
||
|
||
If it expands to more than one word, Bash reports an error.
|
||
|
||
This is the correct behaviour:
|
||
```shell
|
||
bash-5.1$ var="file1 file2"
|
||
bash-5.1$ echo "hello world" > $var
|
||
bash: $var: ambiguous redirect
|
||
```
|
||
|
||
If the variable is not defined, bash prints the following error:
|
||
|
||
```shell
|
||
bash-5.1$ echo "hello world" > $nonexist
|
||
bash: $nonexist: ambiguous redirect
|
||
```
|
||
|
||
#### Here Documents
|
||
cf. Bash Reference Manual 3.6.6 Here Documents
|
||
|
||
This type of redirection instructs the shell to read input from the current
|
||
source until a line containing only word (with no trailing blanks) is seen. All
|
||
of the lines read up to that point are then used as the standard input for a
|
||
command.
|
||
|
||
No parameter and variable expansion, command substitution, arithmetic expansion,
|
||
or filename expansion is performed on word.
|
||
|
||
If any part of word is quoted, the delimiter is the result of quote removal on
|
||
word, and the lines in the here-document are not expanded. If word is unquoted,
|
||
all lines of the here-document are subjected to parameter expansion.
|
||
|
||
This is the correct behaviour for quoting and parameter expansion:
|
||
```shell
|
||
bash-5.2$ cat << EOF
|
||
> hello
|
||
> $$
|
||
> EOF
|
||
hello
|
||
1491742
|
||
bash-5.2$ cat << "EOF"
|
||
> hello
|
||
> $$
|
||
> EOF
|
||
hello
|
||
$$
|
||
bash-5.2$ cat << 'E'OF
|
||
> hello
|
||
> $$
|
||
> EOF
|
||
hello
|
||
$$
|
||
bash-5.2$ cat << $USER
|
||
> hello
|
||
> khais
|
||
> $USER
|
||
hello
|
||
khais
|
||
bash-5.2$ echo $USER
|
||
khais
|
||
bash-5.2$ cat << "$USER"
|
||
> $USER
|
||
```
|
||
|
||
Subject says \\ is not required, so this behaviour we will not implement:
|
||
```shell
|
||
bash-5.2$ cat << EOF
|
||
> hello \
|
||
world
|
||
> EOF
|
||
hello world
|
||
```
|
||
|
||
### Executing Commands
|
||
cf. 3.7 Executing Commands
|
||
|
||
#### Simple Command Execution
|
||
cf. 3.7.1 Simple Command Expansion
|
||
|
||
When a simple command is executed, the shell performs the following
|
||
expansions, assignments, and redirections, from left to right, in the
|
||
following order.
|
||
|
||
1. The words that the parser has marked as redirections are saved for later
|
||
processing.
|
||
|
||
2. The words that are not redirections are expanded (see _Shell Expansions_).
|
||
If any words remain after expansion, the first word is taken to be the name
|
||
of the command and the remaining words are the arguments.
|
||
|
||
3. Redirections are performed as described above (see _Redirections_).
|
||
|
||
If no command name results, redirections are performed, but do not affect the
|
||
current shell environment. A redirection error causes the command to exit with
|
||
a non-zero status.
|
||
|
||
If there is a command name left after expansion, execution proceeds as described
|
||
below. Otherwise, the command exits with a status of zero.
|
||
|
||
#### Command Search and Execution
|
||
cf. 3.7.2 Command Search and Execution
|
||
|
||
After a command has been split into words, if it results in a simple command and
|
||
an optional list of arguments, the following actions are taken.
|
||
|
||
1. The shell searches for it in the list of shell builtins. If a match is
|
||
found, that builtin is invoked.
|
||
|
||
2. If the name is not a builtin, and contains no slashes, Bash searches each
|
||
element of '$PATH' for a directory containing an executable file by that
|
||
name. If the search is unsuccessful, the shell prints an error message and
|
||
returns an exit status of 127.
|
||
|
||
3. If the search is successful, or if the command name contains one or more
|
||
slashes, the shell executes the named program in a separate execution
|
||
environment. Argument 0 is set to the name given, and the remaining
|
||
arguments to the command are set to the arguments supplied, if any.
|
||
|
||
4. If this execution fails because the file is not in executable format, and
|
||
the file is not a directory, it is assumed to be a "shell script" and the
|
||
shell executes it as described in _Shell Scripts_.
|
||
|
||
NOTE: we will _maybe_ implement this, we will see. It does not seem to be
|
||
required.
|
||
|
||
5. The shell waits for the command to complete and collects its exit status.
|
||
|
||
#### Subshell
|
||
cf. 3.7.3 Command Execution Environment
|
||
|
||
The shell has an execution environment, which consists of the following:
|
||
|
||
open files inherited by the shell at invocation, as modified by redirections
|
||
|
||
the current working directory as set by cd or inherited by the shell at invocation
|
||
|
||
shell variables, passed in the environment
|
||
|
||
A command invoked in this separate environment cannot affect the shell’s
|
||
execution environment.
|
||
|
||
A subshell is a copy of the shell process.
|
||
|
||
#### Environment
|
||
cf. 3.7.4 Environment
|
||
|
||
When a program is invoked it is given an array of strings called the
|
||
"environment". This is a list of name-value pairs, of the form 'name=value'.
|
||
|
||
Bash provides several ways to manipulate the environment. On invocation, the
|
||
shell scans its own environment and creates a parameter for each name found,
|
||
automatically marking it for 'export' to child processes. Executed commands
|
||
inherit the environment. The 'export' and 'unset' builtins allow parameters to
|
||
be added to and deleted from the environment. If the value of a parameter in
|
||
the environment is modified using the 'export' builtin, the new value becomes
|
||
part of the environment, replacing the old. The environment inherited by any
|
||
executed command consists of the shell's initial environment, whose values may
|
||
be modified in the shell, less any pairs removed by the 'unset' builtin, plus
|
||
any additions via the 'export' command.
|
||
|
||
#### Exit Status
|
||
cf. 3.7.5 Exit Status
|
||
|
||
The exit status of an executed command is the value returned by the 'waitpid'
|
||
system call or equivalent function. Exit statuses fall between 0 and 255,
|
||
though, as explained below, the shell may use values above 125 specially. Exit
|
||
statuses from shell builtins and compound commands are also limited to this
|
||
range. Under certain circumstances, the shell will use special values to
|
||
indicate specific failure modes.
|
||
|
||
For the shell's purposes, a command which exits with a zero exit status has
|
||
succeeded. A non-zero exit status indicates failure. This seemingly
|
||
counter-intuitive scheme is used so there is one well-defined way to indicate
|
||
success and a variety of ways to indicate various failure modes. When a command
|
||
terminates on a fatal signal whose number is N, Bash uses the value 128+N as the
|
||
exit status.
|
||
|
||
If a command is not found, the child process created to execute it returns a
|
||
status of 127. If a command is found but is not executable, the return status
|
||
is 126.
|
||
|
||
If a command fails because of an error during expansion or redirection, the exit
|
||
status is greater than zero.
|
||
|
||
All of the Bash builtins return an exit status of zero if they succeed and a
|
||
non-zero status on failure, so they may be used by the conditional and list
|
||
constructs. All builtins return an exit status of 2 to indicate incorrect
|
||
usage, generally invalid options or missing arguments.
|
||
|
||
The exit status of the last command is available in the special parameter $?.
|
||
|
||
#### Signals
|
||
cf. 3.7.6 Signals
|
||
|
||
When Bash is interactive, it ignores 'SIGTERM' (so that 'kill 0' does not kill
|
||
an interactive shell), and 'SIGINT' is caught and handled. When Bash receives a
|
||
'SIGINT', it breaks out of any executing loops. In all cases, Bash ignores
|
||
'SIGQUIT'. Bash ignores 'SIGTTIN', 'SIGTTOU', and 'SIGTSTP'.
|
||
|
||
NOTE: The behaviour on when ^C is printed seems strange, investigate further
|
||
once we implement this
|
||
|
||
TODO: investigate this further, this seems very complicated
|
||
|
||
## Definitions
|
||
cf. [Bash Reference Manual](https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Definitions)
|
||
cf. 2 Definitions
|
||
|
||
**token**
|
||
A sequence of characters considered a single unit by the shell. It is either a
|
||
word or an operator
|
||
|
||
**word**
|
||
A sequence of characters treated as a unit by the shell. Words may not include
|
||
unquoted metacharacters.
|
||
|
||
**operator**
|
||
A **control operator** or a **redirection operator**.
|
||
Operators contain at least one unquoted **metacharacter**.
|
||
|
||
**control operator**
|
||
A token that performs a control function.
|
||
|
||
It is a newline or one of the following: '|', ‘||’, ‘&&’, ‘(’, or ‘)’.
|
||
|
||
**redirection operator**
|
||
For our project:
|
||
|
||
'<' redirects input
|
||
|
||
'>' redirects output
|
||
|
||
'<<' is here_doc with delimiter.
|
||
delimiter is a **word**.
|
||
Does not have to update history
|
||
|
||
'>>' redirects output in append mode
|
||
|
||
**blank**
|
||
A space or tab character
|