minishell/NOTES.md
Khaïs COLIN eff1eede66
notes: add redirection section
Signed-off-by: Khaïs COLIN <kcolin@student.42lehavre.fr>
2025-02-07 17:32:16 +01:00

436 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Notes relatives au projet
cf. [Bash Reference Manual](https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html)
Comparative testing with bash should be done with bash --norc --posix.
## Ideas for testing
* use prysk or shellspec with shell=./minishell
## Shell Operation
cf. 3.1.1 Shell Operation
Breaks the input into **words** and **operators**, obeying the *Quoting Rules*.
These tokens are delimited by **metacharacters**.
Parses the tokens into simple and compound commands (see *Shell Commands*)
Performs the various _Shell Expansions_, breaking the expanded tokens into lists
of filenames and commands and arguments.
Performs any necessary _redirections_ and removes the **redirection operators**
and their operands from the argument list.
TODO: add missing operations
### Quoting Rules
cf. 3.1.2 Quoting
Quoting escapes metacharacters.
The quoting mechanisms we have to implement are:
cf. Subject
* Single quotes, which prevent metacharacters interpretation.
* Double quotes, which prevent metacharacters interpretation except for '$' (See
_Shell Parameter Expansion_).
In the Bash Reference Manual, these are defined as follows (keeping only the parts we have to implement):
cf. 3.1.2.2 Single Quotes
Preserves the literal value of each character within the quotes.
cf. 3.1.2.3 Double Quotes
Preserves the literal value of all characters within the quotes, with the exception of '$'.
TODO: The special parameters * and @ have special meaning when in double quotes (see Shell Parameter Expansion).
See if we have to handle this
Per the subject: minishell should not interpret unclosed quotes
### Shell Commands
cf. 3.2 Shell Commands
A Shell Command may be either a *Simple Command*, a *Pipeline*, a *List of
Commands* (composed of one or more *Pipelines*), or a *Grouped Command*
(composed of one or more *List of Commands*).
#### Simple Commands
cf. 3.2.2 Simple Commands
Its just a sequence of words separated by **blanks**, terminated by one of the
shells **control operators**.
The first **word** specifies a command to be executed, with the rest of the
**words** being that commands arguments.
The return status (see _Exit Status_) of a simple command is its exit status as
provided by the POSIX 1003.1 waitpid function, or 128+n if the command was
terminated by signal n.
#### Pipelines
cf. 3.2.3 Pipelines
A pipeline is a sequence of one or more commands separated by the control operator '|'.
The output of each command in the pipeline is connected via a pipe to the input
of the next command.
That is, each command reads the previous commands output.
This connection is performed before any redirections specified by the first
command.
The shell waits for all commands in the pipeline to complete before reading the next command.
Each command in a multi-command pipeline, where pipes are created, is executed
in its own _subshell_, which is a separate process.
e.g.
```shell
export TT=1 | echo $TT
```
prints an empty string, because TT is unset in the second subshell.
The exit status of a pipeline is the exit status of the last command in the pipeline.
The shell waits for all commands in the pipeline to terminate before returning a value.
#### Lists of Commands
cf. 3.2.4 Lists of Commands
A list is a sequence of one or more pipelines separated by one of the
**operators** &&, or ||, and optionally terminated by a newline.
AND and OR lists are sequences of one or more pipelines separated by the control
operators && and ||, respectively.
AND and OR lists are executed with left associativity.
e.g.
```shell
A && B && C
```
is the same as
```shell
(A && B) && C
```
An AND list has the form
```shell
A && B
```
B is execute if and only if A has an exit status of 0 (succes).
An OR list has the form
```shell
A || B
```
B is execute if and only if A has a non-zero exit status (failure).
The return status of AND and OR lists is the exit status of the last command
executed in the list.
#### Group of Commands
cf. 3.2.5 Compound Commands
Each group begins with the **control operator** '(' and ends with the
**control operator** ')'.
Any redirections associated with a _group of commands_ apply to all commands
within that _group of commands_ unless explicitly overridden.
cf. 3.2.5.3 Grouping Commands
When commands are grouped, redirections may be applied to the entire command
list. For example, the output of all the commands in the list may be redirected
to a single stream.
( LIST )
The parentheses are operators, and are recognized as separate tokens by the
shell even if they are not separated from the LIST by whitespace.
Placing a list of commands between parentheses forces the shell to create a
_subshell, and each of the commands in LIST is executed in that subshell
environment. Since the LIST is executed in a subshell, variable assignments do
not remain in effect after the subshell completes.
The exit status of this construct is the exit status of LIST.
### Shell Expansion
cf. 3.5 Shell Expansions
Expansion is performed on the command line after it has been split into
**token**'s. There are seven kinds of expansion performed. in the following
order:
* brace expansion
* tilde expansion
* parameter and variable expansion
* arithmetic expansion
* command substitution (left to right)
* word splitting
* filename expansion
We only have to implement the following kinds:
* parameter expansion
* word splitting
* filename expansion
After these expansions are performed, quote characters present in the original
word are removed unless they have been quoted themselves ("_quote removal_").
Only brace expansion, word splitting, and filename expansion can increase the
number of words of the expansion; other expansions expand a single word to a
single word.
#### Shell Parameter Expansion
cf. 3.5.3 Shell Parameter Expansion
The '$' character introduces parameter expansion, command substitution, or
arithmetic expansion.
The form is $VAR, where VAR may only contain the following characters:
* a-z
* A-Z
* _
* 0-9 (not in the first character)
#### Word Splitting
cf. 3.5.7 Word Splitting
The shell scans the results of parameter expansion that did not occur within
double quotes for word splitting.
The shell splits the results of the other expansions into **words**.
The shell treats the following characters as a delimiter:
* (space)
* (tab)
* (newline)
Explicit null arguments ('""' or '''') are retained and passed to commands as
empty strings. Unquoted implicit null arguments, resulting from the expansion
of parameters that have no values, are removed. If a parameter with no value is
expanded within double quotes, a null argument results and is retained and
passed to a command as an empty string. When a quoted null argument appears as
part of a word whose expansion is non-null, the null argument is removed. That
is, the word '-d''' becomes '-d' after word splitting and null argument removal.
Note that if no expansion occurs, no splitting is performed.
#### Filename Expansion
cf. 3.5.8 Filename Expansion
Bash scans each word for the character '\*'.
If one of these characters appears, and is not quoted, then the word is regarded
as a PATTERN, and replaced with an alphabetically sorted list of filenames
matching the pattern (see: _Pattern Matching_). If no matching filenames are
found, the word is left unchanged.
When a pattern is used for filename expansion, the character '.' at the start of
a filename or immediately following a slash must be matched explicitly. In order
to match the filenames '.' and '..', the pattern must begin with '.'
When matching a filename, the slash character must always be matched explicitly
by a slash in the pattern.
##### Pattern Matching
cf. 3.5.8.1 Pattern Matching
Any character that appears in a pattern, other than the special pattern
characters described below, matches itself. The NUL character may not occur in
a pattern.
The special pattern characters have the following meanings:
'\*'
Matches any string, including the null string.
The special pattern characters must be quoted if they are to be matched
literally.
e.g. this is the required behaviour
```shell
bash-5.1$ ls *there
'hello*there' 'hi*there' noonethere
bash-5.1$ ls *'*'there
'hello*there' 'hi*there'
```
#### Quote Removal
cf. 3.5.9 Quote Removal
After the preceding expansions, all unquoted occurrences of the characters '''
and '"' that did not result from one of the above expansions are removed.
### Redirection
cf. 3.6 Redirections
Before a command is executed, its input and output may be "redirected" using a
special notation interpreted by the shell. "Redirection" allows commands' file
handles to be made to refer to different files, and can change the files the
command reads from and writes to.
The redirection operators may precede or appear anywhere within a simple command
or may follow a command.
e.g. this is the correct behaviour
```shell
bash-5.1$ ls > hello.txt *here
bash-5.1$ cat hello.txt
hello*there
hi*there
noonethere
```
Redirections are processed in the order they appear, from left to right.
e.g. this is the correct behaviour
```shell
bash-5.1$ ls > hello.txt share > here.txt *.txt
bash-5.1$ ls -l hello.txt here.txt
-rw-r--r-- 1 kcolin 2024_le-havre 0 Feb 7 15:54 hello.txt
-rw-r--r-- 1 kcolin 2024_le-havre 68 Feb 7 15:54 here.txt
bash-5.1$ cat here.txt
hello.txt
here.txt
log.txt
newlog-strict.txt
newlog.txt
share:
man
```
'<' refers to the standard input (fd 0, STDIN\_FILENO)
'>' refers to the standard output (fd 1, STDOUT\_FILENO)
TODO: check unless otherwise noted
The word following the redirection operator, unless otherwise noted, is
subjected to parameter expansion, filename expansion, word splitting, and
quote removal.
If it expands to more than one word, Bash reports an error, except when in posix
mode.
TODO: decide if we follow posix default or bash default
Note: This behaviour change is, as far as I can tell, not documented in the Bash
Reference Manual.
```shell
# bash behaviour
bash-5.1$ var="file1 file2"
bash-5.1$ echo "hello world" > $var
bash: $var: ambiguous redirect
# posix behaviour
bash-5.1$ var="file1 file2"
bash-5.1$ echo "hello world" > $var
bash-5.1$ cat "$var"
hello world
```
In bash mode and in posix mode, if the variable is not defined, bash prints the
following error:
```shell
bash-5.1$ echo "hello world" > $nonexist
bash: $nonexist: ambiguous redirect
```
## Subshell
cf. 3.7.3 Command Execution Environment
The shell has an execution environment, which consists of the following:
open files inherited by the shell at invocation, as modified by redirections
the current working directory as set by cd or inherited by the shell at invocation
shell variables, passed in the environment
A command invoked in this separate environment cannot affect the shells
execution environment.
A subshell is a copy of the shell process.
## Here Documents
cf. Bash Reference Manual 3.6.6 Here Documents
This type of redirection instructs the shell to read input from the current
source until a line containing only word (with no trailing blanks) is seen. All
of the lines read up to that point are then used as the standard input for a
command.
TODO: The following paragraph may not apply fully to our project, check it again!
No parameter and variable expansion, command substitution, arithmetic expansion,
or filename expansion is performed on word. If any part of word is quoted, the
delimiter is the result of quote removal on word, and the lines in the
here-document are not expanded. If word is unquoted, all lines of the
here-document are subjected to parameter expansion, command substitution, and
arithmetic expansion, the character sequence \newline is ignored, and \ must
be used to quote the characters \, $, and `.
## Signal handling
cf. 6.12 Shell Compatibility Mode => compat32
interrupting a command list such as "a; b; c" causes the execution of the entire
list to be aborted.
## Definitions
cf. [Bash Reference Manual](https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Definitions)
cf. 2 Definitions
**token**
A sequence of characters considered a single unit by the shell. It is either a
word or an operator
**word**
A sequence of characters treated as a unit by the shell. Words may not include
unquoted metacharacters.
**operator**
A **control operator** or a **redirection operator**.
Operators contain at least one unquoted **metacharacter**.
**control operator**
A token that performs a control function.
It is a newline or one of the following: '|', ||, &&, (, or ).
**redirection operator**
For our project:
'<' redirects input
'>' redirects output
'<<' is here_doc with delimiter.
delimiter is a **word**.
Does not have to update history
'>>' redirects output in append mode
**blank**
A space or tab character