From 2cdd540ed7230bd3f6106c1ac71ce33846b5ea66 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kha=C3=AFs=20COLIN?= Date: Fri, 7 Feb 2025 13:22:51 +0100 Subject: [PATCH] notes: add lots of notes --- NOTES.md | 320 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 320 insertions(+) create mode 100644 NOTES.md diff --git a/NOTES.md b/NOTES.md new file mode 100644 index 0000000..717c529 --- /dev/null +++ b/NOTES.md @@ -0,0 +1,320 @@ +# Notes relatives au projet + +cf. [Bash Reference Manual](https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html) + +Comparative testing with bash should be done with bash --norc --posix. + +## Shell Operation + +cf. 3.1.1 Shell Operation + +Breaks the input into **words** and **operators**, obeying the *Quoting Rules*. +These tokens are delimited by **metacharacters**. + +Parses the tokens into simple and compound commands (see *Shell Commands*) + +Performs the various _Shell Expansions_, breaking the expanded tokens into lists +of filenames and commands and arguments. + +### Quoting Rules + +cf. 3.1.2 Quoting + +Quoting escapes metacharacters. + +The quoting mechanisms we have to implement are: + +cf. Subject + +* Single quotes, which prevent metacharacters interpretation. +* Double quotes, which prevent metacharacters interpretation except for '$' (See + _Shell Parameter Expansion_). + +In the Bash Reference Manual, these are defined as follows (keeping only the parts we have to implement): + +cf. 3.1.2.2 Single Quotes + +Preserves the literal value of each character within the quotes. + +cf. 3.1.2.3 Double Quotes + +Preserves the literal value of all characters within the quotes, with the exception of '$'. + +TODO: The special parameters ‘*’ and ‘@’ have special meaning when in double quotes (see Shell Parameter Expansion). +See if we have to handle this + +Per the subject: minishell should not interpret unclosed quotes + +### Shell Commands + +cf. 3.2 Shell Commands + +A Shell Command may be either a *Simple Command*, a *Pipeline*, a *List of +Commands* (composed of one or more *Pipelines*), or a *Grouped Command* +(composed of one or more *List of Commands*). + +#### Simple Commands + +cf. 3.2.2 Simple Commands + +It’s just a sequence of words separated by **blanks**, terminated by one of the +shell’s **control operators**. +The first **word** specifies a command to be executed, with the rest of the +**words** being that command’s arguments. + +The return status (see _Exit Status_) of a simple command is its exit status as +provided by the POSIX 1003.1 waitpid function, or 128+n if the command was +terminated by signal n. + +#### Pipelines + +cf. 3.2.3 Pipelines + +A pipeline is a sequence of one or more commands separated by the control operator '|'. + +The output of each command in the pipeline is connected via a pipe to the input +of the next command. +That is, each command reads the previous command’s output. +This connection is performed before any redirections specified by the first +command. + +The shell waits for all commands in the pipeline to complete before reading the next command. + +Each command in a multi-command pipeline, where pipes are created, is executed +in its own _subshell_, which is a separate process. + +e.g. + +```shell +export TT=1 | echo $TT +``` + +prints an empty string, because TT is unset in the second subshell. + +The exit status of a pipeline is the exit status of the last command in the pipeline. + +The shell waits for all commands in the pipeline to terminate before returning a value. + +#### Lists of Commands + +cf. 3.2.4 Lists of Commands + +A list is a sequence of one or more pipelines separated by one of the +**operators** ‘&&’, or ‘||’, and optionally terminated by a newline. + +AND and OR lists are sequences of one or more pipelines separated by the control +operators ‘&&’ and ‘||’, respectively. +AND and OR lists are executed with left associativity. + +e.g. + +```shell +A && B && C +``` + +is the same as + +```shell +(A && B) && C +``` + +An AND list has the form + +```shell +A && B +``` + +B is execute if and only if A has an exit status of 0 (succes). + +An OR list has the form + +```shell +A || B +``` + +B is execute if and only if A has a non-zero exit status (failure). + +The return status of AND and OR lists is the exit status of the last command +executed in the list. + +#### Group of Commands + +cf. 3.2.5 Compound Commands + +Each group begins with the **control operator** '(' and ends with the +**control operator** ')'. + +Any redirections associated with a _group of commands_ apply to all commands +within that _group of commands_ unless explicitly overridden. + +cf. 3.2.5.3 Grouping Commands + +When commands are grouped, redirections may be applied to the entire command +list. For example, the output of all the commands in the list may be redirected +to a single stream. + +( LIST ) + +The parentheses are operators, and are recognized as separate tokens by the +shell even if they are not separated from the LIST by whitespace. + +Placing a list of commands between parentheses forces the shell to create a +_subshell, and each of the commands in LIST is executed in that subshell +environment. Since the LIST is executed in a subshell, variable assignments do +not remain in effect after the subshell completes. + +The exit status of this construct is the exit status of LIST. + +### Shell Expansion + +cf. 3.5 Shell Expansions + +Expansion is performed on the command line after it has been split into +**token**'s. There are seven kinds of expansion performed. in the following +order: + + * brace expansion + * tilde expansion + * parameter and variable expansion + * arithmetic expansion + * command substitution (left to right) + * word splitting + * filename expansion + +We only have to implement the following kinds: + * parameter expansion + * word splitting + * filename expansion + +After these expansions are performed, quote characters present in the original +word are removed unless they have been quoted themselves ("_quote removal_"). + +Only brace expansion, word splitting, and filename expansion can increase the +number of words of the expansion; other expansions expand a single word to a +single word. + +#### Shell Parameter Expansion +cf. 3.5.3 Shell Parameter Expansion + +The '$' character introduces parameter expansion, command substitution, or +arithmetic expansion. + +The form is $VAR, where VAR may only contain the following characters: +* a-z +* A-Z +* _ +* 0-9 (not in the first character) + +#### Word Splitting +cf. 3.5.7 Word Splitting + +The shell scans the results of parameter expansion that did not occur within +double quotes for word splitting. + +The shell splits the results of the other expansions into **words**. + +The shell treats the following characters as a delimiter: +* (space) +* (tab) +* (newline) + +Explicit null arguments ('""' or '''') are retained and passed to commands as +empty strings. Unquoted implicit null arguments, resulting from the expansion +of parameters that have no values, are removed. If a parameter with no value is +expanded within double quotes, a null argument results and is retained and +passed to a command as an empty string. When a quoted null argument appears as +part of a word whose expansion is non-null, the null argument is removed. That +is, the word '-d''' becomes '-d' after word splitting and null argument removal. + +Note that if no expansion occurs, no splitting is performed. + +#### Filename Expansion +cf. 3.5.8 Filename Expansion + +Bash scans each word for the character '\*'. + +If one of these characters appears, and is not quoted, then the word is regarded +as a PATTERN, and replaced with an alphabetically sorted list of filenames +matching the pattern (see: _Pattern Matching_). If no matching filenames are +found, the word is left unchanged. + +When a pattern is used for filename expansion, the character '.' at the start of +a filename or immediately following a slash must be matched explicitly. In order +to match the filenames '.' and '..', the pattern must begin with '.' + + + +#### Quote Removal +TODO + +## Subshell + +cf. 3.7.3 Command Execution Environment + +The shell has an execution environment, which consists of the following: + +open files inherited by the shell at invocation, as modified by redirections + +the current working directory as set by cd or inherited by the shell at invocation + +shell variables, passed in the environment + +A command invoked in this separate environment cannot affect the shell’s +execution environment. + +A subshell is a copy of the shell process. + +## Here Documents +cf. Bash Reference Manual 3.6.6 Here Documents + +This type of redirection instructs the shell to read input from the current +source until a line containing only word (with no trailing blanks) is seen. All +of the lines read up to that point are then used as the standard input for a +command. + +TODO: The following paragraph may not apply fully to our project, check it again! + +No parameter and variable expansion, command substitution, arithmetic expansion, +or filename expansion is performed on word. If any part of word is quoted, the +delimiter is the result of quote removal on word, and the lines in the +here-document are not expanded. If word is unquoted, all lines of the +here-document are subjected to parameter expansion, command substitution, and +arithmetic expansion, the character sequence \newline is ignored, and ‘\’ must +be used to quote the characters ‘\’, ‘$’, and ‘`’. + +## Definitions +cf. [Bash Reference Manual](https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Definitions) +cf. 2 Definitions + +**token** +A sequence of characters considered a single unit by the shell. It is either a +word or an operator + +**word** +A sequence of characters treated as a unit by the shell. Words may not include +unquoted metacharacters. + +**operator** +A **control operator** or a **redirection operator**. +Operators contain at least one unquoted **metacharacter**. + +**control operator** +A token that performs a control function. + +It is a newline or one of the following: '|', ‘||’, ‘&&’, ‘(’, or ‘)’. + +**redirection operator** +For our project: + +'<' redirects input + +'>' redirects output + +'<<' is here_doc with delimiter. +delimiter is a **word**. +Does not have to update history + +'>>' redirects output in append mode + +**blank** +A space or tab character