minishell/NOTES.md
2025-02-07 15:43:24 +01:00

10 KiB
Raw Blame History

Notes relatives au projet

cf. Bash Reference Manual

Comparative testing with bash should be done with bash --norc --posix.

Shell Operation

cf. 3.1.1 Shell Operation

Breaks the input into words and operators, obeying the Quoting Rules. These tokens are delimited by metacharacters.

Parses the tokens into simple and compound commands (see Shell Commands)

Performs the various Shell Expansions, breaking the expanded tokens into lists of filenames and commands and arguments.

Quoting Rules

cf. 3.1.2 Quoting

Quoting escapes metacharacters.

The quoting mechanisms we have to implement are:

cf. Subject

  • Single quotes, which prevent metacharacters interpretation.
  • Double quotes, which prevent metacharacters interpretation except for '$' (See Shell Parameter Expansion).

In the Bash Reference Manual, these are defined as follows (keeping only the parts we have to implement):

cf. 3.1.2.2 Single Quotes

Preserves the literal value of each character within the quotes.

cf. 3.1.2.3 Double Quotes

Preserves the literal value of all characters within the quotes, with the exception of '$'.

TODO: The special parameters * and @ have special meaning when in double quotes (see Shell Parameter Expansion). See if we have to handle this

Per the subject: minishell should not interpret unclosed quotes

Shell Commands

cf. 3.2 Shell Commands

A Shell Command may be either a Simple Command, a Pipeline, a List of Commands (composed of one or more Pipelines), or a Grouped Command (composed of one or more List of Commands).

Simple Commands

cf. 3.2.2 Simple Commands

Its just a sequence of words separated by blanks, terminated by one of the shells control operators. The first word specifies a command to be executed, with the rest of the words being that commands arguments.

The return status (see Exit Status) of a simple command is its exit status as provided by the POSIX 1003.1 waitpid function, or 128+n if the command was terminated by signal n.

Pipelines

cf. 3.2.3 Pipelines

A pipeline is a sequence of one or more commands separated by the control operator '|'.

The output of each command in the pipeline is connected via a pipe to the input of the next command. That is, each command reads the previous commands output. This connection is performed before any redirections specified by the first command.

The shell waits for all commands in the pipeline to complete before reading the next command.

Each command in a multi-command pipeline, where pipes are created, is executed in its own subshell, which is a separate process.

e.g.

export TT=1 | echo $TT

prints an empty string, because TT is unset in the second subshell.

The exit status of a pipeline is the exit status of the last command in the pipeline.

The shell waits for all commands in the pipeline to terminate before returning a value.

Lists of Commands

cf. 3.2.4 Lists of Commands

A list is a sequence of one or more pipelines separated by one of the operators &&, or ||, and optionally terminated by a newline.

AND and OR lists are sequences of one or more pipelines separated by the control operators && and ||, respectively. AND and OR lists are executed with left associativity.

e.g.

A && B && C

is the same as

(A && B) && C

An AND list has the form

A && B

B is execute if and only if A has an exit status of 0 (succes).

An OR list has the form

A || B

B is execute if and only if A has a non-zero exit status (failure).

The return status of AND and OR lists is the exit status of the last command executed in the list.

Group of Commands

cf. 3.2.5 Compound Commands

Each group begins with the control operator '(' and ends with the control operator ')'.

Any redirections associated with a group of commands apply to all commands within that group of commands unless explicitly overridden.

cf. 3.2.5.3 Grouping Commands

When commands are grouped, redirections may be applied to the entire command list. For example, the output of all the commands in the list may be redirected to a single stream.

( LIST )

The parentheses are operators, and are recognized as separate tokens by the shell even if they are not separated from the LIST by whitespace.

Placing a list of commands between parentheses forces the shell to create a _subshell, and each of the commands in LIST is executed in that subshell environment. Since the LIST is executed in a subshell, variable assignments do not remain in effect after the subshell completes.

The exit status of this construct is the exit status of LIST.

Shell Expansion

cf. 3.5 Shell Expansions

Expansion is performed on the command line after it has been split into token's. There are seven kinds of expansion performed. in the following order:

  • brace expansion
  • tilde expansion
  • parameter and variable expansion
  • arithmetic expansion
  • command substitution (left to right)
  • word splitting
  • filename expansion

We only have to implement the following kinds:

  • parameter expansion
  • word splitting
  • filename expansion

After these expansions are performed, quote characters present in the original word are removed unless they have been quoted themselves ("quote removal").

Only brace expansion, word splitting, and filename expansion can increase the number of words of the expansion; other expansions expand a single word to a single word.

Shell Parameter Expansion

cf. 3.5.3 Shell Parameter Expansion

The '$' character introduces parameter expansion, command substitution, or arithmetic expansion.

The form is $VAR, where VAR may only contain the following characters:

  • a-z
  • A-Z
  • _
  • 0-9 (not in the first character)

Word Splitting

cf. 3.5.7 Word Splitting

The shell scans the results of parameter expansion that did not occur within double quotes for word splitting.

The shell splits the results of the other expansions into words.

The shell treats the following characters as a delimiter:

  • (space)
  • (tab)
  • (newline)

Explicit null arguments ('""' or '''') are retained and passed to commands as empty strings. Unquoted implicit null arguments, resulting from the expansion of parameters that have no values, are removed. If a parameter with no value is expanded within double quotes, a null argument results and is retained and passed to a command as an empty string. When a quoted null argument appears as part of a word whose expansion is non-null, the null argument is removed. That is, the word '-d''' becomes '-d' after word splitting and null argument removal.

Note that if no expansion occurs, no splitting is performed.

Filename Expansion

cf. 3.5.8 Filename Expansion

Bash scans each word for the character '*'.

If one of these characters appears, and is not quoted, then the word is regarded as a PATTERN, and replaced with an alphabetically sorted list of filenames matching the pattern (see: Pattern Matching). If no matching filenames are found, the word is left unchanged.

When a pattern is used for filename expansion, the character '.' at the start of a filename or immediately following a slash must be matched explicitly. In order to match the filenames '.' and '..', the pattern must begin with '.'

When matching a filename, the slash character must always be matched explicitly by a slash in the pattern.

Pattern Matching

cf. 3.5.8.1 Pattern Matching

Any character that appears in a pattern, other than the special pattern characters described below, matches itself. The NUL character may not occur in a pattern.

The special pattern characters have the following meanings: '*' Matches any string, including the null string.

The special pattern characters must be quoted if they are to be matched literally.

e.g. this is the required behaviour

bash-5.1$ ls *there
'hello*there'  'hi*there'   noonethere
bash-5.1$ ls *'*'there
'hello*there'  'hi*there'

Quote Removal

cf. 3.5.9 Quote Removal

After the preceding expansions, all unquoted occurrences of the characters ''' and '"' that did not result from one of the above expansions are removed.

Subshell

cf. 3.7.3 Command Execution Environment

The shell has an execution environment, which consists of the following:

open files inherited by the shell at invocation, as modified by redirections

the current working directory as set by cd or inherited by the shell at invocation

shell variables, passed in the environment

A command invoked in this separate environment cannot affect the shells execution environment.

A subshell is a copy of the shell process.

Here Documents

cf. Bash Reference Manual 3.6.6 Here Documents

This type of redirection instructs the shell to read input from the current source until a line containing only word (with no trailing blanks) is seen. All of the lines read up to that point are then used as the standard input for a command.

TODO: The following paragraph may not apply fully to our project, check it again!

No parameter and variable expansion, command substitution, arithmetic expansion, or filename expansion is performed on word. If any part of word is quoted, the delimiter is the result of quote removal on word, and the lines in the here-document are not expanded. If word is unquoted, all lines of the here-document are subjected to parameter expansion, command substitution, and arithmetic expansion, the character sequence \newline is ignored, and \ must be used to quote the characters \, $, and `.

Definitions

cf. Bash Reference Manual cf. 2 Definitions

token A sequence of characters considered a single unit by the shell. It is either a word or an operator

word A sequence of characters treated as a unit by the shell. Words may not include unquoted metacharacters.

operator A control operator or a redirection operator. Operators contain at least one unquoted metacharacter.

control operator A token that performs a control function.

It is a newline or one of the following: '|', ||, &&, (, or ).

redirection operator For our project:

'<' redirects input

'>' redirects output

'<<' is here_doc with delimiter. delimiter is a word. Does not have to update history

'>>' redirects output in append mode

blank A space or tab character